A curated list of ✨ awesome ✨ resources for learning about data analytics, machine learning, artificial intelligence, and big data.
- Books
- Podcasts
- Newsletters
- Overviews
- Case Studies, Use Cases, Blogs, Papers
- Data Analytics Process
- Algorithms and Techniques
- APIs, Libraries, Tools
- Big Data
- Courses
- Datasets
- Misc
- Deep Learning - The Straight Dope
- Reinforcement Learning: An Introduction
- Data Science for Business
- Math for Machine Learning: Open Doors to Data Science and Artificial Intelligence
- Advances in Financial Machine Learning (YouTube overview)
- Seeing Theory - A visual introduction to probability and statistics
- Interpretable Machine Learning. A Guide for Making Black Box Models Explainable.
- Introduction to Data Mining
- Data Science from Scratch
- Practical Statistics for Data Scientists
- Learning to Love Data Science
- Doing Data Science
- Data Mining
- Rebooting AI: Building Artificial Intelligence We Can Trust. Gary Marcus. 2019. Commentary by Matt Turck.
- The AI Advantage: How to Put the Artificial Intelligence Revolution to Work (Management on the Cutting Edge). Thomas H. Davenport. 2018.
- Applied Artificial Intelligence: A Handbook For Business Leaders. Mariya Yao. 2018.
- O'Reilly Data Show
- Storytelling with Data
- DataSkeptic
- Linear Digressions
- DataFramed
- This Week in ML and AI
- Machine Learning Guide
- Data Elixir
- KDnuggets News
- O'Reilly Data & AI Newsletter
- Data Science Weekly
- AAAI Alert
- Medium Weekly Digest
- TOPBOTS
- 44 Noteworthy Big Data Statistics in 2019
- The 4 Types of Data Analytics
- What Are Artificial Intelligence, Machine Learning, and Deep Learning?
- Machine Learning "What I really do" panel
- The Data Science Industry: Who Does What (Infographic)
- I have data. I need insights. Where do I start?
- Machine Learning for Economists: An Introduction
- A Gentle Guide to Machine Learning
- A visual introduction to machline learning
- Machine Learning Mindmap / Cheatsheet
- Machine Learning for Humans Aug 2017.
- 4-Steps to Get Started in Machine Learning March 2014.
- Jason's Machine Learning 101
- Getting Value from Machine Learning Isn’t About Fancier Algorithms — It’s About Making It Easier to Use
- ML Resources
- Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found
- Artificial Intelligence—A Game Changer for Climate Change and the Environment
- AI, For Real. HBR. July 2017.
- A list of artificial intelligence tools you can use today — for personal use
- A list of artificial intelligence tools you can use today — for businesses
- AI and Deep Learning, Explained Simply
- The AI Hierarchy of Needs. August 2017.
- A Survey of 3,000 Executives Reveals How Businesses Succeed with AI. HRB. August 2017.
- How to Regulate Artificial Intelligence. Sep 2017.
- Will AI kill us all after taking our jobs? Sep 2017.
- Is AI Riding a One-Trick Pony?
- 51 Artificial Intelligence (AI) Predictions For 2018. Forbes, Nov 2017.
- How Do Machines Learn?. Fun little video.
- What AI can and can’t do (yet) for your business
- The Simple Economics of Machine Intelligence HBR, 2017.
- Tencent says there are only 300,000 AI engineers worldwide, but millions are needed
- The GANfather: The man who’s given machines the gift of imagination
- What is AI, Really?
- Apple and Its Rivals Bet Their Futures on These Men’s Dreams. An oral history of artificial intelligence, as told by its godfathers, gadflies, and Justin Trudeau.
- Physicist Max Tegmark on the promise and pitfalls of artificial intelligence
- A 6 minute Intro to AI
- AI Knowledge Map: How To Classify AI Technologies
- ARTIFICIAL INTELLIGENCE IN BUSINESS GETS REAL
- The limitations of deep learning
- Structured Deep Learning
- Using Deep Learning to Solve Real World Problems
- Deep Learning: A Critical Appraisal Jan 2018.
- Feature Visualization: How neural networks build up their understanding of images
- The Building Blocks of Interpretability
- An Introduction to Deep Learning for Tabular Data. April 2018.
- Educational Data Mining and Learning Analytics
- Learning analytics in higher education.. A review of UK and international practice.
- The beginner's guide to prediction workforce analytics.
- Kaggle: Human Resources Analytics
- IBM Employee attrition dataset
- Using Machine Learning to Predict and Explain Employee Attrition
- NYC Analytics. NYC Mayor’s Office of Data Analysis describes their data management system and improvements in operations.
- UK Government, Tax Agent Segmentation.
- Data.gov, Applications
- The NFL’s Brewing Information War
- TED: The math behind basketball's wildest moves
- AI in sports
- NBA Data Analytics: Changing the Game
- Optimize Your Operations With Predictive Maintenance: Leverage Real-Time IoT Data to Anticipate Equipment Failure
- Applied Data Science: Solving a Predictive Maintenance Business Problem
- Targeting Disaster Relief From Space July 2017.]
- Top 10 Videos on Machine Learning in Finance
- Impact Of Artificial Intelligence And Machine Learning on Trading And Investing
- Ghosts in the Machine: AI, risks and regulations in financial markets
- Introduction to Deep Learning Trading in Hedge Funds
- Introduction to Learning to Trade with Reinforcement Learning Feb 2018.
- CASE STUDY: FERRATUM BANK
- Machine Learning: Challenges, Lessons, and Opportunities in Credit Risk Modeling. July 2017
- Consumer Credit Risk Models via Machine-Learning Algorithms. Amir E. Khandani, Adlar J. Kim, and Andrew W. Lo. 2010.
- How to Build Credit Risk Models Using AI and Machine Learning
- How a Japanese cucumber farmer is using deep learning and TensorFlow. Google ML Blog. August 2016.
- How Artificial Intelligence Is Raising The Bar On The Science Of Marketing May 2018.
- Identifying churn drivers with Random Forests. Jan 2018.
- Deep Learning With Keras To Predict Customer Churn. Jan 2018.
- A Day in the Life of a Marketing Analytics Professional. Aug 2018.
- What is the most important step in a machine learning project?
- https://machinelearningmastery.com/machine-learning-performance-improvement-cheat-sheet/
- A Basic Recipe for Machine Learning
- Fundamentals of Data Visualization. Great online book by Claus O. Wilke.
- skimr. Excellent R package for data exploration.
- Effectively Using Matplotlib. April 2017.
- What's so hard about histograms?
- GeoSpatial Data Visualization in R July 2017.
- The 5 Common Mistakes That Lead to Bad Data Visualization
- Three Common Mistakes With Company-level Dashboards. Nov 2017.
- Visualizing Incomplete and Missing Data Jan 2018.
- Data Visualization Cheat Sheet
- Data-Driving Storytelling
- Visual Vocabulary - Vega Edition
- Common Probability Distributions: The Data Scientist’s Crib Sheet
- How to Perform Data Cleaning for Machine Learning with Python. March 2020.
- The Ultimate Guide to Basic Data Cleaning
- An introduction to data cleaning with R
- Reducing Dimensionality from Dimensionality Reduction Techniques. July 2017.
- Your Data is Being Manipulated
- Dealing with categorical features in machine learning
- How to Handle Imbalanced Classes in Machine Learning. July 2017.
- 8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset
- Understanding Feature Engineering (Part 1) — Continuous Numeric Data. Jan 2018.
- Intro to Feature Engineering with TensorFlow - Machine Learning Recipes #9 Google Developers.
- Feature Hashing (a.k.a. The Hashing Trick) With R
- About Feature Scaling and Normalization – and the effect of standardization for machine learning algorithms
- What metrics should be used for evaluating a model on an imbalanced data set? (precision + recall or ROC=TPR+FPR)
- YouTube: The tradeoff between sensitivity and specificity
- Precision, Recall, AUCs and ROCs Jan 2015
- YouTube: Boosting
- YouTube: Bagging
- YouTube: ROC curves
- YouTube: ROC Curves explained
- The Best Metric to Measure Accuracy of Classification Models
- YouTube: How to evaluate a classifier in scikit-learn
- Performance Metrics for Classification problems in Machine Learning
- Choosing the Right Metric for Evaluating Machine Learning Models — Part 2
- YouTube: Kappa Coefficient
- Understanding Classification Thresholds Using Isocurves
- How to squeeze the most from your training data
- Visualizing Cross-validation Code Sep 2017.
- Selecting the best model in scikit-learn using cross-validation June 2015.
- Bias-Variance Tradeoff in Machine Learning
- Part 4 - The Bias-Variance Dilemma July 2017.
- Visual Intro to Machine Learning - Part 2
- Putting ML in Production
- Machine Learning Engineering
- Software development best practices in a deep learning environment. April 2019.
- Software Engineering for Machine Learning: A Case Study. Blog commentary on this paper.
- Machine Learning Software Engineering: Top Five Best Practices
- Best Practices in Machine Learning Infrastructure. July 2019.
- Rules of Machine Learning: Best Practices for ML Engineering
- Using Machine Learning to Predict Value of Homes On Airbnb. Airbnb Blog. July 2017.
- Google Quick, Draw
- NYTimes: Will you Graduate? Ask Big Data.
- Analyzing 1.1 billion NYC taxi and uber trips
- The Next Wave: Predicting the future of coffee in New York City. Medium, Sep 2017.
- New-Age Machine Learning Algorithms in Retail Lending Sep 2017.
- Comparing supervised learning algorithms. Feb 2015.
- How to choose algorithms for Microsoft Azure Machine Learning
- An Empirical Comparison of Supervised Learning Algorithms
- Machine Learning Cheat sheet
- Machine Learning: Patterns for Predictive Analytics
- Machine Learning Algorithm Cheat Sheet. Sep 2014.
- Cheat Sheet – 10 Machine Learning Algorithms & R Commands. Jan 2015.
- scikit-learn: Choosing the right estimator
- Video: Hello World - Machine Learning Recipes #1. Mar 2016. Google Developers.
- Top 10 data mining algorithms in plain English
- Understanding Machine Learning Algorithms
- The 5 Clustering Algorithms Data Scientists Need to Know
- Hierarchical Clustering in R. April 2017.
- dendextend: a package for visualizing, adjusting, and comparing dendrograms
- A Practical Guide to Tree Based Learning Algorithms. July 2017.
- Blog Post: Machine Learning Made Easy with Talend – Decision Trees
- Blog post: Why do decision trees work?
- Video: Visualizing a Decision Tree - Machine Learning Recipes #2. Mar 2016. Google Developers.
- Book Chapter: Classification: Basic Concepts, Decision Trees, and Model Evaluation
- The
caret
package - How Decision Trees Work
- Awesome Decision Tree Research Papers
- Understanding Decision Trees for Classification in Python
- Video: MarI/O - Machine Learning for Video Games. June 2015.
- Introduction to Neural Networks, Advantages and Applications
- Summary of Unintuitive Properties of Neural Networks
- Neuroscience-Inspired Artificial Intelligence
- 37 Reasons why your Neural Network is not working
- 7 Steps to Understanding Deep Learning
- Neural Network Foundations, Explained: Activation Function
- Neural Network from Scratch
- But what is a Neural Network? | Deep learning, Part 1
- Gradient descent, how neural networks learn | Deep learning, part 2
- Ranking Popular Deep Learning Libraries for Data Science. Oct 2017
- Exploring Recurrent Neural Networks. Dec 2017.
- Convolutional Neural Networks in Python with Keras. DataCamp tutorial. Dec 2017.
- When reinforcement learning should not be used?
- Hacker's guide to Neural Networks. Andrej Karpathy's blog.
- Deep Reinforcement Learning: Pong from Pixels. Andrej Karpathy's blog. May 2016.
- The Unreasonable Effectiveness of Recurrent Neural Networks. Andrej Karpathy's blog. 2015.
- The Neural Network Zoo. Sep 2016.
- A Simple Starter Guide to Build a Neural Network
- Deep Learning: Which Loss and Activation Functions should I use?
- Creating an Artificial Neural Network from Scratch in R. GitHub tutorial.
- MIT Deep Learning Basics: Introduction and Overview with TensorFlow Feb 2019.
- Bayes’ Rule Applied - Using Bayesian Inference on a real-world problem
- A practical explanation of a Naive Bayes classifier
- Nomograms for Visualization of Naive Bayesian Classifier
- Support Vector Machines: A Simple Explanation
- An introduction to Support Vector Machines (SVM)
- Support Vector Machine (SVM) Tutorial August 2017.
- A Gentle Introduction on Market Basket Analysis — Association Rules Sep 2017
- Association Rules and the Apriori Algorithm: A Tutorial
- Kaggle: Frequent Itemsets and Association Rules
- Association Analysis Simplified
- A Novel Method of Interestingness Measures for Association Rules Mining Based on Profit.
- The Research on Measure Method of Association Rules Mining
- Interestingness Measures for Data Mining: A Survey
- Recommender Systems 101 – a step by step practical example in R
- Using R package, recommenderlab, for predicting ratings for MovieLens data
- Recommender Systems Comparison
- Building a Movie Recommendation System
- Building a Music Recommender with Deep Learning. Content-based.
- Recommendation System Algorithms: An Overview. July 2017.
- Spotify’s Discover Weekly: How machine learning finds your new music Oct 2017
- Instacart Market Basket Analysis, Winner's Interview: 2nd place, Kazuki Onodera
- Machine learning at Spotify: You are what you stream
- What makes a good recommender system?. Rubikloud blog, March 2017.
- Exploring Recommendation Systems. Jan 2018.
- Production Recommendation Systems with Cloudera
- Listing Embeddings for Similar Listing Recommendations and Real-time Personalization in Search
- Predicting movie ratings and recommender systems
- How does Netflix recommend movies? Matrix Factorization
- A survey of food recommenders
- Recommenders galore
- Simulacra And Selection
- Video: Ensemble
- Ensemble Learning to Improve Machine Learning Results Sep 2017.
- Interpretable Machine Learning with XGBoost
- Network Analysis and Visualization with R and igraph
- Graph-powered Machine Learning at Google. October 2016.
- Systems Applications of Social Networks. ACM Computing Surveys, Sep 2017.
- Data Mining for Predictive Social Network Analysis – Brazil Elections Case Study Nov 2015.
- The Star Wars social networks – who is the central character? Dec 2015.
- GRAKN.AI: Example Projects
- Visual network analysis with Gephi
- Network science reveals the secrets of the world’s best soccer team
- 5 Ways to Get Started with Reinforcement Learning
- Reinforcement Learning and Its Practical Applications
- Reinforcement Learning - Ep. 30 (Deep Learning SIMPLIFIED)
- Reinforcement Learning Basics
- Reinforcement Learning Explained
- Q Learning Explained
- A Tutorial on Reinforcement Learning I
- MIT 6.S191 Lecture 6: Deep Reinforcement Learning
- Reinforcement Learning FAQ: Frequently Asked Questions about Reinforcement Learning
- Exclusive: Interview with Rich Sutton, the Father of Reinforcement Learning
- Simple Reinforcement Learning with Tensorflow Part 0: Q-Learning with Tables and Neural Networks Medium. August 2016. 9-part series.
- Schooling Flappy Bird: A Reinforcement Learning Tutorial
- TensorFlow Tutorial For Beginners July 2017.
- Square off: Machine learning libraries. Jan 2018.
- Comparing Machine Learning as a Service: Amazon, Microsoft Azure, Google Cloud AI Jan 2018.
- Comparing Top Deep Learning Frameworks: Deeplearning4j, PyTorch, TensorFlow, Caffe, Keras, MxNet, Gluon & CNTK
- R for Reproducible Scientific Analysis: Reference. A nice set of tutorials from Software Carpentry.
- R for Data Science. An excellent online book by Garrett Grolemund and Hadley Wickham.
- Awesome R - A curated list of awesome R packages and tools.
- swirl: Learn R, in R. swirl teaches you R programming and data science interactively, at your own pace, and right in the R console!
- FREE COURSE: Introduction to R
- Data Import Cheat Sheet
- Data Transformation Cheat Sheet
- Sparklyr Cheat Sheet
- R Markdown Cheat Sheet
- R Markdown Reference Guide
- RStudio IDE Cheat Sheet
- Data Visualizaton Cheet Sheet
- caret package: classification and regression training
- DataCamp: Cleaning Data in R
- DataCamp: Joining Data in R with Dplyr
- DataCamp: Data Manipulation in R with dplyr
- Tidyverse, an opinionated Data Science Toolbox in R from Hadley Wickham
- aRrgh: a newcomer’s (angry) guide to R
- Making R Code Faster : A Case Study
- TensorFlow for R
- Introducing ViewPipeSteps: Towards Observable Programming in R
- FastR
- DataCamp: Intro to Python for Data Science Free online course.
- DataCamp: All Python courses Free and paid online courses.
- Software Carpentry: Programming with Python. Free online course.
- The Google Python class. Free online course.
- Coursera: Python For Everyone. Free online course.
- Learn Python the Hard Way. Free online book.
- YouTube: Python Programming
- YouTube videos of old Khan Academy lectures: Python
- Data Science from Scratch. Book.
- Python for Data Analysis. Book.
- Awesome Python. Free curated list of more Python resources.
- KDNuggets: 7 Steps to Mastering Machine Learning With Python. Article.
- A Dramatic Tour through Python’s Data Visualization Landscape (including ggplot and Altair). Blog post.
- Python Seaborn Cheat Sheet For Statistical Data Visualization Aug 2017.
- Top 15 Python Libraries for Data Science in 2017
- 6 Reasons Why Python Is Suddenly Super Popular
- A Visual Intro to NumPy and Data Representation
- A landscape diagram for Python data. March 2019.
- SQL Murder Mystery
- DataCamp: Intro to SQL for Data Science
- DataCamp: Joining Data in PostgreSQL
- SW Carpentry: Databases and SQL
- The SQL Tutorial for Data Analysis
- SQL is 43 years old - here’s 8 reasons we still use it today. April 2017. HN Post.
- SQL Tutorial: How To Write Better Queries
- Why SQL is beating NoSQL, and what this means for the future of data
- Franchise: An open-source notebook for SQL
- SQL Window Functions to Pass a Data Analytics Interview (Opinionated SQL Series Part 2/N)
- Select Star SQL. This is an interactive book which aims to be the best place on the internet for learning SQL.
- SQL Interview Questions: 3 Tech Screening Exercises (For Data Analysts)
- OLAP queries in SQL: A Refresher
- 21 of the best free resources to learn SQL
- The Weka Workbench
- Video: Weka Data Mining Tutorial for First Time & Beginner Users
- Videos: WekaMOOC: Data Mining with Weka
- FutureLearn: Data Mining with Weka
- Learning Spark: Lightning-Fast Big Data Analysis
- Advanced Analytics with Spark: Patterns for Learning from Data at Scale
- Databricks WhitePapers
- Fast track Apache Spark. Sep 2017.
- Should Spark have an API for R?
- Quora: How do I learn Apache Spark?
- Apache Spark: A Unified Engine for Big Data Processing
- Using Apache Spark to Analyze Large Neuroimaging Datasets. August 2016.
- Apache Spark @Scale: A 60 TB+ production use case). August 2016.
- Big Data Processing with Apache Spark – Part 1: Introduction\
- Intro to Apache Spark
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- A Powerful Big Data Trio: Spark, Parquet and Avro
- Interactive Analysis
- The RDD API by example
- Why Apache Spark is a Crossover Hit for Data Scientists
- Building a food recommendation engine with Spark / MLlib and Play]
- Movie Recommendations and More With Spark
- Blog post: $1.44 per terabyte: setting a new world record with Apache Spark Nov 2016
- Blog post: How-to: Predict Telco Churn with Apache Spark MLlib
- KDNuggets: 7 Steps to Mastering Apache Spark 2.0
- Databricks: Introducing Apache Spark 2.0
- KDNuggets: Apache Spark Key Terms, Explained
- Article Spark Streaming: What Is It and Who’s Using It? Nov 2015
- Apache Spark: A Unified Engine for Big Data Processing
- Spark Summit 2013 - The State of Spark, and Where We're Going Next - Matei Zaharia
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- First Steps with Spark - Screencast #1
- Spark Documentation Overview – Screencast #2
- Transformations and Caching - Spark Screencast #3
- A Standalone Job in Scala - Spark Screencast #4
- Apache Spark on YouTube
- Advanced Apache Spark Training
- Structuring Apache Spark 2.0
- Apache Spark 2.0: A Deep Dive Into Structured Streaming
- YouTube: Spark and Spark Streaming at Uber - Meetup talk with Tathagata Das
- YouTube: Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
- YouTube: Intro to Spark Streaming
- RStudio Webinar: Using Spark with Shiny and R Markdown
- Big Data Architecture: A Complete and Detailed Overview
- The Infrastructure Behind Twitter: Scale Jan 2017
- Study on Big Data in Public Health, Telemedine and Healthcare. Dec 2016.
- Michael Stonebraker | Big Data is (at least) Four Different Problems
- Don't use Hadoop - your data isn't that big. 2013. (HN discussion.)
- Data Lake – the evolution of data processing
- Why NoSQL Database?
- 7 Steps to Understanding NoSQL Databases
- Types of NoSQL databases and key criteria for choosing them
- NoSQL Data Modeling Techniques
- NoSQL Databases: a Survey and Decision Guidance
- Stack Overflow: What does “Document-oriented” vs. Key-Value mean when talking about MongoDB vs Cassandra?
- Visual Guide to NoSQL Systems
- NoSQL for Dummies
- Video: GOTO 2012 • Introduction to NoSQL • Martin Fowler
- Video: The Art Of Database Design
- Python Course: Lambda, filter, reduce and map
- HPC MapReduce Exercise: Hands-On Lab
- Book: Data-Intensive Text Processing with MapReduce
- Blog: MapReduce Questions and Answers
- Every single Machine Learning course on the internet, ranked by your reviews
- CMU: Statistical Machine Learning
- CMU: Introduction to Machine Learning
- Elite Data Science
- Machine Learning Crash Course
- Stanford: Data Mining Certificates Online
- MIT OpenCourseware
- UCI
- Kaggle Datasets
- r/datasets
- Awesome public datasets
- R's
datasets
package - Stanford Large Network Dataset Collection
- Data is Plural
- FiveThirtyEight's datasets
- 9 Must-Have Datasets for Investigating Recommender Systems
- Datasets For recommender system
- Wikipedia: List of datasets for ML research
- Google Dataset Search
- Data Commons
- Recommender Systems Datasets
- The State of Data Science and Machine Learning, 2017 Survery
- AI and Deep Learning in 2017 – A Year in Review. Dec 31, 2017. WildML.
- Scaling Analytics at Wish. Jan 8, 2018.
- 30 Amazing Machine Learning Projects for the Past Year (v.2018). Jan 2018.
- Ethical Data Practices
- Big Companies Are Embracing Analytics, But Most Still Don’t Have a Data-Driven Culture
- Data Science and Machine Learning Interview Questions
- How to Build Disruptive Data Science Teams: 10 Best Practices
- NLP Interview Questions
- Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing
- DataCamp: Intro to data.world in Python
- Data Science Weekly's Data Science Resources
- 7 command line tools for data science
- Silicon Valley siphons our data like oil. But the deepest drilling has just begun. Aug 2017
- Huge Trello List of Great Data Science Resources
- Fairness in Machine Learning NIPS 2017 tutorial.
- The Product Possibilities of Interpretability
- When an AI finally kills someone, who will be responsible?
- Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead