Skip to content

Self-Learning Data Science Roadmap for anyone interested in how to break into the field

Notifications You must be signed in to change notification settings

savery1/Self-Learning-Data-Science-Roadmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 

Repository files navigation

Self-Learning-Data-Science-Roadmap

Self-Learning Data Science Roadmap for anyone interested in how to break into the field!

This repository is intended to provide a free Self-Learning Roadmap to learn the field of Data Science. I provide some of the best free resources (some paid) that I have found scattered across the internet. These resources have been consolidated into a single repository to help build the mindset and skills required to learn about the field.

Data Science vs Data Analytics vs Data Engineering - What's the Difference?

These terms are wrongly used interchangably among people. There are distinct differences:

  • Data Science is a multidisciplinary field that focuses on looking at raw and structured data sets and providing potential actionable insights. The field of Data Science looks at ensuring we are asking the right questions as opposed to finding exact answers. Data Scientist require skillsets that are centered on Computer Science, Mathematics, and Statistics. Data Scientist use several unique techniques to analyze data such as machine learning, trends, linear regressions, and predictive modeling. The tools Data Scientist use to apply these techniques include Python and R.
  • Data Analytics focuses on looking at existing data sets and creating solutions to capture data, process data, and finally organize data to draw actionable insights. The field of Data Analytics looks at finding general process, business, and engineering improvements we can make based on questions we don't know the answers to. Data Analytics require skillsets that are centered on Statistics, Mathematics, and high level understanding of Computer Science. Data Analytics involves data cleaning, data visualization, and simple modeling. Common Data Analytic tools used include Microsoft Power Bi, Tableau, and SQL.
  • Data Engineering focuses on creating the correct infrastructure and tools required to support the business. Data Engineers look at what are the optimal ways to store and extract data and involves writing scripts and building data warehouses. Data Engineering require skillsets that are centered on Software Engineering, Computer Science and high level Data Science. The tools Data Engineers utilize are mainly Python, Java, Scala, Hadoop, and Spark.

Recommended Path to Follow

In order to fully take advantage of this repository I suggest the following:

  • Brush up on the math/stats
  • Learn Python basics
  • Learn Python advanced
  • Learn Python basic Data Science libraries (numpy, pandas, matplotlib, seaborn etc...)
  • Learn Python Machine Learning (Scikit-Learn)
  • Learn Python Deep Learning (Tensorflow)
  • Practice all the above
  • Basic project

Legend

  • πŸ“Ή Video Content
  • πŸ“• Online Article Content

Medium Articles

---> More to come on this. I will be uploading my own content onto Medium (will post links to the articles) and I'll go into depth with how to analyze basic problems and models using techniques such as linear regressions, A/B Testing, neural networks, and high level machine learning. Also, I'll cover some examples of how to use visual applications such as Microsoft Power Bi to build reports and dashboards.

Statistics / Probability

Descriptive Statistics

Probability

Combinations and Permutations

Distributions

Confidence Intervals

Hypothesis Testing

Linear Algebra

Vectors and Spaces

Dot Product

Matrix Transformations

Eigenvalues and Eigenvectors

Integrals

Download Python

Use this link to download Anaconda. Anaconda is package that will install Python (Spyder IDE) and Jupyter Notebooks.

Python Programming Basics

Python Programming Advanced

Python Data Science Libraries

NumPy

NumPy is a Python library used for working with arrays.

  • πŸ“Ή Complete Python NumPy Tutorial - This is about an hour long video but provides basic understanding of how to utilize the NumPy library for those of you who would rather listen instead of read.
  • πŸ“Ή NumPy Explained in 5 Minutes - For those of you who want the 5 minute explanation.

NumPy Basics

Shape Manipulation

Copies and Views

Less Basic

Advanced Indexing and Index Tricks

Linear Algebra

Tricks and Tips

Additional Resources

  • πŸ“• 101 Python Excercises - Highly recommend doing these at the end of all the lessons above. These excercises ask you the question, show the desired output, and the solution. If you can't solve the question, research it online first before looking at the solution.
  • πŸ“• Extensive NumPy Tutorial - Extensive tutorial that covers NumPy and its relation to other Python features.

Pandas

---> More to come on this

Matplotlib

---> More to come on this

Seaborn

---> More to come on this

Scikit-Learn

---> More to come on this

TensorFlow

---> More to come on this

Recommended Projects

---> More to come on this

Recommended Paid Content for Additional Learning

---> More to come on this

About

Self-Learning Data Science Roadmap for anyone interested in how to break into the field

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published