Self-Learning Data Science Roadmap for anyone interested in how to break into the field!
This repository is intended to provide a free Self-Learning Roadmap to learn the field of Data Science. I provide some of the best free resources (some paid) that I have found scattered across the internet. These resources have been consolidated into a single repository to help build the mindset and skills required to learn about the field.
These terms are wrongly used interchangably among people. There are distinct differences:
- Data Science is a multidisciplinary field that focuses on looking at raw and structured data sets and providing potential actionable insights. The field of Data Science looks at ensuring we are asking the right questions as opposed to finding exact answers. Data Scientist require skillsets that are centered on Computer Science, Mathematics, and Statistics. Data Scientist use several unique techniques to analyze data such as machine learning, trends, linear regressions, and predictive modeling. The tools Data Scientist use to apply these techniques include Python and R.
- Data Analytics focuses on looking at existing data sets and creating solutions to capture data, process data, and finally organize data to draw actionable insights. The field of Data Analytics looks at finding general process, business, and engineering improvements we can make based on questions we don't know the answers to. Data Analytics require skillsets that are centered on Statistics, Mathematics, and high level understanding of Computer Science. Data Analytics involves data cleaning, data visualization, and simple modeling. Common Data Analytic tools used include Microsoft Power Bi, Tableau, and SQL.
- Data Engineering focuses on creating the correct infrastructure and tools required to support the business. Data Engineers look at what are the optimal ways to store and extract data and involves writing scripts and building data warehouses. Data Engineering require skillsets that are centered on Software Engineering, Computer Science and high level Data Science. The tools Data Engineers utilize are mainly Python, Java, Scala, Hadoop, and Spark.
In order to fully take advantage of this repository I suggest the following:
- Brush up on the math/stats
- Learn Python basics
- Learn Python advanced
- Learn Python basic Data Science libraries (numpy, pandas, matplotlib, seaborn etc...)
- Learn Python Machine Learning (Scikit-Learn)
- Learn Python Deep Learning (Tensorflow)
- Practice all the above
- Basic project
- πΉ Video Content
- π Online Article Content
---> More to come on this. I will be uploading my own content onto Medium (will post links to the articles) and I'll go into depth with how to analyze basic problems and models using techniques such as linear regressions, A/B Testing, neural networks, and high level machine learning. Also, I'll cover some examples of how to use visual applications such as Microsoft Power Bi to build reports and dashboards.
- πΉ Theoretical Probability
- πΉ Sample Spaces
- πΉ Set Operations
- πΉ Addition Rule
- πΉ Multiplication Rule for Independent Events
- πΉ Multiplication Rule for Dependent Events
- πΉ Conditional Probability and Independence
- πΉ Counting Principle and Fractorial
- πΉ Permutations
- πΉ Combinations
- πΉ Normal distribution and the Empirical rule
- πΉ Introduction to Sampling Distributions
- πΉ Sampling distribution of a sample proportion
- πΉ Sampling distribution of a sample mean
- πΉ Hypothesis Testing
- πΉ Error Probabilities and Power
- πΉ Tests about a Population Proportion
- πΉ Tests about a Population Mean
- πΉ Vectors
- πΉ Linear Combinations and Span
- πΉ Linear Dependence and Independence
- πΉ Linear Subspace
- πΉ Functions and Linear Transformations
- πΉ Transformations and Matrix Multiplications
- πΉ Inverse Functions and Transformations
- πΉ Inverses and Determinants
- πΉ Transpose of a Matrix
- πΉ Approximation with Riemann Sums
- πΉ Definite Integrals with Riemann Sums
- πΉ The Fundamental Theorem of Calculus and Accumulation Functions
- πΉ Properties of Definite Integrals
- πΉ The Fundamental Theorem of Calculus and Definite Integrals
- πΉ Reverse Power Rule
- πΉ Indefinite Integrals of Common Functions
- πΉ Definite Integrals of Common Functions
Use this link to download Anaconda. Anaconda is package that will install Python (Spyder IDE) and Jupyter Notebooks.
- π Hello, World!
- π Variables and types
- π Lists
- π Basic Operators
- π String Formatting
- π Basic String Operations
- π Conditions
- π Loops
- π Functions
- π Classes and Objects
- π Dictionaries
- π Modules and Packages
- π Generators
- π List Comprehensions - Know this.
- π Multiple Function Arguments
- π Regular Expressions
- π Exception Handling
- π Sets
- π Serialization
- π Partial Functions
- π Code Introspection
- π Closures
- π Decorators - Know this.
- π Map, Filter, Reduce - Know this.
NumPy is a Python library used for working with arrays.
- πΉ Complete Python NumPy Tutorial - This is about an hour long video but provides basic understanding of how to utilize the NumPy library for those of you who would rather listen instead of read.
- πΉ NumPy Explained in 5 Minutes - For those of you who want the 5 minute explanation.
- π NumPy Example
- π Array Creation
- π Printing Arrays
- π Basic Operations
- π Universal Functions
- π Indexing, Slicing, Iterating
- π Shape Manipulation
- π Stacking together different arrays
- π Splitting one array into several smaller ones
- π No Copy At All
- π View or Shallow Copy
- π Deep Copy
- π Functions and Methods Overview
- π Broadcasting
- π Indexing with Arrays of Indices
- π Indexing with Boolean Arrays
- π The ix_() function
- π Indexing with strings
- π Automatic Reshaping
- π Vector Stacking
- π 101 Python Excercises - Highly recommend doing these at the end of all the lessons above. These excercises ask you the question, show the desired output, and the solution. If you can't solve the question, research it online first before looking at the solution.
- π Extensive NumPy Tutorial - Extensive tutorial that covers NumPy and its relation to other Python features.
---> More to come on this
---> More to come on this
---> More to come on this
---> More to come on this
---> More to come on this
---> More to come on this
---> More to come on this