This tutorial aims to give a small overview of 4 of the most known libraries for data analysis:
- Jupyter - Jupyter is where we will run our code and document our findings and methods.
- Numpy - Numpy is the numerical library that is the bases for most scientific libraries today. It introduces the concepts of array and matrices that the pure Python lacks
- Pandas - Pandas is a very known library built on top of Numpy. It makes it easy in some aspects to deal with data, since it introduces the concepts of columns names and indexes.
- Matplotlib - Is the first library dedicated for visualization.
It is divided into 5 jupyter notebooks:
- Jupyter - Introduction to the environment we will work on
- Numpy - Introduction to Numpy methods, arrays and matrices
- Pandas Series - Introduction to the concept of Series in Pandas (similar to arrays)
- Pandas DataFrames - Introduction to the concept of DataFrames in Pandas. DataFrames are close to matrices but they are very similar to Excel spreadsheets
- Matplotlib - Introduction to the Matplotlib visualization library. We will also explore how to use Matplotlib builtin charts that Pandas implements.
All these libraries have wonderful documentation. I encourage you to go check it out once you understand these basics
You can check the notebooks on MyBinder