Skip to content

leportella/introduction-data-science-modules

Repository files navigation

Introduction to Data Science Modules

This tutorial aims to give a small overview of 4 of the most known libraries for data analysis:

  1. Jupyter - Jupyter is where we will run our code and document our findings and methods.
  2. Numpy - Numpy is the numerical library that is the bases for most scientific libraries today. It introduces the concepts of array and matrices that the pure Python lacks
  3. Pandas - Pandas is a very known library built on top of Numpy. It makes it easy in some aspects to deal with data, since it introduces the concepts of columns names and indexes.
  4. Matplotlib - Is the first library dedicated for visualization.

It is divided into 5 jupyter notebooks:

  1. Jupyter - Introduction to the environment we will work on
  2. Numpy - Introduction to Numpy methods, arrays and matrices
  3. Pandas Series - Introduction to the concept of Series in Pandas (similar to arrays)
  4. Pandas DataFrames - Introduction to the concept of DataFrames in Pandas. DataFrames are close to matrices but they are very similar to Excel spreadsheets
  5. Matplotlib - Introduction to the Matplotlib visualization library. We will also explore how to use Matplotlib builtin charts that Pandas implements.

All these libraries have wonderful documentation. I encourage you to go check it out once you understand these basics

Do it yourself!

You can check the notebooks on MyBinder

Binder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published