Skip to content


Repository files navigation

A Python Tour of Data Science


This short primer is an introduction to the scientific Python stack for Data Science. It is designed as a tour around the major Python packages used for the main computational tasks encountered in the sexiest job of the 21st century. At the end of this tour, you'll have a broad overview of the available libraries as well as why and how they are used for each task. This notebook aims at answering the following question: which tool should I use for which task and how.

The primer is a Jupyter notebook.

  • The easiest way to play with it from your browser without installing anything is to click on the binder badge.
  • If you only want to look at it, open the HTML version rendered by nbviewer.
  • The most interactive way is to run the code by yourself, after installing Python and the required packages on your computer.
    # brew / apt-get / yum / pacman
    package-manager install python3
    # virtual environment
    pyvenv /path/to/new/virtual/env
    . /path/to/new/virtual/env/bin/activate
    # clone repository
    git clone
    cd python_tour_of_data_science
    make install  # install the dependencies (requirements.txt)
    make          # run the notebook to be sure everything is fine
    make clean    # clear the generated outputs
    # display notebook
    jupyter notebook

All codes and examples are released under the terms of the MIT License.