2-Day Workshop - Introduction to Data Science in Python
Materials for the Paris-Saclay Center for Data Science python workshop
Data science is gaining attention impacting many scientific fields and applications. Data science encompasses a large number of topics such as data mining, data wrangling, data visualisation, pattern recognition, or machine learning.
This workshop intends to give an introduction to some of these topics using Python and the PyData ecosystem. It is not a course on deep learning.
Note: the material in this repo is WIP, not the finalized material.
Day 1 - Data wrangling, exploration, and visualisation
Goal: introduce the PyData ecosystem to manipulate, explore, and visualize data.
- Introduction to the basics of numpy, pandas, and matplotlib.
Day 2 - Machine learning
Goal: introduce the basics of machine learning using the scikit-learn library.
- Get familiar with general principles of machine learning;
- Use these principles by using the scikit-learn library on some toy and real-world data examples.
The course uses Python 3 and some data analysis packages such as Numpy, Pandas, scikit-learn, matplotlib, and seaborn. To install the required libraries, we highly recommend Anaconda or miniconda (https://www.anaconda.com/download/) or another Python distribution that includes the scientific libraries (this recommendation applies to all platforms, so for both Window, Linux and Mac).
For first time users and people not fully confident with using the command line, we advice to install Anaconda, by downloading and installing the Python 3.x version from https://www.anaconda.com/download/. Recent computers will require the 64-Bit installer.
Note: When you are already familiar to the command line and Python environments you could opt to use Miniconda instead of Anaconda and download it from https://conda.io/miniconda.html. The main difference is that Anaconda provides a graphical user interface (Anaconda navigator) and a whole lot of scientific packages (e.g https://docs.anaconda.com/anaconda/packages/py3.6_win-64/) when installing, whereas for Miniconda the user needs to install all packages using the command line. On the other hand, Miniconda requires less disc space. By choosing Miniconda, create the workshop environment using the
conda env create -f environment.yml
Install/check of required packages
This tutorial will require recent installations of
- Jupyter notebook
The last one is important and you should be able to type:
in your terminal window and see the notebook panel load in your web browser. Try opening and running a notebook from the material to see check that it works. Alternatively you can use Jupyter notebook.
After obtaining the material, we strongly recommend you to open and execute the script using
python check_env.py that is located at the top level of this repository.
We also recommend you to update the scikit-learn the latest release version to ensure best compatibility with the teaching material. Please upgrade already installed packages by executing
conda update [package-name]
Depending on how you installed