This tutorial requires pandas, scikit-learn and IPython with the IPython Notebook. These can be installed with pip by typing the following in terminal:
pip install --upgrade pip
pip install scipy
pip install numpy pandas sklearn ipython
pip install jupyter
Next, clone the material in this tutorial using git as follows:
git clone git://github.com/savarin/python_for_ml.git
We will be reviewing the materials with the IPython Notebook. You should be able to type
jupyter notebook
in your terminal window and see the notebook panel load in your web browser.
The tutorial will start with data manipulation using pandas - loading and cleaning data. We'll then use scikit-learn to make predictions. By the end of the session, we would have worked on the Kaggle Titanic data from start to finish, through a number of iterations in an increasing order of sophistication. Time-permitting, we’ll also have a brief discussion on cross-validation.