# Machine Learning for music playlists

## Creating music playlists in iTunes with Scikit-Learn tools
— March 2016 —

### 1. Analysis overview
This is a series of posts devoted to analysis of iTunes music library using **[Scikit-Learn](http://scikit-learn.org)** tools and the **[Echo Nest API](http://the.echonest.com/)**.

The purpose of the analysis is to detect tracks in my iTunes music library that would suit my fitness practices, which are "cycling", "yoga", and "ballet". The reason I chose these categories is that I wanted to have two very different classes of songs ("yoga" and "cycling") and one more alike class — "ballet" — that might float between the other two classes. I'd like to find out what attributes make that difference.   

To solve that problem I use Scikit-Learn machine learning classification algorithms.


### 2. Goals
2.1. Build music playlists from my iTunes music library for my fitness practices.

2.2. Explore the basics of Scikit-Learn tools, including:
* Classification algorithms;
* One-Class SVM for novelty detection;
* Model validation;
* Data Standardization;
* Dimensionality reduction.

2.3. Explore the Echo Nest API and track attributes. 

2.4. Document the learning process step-by-step for myself and for other newbies at data analysis. Please take into consideration that I'm a greenhorn not only at data analysis but also at Python. 

### 3. Contents
The series of posts includes the following notebooks:  
[00_Summary](http://localhost:8888/notebooks/00_Summary.ipynb) — Summary of this analysis, its goals and methods, installation notes.  
[01_Data_preparation](http://localhost:8888/notebooks/01_Data_preparation.ipynb) — Data gathering and cleaning.  
[02_Data_visualisation](http://localhost:8888/notebooks/02_Data_Visualisation.ipynb) — Visualisation and overview of data.  
[03_Preprocessing](http://localhost:8888/notebooks/03_Preprocessing.ipynb) — Data preprocessing to use it as input for Scikit-learn machine learning algorithms.  
[04_Novelty_detection](http://localhost:8888/notebooks/04_Novelty_detection.ipynb) — Apply One-Class SVM algorithm to identify matching tracks in the unlabeled dataset.  
[]()  
  

### 4. Installation notes
This notebook requires the following packages:    
* numpy version 1.5 or later: http://www.numpy.org/
* pandas version 0.17.0 or later: http://pandas.pydata.org/
* scipy version 0.10 or later: http://www.scipy.org/
* matplotlib version 1.3 or later: http://matplotlib.org/
* scikit-learn version 0.14 or later: http://scikit-learn.org
* ipython version 2.0 or later, with notebook support: http://ipython.org
* seaborn version 0.7.0 or later: https://stanford.edu/~mwaskom/software/seaborn/#
* pyechonest version 9.0.0 or later: https://github.com/echonest/pyechonest
* pyItunes library: https://github.com/liamks/pyitunes
* sqlitedict library: https://github.com/piskvorky/sqlitedict
* HDF5 requires several libraries, detailed instructions for installation can be found in [this blog post](http://rustyrazorblade.com/2014/11/getting-started-with-pandas-and-hdf5/). 

### 4. Notebook listing
You can view materials using the nbviewer service.

Note, however, that you cannot modify or run the contents within nbviewer. To modify them, first download the repository, change to the notebooks directory, and run ipython notebook. For more information on the IPython notebook, see http://ipython.org/notebook.html