# iTunes music library analysis: Novelty Detection
This is the 4th post in a series of posts devoted to analysis of iTunes music library using Scikit-Learn tools.   
The purpose of the analysis is to detect tracks in my iTunes music library that would fit for my fitness practices, which are "cycling", "yoga", and "ballet". To solve that problem I use machine learning classification algorithms.    

The previous posts cover the following steps:
1. [00_Summary]() — Summary of this analysis, its goals and methods, installation notes.
2. [01_Data_preparation]() — Data gathering and cleaning.
3. [02_Data_visualisation]() — Visualisation and overview of data.
4. [03_Preprocessing]() — Data preprocessing to use it as input for Scikit-learn machine learning algorithms.

As a result of previous manipulations I have two databases (DBs): 
* training DB contains 88 labeled tracks with one of the three classes: "ballet", "cycling", "yoga";
* test DB contains 444 non-labeled tracks. 

The three classes I have in the training set don't cover all classes of music I have in my iTunes music library (test DB). Because of that I can't apply a classification algorithm to the whole test set as it will assign even irrelevant tracks to some class.  

In the following notebook I'm going to identify tracks in the unlabeled dataset that fit classes in the training dataset and eliminate tracks that completely unfit the classes. Only then I will perform classification.  

For this purpose I use [One-Class SVM](http://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html#sklearn.svm.OneClassSVM) unsupervised algorithm. One-Class SVM is used for novelty detection, that is, given a set of samples (training set), it will detect the soft boundary of that set so as to classify new points (test set) as belonging to that set or not. It's important to point out that the algorithm treats the training data as not polluted by outliers.

As a shortcut, in this notebook I import module "data_cleaning.py" where I perform steps from the [01_Data_preparation]() notebook. I start with importing the modules required in the following notebook.

In [None]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from IPython.display import display
import pandas as pd
import numpy as np
from sqlitedict import SqliteDict

# import my module from the previous notebook
import data_cleaning as cln

# set seaborn plot defaults
import seaborn as sns; 
sns.set(palette="husl")
sns.set_context("notebook")
sns.set_style("ticks")

# format floating point numbers
# within pandas data structures
pd.set_option('float_format', '{:.2f}'.format)

I use radial basis function, or 'rbf', kernel; 'nu' value has been chosen by trial and error method. (TODO: add short nu description)


