# AI Machine Learning Practice 

# Informations

- Dataset: Music dataset

- Objectives: Classification

- Time Limits: 1 min

- Score: Classification Accuracy (Test Data)

- Please read all markdowns carefully 

- About Dataset: Music Style Data
    - 348 float type music features (frequency, tone, tempo, timbre...)
    - Label: Music Style
        - 1: Melancholy
        - 2: Romantic
        - 3: Rhythmical
    

## [Step 0] Importing Packages

You must specify all the packages you use in this practice in the cell below.



In [1]:
from __future__ import print_function
import os
data_path = ['data']

from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

## [Step 1] Read Data

Train dataset is in the 'data' directory


In [2]:
import pandas as pd

# Import the data using the file path
filepath = os.sep.join(data_path + ['music_train_data.csv'])
data = pd.read_csv(filepath)

In [3]:
data.head(1)

Unnamed: 0,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,...,f340,f341,f342,f343,f344,f345,f346,f347,f348,answer
0,-0.166614,0.284691,-0.011022,-1.028812,0.101653,0.498247,-0.314566,1.208697,-1.503008,-1.457764,...,2.136721,-1.193955,0.040614,1.127366,0.741521,-0.70773,0.077748,0.832992,-1.291423,2


In [4]:
print(data.shape)
print(data.dtypes)

(650, 349)
f1        float64
f2        float64
f3        float64
f4        float64
f5        float64
           ...   
f345      float64
f346      float64
f347      float64
f348      float64
answer      int64
Length: 349, dtype: object


In [5]:
features = data.columns[:-1]
X_data = data[features]
y_data = data['answer']

## [Step 2] Data Preprocessing

* Preprocessing Code below 
* You must explain your method in this markdown
* (Important) You must define transfrom function for test data

In [6]:
# Sample Code - Min Max Scaling
import warnings
warnings.filterwarnings('ignore', module='sklearn')
msc = MinMaxScaler()

X_data = pd.DataFrame(msc.fit_transform(X_data),  # this is an np.array, not a dataframe.
                    columns=X_data.columns)

X_data.head(5)

Unnamed: 0,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,...,f339,f340,f341,f342,f343,f344,f345,f346,f347,f348
0,0.630211,0.538339,0.46376,0.040328,0.736837,0.780866,0.28018,0.264939,0.210008,0.169543,...,0.487044,0.518064,0.095777,0.52259,0.586644,0.320691,0.475852,0.228383,0.78003,0.362149
1,0.552919,0.606669,0.201281,0.116883,0.52788,0.914367,0.110937,0.053678,0.436478,0.483335,...,0.518388,0.008166,0.231839,0.684377,0.242407,0.143329,0.379794,0.364343,0.34701,0.594168
2,0.621325,0.551232,0.578177,0.174603,0.573474,0.715899,0.222025,0.05408,0.437189,0.647545,...,0.317538,0.221009,0.251789,0.472166,0.226927,0.179582,0.605205,0.027091,0.341074,0.42147
3,0.652887,0.513503,0.618537,0.072859,0.618504,0.617193,0.20625,0.047912,0.642507,0.46235,...,0.548682,0.251897,0.268742,0.298482,0.3063,0.130865,0.0,0.407966,0.829491,0.609766
4,0.805443,0.343634,0.486271,0.549784,0.850448,0.701505,0.384685,0.245385,0.422189,0.215617,...,0.549331,0.030774,0.267452,0.413211,0.655781,0.263345,0.46422,0.14344,0.873379,0.567479


In [7]:
# transform function
# Do not change the function name
def transform_test(X_test_data):
    X_test_data = msc.transform(X_test_data)
    return X_test_data

## Model Training

* Training Code Below
* You must explain your method in this markdown
* (Important) Your model variable should be named 'model' !!! 

In [8]:
# Sample code - kNN Classification
model = KNeighborsClassifier(n_neighbors=3)

model = model.fit(X_data, y_data)

y_pred = model.predict(X_data)

## Check Accuracy 

* Check your Train data accuracy

In [9]:
# Function to calculate the % of values that were correctly predicted

def accuracy(real, predict):
    return sum(real == predict) / float(real.shape[0])

In [10]:
print(accuracy(y_data, y_pred))

0.9230769230769231


## Analysis 

* Analyze your model's result
* You may use additional metrics (F1 Score, Confusion matrix) or visualize your results using plots
* Hint : PCA plot will help you understand the dataset (Which class is the most challenging class to classify?)
* Hint : You may also compare different models to choose the best model among classifiers what we learned in this semester

In [11]:
# Your code here

# Test data

* TA will check your model's test data accuracy
* (Important) Do not change the code below

In [12]:
filepath = os.sep.join( ['data', 'music_test_data.csv'])
t_data = pd.read_csv(filepath)
features = t_data.columns
X_t_data = t_data[features]
X_t_data = transform_test(X_t_data)

y_pred = model.predict(X_t_data)
np.savetxt('out.txt', y_pred, fmt='%d', delimiter='\n')