# Logistic regression

This notebook aims to implement logistic regression in order to classify music pulled from my spotify account which I have classified as 1 for like and 0 for dislike.

In [28]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from log_reg import *

# make matplotlib figures appear inline in the notebook
%matplotlib inline
plt.rcParams['figure.figsize'] = (14.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# make the notebook automatically reload external python modules
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Data gathering

First we must read the data from a csv into a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) and get a sense of what all the data means. A complete list of what each feature means can be found [here](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/)

In [32]:
df = pd.read_csv('data.csv')
df.head()

Unnamed: 0,name,acousticness,analysis_url,danceability,duration_ms,energy,id,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,track_href,type,uri,valence,like
0,Marry Me,0.113,https://api.spotify.com/v1/audio-analysis/0OWZ...,0.588,205453,0.408,0OWZFobGSIW9GrSlQ9C5pc,0.0,3,0.13,-8.662,1,0.0237,85.008,4,https://api.spotify.com/v1/tracks/0OWZFobGSIW9...,audio_features,spotify:track:0OWZFobGSIW9GrSlQ9C5pc,0.484,1.0
1,Is It Really Me You're Missing,0.653,https://api.spotify.com/v1/audio-analysis/1WO6...,0.535,232800,0.297,1WO6kvO7P8TOsuhBaqun5w,0.0,6,0.118,-6.043,1,0.0665,147.569,4,https://api.spotify.com/v1/tracks/1WO6kvO7P8TO...,audio_features,spotify:track:1WO6kvO7P8TOsuhBaqun5w,0.0492,1.0
2,Hollow Crown,0.103,https://api.spotify.com/v1/audio-analysis/15ba...,0.62,171320,0.564,15baNmdl3WSqSKnT0YDG2v,0.0,8,0.0735,-7.09,1,0.0392,150.034,4,https://api.spotify.com/v1/tracks/15baNmdl3WSq...,audio_features,spotify:track:15baNmdl3WSqSKnT0YDG2v,0.301,1.0
3,A Big World,0.312,https://api.spotify.com/v1/audio-analysis/4vhW...,0.394,186681,0.505,4vhW66VGfINDEOssckYMIW,0.0,7,0.268,-7.638,1,0.0473,109.446,4,https://api.spotify.com/v1/tracks/4vhW66VGfIND...,audio_features,spotify:track:4vhW66VGfINDEOssckYMIW,0.255,1.0
4,Roman Sky,0.139,https://api.spotify.com/v1/audio-analysis/1hy6...,0.455,300361,0.42,1hy6eKT3JRhi3ODXpL8Ubu,3.3e-05,2,0.101,-8.731,0,0.0297,130.045,4,https://api.spotify.com/v1/tracks/1hy6eKT3JRhi...,audio_features,spotify:track:1hy6eKT3JRhi3ODXpL8Ubu,0.101,1.0


In [33]:
df.drop(columns=['analysis_url', 'id', 'track_href', 'type', 'uri', 'name'], inplace=True)

In [34]:
df.head()

Unnamed: 0,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,like
0,0.113,0.588,205453,0.408,0.0,3,0.13,-8.662,1,0.0237,85.008,4,0.484,1.0
1,0.653,0.535,232800,0.297,0.0,6,0.118,-6.043,1,0.0665,147.569,4,0.0492,1.0
2,0.103,0.62,171320,0.564,0.0,8,0.0735,-7.09,1,0.0392,150.034,4,0.301,1.0
3,0.312,0.394,186681,0.505,0.0,7,0.268,-7.638,1,0.0473,109.446,4,0.255,1.0
4,0.139,0.455,300361,0.42,3.3e-05,2,0.101,-8.731,0,0.0297,130.045,4,0.101,1.0
