# Machine Learning - Final project
### Empathy Prediction using the [Young people survey](https://www.kaggle.com/miroslavsabo/young-people-survey/) dataset

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="white", color_codes=True)
sns.set_context(rc={"font.family":'sans',"font.size":24,"axes.titlesize":24,"axes.labelsize":24})  
pd.options.display.max_columns = 150
pd.options.display.max_rows = 150

#### Loading datasets

In [89]:
df = pd.read_csv('./input/responses.csv')
columns_desc = pd.read_csv('./input/columns.csv')

In [90]:
df.shape

(1010, 150)

#### Defining way to access description easily

In [70]:
all_columns =list(df)

desc = {}
for col_name in all_columns:
    desc[col_name] = columns_desc.loc[columns_desc['short'] == col_name].iloc[0][0]
    
def print_desc(column_name):
    print(column_name+" --> "+desc[column_name])

Example:

In [71]:
for col in all_columns[:5]:
    print_desc(col)

Music --> I enjoy listening to music.
Slow songs or fast songs --> I prefer.
Dance --> Dance, Disco, Funk
Folk --> Folk music
Country --> Country


## Data Exploration and preprocessing

### Missing values

Let's start by looking at how many rows are missing a class label, I will delete these rows since it makes no sense to impute them, train or test using them

In [86]:
df['Empathy'].isna().sum()

5

In [91]:
df.dropna(subset=['Empathy'], inplace=True)

In [92]:
df.shape

(1005, 150)

Let's now look at the percentage of missing values in all columns in desc order

In [94]:
df_na = (df.isnull().sum() / len(df)) * 100
df_na = df_na.drop(df_na[df_na == 0].index).sort_values(ascending=False)
missing_data = pd.DataFrame({'Missing Ratio': df_na})
missing_data.head(150).T

Unnamed: 0,Height,Weight,Passive sport,Chemistry,Geography,Theatre,Documentary,Smoking,Latino,Punk,Criminal damage,Compassion to animals,Final judgement,"Reggae, Ska",Gardening,Alternative,Rock n roll,"Techno, Trance",Age,Classical music,"Swing, Jazz",Movies,Biology,Reading,Giving,PC,Gender,Number of siblings,Responding to a serious letter,Daily events,Science and technology,Art exhibitions,Friends versus money,Writing,"Countryside, outdoors",Rock,Folk,Self-criticism,Spiders,Country,Economy Management,Energy levels,Workaholism,Psychology,Foreign languages,Prioritising workload,Getting up,Medicine,Alcohol,Socializing,Village - town,Pets,Active sport,Reliability,Loss of interest,Fun with friends,Cars,House - block of flats,Funniness,Decision making,Dance,"Hiphop, Rap",Questionnaires or polls,Finding lost valuables,Personality,Small - big dogs,Happiness in life,Getting angry,Children,Mood swings,Western,Internet,Cheating in school,Hypochondria,Judgment calls,Romantic,Comedy,Fantasy/Fairy tales,Animated,Religion,Physics,Dancing,Metal or Hardrock,Pop,Mathematics,Music,Appearence and gestures,Spending on looks,Life struggles,Interests or hobbies,Charity,Finances,Elections,Thinking ahead,Writing notes,Healthy eating,Entertainment spending,Rats,Heights,Left - right handed
Missing Ratio,1.99005,1.99005,1.492537,0.995025,0.895522,0.79602,0.79602,0.79602,0.79602,0.79602,0.696517,0.696517,0.696517,0.696517,0.696517,0.696517,0.696517,0.696517,0.696517,0.696517,0.597015,0.597015,0.597015,0.597015,0.597015,0.597015,0.597015,0.597015,0.597015,0.597015,0.597015,0.597015,0.597015,0.597015,0.597015,0.597015,0.497512,0.497512,0.497512,0.497512,0.497512,0.497512,0.497512,0.497512,0.497512,0.497512,0.497512,0.497512,0.497512,0.497512,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.39801,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507,0.298507


The percentage of missing values is not a lot, we can easily impute them without introducing too much bias by using the mode

In [95]:
df[all_columns]=df[all_columns].fillna(df.mode().iloc[0])

In [97]:
df.isnull().any().any()

False

There are no missing values now

### Data types

In [17]:
df['Music'] = df['Music'].astype(int)

['a', 'b', 'c', 'd']

In [105]:
df.dtypes[:10]

Music                       float64
Slow songs or fast songs    float64
Dance                       float64
Folk                        float64
Country                     float64
Classical music             float64
Musical                     float64
Pop                         float64
Rock                        float64
Metal or Hardrock           float64
dtype: object

In [109]:
float_cols = df.dtypes == np.float64

In [115]:
fake_float = list(df.loc[:, float_cols])

In [116]:
len(fake_float)

134

134 columns are float but should be int since the decimal value is always 0, let's convert them

In [117]:
df[fake_float] = df[fake_float].astype(int)

### One hot encoding
Since the algorithms that I have in mind do not handle categorical values I will need to perform one hot encoding or encode values in numbers for ordinal data

Let's look at categorical columns and decide which ones can be ordinal

In [121]:
categorical = df.dtypes == np.object

In [123]:
df.loc[:, categorical].T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,...,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000,1001,1002,1003,1004,1005,1006,1007,1008,1009
Smoking,never smoked,never smoked,tried smoking,former smoker,tried smoking,never smoked,tried smoking,current smoker,tried smoking,never smoked,current smoker,tried smoking,tried smoking,never smoked,current smoker,former smoker,current smoker,tried smoking,tried smoking,former smoker,never smoked,tried smoking,tried smoking,tried smoking,tried smoking,tried smoking,tried smoking,never smoked,former smoker,tried smoking,former smoker,tried smoking,current smoker,tried smoking,tried smoking,tried smoking,never smoked,former smoker,tried smoking,never smoked,never smoked,tried smoking,never smoked,tried smoking,current smoker,never smoked,current smoker,tried smoking,current smoker,tried smoking,tried smoking,never smoked,current smoker,tried smoking,current smoker,current smoker,current smoker,current smoker,never smoked,current smoker,tried smoking,current smoker,never smoked,tried smoking,former smoker,tried smoking,tried smoking,never smoked,tried smoking,tried smoking,former smoker,tried smoking,former smoker,former smoker,tried smoking,...,tried smoking,tried smoking,never smoked,former smoker,never smoked,tried smoking,current smoker,tried smoking,current smoker,never smoked,current smoker,tried smoking,tried smoking,tried smoking,tried smoking,never smoked,former smoker,never smoked,tried smoking,former smoker,current smoker,current smoker,never smoked,tried smoking,current smoker,never smoked,tried smoking,never smoked,current smoker,tried smoking,tried smoking,tried smoking,never smoked,tried smoking,former smoker,current smoker,tried smoking,tried smoking,current smoker,current smoker,current smoker,tried smoking,current smoker,tried smoking,never smoked,tried smoking,tried smoking,tried smoking,tried smoking,current smoker,former smoker,tried smoking,current smoker,tried smoking,former smoker,former smoker,tried smoking,never smoked,former smoker,current smoker,tried smoking,former smoker,tried smoking,tried smoking,never smoked,never smoked,former smoker,current smoker,former smoker,former smoker,current smoker,never smoked,tried smoking,tried smoking,tried smoking
Alcohol,drink a lot,drink a lot,drink a lot,drink a lot,social drinker,never,social drinker,drink a lot,social drinker,drink a lot,social drinker,never,social drinker,social drinker,social drinker,social drinker,drink a lot,social drinker,social drinker,drink a lot,never,social drinker,social drinker,social drinker,never,social drinker,social drinker,social drinker,social drinker,social drinker,drink a lot,social drinker,social drinker,social drinker,social drinker,social drinker,social drinker,social drinker,social drinker,never,social drinker,social drinker,never,drink a lot,social drinker,social drinker,social drinker,social drinker,social drinker,social drinker,social drinker,never,social drinker,social drinker,social drinker,drink a lot,drink a lot,drink a lot,social drinker,drink a lot,social drinker,social drinker,social drinker,social drinker,social drinker,social drinker,social drinker,never,drink a lot,social drinker,social drinker,never,social drinker,social drinker,social drinker,...,social drinker,social drinker,social drinker,social drinker,never,never,social drinker,drink a lot,social drinker,social drinker,drink a lot,social drinker,social drinker,social drinker,social drinker,never,drink a lot,never,social drinker,social drinker,drink a lot,drink a lot,social drinker,social drinker,social drinker,social drinker,social drinker,never,social drinker,social drinker,social drinker,social drinker,social drinker,social drinker,drink a lot,drink a lot,social drinker,social drinker,social drinker,drink a lot,drink a lot,social drinker,drink a lot,social drinker,social drinker,drink a lot,social drinker,drink a lot,social drinker,drink a lot,social drinker,social drinker,drink a lot,drink a lot,drink a lot,social drinker,social drinker,never,social drinker,drink a lot,never,social drinker,social drinker,social drinker,social drinker,never,never,social drinker,social drinker,social drinker,drink a lot,social drinker,social drinker,social drinker,social drinker
Punctuality,i am always on time,i am often early,i am often running late,i am often early,i am always on time,i am often early,i am often early,i am always on time,i am often early,i am often running late,i am often early,i am often running late,i am often running late,i am often early,i am often early,i am always on time,i am always on time,i am often running late,i am often early,i am always on time,i am often early,i am often early,i am often early,i am often running late,i am often early,i am always on time,i am often early,i am often early,i am always on time,i am often running late,i am always on time,i am always on time,i am always on time,i am often early,i am often early,i am always on time,i am often early,i am often early,i am often running late,i am often early,i am often early,i am often running late,i am often early,i am always on time,i am often early,i am often early,i am always on time,i am always on time,i am often early,i am often early,i am often running late,i am always on time,i am often early,i am always on time,i am always on time,i am often early,i am often running late,i am always on time,i am often running late,i am often early,i am always on time,i am always on time,i am always on time,i am always on time,i am often running late,i am often early,i am often early,i am often early,i am often early,i am often running late,i am always on time,i am often running late,i am always on time,i am always on time,i am always on time,...,i am often running late,i am often early,i am often running late,i am often running late,i am always on time,i am often running late,i am often running late,i am often early,i am often early,i am often running late,i am often early,i am often early,i am often early,i am often early,i am often early,i am always on time,i am often running late,i am often early,i am always on time,i am always on time,i am often early,i am always on time,i am often running late,i am often early,i am always on time,i am often running late,i am always on time,i am often early,i am often running late,i am always on time,i am often running late,i am often running late,i am always on time,i am always on time,i am often running late,i am often early,i am often running late,i am always on time,i am often early,i am often early,i am often running late,i am always on time,i am always on time,i am always on time,i am often running late,i am often early,i am always on time,i am often running late,i am always on time,i am often running late,i am always on time,i am often running late,i am often running late,i am often running late,i am often early,i am often running late,i am often running late,i am often early,i am often early,i am often running late,i am often early,i am always on time,i am often running late,i am always on time,i am always on time,i am always on time,i am always on time,i am often running late,i am always on time,i am often running late,i am often running late,i am often early,i am often running late,i am often running late,i am often running late
Lying,never,sometimes,sometimes,only to avoid hurting someone,everytime it suits me,only to avoid hurting someone,never,sometimes,sometimes,sometimes,sometimes,everytime it suits me,only to avoid hurting someone,only to avoid hurting someone,only to avoid hurting someone,only to avoid hurting someone,everytime it suits me,everytime it suits me,only to avoid hurting someone,only to avoid hurting someone,only to avoid hurting someone,sometimes,sometimes,sometimes,only to avoid hurting someone,sometimes,sometimes,sometimes,sometimes,sometimes,everytime it suits me,sometimes,only to avoid hurting someone,sometimes,sometimes,sometimes,only to avoid hurting someone,sometimes,only to avoid hurting someone,sometimes,only to avoid hurting someone,never,sometimes,only to avoid hurting someone,only to avoid hurting someone,never,sometimes,sometimes,only to avoid hurting someone,sometimes,only to avoid hurting someone,sometimes,sometimes,only to avoid hurting someone,sometimes,sometimes,sometimes,sometimes,everytime it suits me,everytime it suits me,sometimes,only to avoid hurting someone,only to avoid hurting someone,sometimes,only to avoid hurting someone,everytime it suits me,sometimes,only to avoid hurting someone,sometimes,everytime it suits me,never,sometimes,everytime it suits me,everytime it suits me,sometimes,...,sometimes,never,sometimes,everytime it suits me,only to avoid hurting someone,sometimes,everytime it suits me,only to avoid hurting someone,only to avoid hurting someone,sometimes,everytime it suits me,only to avoid hurting someone,only to avoid hurting someone,sometimes,everytime it suits me,sometimes,everytime it suits me,only to avoid hurting someone,sometimes,everytime it suits me,sometimes,everytime it suits me,sometimes,only to avoid hurting someone,sometimes,sometimes,everytime it suits me,never,sometimes,sometimes,everytime it suits me,everytime it suits me,only to avoid hurting someone,sometimes,everytime it suits me,everytime it suits me,sometimes,only to avoid hurting someone,sometimes,everytime it suits me,everytime it suits me,sometimes,everytime it suits me,only to avoid hurting someone,sometimes,sometimes,sometimes,sometimes,only to avoid hurting someone,sometimes,sometimes,sometimes,sometimes,only to avoid hurting someone,sometimes,sometimes,sometimes,only to avoid hurting someone,everytime it suits me,sometimes,only to avoid hurting someone,sometimes,sometimes,sometimes,only to avoid hurting someone,sometimes,sometimes,everytime it suits me,sometimes,sometimes,everytime it suits me,sometimes,only to avoid hurting someone,sometimes,everytime it suits me
Internet usage,few hours a day,few hours a day,few hours a day,most of the day,few hours a day,few hours a day,less than an hour a day,few hours a day,few hours a day,few hours a day,less than an hour a day,few hours a day,few hours a day,few hours a day,most of the day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,less than an hour a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,most of the day,few hours a day,most of the day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,less than an hour a day,few hours a day,less than an hour a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,most of the day,few hours a day,few hours a day,few hours a day,most of the day,few hours a day,less than an hour a day,few hours a day,few hours a day,most of the day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,...,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,most of the day,less than an hour a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,most of the day,most of the day,most of the day,few hours a day,few hours a day,few hours a day,few hours a day,most of the day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,less than an hour a day,most of the day,few hours a day,few hours a day,less than an hour a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,less than an hour a day,most of the day,few hours a day,most of the day,few hours a day,few hours a day,few hours a day,few hours a day,less than an hour a day,most of the day,few hours a day,few hours a day,most of the day,few hours a day,few hours a day,few hours a day,few hours a day,most of the day,few hours a day,most of the day,most of the day,few hours a day,few hours a day,few hours a day,few hours a day,less than an hour a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,few hours a day,less than an hour a day,most of the day,most of the day,few hours a day
Gender,female,female,female,female,female,male,female,male,female,female,female,female,female,female,female,male,female,female,male,male,male,male,female,female,female,female,female,female,male,female,male,female,female,female,male,female,female,male,female,female,female,male,female,female,female,male,female,female,female,female,female,female,female,female,female,female,female,female,male,male,male,male,male,male,female,male,male,female,female,female,male,male,female,male,male,...,female,female,female,female,female,male,female,male,female,female,male,male,female,female,female,male,male,male,female,female,female,male,male,female,female,female,male,female,male,male,female,female,male,male,male,male,male,female,female,female,female,female,male,female,male,female,female,male,female,female,female,female,male,male,female,male,male,male,female,female,female,male,female,male,male,female,female,male,female,male,female,male,female,female,male
Left - right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,left handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,left handed,right handed,left handed,right handed,right handed,right handed,right handed,left handed,right handed,left handed,right handed,right handed,right handed,left handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,left handed,right handed,right handed,right handed,right handed,left handed,right handed,right handed,right handed,left handed,right handed,right handed,right handed,left handed,right handed,left handed,right handed,right handed,right handed,...,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,left handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,left handed,right handed,left handed,right handed,left handed,right handed,right handed,left handed,right handed,left handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,left handed,right handed,left handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,right handed,left handed,right handed,right handed,right handed
Education,college/bachelor degree,college/bachelor degree,secondary school,college/bachelor degree,secondary school,secondary school,secondary school,college/bachelor degree,secondary school,secondary school,secondary school,primary school,college/bachelor degree,secondary school,college/bachelor degree,secondary school,college/bachelor degree,secondary school,secondary school,secondary school,secondary school,secondary school,college/bachelor degree,secondary school,secondary school,secondary school,college/bachelor degree,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,masters degree,college/bachelor degree,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,college/bachelor degree,secondary school,college/bachelor degree,masters degree,masters degree,college/bachelor degree,secondary school,secondary school,primary school,college/bachelor degree,secondary school,secondary school,secondary school,college/bachelor degree,college/bachelor degree,secondary school,secondary school,secondary school,college/bachelor degree,college/bachelor degree,secondary school,secondary school,college/bachelor degree,primary school,primary school,secondary school,college/bachelor degree,college/bachelor degree,secondary school,secondary school,secondary school,secondary school,secondary school,masters degree,...,secondary school,secondary school,primary school,secondary school,secondary school,primary school,primary school,college/bachelor degree,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,currently a primary school pupil,masters degree,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,college/bachelor degree,masters degree,masters degree,secondary school,secondary school,secondary school,college/bachelor degree,primary school,college/bachelor degree,primary school,secondary school,college/bachelor degree,masters degree,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,college/bachelor degree,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,secondary school,masters degree,secondary school,secondary school,masters degree,secondary school,secondary school,secondary school,secondary school,secondary school,masters degree,secondary school,primary school,primary school,secondary school,secondary school,secondary school,secondary school,masters degree,secondary school,college/bachelor degree,secondary school
Only child,no,no,no,yes,no,no,no,no,no,no,no,no,no,no,no,yes,no,no,no,no,no,no,no,no,no,yes,no,no,yes,no,no,no,no,no,no,no,no,no,no,no,no,no,yes,no,no,no,no,yes,yes,no,no,no,no,no,no,no,no,no,no,no,no,no,no,no,yes,yes,no,yes,no,no,yes,yes,no,yes,no,...,no,yes,yes,yes,yes,no,no,yes,no,no,no,yes,yes,no,no,yes,no,yes,no,no,no,yes,no,no,no,no,no,no,yes,yes,no,no,no,no,no,no,no,yes,no,yes,yes,yes,no,no,yes,no,no,no,no,yes,no,no,no,yes,no,no,yes,no,yes,yes,yes,yes,no,no,no,no,no,no,no,no,no,no,yes,no,no
Village - town,village,city,city,city,village,city,village,city,city,city,city,city,city,city,city,city,city,village,city,city,city,city,city,city,city,city,city,city,village,village,city,city,city,city,city,village,city,city,village,city,city,city,city,city,village,city,city,city,village,city,city,city,village,city,city,village,city,city,village,city,city,city,city,city,city,city,village,village,village,city,city,city,city,city,city,...,city,city,village,city,city,city,village,village,city,village,city,city,village,village,city,village,city,city,city,city,city,city,city,city,city,city,village,village,city,city,city,city,city,city,village,city,city,city,city,city,village,city,village,city,city,city,village,city,village,city,city,city,city,city,village,city,city,city,city,city,city,city,city,city,city,city,city,city,city,city,city,village,city,city,village


In [128]:
categorical_cols = list(df.loc[:, categorical])

In [130]:
for col in categorical_cols:
    print_desc(col)

Smoking --> Smoking habits
Alcohol --> Drinking
Punctuality --> Timekeeping.
Lying --> Do you lie to others?
Internet usage --> How much time do you spend online?
Gender --> Gender
Left - right handed --> I am
Education --> Highest education achieved
Only child --> I am the only child
Village - town --> I spent most of my childhood in a
House - block of flats --> I lived most of my childhood in a


There are a lot of ordinal columns, which is better because we will not occur in a feature explosion using OHE, I think the ordinal values could be:

* Alcohol 
* Punctuality
* Internet usage
* Education

Some other that only have True/False or Yes/No or in general 2 possible values can be encoded using the same attribute and an integer value (0 or 1), these features are:

* Gender
* Left - right handed
* Only child
* Village - town
* House - block of flats

The remaining categorical attributes are the ones that I think are the most important to predict empathy or that are not really ordinal and I would not want to introduce a bias by considering them as ordinal, these attributes are:

