# TLADS Drill
Categorize each of the variables in the ESS dataset as categorical or continuous, and if continuous as ordinal, interval, or ratio variables. Check your work with your mentor, and discuss what that information might imply for feature engineering with this data.

Now we have a clean dataset made up of an outcome variable and a set of other variables that initial explorations and/or domain knowledge suggest would be valuable in our model. The next step is to transform our potential predictor variables into features. Features are variables that have been transformed in ways that make them best-suited to work within our model to explain variance in the outcome of interest. Feature engineering is a broad and complex topic, and an opportunity to get creative with your data. In the next section, we’ll talk about how to pare down a set of features into the best ones for your problem, but to do that we need lots and lots of different features that highlight different information from the data.
Feature engineering can be a lot of fun: there are very few limits on what you can try. For the rest of this assignment, we'll go through some feature engineering options using an example. When working with your data, don’t limit yourself to what you see here – try anything that you think will highlight a particular feature within your dataset.
We'll continue to work with the European Social Survey Data from the last assignment. The data, if you need it again, is here and the codebook is available here.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

# Set the default plot aesthetics to be prettier.
sns.set_style("white")

In [7]:
# Loading the data again.

df = pd.read_csv("https://raw.githubusercontent.com/Thinkful-Ed/data-201-resources/master/ESS_practice_data/ESSdata_Thinkful.csv")

# Take a subset of the data to make plots clearer.
print(df.head())

print(df.cntry.unique())

print(df.columns)

  cntry  idno  year  tvtot  ppltrst  pplfair  pplhlp  happy  sclmeet  sclact  \
0    CH   5.0     6    3.0      3.0     10.0     5.0    8.0      5.0     4.0   
1    CH  25.0     6    6.0      5.0      7.0     5.0    9.0      3.0     2.0   
2    CH  26.0     6    1.0      8.0      8.0     8.0    7.0      6.0     3.0   
3    CH  28.0     6    4.0      6.0      6.0     7.0   10.0      6.0     2.0   
4    CH  29.0     6    5.0      6.0      7.0     5.0    8.0      7.0     2.0   

   gndr  agea  partner  
0   2.0  60.0      1.0  
1   2.0  59.0      1.0  
2   1.0  24.0      2.0  
3   2.0  64.0      1.0  
4   2.0  55.0      1.0  
['CH' 'CZ' 'DE' 'ES' 'NO' 'SE']
Index(['cntry', 'idno', 'year', 'tvtot', 'ppltrst', 'pplfair', 'pplhlp',
       'happy', 'sclmeet', 'sclact', 'gndr', 'agea', 'partner'],
      dtype='object')


# Variables

Variable Name | Variable Type
cntry: Categorical
idno: Continuous -- Ordinal
year: Continous -- Interval
tvtot: Continous -- Ratio
ppltrst: Continous -- Ordinal
pplfair: Continous -- Ordinal
happy: Continous -- Ordinal
sclmeet: Continuous -- Ratio
sclact: Continous -- Ordinal
gndr: Categorical
agea: Continous -- Ratio
partner: Categorical