In [2]:
%run ../../common/import_all.py

from common.setup_notebook import set_css_style, setup_matplotlib, config_ipython

config_ipython()
setup_matplotlib()
set_css_style()

# Feature engineering

Feature engineering is the process of transforming the input data into useful representation for your model to learn from. The process is usually man-driven, in that there isn't a one-recipe-fits-all and you have to apply your knowledge domain about the problems you're trying to solve in order to decide how to encode your features. 
The way you can engineer the features of a model will also vastly depend on the type and quality of data you have in the first place.

Feature engineering is about what you calculate on your data to make for the best representation of the problem, that can solve it. 

## Some techniques for known problems

### Feature hashing

Feature hasing is a trick used to save space and to efficiently retrieve features in memory. What you do is applying a has function to the features and use their hash values as indices of a vector used to store all feature values. It is particularly useful in problem with large number of features. 

For instance, feature *A* gets hashed into 56, so it is index 56 of the vector to be updated.

### One-hot encoding

One-hot encoding is a procedure often used to transform categorical variables into numerical representations in order to use them as features in a model. The name is borrowed from the fact that *one-hot* are groups of bits where there is only a 1 and all the rest is 0. On the flip side, *one-cold* are bits where there is only a 0 among 1's. What you do to one-hot encode your variable is consider all the states in which it can be, use as many bits as there are states and for each different state you light up one of the bits. The table here reports the binary numbers 0-7 (decimal and binary representations given) encoded in a one-hot fashion: you have 8 states, so the one-hot representations is a string of 8 bits, and for each number you light up the corresponding one with a 1 while keeping all rest as 0.

<table style="width:50%">
  <tr>
    <th>**Decimal**</th><th>**Binary**</th><th>**One-hot**</th> 
  </tr>
  <tr>
    <td>0</td><td>000</td><td>00000001</td> 
  </tr>
  <tr>
    <td>1</td><td>001</td><td>00000010</td> 
  </tr>
  <tr>
    <td>2</td><td>010</td><td>00000100</td> 
  </tr>
  <tr>
    <td>3</td><td>011</td><td>00001000</td> 
  </tr>
  <tr>
    <td>4</td><td>100</td><td>00010000</td> 
  </tr>
  <tr>
    <td>5</td><td>101</td><td>00100000</td> 
  </tr>
  <tr>
    <td>6</td><td>110</td><td>01000000</td> 
  </tr>
  <tr>
    <td>7</td><td>111</td><td>10000000</td> 
  </tr>
</table>

Let's do an example with a proper categorical variable. Let's say one of the features you have is the status of the weather, and that it can take any of three states: sunny, cloudy, rainy. Using one-hot, you would encode it as

| Weather status        | One-hotted  |
| ------------- |:-------------:| 
| sunny     | 001 | 
| cloudy    | 010      |
| rainy     | 100      |

At the end, from one feature with 3 states, you end up with 3 features. This procedure adds dimensionality because there will be one feature per each of the states of the categorical variable, containing either a 0 pr a 1. In general, we go from $n$ observations in $d$ values to $d$ binary variables with $n$ observations each.