In [1]:
%run Latex_macros.ipynb

<IPython.core.display.Latex object>

In [2]:
# My standard magic !  You will see this in almost all my notebooks.

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Reload all modules imported with %aimport
%load_ext autoreload
%autoreload 1

%matplotlib inline

# The mechanics of transformations

We briefly introduced transformations in [the overview of the Prepare the data step of the Recipe for ML](Prepare_data_Overview.ipynb).

We recap the key points:

## Fitting transformations

To review: transformations (feature engineering)
- takes an example: vector $\x^\ip$ with $n$ features
- produces a new vector $\tilde\x^\ip$, with $n'$ features

We ultimately fit the model with the transformed *training* examples.

<table>
    <tr>
        <th><center>Feature Engineering</center></th>
    </tr>
    <tr>
        <td><img src="images/Feature_engineering.jpg"</td>
    </tr>
</table>
​

Transformations often have their own parameters $\Theta_\text{transform}$ that is separate from
the $\Theta$ parameters of the model.

For example: a "missing data transformation"
-That substitutes the mean/median (over the training examples) of a feature $j$ for a missing value.
- Is "fit" (or "trained") by giving it all the training examples
- The median for each feature is recorded in $\Theta_\text{transform}$ 


## Transformations are applied to both training and test examples

The domain of the prediction model $h$
- is the domain of the transformed $\tilde{\x}$
- **not** the domain of the original examples $\x$

So before any example, such as a test example, is fed through the model, it must be transformed.
- Transformed using the $\Theta_\text{transform}$ obtained by fitting training examples
- We **do not** refit $\Theta_\text{transform}$ on test examples !

<table>
    <tr>
        <th><center>Feature engineering: fit, then transform</center></th>
    </tr>
    <tr>
        <td><img src="images/Feature_engineering_fit.jpg" width=1000</td>
    </tr>
</table>
​

## Targets and features can be transformed

Although we have framed transformations as something performed on features $\x$, we sometimes
transform the target $\y$
- Logistic Regression transformed the target  $p$ to log odds: $\log{\frac{p}{1-p}}$
    - The log odds is amenable to a linear model; the raw target is not

## Inverting transformations

If we transform the target, then domain of the values predicted by the model
are in the same units as the transformed targets
- Example: log odds rather than probability
- Example: you might convert a price level to a percent change
    - Your predictions are then predictions of percent change, not price

We probably want to report our predictions to our clients in the original domain of the targets.

You may need to *invert the transformation* to convert prediction $\hat{y}$ back into the same units as the original targets


## Transformations in `sklearn`

`sklearn` provides an easy API with the following methods
- `fit`: set parameters of transformation; fit to training data
- `transform`: apply transformations.  Do this for both train and test data
- `inverse_transform` to convert from transformed data units back to original

In [1]:
print("Done")

Done
