In [1]:
%run Latex_macros.ipynb

<IPython.core.display.Latex object>

In [2]:
# My standard magic !  You will see this in almost all my notebooks.

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Reload all modules imported with %aimport
%load_ext autoreload
%autoreload 1

%matplotlib inline

# The mechanics of transformations

We briefly introduced transformations in [the overview of the Prepare the data step of the Recipe for ML](Prepare_data_Overview.ipynb).

We recap the key points:

## Fitting transformations

Feature engineering, or transformations
- takes an example: vector $\x^\ip$ with $n$ features
- produces a new vector $\tilde\x^\ip$, with $n'$ features

We ultimately fit the model with the transformed *training* examples.


<table>
    <tr>
        <th><center>Feature Engineering</center></th>
    </tr>
    <tr>
        <td><img src="images/W3_L1_S3_Feature_engineering.png"</td>
    </tr>
</table>
​

Transformations have parameters $\Theta_\text{transform}$ distinct from the model's parameters $\Theta$.
- Example: Missing data imputation for a feature substitutes the mean/median feature value
- $\Theta_\text{transform}$ stores this value


Our prediction is thus
$$
\begin{array}\\
\hat{\y} & = & h_\Theta (\tilde{\x}) \\
& = &h_\Theta( \, T_{\Theta_\text{transform}}(\x) \,)
\end{array}
$$

Transformations can be applied to the target as well.  For example
- One Hot Encoding a categorical target for a Classification task
- Scaling the target (e.g., pixel intensities from a range $[0 \ldots 255]$ to a range $[-1 \ldots +1 ]$)

If we transform the target $\y$ into new units, the predicted $\hat{\y}$ will also be in the new units
- If we want to report our prediction in original units
- We must be able to invert the transformation

For example:
- Logistic Regression transforms the target into Log Odds
- We want to report our prediction in terms of one class of the Categorical variable

# Apply transformations consistently

Suppose you transform your raw training set 
$$\langle \X, \y \rangle$$

to 
$$\langle \X', \y' \rangle$$

In order to satisfy the Fundamental Assumption of Machine Learning
- you must apply the **identical** transformation to
- validation examples
- test examples


By wrapping up all your transformations in an `sklean Pipeline`
- you can ensure that your transformations are applied consistently to each example
- regardless of its source

But remember
- the transformation parameters $\Theta_\text{transform}$
- are fit to the **training examples** only
- never re-fit to test examples

One simple way to remember this
- assume you can look at your test examples only **one at a time** rather than as a collection
- it doesn't make sense to "fit" a transformation on a singleton

# Transformed targets: remember to invert your prediction !

Suppose you transform your raw training set 
$$\langle \X, \y \rangle$$

to 
$$\langle \X', \y' \rangle$$

where $f$ is the transformation applied to targets
$$
\y' = f(\y)
$$

The units of $\y$ change from $u$ (e.g., dollars) to $u'$ (e.g., dimensionless z-score)

Then your model's predictions
$$
\hat{\y}'
$$
are in units of $u'$ **not** $u$.

You must **invert** the transformed predicted target $\hat{\y}'$ back to units of $u$
$$
\hat{\y} = f^{-1} ( \hat{\y}' )
$$

For example
$$
\y \mapsto \frac{\y - \mu}{\sigma}
$$
where $\mu, \sigma$ are the mean and standard deviation of the *training* examples $\y$

Then
$$
\hat{\y}' \mapsto \sigma * \hat{\y}' + \mu
$$

`sklearn` transformers provide an `inverse_transform` method to facilitate this.

## Transformers in `sklearn`

A transformer in `sklearn` provides the following methods
- `fit`: using training examples, compute $\Theta_\text{transform}$
- `transform`: map an example $[\x^\ip, \y^\ip]$ into transformed example $[ \tilde{\x}^\ip, \tilde{\y}^\ip ]$
- `inverse_transform`: map a transformed example $[ \tilde{\x}^\ip, \tilde{\y}^\ip ]$ back to its source  example $[\x^\ip, \y^\ip]$

In [3]:
print("Done")

Done
