<img src="../images/logo.png" align='right' width=250px>

# Custom Transformer Assignment

This is the first and simplest version of the custom transformer assignment. We will provide you with a piece of code that correctly handles our data, and you have to rewrite is as a transformer! 


In [None]:
import numpy as np

## Part I
Imagine we have a dataset with six columns of dummy variables. We want to perform feature selection and remove all the columns where the values are the same. 

In [None]:
X = np.array(
    [
        [0, 0, 0, 0, 0],
        [0, 0, 1, 0, 1],
        [0, 0, 0, 0, 1],
        [1, 1, 1, 1, 1],
        [0, 1, 0, 1, 1],
        [1, 1, 1, 1, 1],
    ]
).T
X

Luckily, we already know what piece of Python code would help us with this:

In [None]:
sums = np.sum(X, axis=0)
idx = np.where(~(sums == 0) & ~(sums == 5))[0]
removed_columns_X = X[:, idx]
removed_columns_X

However, we would like to create a scikit-learn transformer that does this for us, so it becomes part of the pipeline! Let's create this transformer.

There are a couple of steps to take: 
* Your imports: Which classes do you import that your custom transformer should inherit from? _Hint:_ Checkout the [documentation](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.base).
* Define your class! We assume you will call it `CustomTransformer`, but it can be named anything. Keep in mind that the check later on assumes it's called CustomTransformer.
* `.__init__()` method: This is where you pass arguments to that you'll need in your fit or transform method. Is that the case for this example? 
* `.fit()` method: this can take some arguments, but it should always return its own class (self) at the end. For now let's just return `self` and nothing else.
* `.transform()` method: This is where the magic happens. Make sure you're not mutating the input data! 

In [None]:
# Your code here.

In [None]:
# %load ../answers/simple_part_1.py

Let's check our implementation!

In [None]:
transformer = CustomTransformer()
transformer.fit_transform(X)

_Bonus_: Show how you can use your CustomTransformer in a pipeline. Create a pipeline with your CustomTransformer and a KMeans clustering model (as we have no _y_).

## Part 2

Now, let's say we have some new data. Our transformer was fitted on the original data; this means that on our new data, the same columns (indices) should be deleted as for the original dataset - even if, in this case, these columns _do_ contain dissimilar values.

Edit the transformer from Part 1 to store in its internal state which columns should be kept when the transformation is applied.

In [None]:
X_test = np.array(
    [
        [0, 0, 1, 0, 0],
        [1, 0, 0, 0, 0],
        [0, 0, 1, 1, 1],
        [1, 1, 0, 0, 1],
        [0, 0, 0, 0, 0],
        [1, 0, 1, 1, 1],
    ]
).T
X_test

In [None]:
# Your new transformer code here.

In [None]:
# %load ../answers/simple_part_2.py

In [None]:
X_test_transformed = transformer.transform(X_test)
X_test_transformed

Run the cell below to test whether your implementation has performed the implementation as expected.

In [None]:
# Test cell
answer = np.array([[1, 0, 0], [0, 0, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0]])
assert np.all(answer == X_test_transformed)