# Filter Selection

Remove specifics features from dataset.

This notebook shows:
- how to use the [SDK](https://platiagro.github.io/sdk/) to load datasets, save models and other artifacts.
- how to declare parameters and use them to build reusable components.

## Declare parameters and model hyperparameters
Components may declare (and use) these default parameters:
- dataset

Use these parameters to load/save datasets, models, metrics, and figures with the help of [PlatIAgro SDK](https://platiagro.github.io/sdk/). <br />
You may also declare custom parameters to set when running an experiment.

Select the hyperparameters and their respective values to be used when training the model:
- features_to_filter

In [None]:
# parameters
dataset = "iris" #@param {type:"string"}

# hyperparameters
features_to_filter = ["Species"] #@param {type:"list", label:"Features Para Filtragem", description:"Remove features selecionadas do dataset."}

## Load dataset

Import and put the whole dataset in a pandas.DataFrame.

In [None]:
from platiagro import load_dataset

df = load_dataset(name=dataset)
X = df.to_numpy()

## Load metadata about the dataset
For example, below we get the feature type for each column in the dataset. (eg. categorical, numerical, or datetime)

In [None]:
import numpy as np
from platiagro import stat_dataset

metadata = stat_dataset(name=dataset)
featuretypes = metadata["featuretypes"]

columns = df.columns.to_numpy()
featuretypes = np.array(featuretypes)

## Wrapping custom transformer

In [None]:
%%writefile CustomTransformer.py
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin

class Filter(BaseEstimator, TransformerMixin):
    """Feature selector that removes specific features.
    
    This feature selection algorithm looks only at the columns
    and then remove the selected ones.
    
    Attributes:
        features: A list of features to be removed.
        columns: An np.ndarray with the current features of the dataset.
    """

    def __init__(self, features: list, columns: np.ndarray):
        """Inits Filter class.
        
        Args:
            features: features to be removed.
            columns: columns of the dataset.
        """
        self.features = features
        self.columns = columns
    
    def transform(self, X: np.ndarray):
        """Reduce X to the selected features.
        
        Args:
            X: the input samples.
            
        Returns:
            np.ndarray: the input samples with only the selected features.
        """
        return np.delete(X, self.indexes, axis=1)
    
    def fit(self, X: np.ndarray) -> np.ndarray:
        """Fit the model.
        
        Learn selected features index.
        
        Args:
            X: the imput sample.
        
        Returns:
            self
        """
        self.indexes = np.where(np.in1d(self.columns, self.features))[0]
        return self

## Filter dataset

In [None]:
import pandas as pd
from CustomTransformer import Filter
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer

# Call changes
pipeline = make_pipeline(
    Filter(features=features_to_filter, columns=columns)
)

# Transform `X`
X = pipeline.fit_transform(X)

# Put back in pd.DataFrame
features_after_pipeline = columns[~np.in1d(columns, features_to_filter), ...]
df = pd.DataFrame(X, columns=features_after_pipeline)

## Save dataset

Stores the transformed dataset in a object storage.<br>

In [None]:
from platiagro import save_dataset

save_dataset(name=dataset, df=df)

## Save model

Stores the model artifacts in a object storage.<br>
It will make the model available for future deployments.

In [None]:
from platiagro import save_model

save_model(pipeline=pipeline,
           columns=columns,
           features_after_pipeline=features_after_pipeline)