# Features names out

Having the ability to identify columns after applying transformations using different sklearn transformers is highly crucial. By default, sklearn transformers generate `numpy.array` outputs, lacking the column structure found in `pandas.DataFrame`.

Fortunately, a solution to this problem has been introduced in `sklearn==1.1`. This solution involves the utilization of the `get_feature_names_out` method, which returns the feature names as values. Additionally, the `feature_names_out` parameter has been incorporated to implement specific behaviors in certain cases.

In [2]:
import numpy as np
import pandas as pd

from IPython.display import HTML
header_template = "<p style='font-size:17px'>{}</p>"

from sklearn.preprocessing import (
    FunctionTransformer,
    StandardScaler
)

## Defined names

There are converters that specify columns output names by themselves. In them you can simply call `get_features_names_out` in the trained object and the input names of the features will be returned. Some examples of such transformers:]

- `StandartScaler`;
- `OneHotEncoder`;
- `ColumnTransformer`;
- `PolynomialFeatures`;
- `CountVectorizer`;
- `TfidfVectorizer`.

So in the following example cell is for `StandardScaler` - it just keeps names of the input array:

In [11]:
input_frame = pd.DataFrame({
    f"feature{n}" : np.random.normal(0, 10, 10)
    for n in range(3)
})
display(HTML(header_template.format("Input dataframe")))
display(input_frame)

my_scaler = StandardScaler()
display(HTML(header_template.format("transform result")))
display(my_scaler.fit_transform(input_frame))

display(HTML(header_template.format(".get_features_names_out result")))
display(my_scaler.get_feature_names_out())

Unnamed: 0,feature0,feature1,feature2
0,12.894576,-7.21142,22.376874
1,-4.384511,-3.011719,11.893116
2,2.246345,-12.251342,-23.880928
3,4.267464,-14.256397,11.316802
4,0.489308,-0.981538,25.31344
5,0.057029,3.164312,3.665438
6,-8.548567,-7.245126,3.60271
7,6.028535,-4.029933,-2.911737
8,-5.958672,-6.129775,14.865591
9,-0.430953,-8.197262,-14.039734


array([[ 2.06797108, -0.24570913,  1.16687998],
       [-0.85410375,  0.61679908,  0.45384801],
       [ 0.26724354, -1.28077665, -1.97925246],
       [ 0.60903596, -1.69256235,  0.41465121],
       [-0.02988991,  1.03374489,  1.36660467],
       [-0.10299274,  1.8851937 , -0.10574115],
       [-1.55828924, -0.25263156, -0.11000745],
       [ 0.90685157,  0.40768479, -0.55307464],
       [-1.12031085, -0.02356759,  0.65601501],
       [-0.18551566, -0.44817518, -1.30992319]])

array(['feature0', 'feature1', 'feature2'], dtype=object)

## Specify

In [2]:
my_transformer = FunctionTransformer(
    lambda X : np.concatenate(
        [
            np.array(X**2),
            np.array(X**3)
        ], 
        axis = 1
    ),
    feature_names_out = (
        lambda transformer, input_features: [
            f"{input_features[0]} square", 
            f"{input_features[0]} cubic"
        ]
    )
)

In [10]:
X = pd.Series(
    np.random.normal(0, 10, 10), 
    name="Value"
).to_frame()

display(HTML(header_template.format("Original output")))
display(my_transformer.fit_transform(X))

display(HTML(header_template.format(".get_feature_names_out")))
display(my_transformer.get_feature_names_out())

array([[ 1.06098100e+02,  1.09285215e+03],
       [ 4.73848169e+01, -3.26181113e+02],
       [ 1.46308258e+01, -5.59632930e+01],
       [ 6.14459336e+00,  1.52313964e+01],
       [ 2.16762967e+00, -3.19137347e+00],
       [ 3.88564680e+02, -7.65940408e+03],
       [ 2.05485138e+02,  2.94557866e+03],
       [ 2.66129126e+02,  4.34148807e+03],
       [ 1.46074375e+02, -1.76547290e+03],
       [ 1.00912952e+02,  1.01372548e+03]])

array(['Value square', 'Value cubic'], dtype=object)