# <span style="color:#ff5f27"> 👨🏻‍🏫 Custom Transformation Functions</span>

In this tutorial you will learn how to create custom transformation functions in hopsworks feature store.

## <span style="color:#ff5f27">🗄️ Table of Contents</span>
- [📝 Imports](#1)
- [🔮 Connecting to Hopsworks Feature Store](#2)
- [👩🏻‍🔬 Creation of Custom Transformation Functions](#3)
- [✔️ Testing Custom Transformation Functions in Hopsworks](#4)- 
- [✍🏻 Registering Custom Transformation Functions in Hopsworks](#4)

<a name='1'></a>
# <span style='color:#ff5f27'> 📝 Imports </span>

In [1]:
# Importing necessary libraries
import pandas as pd                         # For data manipulation and analysis using DataFrames
import numpy as np                          # For numerical computations and arrays
import os                                   # For operating system-related functions
import joblib                               # For saving and loading model files

import xgboost as xgb                       # For using the XGBoost machine learning library
from sklearn.metrics import accuracy_score  # For evaluating model accuracy

<a name='3'></a>
# <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

The next step is to login to the Hopsworks platform. 

In [2]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

2024-09-20 20:20:48,030 INFO: Initializing external client
2024-09-20 20:20:48,030 INFO: Base URL: https://localhost:28181
2024-09-20 20:20:48,850 INFO: Python Engine initialized.

Logged in to project, explore it here https://localhost:28181/p/119


---
<a name='2'></a>
# <span style="color:#ff5f27;">👩🏻‍🔬 Creation of Custom Transformation Functions</span>

In Hopsworks, custom transformation functions can be defined using the `@udf` decorator. These transformation functions are implemented as Pandas UDFs, allowing efficient processing of large datasets. Hopsworks provides support for various types of transformations. Hopsworks also allows you to access training dataset statistics for any of the feature provided as input to the UDF. For more details, you can refer to the official documentation [here](https://docs.hopsworks.ai/latest/user_guides/fs/transformation_functions/).

Below are two examples of User-Defined Functions (UDFs): add_one and scaler.

The add_one function is a basic transformation that takes a feature as input and increments its value by one.

In [3]:
@hopsworks.udf(return_type=int, drop=["feature"])
def add_one(feature: pd.Series) -> pd.Series:
    return feature + 1




The `scaler` function takes a feature as input, along with its associated statistics, and scales the values to a range between 0 and 1. It then returns the transformed feature.

In [4]:
from hopsworks.hsfs.transformation_statistics import TransformationStatistics

@hopsworks.udf(return_type=float, drop=["feature"])
def scaler(feature: pd.Series, statistics=TransformationStatistics("feature")) -> pd.Series:
    return (feature - statistics.feature.min) / (statistics.feature.max - statistics.feature.min)

<a name='2a'></a>
## <span style="color:#ff5f27;">✔️ Testing of Custom Transformation Functions</span>

Once a UDF is defined, it should be thoroughly tested to ensure it works as intended.

In Hopsworks, to test a UDF, its `output_column_names` property must be set. Afterward, the executable function can be retrieved using the `get_udf` method.

The `output_column_names` attribute needs to be manually set, as it is typically generated when the UDF is attached to a feature group or feature view. Once this is configured, the UDF can be tested by retrieving the executable function with `get_udf` and calling it using a Pandas Series as input.

In [5]:
# Assign output column names 
add_one.output_column_names = ["scaler_feature"]

# Get the excutable UDF based on the transformation statistics
udf = add_one.get_udf()

# Create testing Series
feature = pd.Series([0, 5, 10])

print("⛳️ The incremented are:", udf(feature).values.tolist())

⛳️ The incremented are: [1, 6, 11]


The `scaler` UDF relies on the statistics of the training dataset. Therefore, to test it, the transformation_statistics attribute must be set using an instance of the `FeatureDescriptiveStatistics` object, which contains the necessary test values for the statistics.

In [6]:
from hopsworks.hsfs.statistics import FeatureDescriptiveStatistics

# Assign test statistics since the UDF uses statistics
statistics = [FeatureDescriptiveStatistics(feature_name="feature", min=0, max=10)]
scaler.transformation_statistics = statistics

# Assign output column names 
scaler.output_column_names = ["scaler_feature"]

# Get the excutable UDF based on the transformation statistics
udf = scaler.get_udf()

# Get testing Series
feature = pd.Series([0, 5, 10])

print("⛳️ The Scaled Values are:", udf(feature).values.tolist())

⛳️ The Scaled Values are: [0.0, 0.5, 1.0]


Once a custom transformation function or UDF is defined, it can be used as an [On-Demand transformations](https://docs.hopsworks.ai/latest/user_guides/fs/feature_group/on_demand_transformations/) by attaching the function to a Feature Group, or as a [Model-Dependent transformations](https://docs.hopsworks.ai/latest/user_guides/fs/feature_view/model-dependent-transformations/) by linking it to a Feature View.

Additionally, UDFs can be saved in the Hopsworks Feature Store, allowing them to be retrieved and reused in the future.

<a name='4'></a>
## <span style="color:#ff5f27;"> ✍🏻 Saving Custom Transformation Functions in Hopsworks</span>

Transformation functions can be saved in Hopsworks, allowing them to be retrieved and used later.

To create a transformation function, use the .create_transformation_function() method with the following parameters:

- `transformation_function`: Your custom transformation function/UDF.
- `version`: The version of your custom transformation function.

Don’t forget to use the .save() method to persist the transformation function in the backend.



In [7]:
scaler = fs.create_transformation_function(
        scaler, 
        version=1,
    )
scaler.save()

RestAPIError: Metadata operation error: (url: https://localhost:28181/hopsworks-api/api/project/119/featurestores/67/transformationfunctions). Server response: 
HTTP code: 400, HTTP reason: Bad Request, body: b'{"errorCode":270159,"usrMsg":"Transformation function: scaler, version: 1","errorMsg":"The provided transformation function name and version already exists"}', error code: 270159, error msg: The provided transformation function name and version already exists, user msg: Transformation function: scaler, version: 1

Now let's check if the custom transformation functions is present in the feature store. You can be the function `get_transformation_functions` for this.

In [8]:
# Check it your transformation functions are present in the feature store
[tf for tf in fs.get_transformation_functions()]

[Transformation Function : robust_scaler(feature),
 Transformation Function : one_hot_encoder(feature),
 Transformation Function : min_max_scaler(feature),
 Transformation Function : label_encoder(feature),
 Transformation Function : standard_scaler(feature),
 Transformation Function : add_one(feature),
 Transformation Function : scaler(feature)]

A transformation function saved in Hopworks can be retrived by using the function `get_transformation_function`

In [9]:
scaler = fs.get_transformation_function(name="scaler", version=1)
scaler

Transformation Function : scaler(feature)

---

## <span style="color:#ff5f27;">⏭️ **Next:** Part 01 Feature Pipeline </span>

In the following notebook you will create feature groups and use on-demand transformations function to create on-demand features