# Migrating to Pyreal

In this tutorial, we'll go through an example of migrating your existing ML pipeline to Pyreal, 
so you can start getting explanations and using your model more effectively.

## Tutorial goals
1. Learn how to migrate your existing data transformers to the Pyreal framework
2. Learn how to create a `RealApp` from an existing model
3. Learn how to use your new `RealApp` to interact with and understand your ML model

In this tutorial, we will use the [Titanic dataset](https://www.kaggle.com/c/titanic/data)

## Problem Setup



In [1]:
from pyreal.sample_applications import titanic

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

# Define the column transformer
categorical_transforms = Pipeline(steps=[('imputer', SimpleImputer(strategy='most_frequent')),
                                         ('onehot', OneHotEncoder(sparse_output=False))])
preprocessor = ColumnTransformer(
    transformers=[
        ('column_dropper', 'drop', ["PassengerId", "Name", "Ticket", "Cabin"]),  # Drop columns
        ('mean_imputer', SimpleImputer(strategy='mean'), ["Pclass", "Age", "SibSp", "Parch", "Fare"]),  # Impute with mean
        ('categorical_transformer', categorical_transforms, ["Sex", "Embarked"])
    ]
)

# Define the pipeline
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('model', GradientBoostingClassifier())  # Step 5: Logistic Regression
])

X_train, y_train = titanic.load_data(include_targets=True)
pipeline.fit(X_train, y_train)


In [2]:
from pyreal.transformers.utils import sklearn_pipeline_to_pyreal
sklearn_pipeline_to_pyreal(pipeline, verbose=True)

Adding transformer drop for columns ['PassengerId', 'Name', 'Ticket', 'Cabin']
Adding transformer SimpleImputer() for columns ['Pclass', 'Age', 'SibSp', 'Parch', 'Fare']
Adding transformer Pipeline(steps=[('imputer', SimpleImputer(strategy='most_frequent')),
                ('onehot', OneHotEncoder(sparse_output=False))]) for columns ['Sex', 'Embarked']
Skipping step GradientBoostingClassifier() as it does not appear to be a transformer


[<pyreal.transformers.generic_transformer.Transformer at 0x1bedcc3e7d0>,
 <pyreal.transformers.generic_transformer.Transformer at 0x1bedcc3e800>,
 <pyreal.transformers.generic_transformer.Transformer at 0x1bedcc3e830>]

In [3]:
from pyreal.transformers import MultiTypeImputer, OneHotEncoder, ColumnDropTransformer

transformers = []

# Dropping unused columns
transformers.append(ColumnDropTransformer(columns=["PassengerId", "Name", "Ticket", "Cabin"]))

# Imputing (both numeric and categorical columns)
transformers.append(MultiTypeImputer())

# One-hot encoding
transformers.append(OneHotEncoder(columns=["Sex", "Embarked"]))