# Migrating to Pyreal

In this tutorial, we'll go through an example of migrating your existing ML pipeline to Pyreal, 
so you can start getting explanations and using your model more effectively.

## Tutorial goals
1. Learn how to migrate your existing data transformers to the Pyreal framework
2. Learn how to create a `RealApp` from an existing model
3. Learn how to use your new `RealApp` to interact with and understand your ML model

In this tutorial, we will use the [Titanic dataset](https://www.kaggle.com/c/titanic/data)

## Problem Setup



In [1]:
from pyreal.sample_applications import titanic

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

# Define the column transformer
categorical_transforms = Pipeline(steps=[('imputer', SimpleImputer(strategy='most_frequent')),
                                         ('onehot', OneHotEncoder(sparse_output=False))])
preprocessor = ColumnTransformer(
    transformers=[
        ('column_dropper', 'drop', ["PassengerId", "Name", "Ticket", "Cabin"]),  # Drop columns
        ('mean_imputer', SimpleImputer(strategy='mean'), ["Pclass", "Age", "SibSp", "Parch", "Fare"]),  # Impute with mean
        ('categorical_transformer', categorical_transforms, ["Sex", "Embarked"])
    ]
)

# Define the pipeline
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('model', GradientBoostingClassifier())  # Step 5: Logistic Regression
])

X_train, y_train = titanic.load_data(include_targets=True)
pipeline.fit(X_train, y_train)

In [2]:
from pyreal import RealApp

app = RealApp.from_sklearn(pipeline, X_train=X_train, y_train=y_train, verbose=1)
app.produce_feature_importance()

Adding ColumnDropTransformer for columns ['PassengerId', 'Name', 'Ticket', 'Cabin']
Adding SimpleImputer() for columns ['Pclass', 'Age', 'SibSp', 'Parch', 'Fare']
Adding SimpleImputer(strategy='most_frequent') for columns ['Sex', 'Embarked']
Adding OneHotEncoder for columns ['Sex', 'Embarked']
Skipping step GradientBoostingClassifier() as it does not appear to be a transformer


Unnamed: 0,Feature Name,Importance
PassengerId,PassengerId,0.0
Pclass,Pclass,0.684353
Name,Name,0.0
Sex,Sex,1.21064
Age,Age,0.371283
SibSp,SibSp,0.121557
Parch,Parch,0.009867
Ticket,Ticket,0.0
Fare,Fare,0.321244
Cabin,Cabin,0.0
