# Snowpark ML - QuickChurnModelNotebok_ML_SIDEKICK

### Imports

In [None]:
from snowflake.snowpark.context import get_active_session
from snowflake.ml.modeling.pipeline import Pipeline
from snowflake.ml.registry import Registry
from numpy import nan, array
from snowflake.ml.modeling.preprocessing.standard_scaler import StandardScaler
from snowflake.ml.modeling.xgboost.xgb_classifier import XGBClassifier

Establish a Snowpark session.

In [None]:
session = get_active_session()

Establish training DataFrame.

Consider using [`random_split`](https://docs.snowflake.com/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.DataFrame.random_split)
if your data is not already split or [`sample_by`](https://docs.snowflake.com/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.DataFrame.sample_by)
for stratified sampling.

In [None]:
train_df = session.table('"CHURN_PROD"."ANALYTICS"."TELCO_CHURN_PDF"')

Preview the DataFrame.

In [None]:
train_df.show()

Build model pipeline.

In [None]:
pipeline = Pipeline(
    [
        (
            "StandardScaler",
            StandardScaler(
                input_cols=[
                    '"AccountWeeks"',
                    '"ContractRenewal"',
                    '"DataPlan"',
                    '"DataUsage"',
                    '"CustServCalls"',
                    '"DayMins"',
                    '"MonthlyCharge"',
                    '"DayCalls"',
                    '"RoamMins"',
                    '"OverageFee"',
                ],
                output_cols=[
                    '"AccountWeeks"',
                    '"ContractRenewal"',
                    '"DataPlan"',
                    '"DataUsage"',
                    '"CustServCalls"',
                    '"DayMins"',
                    '"MonthlyCharge"',
                    '"DayCalls"',
                    '"RoamMins"',
                    '"OverageFee"',
                ],
                with_mean=True,
                with_std=True,
            ),
        ),
        (
            "XGBClassifier",
            XGBClassifier(
                input_cols=[
                    '"AccountWeeks"',
                    '"ContractRenewal"',
                    '"DataPlan"',
                    '"DataUsage"',
                    '"CustServCalls"',
                    '"DayMins"',
                    '"MonthlyCharge"',
                    '"DayCalls"',
                    '"RoamMins"',
                    '"OverageFee"',
                ],
                label_cols=['"Churn"'],
                output_cols=['"OUTPUT_Churn"'],
            ),
        ),
    ]
)


To perform hyperparameter tuning in the pipeline, replace the final estimator step with the following code.
This is runnable example code containing a possible grid of hyperparameter combinations.
To learn more about GridSearchCV take a look at the [docs](https://docs.snowflake.com/en/developer-guide/snowpark-ml/reference/latest/api/modeling/snowflake.ml.modeling.model_selection.GridSearchCV).

```python
(
    "GridSearchCV",
    GridSearchCV(
        estimator=XGBClassifier(),
        param_grid={
            "n_estimators": [100, 200, 300],
            "max_depth": [3, 4, 5],
            "learning_rate": [0.1, 0.01, 0.001],
        },
        input_cols=[
            '"AccountWeeks"',
            '"ContractRenewal"',
            '"DataPlan"',
            '"DataUsage"',
            '"CustServCalls"',
            '"DayMins"',
            '"MonthlyCharge"',
            '"DayCalls"',
            '"RoamMins"',
            '"OverageFee"',
        ],
        label_cols=['"Churn"'],
        output_cols=['"OUTPUT_Churn"'],
        passthrough_cols=[],
        sample_weight_col=None,
    ),
)


Fit the pipeline on training data

In [None]:
pipeline.fit(train_df)

Predict on Test set.

In [None]:
result = pipeline.predict(train_df)

Review the results.

In [None]:
result.show()

Log the model to the registry.

In [None]:
reg = Registry(session, database_name="CHURN_PROD", schema_name="ANALYTICS")
reg.log_model(model_name="QuickChurnModelNotebok_ML_SIDEKICK", model=pipeline)

In [None]:
reg.get_model("QuickChurnModelNotebok_ML_SIDEKICK").show_versions()