# Model Packaging Example

## Before Everything

### Install `snowflake-ml-python` locally

Before `snowflake-ml-python` is publicly available, you have to install from wheel file. Once it is ready, you could install them like other packages in PIP or conda.

In [1]:
%pip install snowflake_ml_python-0.3.2-py3-none-any.whl

# Snowpark Connector, Snowpark Library, Session
import snowflake.connector
import snowflake.snowpark
import snowflake.ml.preprocessing as snowml
from snowflake.snowpark import Session
from snowflake.snowpark.version import VERSION
from snowflake.ml.utils import connection_params

Notice: It is suggested to use pure-pip environment or empty conda environment when you try this. If you insist to install snowML in a conda environment with packages, it is suggested that you should install all requirements and install `snowflake-ml-python` with `--no-deps` flag.

### Setup Notebook

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
# Scale cell width with the browser window to accommodate .show() commands for wider tables.
from IPython.display import display, HTML

display(HTML("<style>.container { width:100% !important; }</style>"))

### Start Snowpark Session

To avoid exposing credentials in Github, we use a small utility `SnowflakeLoginOptions`. It allows you to score your default credentials in `~/.snowsql/config` in the following format:
```
[connections]
accountname = <string>   # Account identifier to connect to Snowflake.
username = <string>      # User name in the account. Optional.
password = <string>      # User password. Optional.
dbname = <string>        # Default database. Optional.
schemaname = <string>    # Default schema. Optional.
warehousename = <string> # Default warehouse. Optional.
#rolename = <string>      # Default role. Optional.
#authenticator = <string> # Authenticator: 'snowflake', 'externalbrowser', etc
```
Please follow [this](https://docs.snowflake.com/en/user-guide/snowsql-start.html#configuring-default-connection-settings) for more details.

In [4]:
from snowflake.ml.utils.connection_params import SnowflakeLoginOptions
from snowflake.snowpark import Session

session = Session.builder.configs(SnowflakeLoginOptions()).create()

### Let `snowflake-ml-python` available for your models to be deployed

Unfortunately, since `snowflake-ml-python` does not exist in Anaconda channel yet, we have to import them manually to use it when the model get deployed to Snowflake. To avoid upload them again and again, we could set up a temporary stage and upload the wheel file there.

In [5]:
SNOW_ML_WHEEL_LOCAL_PATH = "~/snowml/bazel-bin/snowflake/ml/snowflake_ml_python-0.3.2-py3-none-any.whl"

In [6]:
import os


def upload_snowml_to_tmp_stage(session: Session, wheel_path: str) -> str:
    """Upload model module of snowml to tmp stage.

    Args:
        session: Snowpark session.
        wheel_path: Path to the local SnowML wheel file.

    Returns:
        The stage path to uploaded snowml.zip file.
    """
    tmp_stage = session.get_session_stage()
    _ = session.file.put(wheel_path, tmp_stage, auto_compress=False, overwrite=True)
    whl_filename = os.path.basename(wheel_path)
    return f"{tmp_stage}/{whl_filename}"

In [7]:
SNOW_ML_WHEEL_STAGE_PATH = upload_snowml_to_tmp_stage(session, SNOW_ML_WHEEL_LOCAL_PATH)

### Open/Create Model Registry

A model registry needs to be created before it can be used. The creation will create a new database in the current account so the active role needs to have permissions to create a database. After the first creation, the model registry can be opened without the need to create it again.

In [8]:
REGISTRY_DATABASE_NAME = "TEMP"
REGISTRY_SCHEMA_NAME = "WZHAO"

In [9]:
from snowflake.ml.registry import model_registry
model_registry.create_model_registry(session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME)
registry = model_registry.ModelRegistry(session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME)



## Use with snowml model

In [10]:
from snowflake.ml.modeling.xgboost import XGBClassifier
from sklearn.datasets import load_iris
import numpy as np
import pandas as pd


iris = load_iris()
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])
df.columns = [s.replace(" (CM)", '').replace(' ', '') for s in df.columns.str.upper()]

INPUT_COLUMNS = ['SEPALLENGTH', 'SEPALWIDTH', 'PETALLENGTH', 'PETALWIDTH']
LABEL_COLUMNS = 'TARGET'
OUTPUT_COLUMNS = 'PREDICTED_TARGET'

In [11]:
df

Unnamed: 0,SEPALLENGTH,SEPALWIDTH,PETALLENGTH,PETALWIDTH,TARGET
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2.0
146,6.3,2.5,5.0,1.9,2.0
147,6.5,3.0,5.2,2.0,2.0
148,6.2,3.4,5.4,2.3,2.0


In [12]:
test_features = df[:10]
model_version = "1_007"

### XGBoost model

In [13]:
clf_xgb = XGBClassifier(input_cols=INPUT_COLUMNS,
                          output_cols=OUTPUT_COLUMNS,
                          label_cols=LABEL_COLUMNS)

clf_xgb.fit(df)

<snowflake.ml.modeling.xgboost.xgb_classifier.XGBClassifier at 0x7f9c589c5dc0>

In [14]:
prediction = clf_xgb.predict(test_features)
prediction_proba = clf_xgb.predict_proba(test_features)

In [15]:
model_name = "SIMPLE_XGB_MODEL"
deploy_name = "xgb_model_predict"

In [16]:
# A name and model tags can be added to the model at registration time.
model_id = registry.log_model(
    model_name=model_name,
    model_version=model_version,
    model=clf_xgb,
    tags={"stage": "testing", "classifier_type": "XGBClassifier"},
    sample_input_data=test_features[:10], # this line can be removed after modelSignature
)

# The object API can be used to reference a model after creation.
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
print("Registered new model:", model_id)

Registered new model: a1af4d7afbf111ed8e10ce0e8c87ef9b


### Test on the result using load_model 

In [17]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
restored_clf = model.load_model()

restored_prediction = restored_clf.predict(test_features)

print("Original prediction:", prediction[:10])
print("Restored prediction:", restored_prediction[:10])

print("Result comparison:", np.array_equal(prediction, restored_prediction[prediction.columns]))

Original prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  PREDICTED_TARGET
0          5.1         3.5          1.4         0.2     0.0                 0
1          4.9         3.0          1.4         0.2     0.0                 0
2          4.7         3.2          1.3         0.2     0.0                 0
3          4.6         3.1          1.5         0.2     0.0                 0
4          5.0         3.6          1.4         0.2     0.0                 0
5          5.4         3.9          1.7         0.4     0.0                 0
6          4.6         3.4          1.4         0.3     0.0                 0
7          5.0         3.4          1.5         0.2     0.0                 0
8          4.4         2.9          1.4         0.2     0.0                 0
9          4.9         3.1          1.5         0.1     0.0                 0
Restored prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  PREDICTED_TARGET
0          5.1        

### Testing on deploy

#### Predict function match/mismatch? - comparsion between deploy and local

In [18]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
model.deploy(
    deployment_name=deploy_name,
    target_method="predict",
    options={"_snowml_wheel_path": SNOW_ML_WHEEL_STAGE_PATH, "relax_version": True},
)



Generated UDF file is persisted at: /var/folders/76/47j700wn3g905_97713xwpnm0000gn/T/tmpuowyt6bq.py




xgb_model_predict is deployed to warehouse.


In [19]:
remote_prediction = model.predict(deployment_name=deploy_name, data=test_features)

print("Remote prediction:", remote_prediction[:10])

print("Result comparison:", np.array_equal(prediction, remote_prediction.values))

Remote prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  PREDICTED_TARGET
0          5.1         3.5          1.4         0.2     0.0                 0
1          4.9         3.0          1.4         0.2     0.0                 0
2          4.7         3.2          1.3         0.2     0.0                 0
3          4.6         3.1          1.5         0.2     0.0                 0
4          5.0         3.6          1.4         0.2     0.0                 0
5          5.4         3.9          1.7         0.4     0.0                 0
6          4.6         3.4          1.4         0.3     0.0                 0
7          5.0         3.4          1.5         0.2     0.0                 0
8          4.4         2.9          1.4         0.2     0.0                 0
9          4.9         3.1          1.5         0.1     0.0                 0
Result comparison: True


#### Predict_proba function match/mismatch? - comparsion between deploy and local

In [20]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
model.deploy(
    deployment_name=deploy_name,
    target_method="predict_proba",
    options={"_snowml_wheel_path": SNOW_ML_WHEEL_STAGE_PATH, "relax_version": True},
)



Generated UDF file is persisted at: /var/folders/76/47j700wn3g905_97713xwpnm0000gn/T/tmpgltgg1aw.py




xgb_model_predict is deployed to warehouse.


In [21]:
remote_prediction_proba = model.predict(deployment_name=deploy_name, data=test_features)

print("Remote prediction:", remote_prediction_proba[:10])

print("Result comparison:", np.allclose(prediction_proba, remote_prediction_proba.values))

Remote prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  \
0          5.1         3.5          1.4         0.2     0.0   
1          4.9         3.0          1.4         0.2     0.0   
2          4.7         3.2          1.3         0.2     0.0   
3          4.6         3.1          1.5         0.2     0.0   
4          5.0         3.6          1.4         0.2     0.0   
5          5.4         3.9          1.7         0.4     0.0   
6          4.6         3.4          1.4         0.3     0.0   
7          5.0         3.4          1.5         0.2     0.0   
8          4.4         2.9          1.4         0.2     0.0   
9          4.9         3.1          1.5         0.1     0.0   

   predict_proba_0.0  predict_proba_1.0  predict_proba_2.0  
0           0.996803           0.002383           0.000814  
1           0.996362           0.002382           0.001256  
2           0.996803           0.002383           0.000814  
3           0.996795           0.002383    

### Random Forest model *from ensemble*


In [22]:
from snowflake.ml.modeling.ensemble import RandomForestClassifier

In [23]:
clf_rf = RandomForestClassifier(input_cols=INPUT_COLUMNS,
                          output_cols=OUTPUT_COLUMNS,
                          label_cols=LABEL_COLUMNS)

clf_rf.fit(df)

<snowflake.ml.modeling.ensemble.random_forest_classifier.RandomForestClassifier at 0x7f9c6c2512e0>

In [24]:
prediction = clf_rf.predict(test_features)
prediction_proba = clf_rf.predict_proba(test_features)
prediction_log_proba = clf_rf.predict_log_proba(test_features)

  return np.log(proba)


In [25]:
model_name = "SIMPLE_RF_MODEL"
deploy_name = "rf_model_predict"
classifier_type = "RFClassifier"

In [26]:
# A name and model tags can be added to the model at registration time.
model_id = registry.log_model(
    model_name=model_name,
    model_version=model_version,
    model=clf_rf,
    tags={"stage": "testing", "classifier_type": classifier_type},
    sample_input_data=test_features, # this line can be removed after modelSignature
)

# The object API can be used to reference a model after creation.
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
print("Registered new model:", model_id)

  return np.log(proba)


Registered new model: fe277924fbf111ed8e10ce0e8c87ef9b


#### Comparsion between load_model

In [27]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
restored_clf = model.load_model()

restored_prediction = restored_clf.predict(test_features)

print("Original prediction:", prediction[:10])
print("Restored prediction:", restored_prediction[:10])

print("Result comparison:", np.array_equal(prediction, restored_prediction[prediction.columns]))

Original prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  PREDICTED_TARGET
0          5.1         3.5          1.4         0.2     0.0               0.0
1          4.9         3.0          1.4         0.2     0.0               0.0
2          4.7         3.2          1.3         0.2     0.0               0.0
3          4.6         3.1          1.5         0.2     0.0               0.0
4          5.0         3.6          1.4         0.2     0.0               0.0
5          5.4         3.9          1.7         0.4     0.0               0.0
6          4.6         3.4          1.4         0.3     0.0               0.0
7          5.0         3.4          1.5         0.2     0.0               0.0
8          4.4         2.9          1.4         0.2     0.0               0.0
9          4.9         3.1          1.5         0.1     0.0               0.0
Restored prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  PREDICTED_TARGET
0          5.1        

#### Comparsion between deploy

In [28]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
model.deploy(
    deployment_name=deploy_name,
    target_method="predict",
    options={"_snowml_wheel_path": SNOW_ML_WHEEL_STAGE_PATH, "relax_version": True},
)



Generated UDF file is persisted at: /var/folders/76/47j700wn3g905_97713xwpnm0000gn/T/tmpvnzzuuxw.py




rf_model_predict is deployed to warehouse.


In [29]:
remote_prediction = model.predict(deployment_name=deploy_name, data=test_features)

print("Remote prediction:", remote_prediction[:10])

print("Result comparison:", np.array_equal(prediction, remote_prediction.values))

Remote prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  PREDICTED_TARGET
0          5.1         3.5          1.4         0.2     0.0               0.0
1          4.9         3.0          1.4         0.2     0.0               0.0
2          4.7         3.2          1.3         0.2     0.0               0.0
3          4.6         3.1          1.5         0.2     0.0               0.0
4          5.0         3.6          1.4         0.2     0.0               0.0
5          5.4         3.9          1.7         0.4     0.0               0.0
6          4.6         3.4          1.4         0.3     0.0               0.0
7          5.0         3.4          1.5         0.2     0.0               0.0
8          4.4         2.9          1.4         0.2     0.0               0.0
9          4.9         3.1          1.5         0.1     0.0               0.0
Result comparison: True


In [30]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
model.deploy(
    deployment_name=deploy_name,
    target_method="predict_proba",
    options={"_snowml_wheel_path": SNOW_ML_WHEEL_STAGE_PATH, "relax_version": True},
)



Generated UDF file is persisted at: /var/folders/76/47j700wn3g905_97713xwpnm0000gn/T/tmp6rm6hkvn.py




rf_model_predict is deployed to warehouse.


In [31]:
remote_prediction_proba = model.predict(deployment_name=deploy_name, data=test_features)

print("Remote prediction:", remote_prediction_proba[:10])

print("Result comparison:", np.array_equal(prediction_proba, remote_prediction_proba.values))

Remote prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  \
0          5.1         3.5          1.4         0.2     0.0   
1          4.9         3.0          1.4         0.2     0.0   
2          4.7         3.2          1.3         0.2     0.0   
3          4.6         3.1          1.5         0.2     0.0   
4          5.0         3.6          1.4         0.2     0.0   
5          5.4         3.9          1.7         0.4     0.0   
6          4.6         3.4          1.4         0.3     0.0   
7          5.0         3.4          1.5         0.2     0.0   
8          4.4         2.9          1.4         0.2     0.0   
9          4.9         3.1          1.5         0.1     0.0   

   predict_proba_0.0  predict_proba_1.0  predict_proba_2.0  
0                1.0                0.0                0.0  
1                1.0                0.0                0.0  
2                1.0                0.0                0.0  
3                1.0                0.0    

In [32]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
model.deploy(
    deployment_name=deploy_name,
    target_method="predict_log_proba",
    options={"_snowml_wheel_path": SNOW_ML_WHEEL_STAGE_PATH, "relax_version": True},
)



Generated UDF file is persisted at: /var/folders/76/47j700wn3g905_97713xwpnm0000gn/T/tmpfpyrwg0l.py




rf_model_predict is deployed to warehouse.


In [33]:
remote_prediction_log_proba = model.predict(deployment_name=deploy_name, data=test_features)

print("Remote prediction:", remote_prediction_log_proba[:10])

print("Result comparison:", np.array_equal(prediction_log_proba, remote_prediction_log_proba.values))

Remote prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  \
0          5.1         3.5          1.4         0.2     0.0   
1          4.9         3.0          1.4         0.2     0.0   
2          4.7         3.2          1.3         0.2     0.0   
3          4.6         3.1          1.5         0.2     0.0   
4          5.0         3.6          1.4         0.2     0.0   
5          5.4         3.9          1.7         0.4     0.0   
6          4.6         3.4          1.4         0.3     0.0   
7          5.0         3.4          1.5         0.2     0.0   
8          4.4         2.9          1.4         0.2     0.0   
9          4.9         3.1          1.5         0.1     0.0   

   predict_log_proba_0.0  predict_log_proba_1.0  predict_log_proba_2.0  
0                    0.0                   -inf                   -inf  
1                    0.0                   -inf                   -inf  
2                    0.0                   -inf                   -i

### Logistic Regression model

The reason to test w/ LR model is because, it has all the functions such as `predict, predict_log_proba, predict_proba, decision_function`

In [34]:
from snowflake.ml.modeling.linear_model import LogisticRegression

In [35]:
clf_lr = LogisticRegression(input_cols=INPUT_COLUMNS,
                          output_cols=OUTPUT_COLUMNS,
                          label_cols=LABEL_COLUMNS,
                           max_iter=1000)

clf_lr.fit(df)

<snowflake.ml.modeling.linear_model.logistic_regression.LogisticRegression at 0x7f9c5927fd30>

In [36]:
prediction = clf_lr.predict(test_features)
prediction_proba = clf_lr.predict_proba(test_features)
prediction_log_proba = clf_lr.predict_log_proba(test_features)
prediction_decision = clf_lr.decision_function(test_features)

In [37]:
model_name = "SIMPLE_LR_MODEL"
deploy_name = "lr_model_predict"
classifier_type = "LogisticRegression"

In [38]:
# A name and model tags can be added to the model at registration time.
model_id = registry.log_model(
    model_name=model_name,
    model_version=model_version,
    model=clf_lr,
    tags={"stage": "testing", "classifier_type": classifier_type},
    sample_input_data=test_features, # this line can be removed after modelSignature
)

# The object API can be used to reference a model after creation.
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
print("Registered new model:", model_id)

Registered new model: 1ae7d6b2fbf211ed8e10ce0e8c87ef9b


#### Comparison between load_model

In [39]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
restored_clf = model.load_model()

restored_prediction = restored_clf.predict(test_features)

print("Original prediction:", prediction[:10])
print("Restored prediction:", restored_prediction[:10])

print("Result comparison:", np.array_equal(prediction, restored_prediction[prediction.columns]))

Original prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  PREDICTED_TARGET
0          5.1         3.5          1.4         0.2     0.0               0.0
1          4.9         3.0          1.4         0.2     0.0               0.0
2          4.7         3.2          1.3         0.2     0.0               0.0
3          4.6         3.1          1.5         0.2     0.0               0.0
4          5.0         3.6          1.4         0.2     0.0               0.0
5          5.4         3.9          1.7         0.4     0.0               0.0
6          4.6         3.4          1.4         0.3     0.0               0.0
7          5.0         3.4          1.5         0.2     0.0               0.0
8          4.4         2.9          1.4         0.2     0.0               0.0
9          4.9         3.1          1.5         0.1     0.0               0.0
Restored prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  PREDICTED_TARGET
0          5.1        

#### Comparison between deploy

In [40]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
model.deploy(
    deployment_name=deploy_name,
    target_method="predict",
    options={"_snowml_wheel_path": SNOW_ML_WHEEL_STAGE_PATH, "relax_version": True},
)



Generated UDF file is persisted at: /var/folders/76/47j700wn3g905_97713xwpnm0000gn/T/tmpp1upu42a.py




lr_model_predict is deployed to warehouse.


In [41]:
remote_prediction = model.predict(deployment_name=deploy_name, data=test_features)

print("Remote prediction:", remote_prediction[:10])

print("Result comparison:", np.array_equal(prediction, remote_prediction.values))

Remote prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  PREDICTED_TARGET
0          5.1         3.5          1.4         0.2     0.0               0.0
1          4.9         3.0          1.4         0.2     0.0               0.0
2          4.7         3.2          1.3         0.2     0.0               0.0
3          4.6         3.1          1.5         0.2     0.0               0.0
4          5.0         3.6          1.4         0.2     0.0               0.0
5          5.4         3.9          1.7         0.4     0.0               0.0
6          4.6         3.4          1.4         0.3     0.0               0.0
7          5.0         3.4          1.5         0.2     0.0               0.0
8          4.4         2.9          1.4         0.2     0.0               0.0
9          4.9         3.1          1.5         0.1     0.0               0.0
Result comparison: True


In [42]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
model.deploy(
    deployment_name=deploy_name,
    target_method="predict_proba",
    options={"_snowml_wheel_path": SNOW_ML_WHEEL_STAGE_PATH, "relax_version": True},
)



Generated UDF file is persisted at: /var/folders/76/47j700wn3g905_97713xwpnm0000gn/T/tmp9p9ocx8r.py




lr_model_predict is deployed to warehouse.


In [43]:
remote_prediction_proba = model.predict(deployment_name=deploy_name, data=test_features)

print("Remote prediction:", remote_prediction_proba[:10])

print("Result comparison:", np.allclose(prediction_proba, remote_prediction_proba.values))

Remote prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  \
0          5.1         3.5          1.4         0.2     0.0   
1          4.9         3.0          1.4         0.2     0.0   
2          4.7         3.2          1.3         0.2     0.0   
3          4.6         3.1          1.5         0.2     0.0   
4          5.0         3.6          1.4         0.2     0.0   
5          5.4         3.9          1.7         0.4     0.0   
6          4.6         3.4          1.4         0.3     0.0   
7          5.0         3.4          1.5         0.2     0.0   
8          4.4         2.9          1.4         0.2     0.0   
9          4.9         3.1          1.5         0.1     0.0   

   predict_proba_0.0  predict_proba_1.0  predict_proba_2.0  
0           0.981584           0.018416       1.449704e-08  
1           0.971334           0.028666       3.019028e-08  
2           0.985275           0.014725       1.233695e-08  
3           0.976064           0.023936    

In [44]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
model.deploy(
    deployment_name=deploy_name,
    target_method="predict_log_proba",
    options={"_snowml_wheel_path": SNOW_ML_WHEEL_STAGE_PATH, "relax_version": True},
)



Generated UDF file is persisted at: /var/folders/76/47j700wn3g905_97713xwpnm0000gn/T/tmp4kciu22a.py




lr_model_predict is deployed to warehouse.


In [45]:
remote_prediction_log_proba = model.predict(deployment_name=deploy_name, data=test_features)

print("Remote prediction:", remote_prediction_log_proba[:10])

print("Result comparison:", np.allclose(prediction_log_proba, remote_prediction_log_proba.values))

Remote prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  \
0          5.1         3.5          1.4         0.2     0.0   
1          4.9         3.0          1.4         0.2     0.0   
2          4.7         3.2          1.3         0.2     0.0   
3          4.6         3.1          1.5         0.2     0.0   
4          5.0         3.6          1.4         0.2     0.0   
5          5.4         3.9          1.7         0.4     0.0   
6          4.6         3.4          1.4         0.3     0.0   
7          5.0         3.4          1.5         0.2     0.0   
8          4.4         2.9          1.4         0.2     0.0   
9          4.9         3.1          1.5         0.1     0.0   

   predict_log_proba_0.0  predict_log_proba_1.0  predict_log_proba_2.0  
0              -0.018588              -3.994513             -18.049321  
1              -0.029085              -3.552040             -17.315746  
2              -0.014834              -4.218213             -18.2106

In [46]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
model.deploy(
    deployment_name=deploy_name,
    target_method="decision_function",
    options={"_snowml_wheel_path": SNOW_ML_WHEEL_STAGE_PATH, "relax_version": True},
)



Generated UDF file is persisted at: /var/folders/76/47j700wn3g905_97713xwpnm0000gn/T/tmpevdk6spg.py




lr_model_predict is deployed to warehouse.


In [47]:
remote_prediction_decision_function = model.predict(deployment_name=deploy_name, data=test_features)

print("Remote prediction:", remote_prediction_decision_function[:10])

print("Result comparison:", np.allclose(prediction_decision, remote_prediction_decision_function.values))

Remote prediction:    SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET  \
0          5.1         3.5          1.4         0.2     0.0   
1          4.9         3.0          1.4         0.2     0.0   
2          4.7         3.2          1.3         0.2     0.0   
3          4.6         3.1          1.5         0.2     0.0   
4          5.0         3.6          1.4         0.2     0.0   
5          5.4         3.9          1.7         0.4     0.0   
6          4.6         3.4          1.4         0.3     0.0   
7          5.0         3.4          1.5         0.2     0.0   
8          4.4         2.9          1.4         0.2     0.0   
9          4.9         3.1          1.5         0.1     0.0   

   decision_function_0.0  decision_function_1.0  decision_function_2.0  
0               7.335553               3.359628             -10.695181  
1               6.936539               3.413583             -10.350122  
2               7.466404               3.263025             -10.7294

### Pipeline model

It is important to see if the whole pipeline is stored

In [48]:
def add_simple_category(df):
    bins = (-1, 4, 5, 6, 10)
    group_names = ['Unknown', '1_quartile', '2_quartile', '3_quartile']
    categories = pd.cut(df.SEPALLENGTH, bins, labels=group_names)
    df['SIMPLE'] = categories
    return df
df_cat = add_simple_category(df)

numeric_features=['SEPALLENGTH', 'SEPALWIDTH', 'PETALLENGTH', 'PETALWIDTH']
categorical_features = ['SIMPLE']
numeric_features_output = [x + '_O' for x in numeric_features]

In [49]:
# Define the Table and Cleanup Cols, have a work_schema for testing


############################################################################
# NOTE: 
#    Set work_schema variable to some schema that exists in your account.
#    set data_dir to point to the directory that contains the diamonds.csv file.
############################################################################
work_schema = 'TEST'
demo_table = 'IRIS_UPPER'

# write the DF to Snowflake and create a Snowflake DF
session.write_pandas(df_cat, demo_table, auto_create_table=True, table_type="temporary", schema=work_schema)

<snowflake.snowpark.table.Table at 0x7f9c5929f850>

In [50]:
# Diamonds Snowflake Table
input_tbl = f"{session.get_current_database()}.{session.get_current_schema()}.{demo_table}"
iris_df = session.table(input_tbl)
print(iris_df.limit(10).to_pandas())

   SEPALLENGTH  SEPALWIDTH  PETALLENGTH  PETALWIDTH  TARGET      SIMPLE
0          5.1         3.5          1.4         0.2     0.0  2_quartile
1          4.9         3.0          1.4         0.2     0.0  1_quartile
2          4.7         3.2          1.3         0.2     0.0  1_quartile
3          4.6         3.1          1.5         0.2     0.0  1_quartile
4          5.0         3.6          1.4         0.2     0.0  1_quartile
5          5.4         3.9          1.7         0.4     0.0  2_quartile
6          4.6         3.4          1.4         0.3     0.0  1_quartile
7          5.0         3.4          1.5         0.2     0.0  1_quartile
8          4.4         2.9          1.4         0.2     0.0  1_quartile
9          4.9         3.1          1.5         0.1     0.0  1_quartile


In [51]:
from snowflake.ml.modeling.linear_model import LogisticRegression
from snowflake.ml.preprocessing import MinMaxScaler, StandardScaler, OneHotEncoder
from snowflake.ml.framework.pipeline import Pipeline
pipeline = Pipeline(
    steps=[
        ('OHEHOT', OneHotEncoder(input_cols=categorical_features, output_cols='cat_output', drop_input_cols=True), ),
        ('SCALER', MinMaxScaler(clip=True, input_cols=numeric_features, output_cols=numeric_features_output, drop_input_cols=True), ),
        ('CLASSIFIER', LogisticRegression(label_cols=LABEL_COLUMNS))
    ])
pipeline.fit(iris_df)

  success, nchunks, nrows, ci_output = write_pandas(


<snowflake.ml.framework.pipeline.Pipeline at 0x7f9c6c3e6ca0>

In [52]:
iris_df_test = iris_df.limit(10)
prediction = pipeline.predict(iris_df_test)

  success, nchunks, nrows, ci_output = write_pandas(


In [53]:
pipeline.fit(iris_df.to_pandas())



<snowflake.ml.framework.pipeline.Pipeline at 0x7f9c6c3e6ca0>

In [54]:
prediction = pipeline.predict(iris_df_test.to_pandas())
prediction_log_proba = pipeline.predict_log_proba(iris_df_test.to_pandas())
prediction_proba = pipeline.predict_proba(iris_df_test.to_pandas())



In [55]:
model_name = "SIMPLE_PP_MODEL"
deploy_name = "pp_model_predict"
classifier_type = "Pipeline"
model_version = f"{model_name}_007"

In [56]:
# A name and model tags can be added to the model at registration time.
model_id = registry.log_model(
    model_name=model_name,
    model_version=model_version,
    model=pipeline,
    tags={"stage": "testing", "classifier_type": classifier_type},
    sample_input_data=iris_df_test.to_pandas(), # this line can be removed after modelSignature
)

# The object API can be used to reference a model after creation.
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
print("Registered new model:", model_id)



Registered new model: 46bab6bafbf211ed8e10ce0e8c87ef9b


#### Comparison between load_model

In [57]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
restored_clf = model.load_model()

restored_prediction = restored_clf.predict(iris_df_test.to_pandas())

print("Original prediction:", prediction[:10])
print("Restored prediction:", restored_prediction[:10])

print("Result comparison:", np.array_equal(prediction, restored_prediction[prediction.columns]))

Original prediction:    TARGET  "cat_output_1_quartile"  "cat_output_2_quartile"  \
0     0.0                      0.0                      1.0   
1     0.0                      1.0                      0.0   
2     0.0                      1.0                      0.0   
3     0.0                      1.0                      0.0   
4     0.0                      1.0                      0.0   
5     0.0                      0.0                      1.0   
6     0.0                      1.0                      0.0   
7     0.0                      1.0                      0.0   
8     0.0                      1.0                      0.0   
9     0.0                      1.0                      0.0   

   "cat_output_3_quartile"  SEPALLENGTH_O  SEPALWIDTH_O  PETALLENGTH_O  \
0                      0.0       0.222222      0.625000       0.067797   
1                      0.0       0.166667      0.416667       0.067797   
2                      0.0       0.111111      0.500000       0



#### Comparison between deploy predict

In [58]:
registry = model_registry.ModelRegistry(
    session=session, database_name=REGISTRY_DATABASE_NAME, schema_name=REGISTRY_SCHEMA_NAME
)
model = model_registry.ModelReference(registry=registry, model_name=model_name, model_version=model_version)
model.deploy(
    deployment_name=deploy_name,
    target_method="predict",
    options={"_snowml_wheel_path": SNOW_ML_WHEEL_STAGE_PATH, "relax_version": True},
)



Generated UDF file is persisted at: /var/folders/76/47j700wn3g905_97713xwpnm0000gn/T/tmp0_o73cne.py




pp_model_predict is deployed to warehouse.


In [59]:
remote_prediction = model.predict(deployment_name=deploy_name, data=iris_df_test.to_pandas())

print("Remote prediction:", remote_prediction[:10])

print("Result comparison:", np.allclose(prediction, remote_prediction.values))

Remote prediction:    TARGET  "cat_output_1_quartile"  "cat_output_2_quartile"  \
0     0.0                      0.0                      1.0   
1     0.0                      1.0                      0.0   
2     0.0                      1.0                      0.0   
3     0.0                      1.0                      0.0   
4     0.0                      1.0                      0.0   
5     0.0                      0.0                      1.0   
6     0.0                      1.0                      0.0   
7     0.0                      1.0                      0.0   
8     0.0                      1.0                      0.0   
9     0.0                      1.0                      0.0   

   "cat_output_3_quartile"  SEPALLENGTH_O  SEPALWIDTH_O  PETALLENGTH_O  \
0                      0.0       0.222222      0.625000       0.067797   
1                      0.0       0.166667      0.416667       0.067797   
2                      0.0       0.111111      0.500000       0.0