Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 37 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,41 @@
# Release History

## 1.0.5
## 1.0.6

### New Features
- Model Registry: add `create_if_not_exists` parameter in constructor.
- Model Registry: Added get_or_create_model_registry API.
- Model Registry: Added support for using GPU inference when deploying XGBoost (`xgboost.XGBModel` and `xgboost.Booster`), PyTorch (`torch.nn.Module` and `torch.jit.ScriptModule`) and TensorFlow (`tensorflow.Module` and `tensorflow.keras.Model`) models to Snowpark Container Services.
- Model Registry: When inferring model signature, `Sequence` of built-in types, `Sequence` of `numpy.ndarray`, `Sequence` of `torch.Tensor`, `Sequence` of `tensorflow.Tensor` and `Sequence` of `tensorflow.Tensor` can be used instead of only `List` of them.
- Model Registry: Added `get_training_dataset` API.
- Model Development: Size of metrics result can exceed previous 8MB limit.
- Model Registry: Added support save/load/deploy HuggingFace pipeline object (`transformers.Pipeline`) and our wrapper (`snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel`) to it. Using the wrapper to specify configurations and the model for the pipeline will be loaded dynamically when deploying. Currently, following tasks are supported to log without manually specifying model signatures:
- "conversational"
- "fill-mask"
- "question-answering"
- "summarization"
- "table-question-answering"
- "text2text-generation"
- "text-classification" (alias "sentiment-analysis" available)
- "text-generation"
- "token-classification" (alias "ner" available)
- "translation"
- "translation_xx_to_yy"
- "zero-shot-classification"

### Bug Fixes
- Model Development: Fixed a bug when using simple imputer with numpy >= 1.25.
- Model Development: Fixed a bug when inferring the type of label columns.

### Behavior Changes
- Model Registry: `log_model()` now return a `ModelReference` object instead of a model ID.
- Model Registry: When deploying a model with 1 `target method` only, the `target_method` argument can be omitted.
- Model Registry: When using the snowflake-ml-python with version newer than what is available in Snowflake Anaconda Channel, `embed_local_ml_library` option will be set as `True` automatically if not.
- Model Registry: When deploying a model to Snowpark Container Services and using GPU, the default value of num_workers will be 1.
- Model Registry: `keep_order` and `output_with_input_features` in the deploy options have been removed. Now the behavior is controlled by the type of the input when calling `model.predict()`. If the input is a `pandas.DataFrame`, the behavior will be the same as `keep_order=True` and `output_with_input_features=False` before. If the input is a `snowpark.DataFrame`, the behavior will be the same as `keep_order=False` and `output_with_input_features=True` before.
- Model Registry: When logging and deploying PyTorch (`torch.nn.Module` and `torch.jit.ScriptModule`) and TensorFlow (`tensorflow.Module` and `tensorflow.keras.Model`) models, we no longer accept models whose input is a list of tensor and output is a list of tensors. Instead, now we accept models whose input is 1 or more tensors as positional arguments, and output is a tensor or a tuple of tensors. The input and output dataframe when predicting keep the same as before, that is every column is an array feature and contains a tensor.

## 1.0.5 (2023-08-17)

### New Features

Expand All @@ -13,7 +48,7 @@
- Model Registry: Fixed an issue that the UDF name created when deploying a model is not identical to what is provided and cannot be correctly dropped when deployment getting dropped.
- connection_params.SnowflakeLoginOptions(): Added support for `private_key_path`.

## 1.0.4
## 1.0.4 (2023-07-28)

### New Features

Expand Down
2 changes: 1 addition & 1 deletion bazel/environments/conda-env-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@ dependencies:
- numpy==1.24.3
- packaging==23.0
- pyyaml==6.0
- scikit-learn==1.2.2
- scikit-learn==1.3.0
- xgboost==1.7.3
5 changes: 4 additions & 1 deletion bazel/environments/conda-env-snowflake.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ dependencies:
- aiohttp==3.8.3
- anyio==3.5.0
- boto3==1.24.28
- cachetools==4.2.2
- cloudpickle==2.0.0
- conda-libmamba-solver==23.3.0
- coverage==6.3.2
Expand All @@ -23,6 +24,7 @@ dependencies:
- lightgbm==3.3.5
- mlflow==2.3.1
- moto==4.0.11
- multipledispatch==0.6.0
- mypy==0.981
- networkx==2.8.4
- numpy==1.24.3
Expand All @@ -36,13 +38,14 @@ dependencies:
- requests==2.29.0
- ruamel.yaml==0.17.21
- s3fs==2022.11.0
- scikit-learn==1.2.2
- scikit-learn==1.3.0
- scipy==1.9.3
- snowflake-connector-python==3.0.3
- snowflake-snowpark-python==1.5.1
- sqlparse==0.4.3
- tensorflow==2.10.0
- transformers==4.29.2
- types-protobuf==4.23.0.1
- types-requests==2.30.0.0
- typing-extensions==4.5.0
- xgboost==1.7.3
6 changes: 5 additions & 1 deletion bazel/environments/conda-env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@ dependencies:
- aiohttp==3.8.3
- anyio==3.5.0
- boto3==1.24.28
- cachetools==4.2.2
- cloudpickle==2.0.0
- conda-forge::starlette==0.27.0
- conda-forge::types-PyYAML==6.0.12
- conda-forge::types-cachetools==4.2.2
- conda-libmamba-solver==23.3.0
- coverage==6.3.2
- cryptography==39.0.1
Expand All @@ -25,6 +27,7 @@ dependencies:
- lightgbm==3.3.5
- mlflow==2.3.1
- moto==4.0.11
- multipledispatch==0.6.0
- mypy==0.981
- networkx==2.8.4
- numpy==1.24.3
Expand All @@ -39,13 +42,14 @@ dependencies:
- requests==2.29.0
- ruamel.yaml==0.17.21
- s3fs==2022.11.0
- scikit-learn==1.2.2
- scikit-learn==1.3.0
- scipy==1.9.3
- snowflake-connector-python==3.0.3
- snowflake-snowpark-python==1.5.1
- sqlparse==0.4.3
- tensorflow==2.10.0
- transformers==4.29.2
- types-protobuf==4.23.0.1
- types-requests==2.30.0.0
- typing-extensions==4.5.0
- xgboost==1.7.3
7 changes: 4 additions & 3 deletions ci/conda_recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ build:
noarch: python
package:
name: snowflake-ml-python
version: 1.0.5
version: 1.0.6
requirements:
build:
- python
Expand All @@ -34,7 +34,7 @@ requirements:
- python
- pyyaml>=6.0,<7
- requests
- scikit-learn>=1.2.1,<1.3
- scikit-learn>=1.2.1,<1.4
- scipy>=1.9,<2
- snowflake-connector-python>=3.0.3,<4
- snowflake-snowpark-python>=1.5.1,<2
Expand All @@ -43,8 +43,9 @@ requirements:
- xgboost>=1.7.3,<2
run_constrained:
- lightgbm==3.3.5
- mlflow>=2.1.0,<3
- mlflow>=2.1.0,<2.4
- tensorflow>=2.9,<3
- torchdata>=0.4,<1
- transformers>=4.29.2,<5
source:
path: ../../
26 changes: 19 additions & 7 deletions codegen/sklearn_wrapper_template.py_template
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ from snowflake.snowpark import DataFrame, Session
from snowflake.snowpark.functions import pandas_udf, sproc
from snowflake.snowpark.types import PandasSeries
from snowflake.snowpark._internal.type_utils import convert_sp_to_sf_type
from snowflake.snowpark._internal.utils import (
TempObjectType,
random_name_for_temp_object,
)

from snowflake.ml.model.model_signature import (
DataType,
Expand Down Expand Up @@ -244,7 +248,7 @@ class {transform.original_class_name}(BaseTransformer):
cp.dump(self._sklearn_object, local_transform_file)

# Create temp stage to run fit.
transform_stage_name = "SNOWML_TRANSFORM_{{safe_id}}".format(safe_id=self._get_rand_id())
transform_stage_name = random_name_for_temp_object(TempObjectType.STAGE)
stage_creation_query = f"CREATE OR REPLACE TEMPORARY STAGE {{transform_stage_name}};"
SqlResultValidator(
session=session,
Expand All @@ -258,7 +262,7 @@ class {transform.original_class_name}(BaseTransformer):
stage_result_file_name = posixpath.join(transform_stage_name, os.path.basename(local_transform_file_name))
local_result_file_name = get_temp_file_path()

fit_sproc_name = "SNOWML_FIT_{{safe_id}}".format(safe_id=self._get_rand_id())
fit_sproc_name = random_name_for_temp_object(TempObjectType.PROCEDURE)
statement_params = telemetry.get_function_usage_statement_params(
project=_PROJECT,
subproject=_SUBPROJECT,
Expand Down Expand Up @@ -439,8 +443,7 @@ class {transform.original_class_name}(BaseTransformer):
pkg_versions=self._get_dependencies(), session=session, subproject=_SUBPROJECT)

# Register vectorized UDF for batch inference
batch_inference_udf_name = "SNOWML_BATCH_INFERENCE_{{safe_id}}_{{method}}".format(
safe_id=self._get_rand_id(), method=inference_method)
batch_inference_udf_name = random_name_for_temp_object(TempObjectType.FUNCTION)

# Need to do this since if we use self._sklearn_object directly in the UDF, Snowpark
# will try to pickle all of self which fails.
Expand Down Expand Up @@ -701,8 +704,17 @@ class {transform.original_class_name}(BaseTransformer):
expected_type_inferred = "{transform.udf_datatype}"
# when it is classifier, infer the datatype from label columns
if expected_type_inferred == "" and 'predict' in self.model_signatures:
# Batch inference takes a single expected output column type. Use the first columns type for now.
# TODO: Handle varying output column types.
label_cols_signatures = [row for row in self.model_signatures['predict'].outputs if row.name in self.output_cols]
if len(label_cols_signatures) == 0:
error_str = f"Output columns {{self.output_cols}} do not match model signatures {{self.model_signatures['predict'].outputs}}."
raise exceptions.SnowflakeMLException(
error_code=error_codes.INVALID_ATTRIBUTE,
original_exception=ValueError(error_str),
)
expected_type_inferred = convert_sp_to_sf_type(
self.model_signatures['predict'].outputs[0].as_snowpark_type()
label_cols_signatures[0].as_snowpark_type()
)

output_df = self._batch_inference(
Expand Down Expand Up @@ -955,7 +967,7 @@ class {transform.original_class_name}(BaseTransformer):
cp.dump(self._sklearn_object, local_score_file)

# Create temp stage to run score.
score_stage_name = "SNOWML_SCORE_{{safe_id}}".format(safe_id=self._get_rand_id())
score_stage_name = random_name_for_temp_object(TempObjectType.STAGE)
session = dataset._session
assert session is not None # keep mypy happy
stage_creation_query = f"CREATE OR REPLACE TEMPORARY STAGE {{score_stage_name}};"
Expand All @@ -968,7 +980,7 @@ class {transform.original_class_name}(BaseTransformer):

# Use posixpath to construct stage paths
stage_score_file_name = posixpath.join(score_stage_name, os.path.basename(local_score_file_name))
score_sproc_name = "SNOWML_SCORE_{{safe_id}}".format(safe_id=self._get_rand_id())
score_sproc_name = random_name_for_temp_object(TempObjectType.PROCEDURE)
statement_params = telemetry.get_function_usage_statement_params(
project=_PROJECT,
subproject=_SUBPROJECT,
Expand Down
21 changes: 18 additions & 3 deletions requirements.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
version_requirements: ">=0.15,<2"
tags:
- build_essential
- deployment_core
# For fsspec[http] in conda
- name_conda: aiohttp
dev_version_conda: "3.8.3"
Expand Down Expand Up @@ -123,7 +124,7 @@
- build_essential
- name: mlflow
dev_version: "2.3.1"
version_requirements: ">=2.1.0,<3"
version_requirements: ">=2.1.0,<2.4"
requirements_extra_tags:
- mlflow
- name: moto
Expand Down Expand Up @@ -176,8 +177,8 @@
- name: s3fs
dev_version: "2022.11.0"
- name: scikit-learn
dev_version: "1.2.2"
version_requirements: ">=1.2.1,<1.3"
dev_version: "1.3.0"
version_requirements: ">=1.2.1,<1.4"
tags:
- build_essential
- name: scipy
Expand Down Expand Up @@ -211,6 +212,11 @@
- torch
- name: transformers
dev_version: "4.29.2"
version_requirements: ">=4.29.2,<5"
requirements_extra_tags:
- transformers
- name: types-requests
dev_version: "2.30.0.0"
- name: types-protobuf
dev_version: "4.23.0.1"
- name: types-PyYAML
Expand All @@ -226,3 +232,12 @@
version_requirements: ">=1.7.3,<2"
tags:
- build_essential
- name: types-cachetools
dev_version: "4.2.2"
from_channel: conda-forge
- name: cachetools
dev_version: "4.2.2"
# TODO: this will be a user side dep requirement
# enable when we are releasing FS.
- name: multipledispatch
dev_version: "0.6.0"
Loading