Skip to content

Commit

Permalink
Project import generated by Copybara. (#98)
Browse files Browse the repository at this point in the history
GitOrigin-RevId: e6f7fd62eecf65ab1db362c8c4d2cb8a5fb32c9a

Co-authored-by: Snowflake Authors <noreply@snowflake.com>
  • Loading branch information
sfc-gh-thoyt and Snowflake Authors committed May 1, 2024
1 parent e73e178 commit c530f5c
Show file tree
Hide file tree
Showing 120 changed files with 6,758 additions and 2,091 deletions.
63 changes: 62 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,67 @@
# Release History

## 1.4.1
## 1.5.0

### Bug Fixes

- Registry: Fix invalid parameter 'SHOW_MODEL_DETAILS_IN_SHOW_VERSIONS_IN_MODEL' error.

### Behavior Changes

- Model Development: The behavior of `fit_transform` for all estimators is changed.
Firstly, it will cover all the estimator that contains this function,
secondly, the output would be the union of pandas DataFrame and snowpark DataFrame.

#### Model Registry (PrPr)

`snowflake.ml.registry.artifact` and related `snowflake.ml.model_registry.ModelRegistry` APIs have been removed.

- Removed `snowflake.ml.registry.artifact` module.
- Removed `ModelRegistry.log_artifact()`, `ModelRegistry.list_artifacts()`, `ModelRegistry.get_artifact()`
- Removed `artifacts` argument from `ModelRegistry.log_model()`

#### Dataset (PrPr)

`snowflake.ml.dataset.Dataset` has been redesigned to be backed by Snowflake Dataset entities.

- New `Dataset`s can be created with `Dataset.create()` and existing `Dataset`s may be loaded
with `Dataset.load()`.
- `Dataset`s now maintain an immutable `selected_version` state. The `Dataset.create_version()` and
`Dataset.load_version()` APIs return new `Dataset` objects with the requested `selected_version` state.
- Added `dataset.create_from_dataframe()` and `dataset.load_dataset()` convenience APIs as a shortcut
to creating and loading `Dataset`s with a pre-selected version.
- `Dataset.materialized_table` and `Dataset.snapshot_table` no longer exist with `Dataset.fully_qualified_name`
as the closest equivalent.
- `Dataset.df` no longer exists. Instead, use `DatasetReader.read.to_snowpark_dataframe()`.
- `Dataset.owner` has been moved to `Dataset.selected_version.owner`
- `Dataset.desc` has been moved to `DatasetVersion.selected_version.comment`
- `Dataset.timestamp_col`, `Dataset.label_cols`, `Dataset.feature_store_metadata`, and
`Dataset.schema_version` have been removed.

#### Feature Store (PrPr)

`FeatureStore.generate_dataset` argument list has been changed to match the new
`snowflake.ml.dataset.Dataset` definition

- `materialized_table` has been removed and replaced with `name` and `version`.
- `name` moved to first positional argument
- `save_mode` has been removed as `merge` behavior is no longer supported. The new behavior is always `errorifexists`.

### New Features

- Registry: Add `export` method to `ModelVersion` instance to export model files.
- Registry: Add `load` method to `ModelVersion` instance to load the underlying object from the model.
- Registry: Add `Model.rename` method to `Model` instance to rename or move a model.

#### Dataset (PrPr)

- Added Snowpark DataFrame integration using `Dataset.read.to_snowpark_dataframe()`
- Added Pandas DataFrame integration using `Dataset.read.to_pandas()`
- Added PyTorch and TensorFlow integrations using `Dataset.read.to_torch_datapipe()`
and `Dataset.read.to_tf_dataset()` respectively.
- Added `fsspec` style file integration using `Dataset.read.files()` and `Dataset.read.filesystem()`

## 1.4.1 (2024-04-18)

### New Features

Expand Down
2 changes: 1 addition & 1 deletion ci/conda_recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ build:
noarch: python
package:
name: snowflake-ml-python
version: 1.4.1
version: 1.5.0
requirements:
build:
- python
Expand Down
6 changes: 6 additions & 0 deletions codegen/sklearn_wrapper_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
NP_CONSTANTS = [c for c in dir(np) if type(getattr(np, c, None)) == float or type(getattr(np, c, None)) == int]
LOAD_BREAST_CANCER = "load_breast_cancer"
LOAD_IRIS = "load_iris"
LOAD_DIGITS = "load_digits"
LOAD_DIABETES = "load_diabetes"


Expand Down Expand Up @@ -278,6 +279,7 @@ def _is_deterministic(class_object: Tuple[str, type]) -> bool:
return not (
WrapperGeneratorFactory._is_class_of_type(class_object[1], "LinearDiscriminantAnalysis")
or WrapperGeneratorFactory._is_class_of_type(class_object[1], "BernoulliRBM")
or WrapperGeneratorFactory._is_class_of_type(class_object[1], "TSNE")
)

@staticmethod
Expand Down Expand Up @@ -739,6 +741,7 @@ def _populate_function_doc_fields(self) -> None:
_METHODS = [
"fit",
"fit_predict",
"fit_transform",
"predict",
"predict_log_proba",
"predict_proba",
Expand Down Expand Up @@ -775,6 +778,7 @@ def _populate_function_doc_fields(self) -> None:
self.transform_docstring = self.estimator_function_docstring["transform"]
self.predict_docstring = self.estimator_function_docstring["predict"]
self.fit_predict_docstring = self.estimator_function_docstring["fit_predict"]
self.fit_transform_docstring = self.estimator_function_docstring["fit_transform"]
self.predict_proba_docstring = self.estimator_function_docstring["predict_proba"]
self.score_samples_docstring = self.estimator_function_docstring["score_samples"]
self.predict_log_proba_docstring = self.estimator_function_docstring["predict_log_proba"]
Expand Down Expand Up @@ -898,6 +902,8 @@ def _populate_integ_test_fields(self) -> None:
self.test_dataset_func = LOAD_BREAST_CANCER
elif self._is_regressor:
self.test_dataset_func = LOAD_DIABETES
elif WrapperGeneratorFactory._is_class_of_type(self.class_object[1], "SpectralEmbedding"):
self.test_dataset_func = LOAD_DIGITS
else:
self.test_dataset_func = LOAD_IRIS

Expand Down
95 changes: 43 additions & 52 deletions codegen/sklearn_wrapper_template.py_template
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,6 @@ _SUBPROJECT = "".join([s.capitalize() for s in "{transform.root_module_name}".re

DATAFRAME_TYPE = Union[DataFrame, pd.DataFrame]

def _is_fit_transform_method_enabled() -> Callable[[Any], bool]:
def check(self: BaseTransformer) -> TypeGuard[Callable[..., object]]:
return {transform.fit_transform_manifold_function_support} and callable(getattr(self._sklearn_object, "fit_transform", None))
return check


class {transform.original_class_name}(BaseTransformer):
r"""{transform.estimator_class_docstring}
"""
Expand Down Expand Up @@ -174,20 +168,17 @@ class {transform.original_class_name}(BaseTransformer):
self,
dataset: DataFrame,
inference_method: str,
) -> List[str]:
"""Util method to run validate that batch inference can be run on a snowpark dataframe and
return the available package that exists in the snowflake anaconda channel
) -> None:
"""Util method to run validate that batch inference can be run on a snowpark dataframe.

Args:
dataset: snowpark dataframe
inference_method: the inference method such as predict, score...

Raises:
SnowflakeMLException: If the estimator is not fitted, raise error
SnowflakeMLException: If the session is None, raise error

Returns:
A list of available package that exists in the snowflake anaconda channel
"""
if not self._is_fitted:
raise exceptions.SnowflakeMLException(
Expand All @@ -205,9 +196,7 @@ class {transform.original_class_name}(BaseTransformer):
"Session must not specified for snowpark dataset."
),
)
# Validate that key package version in user workspace are supported in snowflake conda channel
return pkg_version_utils.get_valid_pkg_versions_supported_in_snowflake_conda_channel(
pkg_versions=self._get_dependencies(), session=session, subproject=_SUBPROJECT)


@available_if(original_estimator_has_callable("predict")) # type: ignore[misc]
@telemetry.send_api_usage_telemetry(
Expand Down Expand Up @@ -246,7 +235,8 @@ class {transform.original_class_name}(BaseTransformer):

expected_type_inferred = convert_sp_to_sf_type(label_cols_signatures[0].as_snowpark_type())

self._deps = self._batch_inference_validate_snowpark(dataset=dataset, inference_method=inference_method)
self._batch_inference_validate_snowpark(dataset=dataset, inference_method=inference_method)
self._deps = self._get_dependencies()
assert isinstance(
dataset._session, Session
) # mypy does not recognize the check in _batch_inference_validate_snowpark()
Expand Down Expand Up @@ -321,10 +311,8 @@ class {transform.original_class_name}(BaseTransformer):
if all(x == output_types[0] for x in output_types) and len(output_types) == len(self.output_cols):
expected_dtype = convert_sp_to_sf_type(output_types[0])

self._deps = self._batch_inference_validate_snowpark(
dataset=dataset,
inference_method=inference_method,
)
self._batch_inference_validate_snowpark(dataset=dataset, inference_method=inference_method)
self._deps = self._get_dependencies()
assert isinstance(dataset._session, Session) # mypy does not recognize the check in _batch_inference_validate_snowpark()

transform_kwargs = dict(
Expand Down Expand Up @@ -383,16 +371,32 @@ class {transform.original_class_name}(BaseTransformer):
self._is_fitted = True
return output_result


@available_if(_is_fit_transform_method_enabled()) # type: ignore[misc]
def fit_transform(self, dataset: Union[DataFrame, pd.DataFrame]) -> Union[Any, npt.NDArray[Any]]:
@available_if(original_estimator_has_callable("fit_transform")) # type: ignore[misc]
def fit_transform(self, dataset: Union[DataFrame, pd.DataFrame], output_cols_prefix: str = "fit_transform_",) -> Union[DataFrame, pd.DataFrame]:
""" {transform.fit_transform_docstring}
output_cols_prefix: Prefix for the response columns
Returns:
Transformed dataset.
"""
self.fit(dataset)
assert self._sklearn_object is not None
return self._sklearn_object.embedding_
self._infer_input_output_cols(dataset)
super()._check_dataset_type(dataset)
model_trainer = ModelTrainerBuilder.build_fit_transform(
estimator=self._sklearn_object,
dataset=dataset,
input_cols=self.input_cols,
label_cols=self.label_cols,
sample_weight_col=self.sample_weight_col,
autogenerated=self._autogenerated,
subproject=_SUBPROJECT,
)
output_result, fitted_estimator = model_trainer.train_fit_transform(
drop_input_cols=self._drop_input_cols,
expected_output_cols_list=self.output_cols,
)
self._sklearn_object = fitted_estimator
self._is_fitted = True
return output_result


def _get_output_column_names(self, output_cols_prefix: str, output_cols: Optional[List[str]] = None) -> List[str]:
Expand Down Expand Up @@ -475,10 +479,8 @@ class {transform.original_class_name}(BaseTransformer):
expected_output_cols = self._get_output_column_names(output_cols_prefix)

if isinstance(dataset, DataFrame):
self._deps = self._batch_inference_validate_snowpark(
dataset=dataset,
inference_method=inference_method,
)
self._batch_inference_validate_snowpark(dataset=dataset, inference_method=inference_method)
self._deps = self._get_dependencies()
assert isinstance(
dataset._session, Session
) # mypy does not recognize the check in _batch_inference_validate_snowpark()
Expand Down Expand Up @@ -535,10 +537,8 @@ class {transform.original_class_name}(BaseTransformer):
transform_kwargs: BatchInferenceKwargsTypedDict = dict()

if isinstance(dataset, DataFrame):
self._deps = self._batch_inference_validate_snowpark(
dataset=dataset,
inference_method=inference_method,
)
self._batch_inference_validate_snowpark(dataset=dataset, inference_method=inference_method)
self._deps = self._get_dependencies()
assert isinstance(
dataset._session, Session
) # mypy does not recognize the check in _batch_inference_validate_snowpark()
Expand Down Expand Up @@ -592,10 +592,8 @@ class {transform.original_class_name}(BaseTransformer):
expected_output_cols = self._get_output_column_names(output_cols_prefix)

if isinstance(dataset, DataFrame):
self._deps = self._batch_inference_validate_snowpark(
dataset=dataset,
inference_method=inference_method,
)
self._batch_inference_validate_snowpark(dataset=dataset, inference_method=inference_method)
self._deps = self._get_dependencies()
assert isinstance(
dataset._session, Session
) # mypy does not recognize the check in _batch_inference_validate_snowpark()
Expand Down Expand Up @@ -653,10 +651,8 @@ class {transform.original_class_name}(BaseTransformer):
expected_output_cols = self._get_output_column_names(output_cols_prefix)

if isinstance(dataset, DataFrame):
self._deps = self._batch_inference_validate_snowpark(
dataset=dataset,
inference_method=inference_method,
)
self._batch_inference_validate_snowpark(dataset=dataset, inference_method=inference_method)
self._deps = self._get_dependencies()
assert isinstance(dataset._session, Session) # mypy does not recognize the check in _batch_inference_validate_snowpark()
transform_kwargs = dict(
session=dataset._session,
Expand Down Expand Up @@ -710,17 +706,15 @@ class {transform.original_class_name}(BaseTransformer):
transform_kwargs: ScoreKwargsTypedDict = dict()

if isinstance(dataset, DataFrame):
self._deps = self._batch_inference_validate_snowpark(
dataset=dataset,
inference_method="score",
)
self._batch_inference_validate_snowpark(dataset=dataset, inference_method="score")
self._deps = self._get_dependencies()
selected_cols = self._get_active_columns()
if len(selected_cols) > 0:
dataset = dataset.select(selected_cols)
assert isinstance(dataset._session, Session) # keep mypy happy
transform_kwargs = dict(
session=dataset._session,
dependencies=["snowflake-snowpark-python"] + self._deps,
dependencies=self._deps,
score_sproc_imports={transform.score_sproc_imports},
)
elif isinstance(dataset, pd.DataFrame):
Expand Down Expand Up @@ -778,11 +772,8 @@ class {transform.original_class_name}(BaseTransformer):
if isinstance(dataset, DataFrame):
# TODO: Solve inconsistent neigh_ind with sklearn due to different precisions in case of close distances.

self._deps = self._batch_inference_validate_snowpark(
dataset=dataset,
inference_method=inference_method,

)
self._batch_inference_validate_snowpark(dataset=dataset, inference_method=inference_method)
self._deps = self._get_dependencies()
assert isinstance(dataset._session, Session) # mypy does not recognize the check in _batch_inference_validate_snowpark()
transform_kwargs = dict(
session = dataset._session,
Expand Down
12 changes: 8 additions & 4 deletions codegen/transformer_autogen_test_template.py_template
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ from snowflake.ml.utils.connection_params import SnowflakeLoginOptions
from snowflake.snowpark import Session, DataFrame



class {transform.test_class_name}(TestCase):
def setUp(self) -> None:
"""Creates Snowpark and Snowflake environments for testing."""
Expand Down Expand Up @@ -125,7 +124,7 @@ class {transform.test_class_name}(TestCase):

sklearn_reg.fit(**args)

inference_methods = ["transform", "predict", "fit_predict"]
inference_methods = ["transform", "predict", "fit_predict", "fit_transform"]
for m in inference_methods:
if callable(getattr(sklearn_reg, m, None)):
if m == 'predict':
Expand All @@ -151,7 +150,7 @@ class {transform.test_class_name}(TestCase):

# TODO(snandamuri): Implement type inference for transform and predict methods to return results with
# correct datatype.
if m == 'transform':
if m == 'transform' or m == 'fit_transform':
actual_arr = output_df_pandas.astype("float64").to_numpy()
else:
actual_arr = output_df_pandas.to_numpy()
Expand All @@ -163,7 +162,12 @@ class {transform.test_class_name}(TestCase):
]
actual_arr = output_df_pandas[actual_output_cols].to_numpy()

sklearn_numpy_arr = getattr(sklearn_reg, m)(input_df_pandas[input_cols])
if m == 'fit_transform':
sklearn_numpy_arr = sklearn_reg.fit_transform(**args)
else:
sklearn_numpy_arr = getattr(sklearn_reg, m)(input_df_pandas[input_cols])


if len(sklearn_numpy_arr.shape) == 3:
# VotingClassifier will return results of shape (n_classifiers, n_samples, n_classes)
# when voting = "soft" and flatten_transform = False. We can't handle unflatten transforms,
Expand Down
1 change: 0 additions & 1 deletion snowflake/ml/_internal/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ py_library(
srcs = ["env_utils.py"],
deps = [
":env",
"//snowflake/ml/_internal/exceptions",
"//snowflake/ml/_internal/utils:query_result_checker",
"//snowflake/ml/_internal/utils:retryable_http",
],
Expand Down
Loading

0 comments on commit c530f5c

Please sign in to comment.