Pmdarima flavor #5373

BenWilson2 · 2022-02-11T23:36:48Z

What changes are proposed in this pull request?

Add pmdarima flavor

How is this patch tested?

unit tests

Does this PR change the documentation?

No. You can skip the rest of this section.
Yes. Make sure the changed pages / sections render correctly by following the steps below.

Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
next step, otherwise fix it.
Click Details on the right to open the job page of CircleCI.
Click the Artifacts tab.
Click docs/build/html/index.html.
Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

Adds the native pmdarima flavor to MLflow.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

dbczumar

@BenWilson2 Looks great! Just left a few comments, should be good to go after!

dbczumar · 2022-02-16T02:01:40Z

docs/source/models.rst

+    When predicting a ``pmdarima`` flavor, the ``predict`` method argument ``return_conf_int`` will control the
+    output format. When set to ``False`` or ``None`` (which is the default), the return type will be a
+    ``pandas.series``. When set to ``True``, the return type a ``tuple[pandas.Series, tuple[pandas.Series]]``.


Suggested change

When predicting a ``pmdarima`` flavor, the ``predict`` method argument ``return_conf_int`` will control the

output format. When set to ``False`` or ``None`` (which is the default), the return type will be a

``pandas.series``. When set to ``True``, the return type a ``tuple[pandas.Series, tuple[pandas.Series]]``.

When predicting a ``pmdarima`` flavor, the ``predict`` method argument ``return_conf_int`` controls the

output format. When set to ``False`` or ``None`` (which is the default), the return type is a

``pandas.series``. When set to ``True``, the return type is a ``tuple[pandas.Series, tuple[pandas.Series]]``.

Can we also explain what each data type represents?

Is this prediction signature compatible with future pmdarima API wrappers . extensions that we may want to introduce?

For other wrappers that we'll be building the return type for either a) without confidence intervals (pd.Series) and b) with confidence intervals (tuple[pd.Series, tuple[pd.Series]]) would be unified as a pd.DataFrame output.
Do you think that we should make make the "b" scenario a pd.DataFrame return type and handle the transformation for users here?

Given that the pyfunc model inference format doesn't support tuple outputs (https://github.com/mlflow/mlflow/pull/5370/files#r808409658), I think it makes sense to use a dataframe for (b). In fact, can we use a dataframe for both (a) and (b) and add additional column(s) to the dataframe for (b)?

100%. Will implement this

dbczumar · 2022-02-16T06:05:13Z

dev/run-python-flavor-tests.sh

@@ -17,6 +17,7 @@ pytest tests/h2o --large
 pytest tests/shap --large
 pytest tests/paddle --large
 pytest tests/prophet --large
+pytest tests/pmdarima --large


Can we add these to cross version tests instead, since pmdarima, unlike Prophet, is maintained & somewhat frequently updated?

definitely! added.

@BenWilson2 Thanks! Can we remove this line from dev/run-python-flavor-tests.sh now that we've added pmdarima to cross version tests?

great catch. Since the prophet flavor is part of x-version testing in master now I'll remove that here as well.

mlflow/pmdarima.py

dbczumar · 2022-02-16T06:10:32Z

docs/source/models.rst

+.. note::
+    When predicting a ``pmdarima`` flavor, the ``predict`` method argument ``return_conf_int`` will control the
+    output format. When set to ``False`` or ``None`` (which is the default), the return type will be a


Can we provide an example input that populates all of the supported arguments? I know we don't do a good job of this for other flavors, but this is a great opportunity to start!

Added an example showing the primary config for the input df signature to the pyfunc .predict() method. I'm leaving the X parameter out since none of the common open source datasets support exogeneous regressors, pmdarima is deprecating that functionality in a near-future release, and it's just really confusing to most users of these libraries to use exog regressor elements.
I'm providing an example output conversion for the confidence interval output to convert it to a DataFrame to illustrate what to do with the model's output as well.

dbczumar · 2022-02-16T06:11:45Z

mlflow/pmdarima.py

+    :param pmdarima_model:
+    :param path:
+    :param conda_env:
+    :param mlflow_model:
+    :param signature:
+    :param input_example:
+    :param pip_requirements:
+    :param extra_pip_requirements:


Can we complete these parameter docstrings and add a description of the overall method?

wow. yikes. totally missed those. On it!

dbczumar · 2022-02-16T06:12:35Z

mlflow/pmdarima.py

+    :param pmdarima_model:
+    :param artifact_path:
+    :param conda_env:
+    :param registered_model_name:
+    :param signature:
+    :param input_example:
+    :param await_registration_for:
+    :param pip_requirements:
+    :param extra_pip_requirements:
+    :param kwargs:
+
+    :return:


Can we complete these parameter docstrings and add a description of the overall method?

dbczumar · 2022-02-16T06:12:43Z

mlflow/pmdarima.py

+    """
+    :param model_uri:
+    :param dst_path:
+    """


Can we complete these parameter docstrings and add a description of the overall method?

dbczumar · 2022-02-16T06:19:02Z

examples/pmdarima/conda.yaml

+  - pip
+  - pip:
+    - pmdarima
+    - mlflow>=1.23.0


MLflow 1.23.0 doesn't have this feature. I think we need mflow>1.23.1.

dbczumar · 2022-02-16T06:20:00Z

examples/pmdarima/train.py

@@ -0,0 +1,57 @@
+import mlflow


This takes quite awhile to run (several minutes). Can we make the workload smaller and add some print statement progress indicators prior to the training portion (e.g. "constructing model...")?

reduced the runtime to ~ 5 seconds and added additional status messages.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

BenWilson2 · 2022-02-17T15:43:49Z

Docs rendering verified for table visualization

harupy · 2022-02-21T02:43:38Z

mlflow/ml-package-versions.yml

+  models:
+    minimum: "1.8.4"
+    maximum: "1.8.4"
+    requirements: ["prophet"]


Curious why we need prophet.

It's just so that we can use the data generators in that test module without having to copy the generator into this module.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

…ts for dev branch Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

harupy · 2022-02-22T02:37:41Z

tests/pmdarima/test_pmdarima_model_export.py

+    # NB: dev version of pmdarima uses a __version__ tag of "0.0.0" which will fail the env build.
+    # To be as safe as possible in cross version tests, use the latest released version when
+    # testing dev pmdarima.
+
+    def get_latest_version():
+        ver = pmdarima.__version__
+        if ver == "0.0.0":
+            url = "https://pypi.org/pypi/pmdarima/json"
+            resp = requests.get(url).json()
+            releases = resp["releases"].keys()
+            return sorted(releases, key=parse_version, reverse=True)[0]
+        else:
+            return ver


Can we use --no-conda flag instead?

Example:

mlflow/tests/xgboost/test_xgboost_model_export.py

Line 35 in b81a1d3

EXTRA_PYFUNC_SERVING_TEST_ARGS = [] if _is_available_on_pypi("xgboost") else ["--no-conda"]

@BenWilson2 Can we address this comment?

most definitely. Thank you for the tip @harupy !

dbczumar · 2022-02-23T06:57:23Z

docs/source/models.rst

+model flavor in native pmdarima formats.
+
+.. note::
+    When predicting a ``pmdarima`` flavor, the ``predict`` method argument ``return_conf_int`` controls the


Nit: return_conf_int isn't a method argument, right? It's an optional column in the input pandas dataframe.

very good point. Changed this note to make it far more clear what is going on with this configuration (and that by omitting the column in the configuration argument DataFrame, the default will be False for calculating CI values.

dbczumar · 2022-02-23T06:58:29Z

docs/source/models.rst

+format via the :py:func:`mlflow.pmdarima.save_model()` and :py:func:`mlflow.pmdarima.log_model()` methods.
+These methods also add the ``python_function`` flavor to the MLflow Models that they produce, allowing the
+model to be interpreted as generic Python functions for inference via :py:func:`mlflow.pyfunc.load_model()`.
+This loaded PyFunc model can only be scored with a DataFrame input.


Can we provide an overview of the accepted DataFrame input columns and length (i.e. single-row DataFrames)?

Added a table, a listing of what the values mean, and warnings for ensuring that it's a single row DF and that n_periods is not optional.

dbczumar · 2022-02-23T07:04:33Z

mlflow/ml-package-versions.yml

+      pip install $CACHE_DIR/pmdarima-*.whl
+
+  models:
+    minimum: "1.8.4"


We seem to be compatible with pmdarima < 1.8.4 based on your warning logic for the case where exogenous regressors are specified with pmdarima < 1.8.0. Can we drop the minimum version down to 1.8.0, providing support for all versions of pmdarima released within the last year (+ December 2020)?

changed and upped the cross version test to 1.8.5 (released 2 days ago)

dbczumar · 2022-02-23T07:06:29Z

mlflow/pmdarima.py

+
+
+@format_docstring(LOG_MODEL_PARAM_DOCS.format(package_name=FLAVOR_NAME))
+def save_model(


Can we add the @experimental decorator to the public methods in this module?

dbczumar · 2022-02-23T07:06:50Z

docs/source/models.rst

@@ -784,6 +784,60 @@ method to load MLflow Models with the ``prophet`` model flavor in native prophet

 For more information, see :py:mod:`mlflow.prophet`.

+Pmdarima (``pmdarima``)


Can we indicate that this is experimental?

Suggested change

Pmdarima (``pmdarima``)

Pmdarima (``pmdarima``) (Experimental)

dbczumar

@BenWilson2 Looks fantastic! Should be ready to go after all remaining comments are addressed!

harupy · 2022-02-24T03:21:12Z

tests/pmdarima/test_pmdarima_model_export.py

+    assert len(forecast_with_ci == 10)
+    assert len(forecast_with_ci.columns.values == 3)


Suggested change

assert len(forecast_with_ci == 10)

assert len(forecast_with_ci.columns.values == 3)

assert len(forecast_with_ci) == 10

assert len(forecast_with_ci.columns.values) == 3

yikes that's quite a typo. Fixed.

harupy · 2022-02-24T03:21:25Z

tests/pmdarima/test_pmdarima_model_export.py

+    assert len(forecast_no_ci == 10)
+    assert len(forecast_no_ci.columns.values == 1)


Suggested change

assert len(forecast_no_ci == 10)

assert len(forecast_no_ci.columns.values == 1)

assert len(forecast_no_ci) == 10

assert len(forecast_no_ci.columns.values) == 1

harupy · 2022-02-24T10:22:41Z

examples/pmdarima/conda.yaml

+  - pip
+  - pip:
+    - pmdarima
+    - mlflow>=1.23.1


nit: can we insert a newline?

harupy · 2022-02-24T10:23:08Z

mlflow/ml-package-versions.yml

+    maximum: "1.8.4"
+    requirements: ["prophet"]
+    run: |
+      pytest tests/pmdarima/test_pmdarima_model_export.py --large


nit: can we insert a newline?

harupy · 2022-02-24T10:29:12Z

mlflow/pmdarima.py

+            raise MlflowException(
+                f"The provided prediction pd.DataFrame {dataframe.to_string()} "
+                f"contains {len(dataframe)} rows. Only 1 row should be supplied."
+            )


This line prints out an error message that looks like this:

The provided prediction pd.DataFrame a b 0 1 3 1 2 4 contains 2 rows. Only 1 row should be supplied.

Can we remove dataframe.to_string() and say the provided prediction pd.DataFrame contains ...?

removed and cleaned up the wording

harupy · 2022-02-24T10:46:37Z

mlflow/pmdarima.py

+        # `X` entries as a 2D array structure to the predict method.
+        exogoneous_regressor = attrs.get("X", None)
+
+        if exogoneous_regressor and self._pmdarima_version < "1.8.0":


Suggested change

if exogoneous_regressor and self._pmdarima_version < "1.8.0":

if exogoneous_regressor and Version(self._pmdarima_version) < Version("1.8.0"):

Can we use packaging.version.Version here to compare versions correctly?

>>> from packaging.version import Version >>> "1.8.0.rc1" > "1.8.0" True >>> Version("1.8.0.rc1") > Version("1.8.0") False

great point. Updated.

harupy · 2022-02-24T11:10:38Z

mlflow/pmdarima.py

+        if not set(df_schema).issubset(schema):
+            raise MlflowException(
+                f"The provided schema {df_schema} contains invalid columns. "
+                f"Columns must be part of: {schema}"
+            )


The current implementation throws an exception if the given dataframe looks like this:

pd.DataFrame( { # valid columns "n_periods": [0], "X": [0], "return_conf_int": [0], "alpha": [0], # invalid columns "foo": [0], } )

Does the foo column cause any problems in the subsequent process? It appears the current implementation ignores it.

Changed this implementation and simplified the validation check (only n_periods is required so we only throw if we can't get a valid value there). Any extra elements in the passed in DF will be ignored.

harupy · 2022-02-24T11:36:33Z

mlflow/pmdarima.py

+                      .. code-block:: py
+
+                      from mlflow.models.signature import infer_signature
+
+                      model = pmdarima.auto_arima(data)
+                      predictions = model.predict(n_periods=30, return_conf_int=False)
+                      signature = infer_signature(data, predictions)


Suggested change

.. code-block:: py

from mlflow.models.signature import infer_signature

model = pmdarima.auto_arima(data)

predictions = model.predict(n_periods=30, return_conf_int=False)

signature = infer_signature(data, predictions)

.. code-block:: py

from mlflow.models.signature import infer_signature

model = pmdarima.auto_arima(data)

predictions = model.predict(n_periods=30, return_conf_int=False)

signature = infer_signature(data, predictions)

Can we indent this block?

harupy

Left some comments, otherwise LGTM!

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

…ma-flavor Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

docs/source/models.rst

dbczumar

LGTM once note order is swapped!

…flow Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

BenWilson2 added 5 commits February 7, 2022 09:48

wip

f6aff8f

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

WIP

6f724c7

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

WIP

2caf9a6

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

lint fix

7e44409

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

pmdarima flavor implementation

84c2f65

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

github-actions bot added area/docs Documentation issues area/examples Example code area/models MLmodel format, model serialization/deserialization, flavors rn/feature Mention under Features in Changelogs. labels Feb 11, 2022

BenWilson2 added 3 commits February 11, 2022 19:06

exclude pmdarima from small test suite

3b68b02

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

convert serving pd series type to numpy for equivalency validation

5931b3f

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

adjust the return format for serve and score

b5e36a3

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

BenWilson2 requested review from harupy, WeichenXu123 and dbczumar February 12, 2022 19:15

dbczumar reviewed Feb 16, 2022

View reviewed changes

BenWilson2 added 6 commits February 16, 2022 15:58

PR feedback fixes and additional validations

c7f86d2

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

Change pyfunc predict return type to DataFrame and update docs

e540467

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

Fix doc table span

b9d68d2

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

fix cross version tests

bc8f127

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

fix dupe import

9f7ab87

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

fix import statements

109f5f9

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

BenWilson2 requested a review from dbczumar February 17, 2022 15:43

harupy reviewed Feb 21, 2022

View reviewed changes

harupy added the enable-dev-tests Enables cross-version tests for dev versions label Feb 21, 2022

BenWilson2 added 4 commits February 21, 2022 11:19

build whl file for cross version tests dev version

bd86359

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

fix typo in ml-package-versions.yml

82622aa

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

fix serving container version issue for pmdarima in cross version tes…

ebdac72

…ts for dev branch Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

remove commented out line

ab8da42

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

harupy reviewed Feb 22, 2022

View reviewed changes

dbczumar reviewed Feb 23, 2022

View reviewed changes

harupy reviewed Feb 24, 2022

View reviewed changes

harupy approved these changes Feb 24, 2022

View reviewed changes

BenWilson2 added 3 commits February 24, 2022 13:33

PR feedback

2a45373

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

Merge branch 'master' of https://github.com/mlflow/mlflow into pmdari…

479e868

…ma-flavor Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

fix annoying variable name typo

6ff6ad0

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

BenWilson2 commented Feb 25, 2022

View reviewed changes

docs/source/models.rst Show resolved Hide resolved

dbczumar approved these changes Feb 25, 2022

View reviewed changes

relocate note in docs to a more natural position for readability and …

43a2e20

…flow Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

BenWilson2 merged commit 3973283 into master Feb 25, 2022



		@format_docstring(LOG_MODEL_PARAM_DOCS.format(package_name=FLAVOR_NAME))
		def save_model(

		@@ -784,6 +784,60 @@ method to load MLflow Models with the ``prophet`` model flavor in native prophet

		For more information, see :py:mod:`mlflow.prophet`.

		Pmdarima (``pmdarima``)

	Pmdarima (``pmdarima``)
	Pmdarima (``pmdarima``) (Experimental)

		assert len(forecast_with_ci == 10)
		assert len(forecast_with_ci.columns.values == 3)

		assert len(forecast_no_ci == 10)
		assert len(forecast_no_ci.columns.values == 1)

	if exogoneous_regressor and self._pmdarima_version < "1.8.0":
	if exogoneous_regressor and Version(self._pmdarima_version) < Version("1.8.0"):

Pmdarima flavor #5373

Pmdarima flavor #5373

Conversation

BenWilson2 commented Feb 11, 2022

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change the documentation?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

dbczumar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenWilson2 commented Feb 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Feb 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Feb 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Feb 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Feb 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy left a comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

harupy Feb 24, 2022 •

edited

Loading

harupy Feb 24, 2022 •

edited

Loading

harupy Feb 24, 2022 •

edited

Loading

harupy Feb 24, 2022 •

edited

Loading