-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pmdarima flavor #5373
Pmdarima flavor #5373
Conversation
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenWilson2 Looks great! Just left a few comments, should be good to go after!
docs/source/models.rst
Outdated
When predicting a ``pmdarima`` flavor, the ``predict`` method argument ``return_conf_int`` will control the | ||
output format. When set to ``False`` or ``None`` (which is the default), the return type will be a | ||
``pandas.series``. When set to ``True``, the return type a ``tuple[pandas.Series, tuple[pandas.Series]]``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When predicting a ``pmdarima`` flavor, the ``predict`` method argument ``return_conf_int`` will control the | |
output format. When set to ``False`` or ``None`` (which is the default), the return type will be a | |
``pandas.series``. When set to ``True``, the return type a ``tuple[pandas.Series, tuple[pandas.Series]]``. | |
When predicting a ``pmdarima`` flavor, the ``predict`` method argument ``return_conf_int`` controls the | |
output format. When set to ``False`` or ``None`` (which is the default), the return type is a | |
``pandas.series``. When set to ``True``, the return type is a ``tuple[pandas.Series, tuple[pandas.Series]]``. |
Can we also explain what each data type represents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this prediction signature compatible with future pmdarima API wrappers . extensions that we may want to introduce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other wrappers that we'll be building the return type for either a) without confidence intervals (pd.Series) and b) with confidence intervals (tuple[pd.Series, tuple[pd.Series]]) would be unified as a pd.DataFrame output.
Do you think that we should make make the "b" scenario a pd.DataFrame return type and handle the transformation for users here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the pyfunc model inference format doesn't support tuple outputs (https://github.com/mlflow/mlflow/pull/5370/files#r808409658), I think it makes sense to use a dataframe for (b). In fact, can we use a dataframe for both (a) and (b) and add additional column(s) to the dataframe for (b)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%. Will implement this
dev/run-python-flavor-tests.sh
Outdated
@@ -17,6 +17,7 @@ pytest tests/h2o --large | |||
pytest tests/shap --large | |||
pytest tests/paddle --large | |||
pytest tests/prophet --large | |||
pytest tests/pmdarima --large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add these to cross version tests instead, since pmdarima, unlike Prophet, is maintained & somewhat frequently updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely! added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenWilson2 Thanks! Can we remove this line from dev/run-python-flavor-tests.sh
now that we've added pmdarima to cross version tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch. Since the prophet flavor is part of x-version testing in master now I'll remove that here as well.
docs/source/models.rst
Outdated
.. note:: | ||
When predicting a ``pmdarima`` flavor, the ``predict`` method argument ``return_conf_int`` will control the | ||
output format. When set to ``False`` or ``None`` (which is the default), the return type will be a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we provide an example input that populates all of the supported arguments? I know we don't do a good job of this for other flavors, but this is a great opportunity to start!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example showing the primary config for the input df signature to the pyfunc .predict()
method. I'm leaving the X
parameter out since none of the common open source datasets support exogeneous regressors, pmdarima is deprecating that functionality in a near-future release, and it's just really confusing to most users of these libraries to use exog regressor elements.
I'm providing an example output conversion for the confidence interval output to convert it to a DataFrame to illustrate what to do with the model's output as well.
mlflow/pmdarima.py
Outdated
:param pmdarima_model: | ||
:param path: | ||
:param conda_env: | ||
:param mlflow_model: | ||
:param signature: | ||
:param input_example: | ||
:param pip_requirements: | ||
:param extra_pip_requirements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we complete these parameter docstrings and add a description of the overall method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow. yikes. totally missed those. On it!
mlflow/pmdarima.py
Outdated
:param pmdarima_model: | ||
:param artifact_path: | ||
:param conda_env: | ||
:param registered_model_name: | ||
:param signature: | ||
:param input_example: | ||
:param await_registration_for: | ||
:param pip_requirements: | ||
:param extra_pip_requirements: | ||
:param kwargs: | ||
|
||
:return: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we complete these parameter docstrings and add a description of the overall method?
mlflow/pmdarima.py
Outdated
""" | ||
:param model_uri: | ||
:param dst_path: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we complete these parameter docstrings and add a description of the overall method?
examples/pmdarima/conda.yaml
Outdated
- pip | ||
- pip: | ||
- pmdarima | ||
- mlflow>=1.23.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MLflow 1.23.0
doesn't have this feature. I think we need mflow>1.23.1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
@@ -0,0 +1,57 @@ | |||
import mlflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes quite awhile to run (several minutes). Can we make the workload smaller and add some print statement progress indicators prior to the training portion (e.g. "constructing model...")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduced the runtime to ~ 5 seconds and added additional status messages.
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Docs rendering verified for table visualization |
models: | ||
minimum: "1.8.4" | ||
maximum: "1.8.4" | ||
requirements: ["prophet"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why we need prophet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just so that we can use the data generators in that test module without having to copy the generator into this module.
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…ts for dev branch Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
# NB: dev version of pmdarima uses a __version__ tag of "0.0.0" which will fail the env build. | ||
# To be as safe as possible in cross version tests, use the latest released version when | ||
# testing dev pmdarima. | ||
|
||
def get_latest_version(): | ||
ver = pmdarima.__version__ | ||
if ver == "0.0.0": | ||
url = "https://pypi.org/pypi/pmdarima/json" | ||
resp = requests.get(url).json() | ||
releases = resp["releases"].keys() | ||
return sorted(releases, key=parse_version, reverse=True)[0] | ||
else: | ||
return ver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use --no-conda
flag instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example:
EXTRA_PYFUNC_SERVING_TEST_ARGS = [] if _is_available_on_pypi("xgboost") else ["--no-conda"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenWilson2 Can we address this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most definitely. Thank you for the tip @harupy !
docs/source/models.rst
Outdated
model flavor in native pmdarima formats. | ||
|
||
.. note:: | ||
When predicting a ``pmdarima`` flavor, the ``predict`` method argument ``return_conf_int`` controls the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: return_conf_int
isn't a method argument, right? It's an optional column in the input pandas dataframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good point. Changed this note to make it far more clear what is going on with this configuration (and that by omitting the column in the configuration argument DataFrame, the default will be False for calculating CI values.
format via the :py:func:`mlflow.pmdarima.save_model()` and :py:func:`mlflow.pmdarima.log_model()` methods. | ||
These methods also add the ``python_function`` flavor to the MLflow Models that they produce, allowing the | ||
model to be interpreted as generic Python functions for inference via :py:func:`mlflow.pyfunc.load_model()`. | ||
This loaded PyFunc model can only be scored with a DataFrame input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we provide an overview of the accepted DataFrame input columns and length (i.e. single-row DataFrames)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a table, a listing of what the values mean, and warnings for ensuring that it's a single row DF and that n_periods is not optional.
mlflow/ml-package-versions.yml
Outdated
pip install $CACHE_DIR/pmdarima-*.whl | ||
|
||
models: | ||
minimum: "1.8.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be compatible with pmdarima < 1.8.4 based on your warning logic for the case where exogenous regressors are specified with pmdarima < 1.8.0
. Can we drop the minimum version down to 1.8.0, providing support for all versions of pmdarima released within the last year (+ December 2020)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed and upped the cross version test to 1.8.5 (released 2 days ago)
|
||
|
||
@format_docstring(LOG_MODEL_PARAM_DOCS.format(package_name=FLAVOR_NAME)) | ||
def save_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the @experimental
decorator to the public methods in this module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
docs/source/models.rst
Outdated
@@ -784,6 +784,60 @@ method to load MLflow Models with the ``prophet`` model flavor in native prophet | |||
|
|||
For more information, see :py:mod:`mlflow.prophet`. | |||
|
|||
Pmdarima (``pmdarima``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we indicate that this is experimental?
Pmdarima (``pmdarima``) | |
Pmdarima (``pmdarima``) (Experimental) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenWilson2 Looks fantastic! Should be ready to go after all remaining comments are addressed!
assert len(forecast_with_ci == 10) | ||
assert len(forecast_with_ci.columns.values == 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert len(forecast_with_ci == 10) | |
assert len(forecast_with_ci.columns.values == 3) | |
assert len(forecast_with_ci) == 10 | |
assert len(forecast_with_ci.columns.values) == 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yikes that's quite a typo. Fixed.
assert len(forecast_no_ci == 10) | ||
assert len(forecast_no_ci.columns.values == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert len(forecast_no_ci == 10) | |
assert len(forecast_no_ci.columns.values == 1) | |
assert len(forecast_no_ci) == 10 | |
assert len(forecast_no_ci.columns.values) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed typo
examples/pmdarima/conda.yaml
Outdated
- pip | ||
- pip: | ||
- pmdarima | ||
- mlflow>=1.23.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we insert a newline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
mlflow/ml-package-versions.yml
Outdated
maximum: "1.8.4" | ||
requirements: ["prophet"] | ||
run: | | ||
pytest tests/pmdarima/test_pmdarima_model_export.py --large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we insert a newline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
mlflow/pmdarima.py
Outdated
raise MlflowException( | ||
f"The provided prediction pd.DataFrame {dataframe.to_string()} " | ||
f"contains {len(dataframe)} rows. Only 1 row should be supplied." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line prints out an error message that looks like this:
The provided prediction pd.DataFrame a b
0 1 3
1 2 4 contains 2 rows. Only 1 row should be supplied.
Can we remove dataframe.to_string()
and say the provided prediction pd.DataFrame contains ...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed and cleaned up the wording
mlflow/pmdarima.py
Outdated
# `X` entries as a 2D array structure to the predict method. | ||
exogoneous_regressor = attrs.get("X", None) | ||
|
||
if exogoneous_regressor and self._pmdarima_version < "1.8.0": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if exogoneous_regressor and self._pmdarima_version < "1.8.0": | |
if exogoneous_regressor and Version(self._pmdarima_version) < Version("1.8.0"): |
Can we use packaging.version.Version
here to compare versions correctly?
>>> from packaging.version import Version
>>> "1.8.0.rc1" > "1.8.0"
True
>>> Version("1.8.0.rc1") > Version("1.8.0")
False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point. Updated.
mlflow/pmdarima.py
Outdated
if not set(df_schema).issubset(schema): | ||
raise MlflowException( | ||
f"The provided schema {df_schema} contains invalid columns. " | ||
f"Columns must be part of: {schema}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation throws an exception if the given dataframe looks like this:
pd.DataFrame(
{
# valid columns
"n_periods": [0],
"X": [0],
"return_conf_int": [0],
"alpha": [0],
# invalid columns
"foo": [0],
}
)
Does the foo
column cause any problems in the subsequent process? It appears the current implementation ignores it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this implementation and simplified the validation check (only n_periods is required so we only throw if we can't get a valid value there). Any extra elements in the passed in DF will be ignored.
mlflow/pmdarima.py
Outdated
.. code-block:: py | ||
|
||
from mlflow.models.signature import infer_signature | ||
|
||
model = pmdarima.auto_arima(data) | ||
predictions = model.predict(n_periods=30, return_conf_int=False) | ||
signature = infer_signature(data, predictions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. code-block:: py | |
from mlflow.models.signature import infer_signature | |
model = pmdarima.auto_arima(data) | |
predictions = model.predict(n_periods=30, return_conf_int=False) | |
signature = infer_signature(data, predictions) | |
.. code-block:: py | |
from mlflow.models.signature import infer_signature | |
model = pmdarima.auto_arima(data) | |
predictions = model.predict(n_periods=30, return_conf_int=False) | |
signature = infer_signature(data, predictions) |
Can we indent this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, otherwise LGTM!
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…ma-flavor Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once note order is swapped!
…flow Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
What changes are proposed in this pull request?
Add pmdarima flavor
How is this patch tested?
unit tests
Does this PR change the documentation?
ci/circleci: build_doc
check. If it's successful, proceed to thenext step, otherwise fix it.
Details
on the right to open the job page of CircleCI.Artifacts
tab.docs/build/html/index.html
.Release Notes
Is this a user-facing change?
Adds the native pmdarima flavor to MLflow.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes