Skip to content

Commit

Permalink
Merge branch 'master' into branch-2.3
Browse files Browse the repository at this point in the history
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
  • Loading branch information
BenWilson2 committed Apr 26, 2023
2 parents 6e05102 + cef03da commit b7d8406
Show file tree
Hide file tree
Showing 73 changed files with 3,118 additions and 1,005 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/r.yml
Expand Up @@ -36,6 +36,10 @@ jobs:
- uses: ./.github/actions/setup-pyenv
- uses: ./.github/actions/setup-java
- uses: r-lib/actions/setup-r@v2
# Note: the version of R released on 4/21/23, 4.3.0, has build issues with devtools.
# Remove this version pin once issues on dependent packages have been fixed.
with:
r-version: "4.2.3"
# This step dumps the current set of R dependencies and R version into files to be used
# as a cache key when caching/restoring R dependencies.
- name: Dump dependencies
Expand Down
44 changes: 44 additions & 0 deletions CHANGELOG.md
@@ -1,5 +1,49 @@
# CHANGELOG

## 2.3.0 (2023-04-18)

MLflow 2.3.0 includes several major features and improvements

Features:

- [Models] Introduce a new `transformers` named flavor (#8236, #8181, #8086, @BenWilson2)
- [Models] Introduce a new `openai` named flavor (#8191, #8155, @harupy)
- [Models] Introduce a new `langchain` named flavor (#8251, #8197, @liangz1, @sunishsheth2009)
- [Models] Add support for `Pytorch` and `Lightning` 2.0 (#8072, @shrinath-suresh)
- [Tracking] Add support for logging LLM input, output, and prompt artifacts (#8234, #8204, @sunishsheth2009)
- [Tracking] Add support for HTTP Basic Auth in the MLflow tracking server (#8130, @gabrielfu)
- [Tracking] Add `search_model_versions` to the fluent API (#8223, @mariusschlegel)
- [Artifacts] Add support for parallelized artifact downloads (#8116, @apurva-koti)
- [Artifacts] Add support for parallelized artifact uploads for AWS (#8003, @harupy)
- [Artifacts] Add content type headers to artifact upload requests for the `HttpArtifactRepository` (#8048, @WillEngler)
- [Model Registry] Add alias support for logged models within Model Registry (#8164, #8094, #8055 @arpitjasa-db)
- [UI] Add support for custom domain git providers (#7933, @gusghrlrl101)
- [Scoring] Add plugin support for customization of MLflow serving endpoints (#7757, @jmahlik)
- [Scoring] Add support to MLflow serving that allows configuration of multiple inference workers (#8035, @M4nouel)
- [Sagemaker] Add support for asynchronous inference configuration on Sagemaker (#8009, @thomasbell1985)
- [Build] Remove `shap` as a core dependency of MLflow (#8199, @jmahlik)

Bug fixes:

- [Models] Fix a bug with `tensorflow` autologging for models with multiple inputs (#8097, @jaume-ferrarons)
- [Recipes] Fix a bug with `Pandas` 2.0 updates for profiler rendering of datetime types (#7925, @sunishsheth2009)
- [Tracking] Prevent exceptions from being raised if a parameter is logged with an existing key whose value is identical to the logged parameter (#8038, @AdamStelmaszczyk)
- [Tracking] Fix an issue with deleting experiments in the FileStore backend (#8178, @mariusschlegel)
- [Tracking] Fix a UI bug where the "Source Run" field in the Model Version page points to an incorrect set of artifacts (#8156, @WeichenXu123)
- [Tracking] Fix a bug wherein renaming a run reverts its current lifecycle status to `UNFINISHED` (#8154, @WeichenXu123)
- [Tracking] Fix a bug where a file URI could be used as a model version source (#8126, @harupy)
- [Projects] Fix an issue with MLflow projects that have submodules contained within a project (#8050, @kota-iizuka)
- [Examples] Fix `lightning` hyperparameter tuning examples (#8039, @BenWilson2)
- [Server-infra] Fix bug with Cache-Control headers for static server files (#8016, @jmahlik)

Documentation updates:

- [Examples] Add a new and thorough example for the creation of custom model flavors (#7867, @benjaminbluhm)

Small bug fixes and documentation updates:

#8262, #8252, #8250, #8228, #8221, #8203, #8134, #8040, #7994, #7934, @BenWilson2; #8258, #8255, #8253, #8248, #8247, #8245, #8243, #8246, #8244, #8242, #8240, #8229, #8198, #8192, #8112, #8165, #8158, #8152, #8148, #8144, #8143, #8120, #8107, #8105, #8102, #8088, #8089, #8096, #8075, #8073, #8076, #8063, #8064, #8033, #8024, #8023, #8021, #8015, #8005, #7982, #8002, #7987, #7981, #7968, #7931, #7930, #7929, #7917, #7918, #7916, #7914, #7913, @harupy; #7955, @arjundc-db; #8219, #8110, #8093, #8087, #8091, #8092, #8029, #8028, #8031, @jerrylian-db; #8187, @apurva-koti; #8210, #8001, #8000, @arpitjasa-db; #8161, #8127, #8095, #8090, #8068, #8043, #7940, #7924, #7923, @dbczumar; #8147, @morelen17; #8106, @WeichenXu123; #8117, @eltociear; #8100, @laerciop; #8080, @elado; #8070, @grofte; #8066, @yukimori; #8027, #7998, @liangz1; #7999, @martlaf; #7964, @viditjain99; #7928, @alekseyolg; #7909, #7901, #7844, @smurching; #7971, @n30111; #8012, @mingyu89; #8137, @lobrien; #7992, @robmarkcole; #8263, @sunishsheth2009

## 2.2.2 (2023-03-14)

MLflow 2.2.2 is a patch release containing the following bug fixes:
Expand Down
119 changes: 110 additions & 9 deletions docs/source/models.rst
Expand Up @@ -646,6 +646,67 @@ Finally, you can use the :py:func:`mlflow.h2o.load_model()` method to load MLflo

For more information, see :py:mod:`mlflow.h2o`.

h2o pyfunc usage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For a minimal h2o model, here is an example of the pyfunc predict() method in a classification scenario :

.. code-block:: python
import mlflow
import h2o
h2o.init()
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
# import the prostate data
df = h2o.import_file(
"http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip"
)
# convert the columns to factors
df["CAPSULE"] = df["CAPSULE"].asfactor()
df["RACE"] = df["RACE"].asfactor()
df["DCAPS"] = df["DCAPS"].asfactor()
df["DPROS"] = df["DPROS"].asfactor()
# split the data
train, test, valid = df.split_frame(ratios=[0.7, 0.15])
# generate a GLM model
glm_classifier = H2OGeneralizedLinearEstimator(
family="binomial", lambda_=0, alpha=0.5, nfolds=5, compute_p_values=True
)
with mlflow.start_run():
glm_classifier.train(
y="CAPSULE", x=["AGE", "RACE", "VOL", "GLEASON"], training_frame=train
)
metrics = glm_classifier.model_performance()
metrics_to_track = ["MSE", "RMSE", "r2", "logloss"]
metrics_to_log = {
key: value
for key, value in metrics._metric_json.items()
if key in metrics_to_track
}
params = glm_classifier.params
mlflow.log_params(params)
mlflow.log_metrics(metrics_to_log)
model_info = mlflow.h2o.log_model(glm_classifier, artifact_path="h2o_model_info")
# load h2o model and make a prediction
h2o_pyfunc = mlflow.pyfunc.load_model(model_uri=model_info.model_uri)
test_df = test.as_data_frame()
predictions = h2o_pyfunc.predict(test_df)
print(predictions)
# it is also possible to load the model and predict using h2o methods on the h2o frame
# h2o_model = mlflow.h2o.load_model(model_info.model_uri)
# predictions = h2o_model.predict(test)
.. _tf-keras-example:

Keras (``keras``)
^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -952,16 +1013,47 @@ For more information, see :py:mod:`mlflow.spark`.
TensorFlow (``tensorflow``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``tensorflow`` model flavor allows TensorFlow Core models and Keras models
to be logged in MLflow format via the :py:func:`mlflow.tensorflow.save_model()` and
:py:func:`mlflow.tensorflow.log_model()` methods. These methods also add the ``python_function``
flavor to the MLflow Models that they produce, allowing the models to be interpreted as generic
Python functions for inference via :py:func:`mlflow.pyfunc.load_model()`. This loaded PyFunc model
can be scored with both DataFrame input and numpy array input. Finally, you can use the
:py:func:`mlflow.tensorflow.load_model()` method to load MLflow Models with the ``tensorflow``
flavor as TensorFlow Core models or Keras models.
The simple example below shows how to log params and metrics in mlflow for a custom training loop
using low-level TensorFlow API. See `tf-keras-example`_. for an example of mlflow and ``tf.keras`` models.


.. code-block:: python
import numpy as np
import tensorflow as tf
import mlflow
x = np.linspace(-4, 4, num=512)
y = 3 * x + 10
# estimate w and b where y = w * x + b
learning_rate = 0.1
x_train = tf.Variable(x, trainable=False, dtype=tf.float32)
y_train = tf.Variable(y, trainable=False, dtype=tf.float32)
# initial values
w = tf.Variable(1.0)
b = tf.Variable(1.0)
with mlflow.start_run():
mlflow.log_param("learning_rate", learning_rate)
for i in range(1000):
with tf.GradientTape(persistent=True) as tape:
# calculate MSE = 0.5 * (y_predict - y_train)^2
y_predict = w * x_train + b
loss = 0.5 * tf.reduce_mean(tf.square(y_predict - y_train))
mlflow.log_metric("loss", value=loss.numpy(), step=i)
# Update the trainable variables
# w = w - learning_rate * gradient of loss function w.r.t. w
# b = b - learning_rate * gradient of loss function w.r.t. b
w.assign_sub(learning_rate * tape.gradient(loss, w))
b.assign_sub(learning_rate * tape.gradient(loss, b))
print(f"W = {w.numpy():.2f}, b = {b.numpy():.2f}")
For more information, see :py:mod:`mlflow.tensorflow`.
ONNX (``onnx``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -2000,6 +2092,15 @@ and standardizes both the inputs and outputs of pipeline inference. This conform
and batch inference by coercing the data structures that are required for ``transformers`` inference pipelines
to formats that are compatible with json serialization and casting to Pandas DataFrames.

.. note::
Certain `TextGenerationPipeline` types, particularly instructional-based ones, may return the original
prompt and included line-formatting carriage returns `"\n"` in their outputs. For these pipeline types,
if you would like to disable the prompt return, you can set the following in the `inference_config` dictionary when
saving or logging the model: `"include_prompt": False`. To remove the newline characters from within the body
of the generated text output, you can add the `"collapse_whitespace": True` option to the `inference_config` dictionary.
If the pipeline type being saved does not inherit from `TextGenerationPipeline`, these options will not perform
any modification to the output returned from pipeline inference.

.. attention::
Not all ``transformers`` pipeline types are supported. See the table below for the list of currently supported Pipeline
types that can be loaded as ``pyfunc``.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/python_api/index.rst
Expand Up @@ -8,8 +8,10 @@ exposed in the :py:mod:`mlflow` module, so we recommend starting there.

.. toctree::
:glob:
:maxdepth: 1

*
openai/index.rst


See also the :ref:`index of all functions and classes<genindex>`.
Expand Down
File renamed without changes.

0 comments on commit b7d8406

Please sign in to comment.