Skip to content

Commit

Permalink
[Docs] Add V1.3.1 release notes (#3670)
Browse files Browse the repository at this point in the history
  • Loading branch information
jillnogold committed May 31, 2023
1 parent 65dd220 commit 7bf2601
Show file tree
Hide file tree
Showing 7 changed files with 26 additions and 25 deletions.
3 changes: 0 additions & 3 deletions docs/change-log/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,9 @@
| ML-3819 | Reduce overly-verbose logs on the backend side. [View in Git](https://github.com/mlrun/mlrun/pull/3531). [View in Git](https://github.com/mlrun/mlrun/pull/3553). |
| ML-3823 | Optimized `/projects` endpoint to work faster. [View in Git](https://github.com/mlrun/mlrun/pull/3560). |


### Documentation
New sections describing [Git best practices](../projects/git-best-practices.html) and an example [Nuclio function](../concepts/nuclio-real-time-functions.html#example-of-nuclio-function).


## v1.3.0

### Client/server matrix, prerequisites, and installing
Expand Down Expand Up @@ -239,7 +237,6 @@ The `--ensure-project` flag of the `mlrun project` CLI command is deprecated and
| --- | ----------------------------------------------------------------- |
| ML-3797, ML-3798 | Fixed presenting and serving large-sized projects. [View in Git](https://github.com/mlrun/mlrun/pull/3477). |


## v1.2.1

### New and updated features
Expand Down
2 changes: 1 addition & 1 deletion docs/feature-store/feature-sets.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ df = fstore.ingest(stocks_set, stocks_df)

The graph steps can use built-in transformation classes, simple python classes, or function handlers.

See more details in [Feature set transformations](transformations.html) and See more details in {ref}`transformations`.
See more details in {ref}`Feature set transformations <transformations>`.

## Simulate and debug the data pipeline with a small dataset
During the development phase it's pretty common to check the feature set definition and to simulate the creation of the feature set before
Expand Down
31 changes: 16 additions & 15 deletions docs/feature-store/feature-store-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,18 @@
In machine-learning scenarios, generating a new feature, called feature engineering, takes a tremendous amount of work. The same features
must be used both for training, based on historical data, and for the model prediction based on the online or real-time data. This creates a
significant additional engineering effort, and leads to model inaccuracy when the online and offline features do not match. Furthermore,
monitoring solutions must be built to track features and results and send alerts of data or model drift.
monitoring solutions must be built to track features and results, and to send alerts upon data or model drift.

Consider a scenario in which you train a model and one of its features is a comparison of the current amount to the average amount spent
during the last 3 months by the same person. Creating such a feature is easy when you have the full dataset in training, but in serving,
during the last 3 months by the same person. Creating such a feature is easy when you have the full dataset in training, but for serving
this feature must be calculated in an online manner. The "brute-force" way to address this is to have an ML engineer create an online
pipeline that reimplements all the feature calculations done in the offline process. This is not just time-consuming and error-prone, but
pipeline that re-implements all the feature calculations that comprise the offline process. This is not just time-consuming and error-prone, but
very difficult to maintain over time, and results in a lengthy deployment time. This is exacerbated when having to deal with thousands of
features with an increasing number of data engineers and data scientists that are creating and using the features.
features, and an increasing number of data engineers and data scientists that are creating and using the features.

![Challenges managing features](../_static/images/challenges_managing_features.png)

With MLRun's feature store you can easily define features during the training, that are deployable to serving, without having to define all the
With MLRun's feature store you can easily define features during the training, which are deployable to serving, without having to define all the
"glue" code. You simply create the necessary building blocks to define features and integration, with offline and online storage systems to access the features.

![Feature store diagram](../_static/images/feature_store_diagram.png)
Expand All @@ -26,11 +26,11 @@ This can be raw data (e.g., transaction amount, image pixel, etc.) or a calculat
from average, pattern on image, etc.).
- **{ref}`feature-sets`** &mdash; A grouping of features that are ingested together and stored in a logical group. Feature sets take data from
offline or online sources, build a list of features through a set of transformations, and store the resulting features, along with the
associated metadata and statistics. For example, a transaction may be grouped by the ID of a person performing the transfer or by the device
associated metadata and statistics. For example, transactions could be grouped by the ID of a person performing the transfer or by the device
identifier used to perform the transaction. You can also define in the timestamp source in the feature set, and ingest data into a
feature set.
- **[Execution](./feature-sets.html#add-transformations)** &mdash; A set of operations performed on the data while it is
ingested. The graph contains steps that represent data sources and targets, and can also contain steps that transform and enrich the data that is passed through the feature set. For a deeper dive, see {ref}`transformations`.
ingested. The transformation graph contains steps that represent data sources and targets, and can also include steps that transform and enrich the data that is passed through the feature set. For a deeper dive, see {ref}`transformations`.
- **{ref}`Feature vectors <create-use-feature-vectors>`** &mdash; A set of features, taken from one or more feature sets. The feature vector is defined prior to model
training and serves as the input to the model training process. During model serving, the feature values in the vector are obtained from an online service.

Expand All @@ -40,9 +40,10 @@ training and serves as the input to the model training process. During model ser

The common flow when working with the feature store is to first define the feature set with its source, transformation graph, and targets.
MLRun's robust transformation engine performs complex operations with just a few lines of Python code. To test the
execution process, call the `infer` method with a sample DataFrame. This runs all operations in memory without storing the results. Once the
graph is defined, it's time to ingest the data.
execution process, call the `infer` method with a sample DataFrame. This runs all operations in memory without storing the results.

Once the
graph is defined, it's time to ingest the data.
You can ingest data directly from a DataFrame, by calling the feature set {py:class}`~mlrun.feature_store.ingest` method. You can also define an ingestion
process that runs as a Kubernetes job. This is useful if there is a large ingestion process, or if there is a recurrent ingestion and you
want to schedule the job.
Expand All @@ -61,20 +62,20 @@ Next, extract a versioned **offline** static dataset for training, based on the
model with the feature vector data by providing the input in the form of `'store://feature-vectors/{project}/{feature_vector_name}'`.

Training functions generate models and various model statistics. Use MLRun's auto logging capabilities to store the models along with all
the relevant data, metadata and measurements.
the relevant data, metadata, and measurements.

MLRun can apply all the MLOps functionality by using the framework specific `apply_mlrun()` method, which manages the training process and
automatically logs all the framework specific model details, data, metadata and metrics.
automatically logs all the framework specific model details, data, metadata, and metrics.

The training job automatically generates a set of results and versioned artifacts (run `train_run.outputs` to view the job outputs).

For serving, once you validate the feature vector, use the **online** feature service, based on the
nosql target defined in the feature set for real-time serving. For serving, you define a serving class derived from
After you validate the feature vector, use the **online** feature service, based on the
nosql target defined in the feature set, for real-time serving. For serving, you define a serving class derived from
`mlrun.serving.V2ModelServer`. In the class `load` method, call the {py:meth}`~mlrun.feature_store.get_online_feature_service` function with the vector name, which returns
a feature service object. In the class `preprocess` method, call the feature service `get` method to get the values of those features.

Using this feature store centric process, using one computation graph definition for a feature set, you receive an automatic online and
offline implementation for the feature vectors, with data versioning both in terms of the actual graph that was used to calculate each data
This feature store centric process, using one computation graph definition for a feature set, gives you an automatic online and
offline implementation for the feature vectors with data versioning, both in terms of the actual graph that was used to calculate each data
point, and the offline datasets that were created to train each model.

See more information in {ref}`training with the feature store <retrieve-offline-data>` and {ref}`training-serving`.
6 changes: 3 additions & 3 deletions docs/feature-store/feature-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@
# Feature store

A feature store provides a single pane of glass for sharing all available features across
the organization along with their metadata. MLRun Feature store support security, versioning,
the organization along with their metadata. The MLRun feature store supports security, versioning,
and data snapshots, enabling better data lineage, compliance, and manageability.

As illustrated in the diagram below,
feature stores provide a mechanism (**`Feature Sets`**) to read data from various online or offline sources,
conduct a set of data transformations, and persist the data in online and offline
storage. Features are stored and cataloged along with all their metadata (schema,
labels, statistics, etc.), allowing users to compose **`Feature Vectors`** and use them for training
or serving. The Feature Vectors are generated when needed, taking into account data versioning and time
or serving. The feature vectors are generated when needed, taking into account data versioning and time
correctness (time traveling). Different function kinds (Nuclio, Spark, Dask) are used for feature retrieval, real-time
engine for serving, and batch one for training.
engines for serving, and batch for training.

<br><img src="../_static/images/feature-store-arch.png" alt="feature-store" width="800"/><br>

Expand Down
2 changes: 1 addition & 1 deletion docs/feature-store/feature-vectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Defaults to return as a return value to the caller.
- **engine_args** &mdash; kwargs for the processing engine
- **query** &mdash; The query string used to filter rows
- **spark_service** &mdash; Name of the spark service to be used (when using a remote-spark runtime)
- **join_type** &mdash; (optional) Indicates the join type: `{'left', 'right', 'outer', 'inner'}, default 'inner'`. The Spark retrieval engine only supports entities-based `inner` join (ie. no support for `relations`, no support for `outer`, `left`, `right` joins)
- **join_type** &mdash; (optional) Indicates the join type: `{'left', 'right', 'outer', 'inner'}, default 'inner'`.
- left: use only keys from left frame (SQL: left outer join)
- right: use only keys from right frame (SQL: right outer join)
- outer: use union of keys from both frames (SQL: full outer join)
Expand Down
4 changes: 3 additions & 1 deletion docs/monitoring/initial-setup-configuration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@
" \n",
" `fn.set_tracking(stream_path, batch, sample)`\n",
" \n",
"- **stream_path** &mdash; the v3io stream path (e.g. `v3io:///users/..`)\n",
"- **stream_path**\n",
" - Enterprise: the v3io stream path (e.g. `v3io:///users/..`)\n",
" - CE: a valid Kafka stream (e.g. `kafka://kafka.default.svc.cluster.local:9092`)\n",
"- **sample** &mdash; optional, sample every N requests\n",
"- **batch** &mdash; optional, send micro-batches every N requests\n",
" \n",
Expand Down
3 changes: 2 additions & 1 deletion docs/serving/custom-model-serving-class.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ To set the tracking stream options, specify the following function spec attribut

fn.set_tracking(stream_path, batch, sample)

* **stream_path** &mdash; the v3io stream path (e.g. `v3io:///users/..`)
* **stream_path** &mdash; Enterprise: the v3io stream path (e.g. `v3io:///users/..`); CE: a valid Kafka stream
(e.g. kafka://kafka.default.svc.cluster.local:9092)
* **sample** &mdash; optional, sample every N requests
* **batch** &mdash; optional, send micro-batches every N requests

0 comments on commit 7bf2601

Please sign in to comment.