[Docs] Add V1.3.1 release notes (#3670)

mlrun · May 31, 2023 · 7bf2601 · 7bf2601
1 parent 65dd220
commit 7bf2601
Show file tree

Hide file tree

Showing 7 changed files with 26 additions and 25 deletions.
diff --git a/docs/change-log/index.md b/docs/change-log/index.md
@@ -28,11 +28,9 @@
 | ML-3819 | Reduce overly-verbose logs on the backend side. [View in Git](https://github.com/mlrun/mlrun/pull/3531). [View in Git](https://github.com/mlrun/mlrun/pull/3553).  |
 | ML-3823 | Optimized `/projects` endpoint to work faster. [View in Git](https://github.com/mlrun/mlrun/pull/3560). |
 
-
 ###  Documentation
 New sections describing [Git best practices](../projects/git-best-practices.html) and an example [Nuclio function](../concepts/nuclio-real-time-functions.html#example-of-nuclio-function).
 
-
 ## v1.3.0
 
 ### Client/server matrix, prerequisites, and installing
@@ -239,7 +237,6 @@ The `--ensure-project` flag of the `mlrun project` CLI command is deprecated and
 | --- | ----------------------------------------------------------------- |
 | ML-3797, ML-3798 | Fixed presenting and serving large-sized projects. [View in Git](https://github.com/mlrun/mlrun/pull/3477). |
 
-
 ## v1.2.1
 
 ### New and updated features

diff --git a/docs/feature-store/feature-sets.md b/docs/feature-store/feature-sets.md
@@ -124,7 +124,7 @@ df = fstore.ingest(stocks_set, stocks_df)
 
 The graph steps can use built-in transformation classes, simple python classes, or function handlers. 
 
-See more details in [Feature set transformations](transformations.html) and See more details in {ref}`transformations`.
+See more details in {ref}`Feature set transformations <transformations>`.
 
 ## Simulate and debug the data pipeline with a small dataset
 During the development phase it's pretty common to check the feature set definition and to simulate the creation of the feature set before 

diff --git a/docs/feature-store/feature-store-overview.md b/docs/feature-store/feature-store-overview.md
@@ -4,18 +4,18 @@
 In machine-learning scenarios, generating a new feature, called feature engineering, takes a tremendous amount of work. The same features 
 must be used both for training, based on historical data, and for the model prediction based on the online or real-time data. This creates a 
 significant additional engineering effort, and leads to model inaccuracy when the online and offline features do not match. Furthermore, 
-monitoring solutions must be built to track features and results and send alerts of data or model drift.
+monitoring solutions must be built to track features and results, and to send alerts upon data or model drift.
 
 Consider a scenario in which you train a model and one of its features is a comparison of the current amount to the average amount spent 
-during the last 3 months by the same person. Creating such a feature is easy when you have the full dataset in training, but in serving, 
+during the last 3 months by the same person. Creating such a feature is easy when you have the full dataset in training, but for serving 
 this feature must be calculated in an online manner. The "brute-force" way to address this is to have an ML engineer create an online 
-pipeline that reimplements all the feature calculations done in the offline process. This is not just time-consuming and error-prone, but 
+pipeline that re-implements all the feature calculations that comprise the offline process. This is not just time-consuming and error-prone, but 
 very difficult to maintain over time, and results in a lengthy deployment time. This is exacerbated when having to deal with thousands of 
-features with an increasing number of data engineers and data scientists that are creating and using the features.  
+features, and an increasing number of data engineers and data scientists that are creating and using the features.  
 
 ![Challenges managing features](../_static/images/challenges_managing_features.png)
 
-With MLRun's feature store you can easily define features during the training, that are deployable to serving, without having to define all the 
+With MLRun's feature store you can easily define features during the training, which are deployable to serving, without having to define all the 
 "glue" code. You simply create the necessary building blocks to define features and integration, with offline and online storage systems to access the features.
 
 ![Feature store diagram](../_static/images/feature_store_diagram.png)
@@ -26,11 +26,11 @@ This can be raw data (e.g., transaction amount, image pixel, etc.) or a calculat
 from average, pattern on image, etc.).
 - **{ref}`feature-sets`** &mdash;  A grouping of features that are ingested together and stored in a logical group. Feature sets take data from 
 offline or online sources, build a list of features through a set of transformations, and store the resulting features, along with the 
-associated metadata and statistics. For example, a transaction may be grouped by the ID of a person performing the transfer or by the device 
+associated metadata and statistics. For example, transactions could be grouped by the ID of a person performing the transfer or by the device 
 identifier used to perform the transaction. You can also define in the timestamp source in the feature set, and ingest data into a 
 feature set.
 - **[Execution](./feature-sets.html#add-transformations)** &mdash; A set of operations performed on the data while it is 
-ingested. The graph contains steps that represent data sources and targets, and can also contain steps that transform and enrich the data that is passed through the feature set. For a deeper dive, see {ref}`transformations`.
+ingested. The transformation graph contains steps that represent data sources and targets, and can also include steps that transform and enrich the data that is passed through the feature set. For a deeper dive, see {ref}`transformations`.
 - **{ref}`Feature vectors <create-use-feature-vectors>`** &mdash; A set of features, taken from one or more feature sets. The feature vector is defined prior to model 
 training and serves as the input to the model training process. During model serving, the feature values in the vector are obtained from an online service.
 
@@ -40,9 +40,10 @@ training and serves as the input to the model training process. During model ser
 
 The common flow when working with the feature store is to first define the feature set with its source, transformation graph, and targets. 
 MLRun's robust transformation engine performs complex operations with just a few lines of Python code. To test the 
-execution process, call the `infer` method with a sample DataFrame. This runs all operations in memory without storing the results. Once the 
-graph is defined, it's time to ingest the data.
+execution process, call the `infer` method with a sample DataFrame. This runs all operations in memory without storing the results. 
 
+Once the 
+graph is defined, it's time to ingest the data. 
 You can ingest data directly from a DataFrame, by calling the feature set {py:class}`~mlrun.feature_store.ingest` method. You can also define an ingestion 
 process that runs as a Kubernetes job. This is useful if there is a large ingestion process, or if there is a recurrent ingestion and you 
 want to schedule the job. 
@@ -61,20 +62,20 @@ Next, extract a versioned **offline** static dataset for training, based on the
 model with the feature vector data by providing the input in the form of `'store://feature-vectors/{project}/{feature_vector_name}'`.
 
 Training functions generate models and various model statistics. Use MLRun's auto logging capabilities to store the models along with all 
-the relevant data, metadata and measurements.
+the relevant data, metadata, and measurements.
 
 MLRun can apply all the MLOps functionality by using the framework specific `apply_mlrun()` method, which manages the training process and 
-automatically logs all the framework specific model details, data, metadata and metrics. 
+automatically logs all the framework specific model details, data, metadata, and metrics. 
 
 The training job automatically generates a set of results and versioned artifacts (run `train_run.outputs` to view the job outputs).
 
-For serving, once you validate the feature vector, use the **online** feature service, based on the 
-nosql target defined in the feature set for real-time serving. For serving, you define a serving class derived from 
+After you validate the feature vector, use the **online** feature service, based on the 
+nosql target defined in the feature set, for real-time serving. For serving, you define a serving class derived from 
 `mlrun.serving.V2ModelServer`. In the class `load` method, call the {py:meth}`~mlrun.feature_store.get_online_feature_service` function with the vector name, which returns 
 a feature service object. In the class `preprocess` method, call the feature service `get` method to get the values of those features.
 
-Using this feature store centric process, using one computation graph definition for a feature set, you receive an automatic online and 
-offline implementation for the feature vectors, with data versioning both in terms of the actual graph that was used to calculate each data 
+This feature store centric process, using one computation graph definition for a feature set, gives you an automatic online and 
+offline implementation for the feature vectors with data versioning, both in terms of the actual graph that was used to calculate each data 
 point, and the offline datasets that were created to train each model.
 
 See more information in {ref}`training with the feature store <retrieve-offline-data>` and {ref}`training-serving`.
diff --git a/docs/feature-store/feature-store.md b/docs/feature-store/feature-store.md
@@ -2,17 +2,17 @@
 # Feature store 
 
 A feature store provides a single pane of glass for sharing all available features across
-the organization along with their metadata. MLRun Feature store support security, versioning, 
+the organization along with their metadata. The MLRun feature store supports security, versioning, 
 and data snapshots, enabling better data lineage, compliance, and manageability.
 
 As illustrated in the diagram below,
 feature stores provide a mechanism (**`Feature Sets`**) to read data from various online or offline sources,
 conduct a set of data transformations, and persist the data in online and offline
 storage. Features are stored and cataloged along with all their metadata (schema,
 labels, statistics, etc.), allowing users to compose **`Feature Vectors`** and use them for training 
-or serving. The Feature Vectors are generated when needed, taking into account data versioning and time
+or serving. The feature vectors are generated when needed, taking into account data versioning and time
 correctness (time traveling). Different function kinds (Nuclio, Spark, Dask) are used for feature retrieval, real-time
-engine for serving, and batch one for training.
+engines for serving, and batch for training.
 
 <br><img src="../_static/images/feature-store-arch.png" alt="feature-store" width="800"/><br>
 

diff --git a/docs/feature-store/feature-vectors.md b/docs/feature-store/feature-vectors.md
@@ -88,7 +88,7 @@ Defaults to return as a return value to the caller.
 - **engine_args** &mdash; kwargs for the processing engine
 - **query** &mdash; The query string used to filter rows
 - **spark_service** &mdash; Name of the spark service to be used (when using a remote-spark runtime)   
-- **join_type** &mdash; (optional) Indicates the join type: `{'left', 'right', 'outer', 'inner'}, default 'inner'`. The Spark retrieval engine only supports entities-based `inner` join (ie. no support for `relations`, no support for `outer`, `left`, `right` joins) 
+- **join_type** &mdash; (optional) Indicates the join type: `{'left', 'right', 'outer', 'inner'}, default 'inner'`.  
    - left: use only keys from left frame (SQL: left outer join)
    - right: use only keys from right frame (SQL: right outer join)
    - outer: use union of keys from both frames (SQL: full outer join)

diff --git a/docs/monitoring/initial-setup-configuration.ipynb b/docs/monitoring/initial-setup-configuration.ipynb
@@ -33,7 +33,9 @@
     "    \n",
     "   `fn.set_tracking(stream_path, batch, sample)`\n",
     "    \n",
-    "- **stream_path** &mdash; the v3io stream path (e.g. `v3io:///users/..`)\n",
+    "- **stream_path**\n",
+    "  - Enterprise: the v3io stream path (e.g. `v3io:///users/..`)\n",
+    "  - CE: a valid Kafka stream (e.g. `kafka://kafka.default.svc.cluster.local:9092`)\n",
     "- **sample** &mdash; optional, sample every N requests\n",
     "- **batch** &mdash; optional, send micro-batches every N requests\n",
     "    \n",

diff --git a/docs/serving/custom-model-serving-class.md b/docs/serving/custom-model-serving-class.md
@@ -172,6 +172,7 @@ To set the tracking stream options, specify the following function spec attribut
 
         fn.set_tracking(stream_path, batch, sample)
 
-* **stream_path** &mdash; the v3io stream path (e.g. `v3io:///users/..`)
+* **stream_path** &mdash; Enterprise: the v3io stream path (e.g. `v3io:///users/..`); CE: a valid Kafka stream 
+(e.g. kafka://kafka.default.svc.cluster.local:9092)
 * **sample** &mdash; optional, sample every N requests
 * **batch** &mdash; optional, send micro-batches every N requests