[Docs] Add v1.3.0 change log (mlrun#3194)

liranbg · Mar 16, 2023 · c0c1736 · c0c1736
1 parent 60522ac
commit c0c1736
Show file tree

Hide file tree

Showing 6 changed files with 244 additions and 31 deletions.
diff --git a/docs/_static/images/batch_inference_prediction_artifact.png b/docs/_static/images/batch_inference_prediction_artifact.png
diff --git a/docs/_static/images/project-homepage.png b/docs/_static/images/project-homepage.png
diff --git a/docs/change-log/index.md b/docs/change-log/index.md
diff --git a/docs/data-prep/ingest-data-fs.md b/docs/data-prep/ingest-data-fs.md
@@ -163,6 +163,12 @@ or pip install mlrun[google-cloud-storage] to install them.
 
 ### SQL data source
 
+```{admonition} Note
+Tech Preview 
+```
+```{admonition} Limitation
+Do not use SQL reserved words as entity names. See more details in [Keywords and Reserved Words](https://dev.mysql.com/doc/refman/8.0/en/keywords.html).
+```
 `SQLSource` can be used for both batch ingestion and real time ingestion. It supports storey but does not support Spark. To configure 
 either, pass the `db_uri` or overwrite the `MLRUN_SQL__URL` env var, in this format:<br> 
 `mysql+pymysql://<username>:<password>@<host>:<port>/<db_name>`, for example:
@@ -273,7 +279,7 @@ For example: `rediss://localhost:6379` creates a redis target, where:
    - The server location is localhost port 6379.
 - If the path parameter is not set, it tries to fetch it from the MLRUN_REDIS__URL environment variable.
 - You cannot pass the username/password as part of the URL. If you want to provide the username/password, use secrets as:
-`<prefix_>REDIS_USER <prefix_>REDIS_PASSWORD` where "prefix" is the optional RedisNoSqlTarget `credentials_prefix` parameter.
+`<prefix_>REDIS_USER <prefix_>REDIS_PASSWORD` where \<prefix> is the optional RedisNoSqlTarget `credentials_prefix` parameter.
 - Two types of Redis servers are supported: StandAlone and Cluster (no need to specify the server type in the config).
 - A feature set supports one online target only. Therefore `RedisNoSqlTarget` and `NoSqlTarget` cannot be used as two targets of the same feature set.
 
@@ -286,7 +292,13 @@ explicitly each time with the path parameter, for example:</br>
 
 ### SQL target store
 
-The `SQLTarget` online target supports storey but does not support Spark.<br>
+```{admonition} Note
+Tech Preview 
+```
+```{admonition} Limitation
+Do not use SQL reserved words as entity names. See more details in [Keywords and Reserved Words](https://dev.mysql.com/doc/refman/8.0/en/keywords.html).
+```
+The `SQLTarget` online target supports storey but does not support Spark. Aggregations are not supported.<br>
 To configure, pass the `db_uri` or overwrite the `MLRUN_SQL__URL` env var, in this format:<br>
 `mysql+pymysql://<username>:<password>@<host>:<port>/<db_name>`
 

diff --git a/docs/feature-store/feature-sets.md b/docs/feature-store/feature-sets.md
@@ -57,11 +57,11 @@ stocks_set = FeatureSet("stocks", entities=[Entity("ticker")])
 
 ## Create a feature set without ingesting its data
 
-You can define and register a feature set (and use it in a feature vector) without ingesting its data into MLRun offline targets.
+You can define and register a feature set (and use it in a feature vector) without ingesting its data into MLRun offline targets. This supports all batch sources.
 
 The use-case for this is when you have a large amount of data in a remote storage that is ready to be consumed by a model-training pipeline.
-When this feature is enabled on a feature set, data is **not** saved to the offline target during ingestion. Instead, when  
-`get_offline_features` is called on a vector containing that feature set, that data is read directly from the source.
+When this feature is enabled on a feature set, data is **not** saved to the offline target during ingestion. Instead, when `get_offline_features` 
+is called on a vector containing that feature set, that data is read directly from the source.
 Online targets are still ingested, and their value represents a timeslice of the offline source.
 Transformations are not allowed when this feature is enabled: no computation graph, no aggregations, etc.
 Enable this feature by including `passthrough=True` in the feature set definition. All three ingestion engines (Storey, Spark, Pandas) 

diff --git a/docs/feature-store/feature-vectors.md b/docs/feature-store/feature-vectors.md
@@ -61,7 +61,7 @@ You can also view some metadata about the feature vector, including all the feat
 
 After a feature vector is saved, it can be used to create both offline (static) datasets and online (real-time) instances to supply as input to a machine learning model.  
 
-### Creating an offline feature vector
+### Using an offline feature vector
 
 Use the feature store's {py:meth}`~mlrun.feature_store.get_offline_features` function to produce a `dataset` from the feature vector.
 It creates the dataset (asynchronously if possible), saves it to the requested target, and returns an {py:class}`~mlrun.feature_store.OfflineVectorResponse`.  
@@ -82,13 +82,13 @@ Defaults to return as a return value to the caller.
 - **drop_columns** &mdash; (optional) A list of columns to drop from the resulting feature vector.
 - **start_time** &mdash; (optional) Datetime, low limit of time needed to be filtered. 
 - **end_time** &mdash; (optional) Datetime, high limit of time needed to be filtered. 
--**with_indexes**    return vector with index columns and timestamp_key from the feature sets. Default is False.
--**update_stats** &mdash; update features statistics from the requested feature sets on the vector. Default is False.
--**engine** &mdash; processing engine kind ("local", "dask", or "spark")
--**engine_args** &mdash; kwargs for the processing engine
--**query** &mdash; The query string used to filter rows
--**spark_service** &mdash; Name of the spark service to be used (when using a remote-spark runtime)   
-- **join_type** &mdash; (optional) Indicates the join type: `{'left', 'right', 'outer', 'inner'}, default 'outer'`. Relevant only for Dask and storey (local) engines. 
+- **with_indexes**    return vector with index columns and timestamp_key from the feature sets. Default is False.
+- **update_stats** &mdash; update features statistics from the requested feature sets on the vector. Default is False.
+- **engine** &mdash; processing engine kind ("local", "dask", or "spark")
+- **engine_args** &mdash; kwargs for the processing engine
+- **query** &mdash; The query string used to filter rows
+- **spark_service** &mdash; Name of the spark service to be used (when using a remote-spark runtime)   
+- **join_type** &mdash; (optional) Indicates the join type: `{'left', 'right', 'outer', 'inner'}, default 'inner'`. The Spark retrieval engine only supports entities-based `inner` join (ie. no support for `relations`, no support for `outer`, `left`, `right` joins) 
    - left: use only keys from left frame (SQL: left outer join)
    - right: use only keys from right frame (SQL: right outer join)
    - outer: use union of keys from both frames (SQL: full outer join)
@@ -220,7 +220,7 @@ resp = fs.get_offline_features(
 )
 ```
 
-### Creating an online feature vector
+### Using an online feature vector
 
 The online feature vector provides real-time feature vectors to the model using the latest data available.