Skip to content

Commit

Permalink
[Docs] Add v1.3.0 change log (mlrun#3194)
Browse files Browse the repository at this point in the history
  • Loading branch information
jillnogold authored and liranbg committed Mar 16, 2023
1 parent 60522ac commit c0c1736
Show file tree
Hide file tree
Showing 6 changed files with 244 additions and 31 deletions.
Binary file modified docs/_static/images/batch_inference_prediction_artifact.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/project-homepage.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
235 changes: 218 additions & 17 deletions docs/change-log/index.md

Large diffs are not rendered by default.

16 changes: 14 additions & 2 deletions docs/data-prep/ingest-data-fs.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,12 @@ or pip install mlrun[google-cloud-storage] to install them.

### SQL data source

```{admonition} Note
Tech Preview
```
```{admonition} Limitation
Do not use SQL reserved words as entity names. See more details in [Keywords and Reserved Words](https://dev.mysql.com/doc/refman/8.0/en/keywords.html).
```
`SQLSource` can be used for both batch ingestion and real time ingestion. It supports storey but does not support Spark. To configure
either, pass the `db_uri` or overwrite the `MLRUN_SQL__URL` env var, in this format:<br>
`mysql+pymysql://<username>:<password>@<host>:<port>/<db_name>`, for example:
Expand Down Expand Up @@ -273,7 +279,7 @@ For example: `rediss://localhost:6379` creates a redis target, where:
- The server location is localhost port 6379.
- If the path parameter is not set, it tries to fetch it from the MLRUN_REDIS__URL environment variable.
- You cannot pass the username/password as part of the URL. If you want to provide the username/password, use secrets as:
`<prefix_>REDIS_USER <prefix_>REDIS_PASSWORD` where "prefix" is the optional RedisNoSqlTarget `credentials_prefix` parameter.
`<prefix_>REDIS_USER <prefix_>REDIS_PASSWORD` where \<prefix> is the optional RedisNoSqlTarget `credentials_prefix` parameter.
- Two types of Redis servers are supported: StandAlone and Cluster (no need to specify the server type in the config).
- A feature set supports one online target only. Therefore `RedisNoSqlTarget` and `NoSqlTarget` cannot be used as two targets of the same feature set.

Expand All @@ -286,7 +292,13 @@ explicitly each time with the path parameter, for example:</br>

### SQL target store

The `SQLTarget` online target supports storey but does not support Spark.<br>
```{admonition} Note
Tech Preview
```
```{admonition} Limitation
Do not use SQL reserved words as entity names. See more details in [Keywords and Reserved Words](https://dev.mysql.com/doc/refman/8.0/en/keywords.html).
```
The `SQLTarget` online target supports storey but does not support Spark. Aggregations are not supported.<br>
To configure, pass the `db_uri` or overwrite the `MLRUN_SQL__URL` env var, in this format:<br>
`mysql+pymysql://<username>:<password>@<host>:<port>/<db_name>`

Expand Down
6 changes: 3 additions & 3 deletions docs/feature-store/feature-sets.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,11 @@ stocks_set = FeatureSet("stocks", entities=[Entity("ticker")])

## Create a feature set without ingesting its data

You can define and register a feature set (and use it in a feature vector) without ingesting its data into MLRun offline targets.
You can define and register a feature set (and use it in a feature vector) without ingesting its data into MLRun offline targets. This supports all batch sources.

The use-case for this is when you have a large amount of data in a remote storage that is ready to be consumed by a model-training pipeline.
When this feature is enabled on a feature set, data is **not** saved to the offline target during ingestion. Instead, when
`get_offline_features` is called on a vector containing that feature set, that data is read directly from the source.
When this feature is enabled on a feature set, data is **not** saved to the offline target during ingestion. Instead, when `get_offline_features`
is called on a vector containing that feature set, that data is read directly from the source.
Online targets are still ingested, and their value represents a timeslice of the offline source.
Transformations are not allowed when this feature is enabled: no computation graph, no aggregations, etc.
Enable this feature by including `passthrough=True` in the feature set definition. All three ingestion engines (Storey, Spark, Pandas)
Expand Down
18 changes: 9 additions & 9 deletions docs/feature-store/feature-vectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ You can also view some metadata about the feature vector, including all the feat

After a feature vector is saved, it can be used to create both offline (static) datasets and online (real-time) instances to supply as input to a machine learning model.

### Creating an offline feature vector
### Using an offline feature vector

Use the feature store's {py:meth}`~mlrun.feature_store.get_offline_features` function to produce a `dataset` from the feature vector.
It creates the dataset (asynchronously if possible), saves it to the requested target, and returns an {py:class}`~mlrun.feature_store.OfflineVectorResponse`.
Expand All @@ -82,13 +82,13 @@ Defaults to return as a return value to the caller.
- **drop_columns** &mdash; (optional) A list of columns to drop from the resulting feature vector.
- **start_time** &mdash; (optional) Datetime, low limit of time needed to be filtered.
- **end_time** &mdash; (optional) Datetime, high limit of time needed to be filtered.
-**with_indexes** return vector with index columns and timestamp_key from the feature sets. Default is False.
-**update_stats** &mdash; update features statistics from the requested feature sets on the vector. Default is False.
-**engine** &mdash; processing engine kind ("local", "dask", or "spark")
-**engine_args** &mdash; kwargs for the processing engine
-**query** &mdash; The query string used to filter rows
-**spark_service** &mdash; Name of the spark service to be used (when using a remote-spark runtime)
- **join_type** &mdash; (optional) Indicates the join type: `{'left', 'right', 'outer', 'inner'}, default 'outer'`. Relevant only for Dask and storey (local) engines.
- **with_indexes** return vector with index columns and timestamp_key from the feature sets. Default is False.
- **update_stats** &mdash; update features statistics from the requested feature sets on the vector. Default is False.
- **engine** &mdash; processing engine kind ("local", "dask", or "spark")
- **engine_args** &mdash; kwargs for the processing engine
- **query** &mdash; The query string used to filter rows
- **spark_service** &mdash; Name of the spark service to be used (when using a remote-spark runtime)
- **join_type** &mdash; (optional) Indicates the join type: `{'left', 'right', 'outer', 'inner'}, default 'inner'`. The Spark retrieval engine only supports entities-based `inner` join (ie. no support for `relations`, no support for `outer`, `left`, `right` joins)
- left: use only keys from left frame (SQL: left outer join)
- right: use only keys from right frame (SQL: right outer join)
- outer: use union of keys from both frames (SQL: full outer join)
Expand Down Expand Up @@ -220,7 +220,7 @@ resp = fs.get_offline_features(
)
```

### Creating an online feature vector
### Using an online feature vector

The online feature vector provides real-time feature vectors to the model using the latest data available.

Expand Down

0 comments on commit c0c1736

Please sign in to comment.