Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs for consuming features in online environment #609

Merged
merged 16 commits into from
Sep 14, 2022
Merged
30 changes: 15 additions & 15 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,26 +155,26 @@ Follow the [quick start Jupyter Notebook](./samples/product_recommendation_demo.

![Architecture Diagram](./images/architecture.png)

| Feathr component | Cloud Integrations |
| ------------------------------- | --------------------------------------------------------------------------- |
| Offline store – Object Store | Azure Blob Storage, Azure ADLS Gen2, AWS S3 |
| Offline store – SQL | Azure SQL DB, Azure Synapse Dedicated SQL Pools, Azure SQL in VM, Snowflake |
| Streaming Source | Kafka, EventHub |
| Online store | Redis, Azure Cosmos DB (coming soon), Aerospike (coming soon) |
| Feature Registry and Governance | Azure Purview, ANSI SQL such as Azure SQL Server |
| Compute Engine | Azure Synapse Spark Pools, Databricks |
| Machine Learning Platform | Azure Machine Learning, Jupyter Notebook, Databricks Notebook |
| File Format | Parquet, ORC, Avro, JSON, Delta Lake, CSV |
| Credentials | Azure Key Vault |
| Feathr component | Cloud Integrations |
| ------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| Offline store – Object Store | Azure Blob Storage, Azure ADLS Gen2, AWS S3 |
| Offline store – SQL | Azure SQL DB, Azure Synapse Dedicated SQL Pools, Azure SQL in VM, Snowflake |
| Streaming Source | Kafka, EventHub |
| Online store | Redis, [Azure Cosmos DB](https://feathr-ai.github.io/feathr/how-to-guides/jdbc-cosmos-notes.html#using-cosmosdb-as-the-online-store), Aerospike (coming soon) |
| Feature Registry and Governance | Azure Purview, ANSI SQL such as Azure SQL Server |
| Compute Engine | Azure Synapse Spark Pools, Databricks |
| Machine Learning Platform | Azure Machine Learning, Jupyter Notebook, Databricks Notebook |
| File Format | Parquet, ORC, Avro, JSON, Delta Lake, CSV |
| Credentials | Azure Key Vault |

## 🚀 Roadmap

For a complete roadmap with estimated dates, please [visit this page](https://github.com/linkedin/feathr/milestones?direction=asc&sort=title&state=open).

- [x] Support streaming
- [x] Support common data sources
- [x] Support streaming features with transformation
- [x] Support common data sources and sinks. Read more in the [Cloud Integrations and Architecture](#️-cloud-integrations-and-architecture) part.
- [x] Support feature store UI, including Lineage and Search functionalities
- [ ] Support a sandbox Feathr environment for better getting started experience
- [ ] Support online transformation
- [ ] More Feathr online client libraries such as Java
- [ ] Support feature versioning
- [ ] Support feature monitoring
- [ ] Support feature data deletion and retention
Expand Down
34 changes: 27 additions & 7 deletions docs/dev_guide/build-and-push-feathr-registry-docker-image.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ parent: Developer Guides

# How to build and push feathr registry docker image

This doc shows how to build feathr registry docker image locally and publish to registry.
This doc shows how to build feathr registry docker image locally and publish to DockerHub.

## Prerequisites

Expand All @@ -28,32 +28,52 @@ Run **docker images** command, you will see newly created image listed in output
docker images
```

Run **docker run** command to test docker image locally:
Run **docker run** command to test docker image locally.

### Test SQL-based registry

You need to setup the connection string `CONNECTION_STR` for the docker container, so that it knows which SQL-based registry is connected to. The connection string will be something like this:

```bash
"Server=tcp:testregistry.database.windows.net,1433;Initial Catalog=testsql;Persist Security Info=False;User ID=feathr@feathrtestsql;Password=StrongPassword;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;"
```

Then you can test the docker locally by running this command:

### Test SQL registry
```bash
docker run --env CONNECTION_STR=<REPLACE_ME> --env API_BASE=api/v1 -it --rm -p 3000:80 feathrfeaturestore/sql-registry
```

### Test Purview registry

You need to setup a few environment variables, include:

- `PURVIEW_NAME` indicates the Purview service name
- `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET` indicates the service principal account to talk with Purview service.

```bash
docker run --env PURVIEW_NAME=<REPLACE_ME> --env AZURE_CLIENT_ID=<REPLACE_ME> --env AZURE_TENANT_ID=<REPLACE_ME> --env AZURE_CLIENT_SECRET=<REPLACE_ME> --env API_BASE=api/v1 -it --rm -p 3000:80 feathrfeaturestore/feathr-registry
```

### Test SQL registry + RBAC

```bash
docker run --env REACT_APP_ENABLE_RBAC=true --env REACT_APP_AZURE_CLIENT_ID=<REPLACE_ME> --env REACT_APP_AZURE_TENANT_ID=<REPLACE_ME> --env CONNECTION_STR=<REPLACE_ME> --env API_BASE=api/v1 -it --rm -p 3000:80 feathrfeaturestore/feathr-registry
```

After docker image launched, open web browser and navigate to <https://localhost:3000>,verify both UI and backend api can work correctly.
After docker image launched, open web browser and navigate to <https://localhost:3000>,verify both the Feathr UI and the registry backend (SQL/Purview) can work correctly.

## Upload to DockerHub (For Feathr Release Manager)

## Upload to DockerHub Registry
The Feathr repository already have automatic CD pipelines to publish the docker image to DockerHub on release branches. Please checkout [docker publish workflow](https://github.com/feathr-ai/feathr/blob/main/.github/workflows/docker-publish.yml) for details

Login with feathrfeaturestore account and then run **docker push** command to publish docker image to DockerHub. Contact Feathr Team (@jainr, @blrchen) for credentials.
In case if the Feathr release manager wants to do it manually, login with feathrfeaturestore account and then run **docker push** command to publish docker image to DockerHub. Contact Feathr Team (@jainr, @blrchen) for credentials.

```bash
docker login
docker push feathrfeaturestore/sql-registry
docker push feathrfeaturestore/feathr-registry
```

## Published Feathr Registry Image

The published feathr feature registry is located in [DockerHub here](https://hub.docker.com/r/feathrfeaturestore/feathr-registry).
4 changes: 2 additions & 2 deletions docs/dev_guide/cloud_resource_provision.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ Invoke Deployment Script from GitHub Repo with parameter for Azure Region.
Available regions can be checked with this command

```powershell
Get-AzLocation | select displayname,location
Get-AzLocation | select displayname,location
```

```powershell

iwr https://raw.githubusercontent.com/linkedin/feathr/main/docs/how-to-guides/deployFeathr.ps1 -outfile ./deployFeathr.ps1; ./deployFeathr.ps1 -AzureRegion '{Assign Your Region}'
iwr https://raw.githubusercontent.com/linkedin/feathr/main/docs/how-to-guides/deployFeathr.ps1 -outfile ./deployFeathr.ps1; ./deployFeathr.ps1 -AzureRegion '{Assign Your Region}'

```

Expand Down
122 changes: 0 additions & 122 deletions docs/dev_guide/deploy-feathr-api-as-webapp.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/dev_guide/feathr-core-code-structure.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: default
title: Documentation Guideline
title: Feathr Core Code Structure
parent: Developer Guides
---

Expand Down
5 changes: 3 additions & 2 deletions docs/how-to-guides/azure-deployment-arm.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@ The provided Azure Resource Manager (ARM) template deploys the following resourc
7. Azure Event Hub
8. Azure Redis

Please note, you need to have **owner access** in the resource group you are deploying this in. Owner access is required to assign role to managed identity within ARM template so it can access key vault and store secrets.
Please note, you need to have **owner access** in the resource group you are deploying this in. Owner access is required to assign role to managed identity within ARM template so it can access key vault and store secrets. If you don't have such permission, you might want to contact your IT admin to see if they can do that.

Although we recommend end users deploy the resources using the ARM template, we understand that in many situations where users want to reuse existing resources instead of creating new resources; or users have many other permission issues. See [Manually connecting existing resources](#manually-connecting-existing-resources) for more details.

## Architecture

Expand Down Expand Up @@ -111,7 +113,6 @@ https://{resource_prefix}webapp.azurewebsites.net

![feathr ui landing page](../images/feathr-ui-landingpage.png)


### 5. Initialize RBAC access table (Optional)

If you want to use RBAC access for your deployment, you also need to manually initialize the user access table. Replace `[your-email-account]` with the email account that you are currently using, and this email will be the global admin for Feathr feature registry.
Expand Down
4 changes: 2 additions & 2 deletions docs/how-to-guides/azure_resource_provision.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@
"sqlAdminUsername": {
"type": "String",
"metadata": {
"description": "Specifies the username for admin"
"description": "Specifies the username for SQL Database admin"
}
},
"sqlAdminPassword": {
"type": "SecureString",
"metadata": {
"description": "Specifies the password for admin"
"description": "Specifies the password for SQL Database admin"
}
},
"registryBackend": {
Expand Down
56 changes: 56 additions & 0 deletions docs/how-to-guides/model-inference-with-feathr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
layout: default
title: Online Model Inference with Features from Feathr
parent: How-to Guides
---

# Online Model Inference with Features from Feathr

After you have materialized features in online store such as Redis or Azure Cosmos DB, usually end users want to consume those features in production environment for model inference.

With Feathr's [online client](https://feathr.readthedocs.io/en/latest/#feathr.FeathrClient.get_online_features), it is quite straightforward to do that. The sample code is as below, where users only need to configure the online store endpoint (if using Redis), and call `client.get_online_features()` to get the features for a particular key.

```python

## put the section below into the initialization handler
import os
from feathr import FeathrClient

# Set Redis endpoint
os.environ['online_store__redis__host'] = "<replace_with_your_redis_name>.redis.cache.windows.net"
os.environ['online_store__redis__port'] = "6380"
os.environ['online_store__redis__ssl_enabled'] = "True"
os.environ['REDIS_PASSWORD'] = "<put-your-key-here>"

client = FeathrClient()


# put this section in the model inference handler
feature = client.get_online_features(feature_table="nycTaxiCITable",
key='2020-04-15',
feature_names=['f_is_long_trip_distance', 'f_day_of_week'])
# `res` will be an array representing the features of that particular key.


# `model` will be a ML model that is loaded previously.
result = model.predict(feature)
```

## Best Practices

Usually for ML platforms such as Azure Machine Learning, Sagemaker, or DataRobot, there are options where you can "bring your own container" or using "container inference". Basically it requires end users to write an "entry script" and provide a few functions. In those cases, there are usually two handlers:

- an initialization handler to allow users to load configurations. For example, in Azure Machine Learning, it is a function called `init()`, and in Sagemaker, it is `model_fn()`.
- a model inference handler to do the model inference. For example, in Azure Machine Learning, it is called `init()`, and in Sagemaker, it is called `predict_fn()`.

In the initialization handler, initialize the environment variables and initialize `FeathrClient` as shown in the above script; in the inference handler, call this line:

```python
# put this section in the model inference handler
feature = client.get_online_features(feature_table="nycTaxiCITable",
xiaoyongzhu marked this conversation as resolved.
Show resolved Hide resolved
key='2020-04-15',
feature_names=['f_is_long_trip_distance', 'f_day_of_week'])
# `res` will be an array representing the features of that particular key.
# `model` will be a ML model that is loaded previously.
result = model.predict(feature)
```
Loading