From 40a2dd588f91600ed431ef8ecb913064c08a8323 Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Thu, 25 Aug 2022 03:16:14 -0700 Subject: [PATCH 01/15] Create consume-features.md --- docs/how-to-guides/consume-features.md | 48 ++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) create mode 100644 docs/how-to-guides/consume-features.md diff --git a/docs/how-to-guides/consume-features.md b/docs/how-to-guides/consume-features.md new file mode 100644 index 000000000..e9381e4ea --- /dev/null +++ b/docs/how-to-guides/consume-features.md @@ -0,0 +1,48 @@ +--- +layout: default +title: Consuming Features in ML Platforms +parent: How-to Guides +--- + +# Consuming Features in ML Platforms + +After you have materialized features in online store such as Redis, usually end users want to consume those features in production environment for model inference. + +With Feathr's online client, it is quite straightforward to do that. The sample code is as below, where users only need to configure the online store endpoint (if using Redis), and call `client.get_online_features()` to get the features for a particular key. + + +```python + +## put the section below into the initialization handler +import os +from feathr import FeathrClient + +# Set Redis endpoint +os.environ['online_store__redis__host'] = "xx.redis.cache.windows.net" +os.environ['online_store__redis__port'] = "6380" +os.environ['online_store__redis__ssl_enabled'] = "True" +os.environ['REDIS_PASSWORD'] = "" + +client = FeathrClient() + + +# put this section in the model inference handler +feature = client.get_online_features(feature_table="nycTaxiCITable", + key='2020-04-15', + feature_names=['f_is_long_trip_distance', 'f_day_of_week']) +# `res` will be an array representing the features of that particular key. + + +# `model` will be a ML model that is loaded previously. +result = model.predict(feature) +``` + +## Best Practices + +Usually for ML platforms such as Azure Machine Learning, Sagemaker, or DataRobot, there are options where you can "bring your own container" or using "container inference". Basically it requires end users to write an "entry script" and provide a few functions. In those cases, there are usually two handlers: + +- an initialization handler to allow users to load configurations. For example, in Azure Machine Learning, it is a function called `init()`, and in Sagemaker, it is +- a model inference handler to do the model inference. For example, `predict_fn()` + +In the initialization handler + From 8fe6a249bc2bb587c58438b37c12cd17fb341585 Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Thu, 25 Aug 2022 03:18:30 -0700 Subject: [PATCH 02/15] Update consume-features.md --- docs/how-to-guides/consume-features.md | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/docs/how-to-guides/consume-features.md b/docs/how-to-guides/consume-features.md index e9381e4ea..1593043ba 100644 --- a/docs/how-to-guides/consume-features.md +++ b/docs/how-to-guides/consume-features.md @@ -41,8 +41,20 @@ result = model.predict(feature) Usually for ML platforms such as Azure Machine Learning, Sagemaker, or DataRobot, there are options where you can "bring your own container" or using "container inference". Basically it requires end users to write an "entry script" and provide a few functions. In those cases, there are usually two handlers: -- an initialization handler to allow users to load configurations. For example, in Azure Machine Learning, it is a function called `init()`, and in Sagemaker, it is -- a model inference handler to do the model inference. For example, `predict_fn()` +- an initialization handler to allow users to load configurations. For example, in Azure Machine Learning, it is a function called `init()`, and in Sagemaker, it is `model_fn()`. +- a model inference handler to do the model inference. For example, in Azure Machine Learning, it is called `init()`, and in Sagemaker, it is called `predict_fn()`. -In the initialization handler +In the initialization handler, initialize the environment variables and initialize `FeathrClient` as shown in the above script; in the inference handler, call this line: + +```python +# put this section in the model inference handler +feature = client.get_online_features(feature_table="nycTaxiCITable", + key='2020-04-15', + feature_names=['f_is_long_trip_distance', 'f_day_of_week']) +# `res` will be an array representing the features of that particular key. + + +# `model` will be a ML model that is loaded previously. +result = model.predict(feature) +``` From 0a3d06d7c0d10f4e8b4feca31fc3f3a847437512 Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Thu, 25 Aug 2022 03:21:34 -0700 Subject: [PATCH 03/15] rename docs --- ...me-features.md => model-inference-with-feathr.md} | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) rename docs/how-to-guides/{consume-features.md => model-inference-with-feathr.md} (91%) diff --git a/docs/how-to-guides/consume-features.md b/docs/how-to-guides/model-inference-with-feathr.md similarity index 91% rename from docs/how-to-guides/consume-features.md rename to docs/how-to-guides/model-inference-with-feathr.md index 1593043ba..1782ccdf1 100644 --- a/docs/how-to-guides/consume-features.md +++ b/docs/how-to-guides/model-inference-with-feathr.md @@ -1,16 +1,15 @@ --- layout: default -title: Consuming Features in ML Platforms +title: Model Inference with Features from Feathr parent: How-to Guides --- -# Consuming Features in ML Platforms +# Model Inference with Features from Feathr -After you have materialized features in online store such as Redis, usually end users want to consume those features in production environment for model inference. +After you have materialized features in online store such as Redis, usually end users want to consume those features in production environment for model inference. With Feathr's online client, it is quite straightforward to do that. The sample code is as below, where users only need to configure the online store endpoint (if using Redis), and call `client.get_online_features()` to get the features for a particular key. - ```python ## put the section below into the initialization handler @@ -28,7 +27,7 @@ client = FeathrClient() # put this section in the model inference handler feature = client.get_online_features(feature_table="nycTaxiCITable", - key='2020-04-15', + key='2020-04-15', feature_names=['f_is_long_trip_distance', 'f_day_of_week']) # `res` will be an array representing the features of that particular key. @@ -49,7 +48,7 @@ In the initialization handler, initialize the environment variables and initiali ```python # put this section in the model inference handler feature = client.get_online_features(feature_table="nycTaxiCITable", - key='2020-04-15', + key='2020-04-15', feature_names=['f_is_long_trip_distance', 'f_day_of_week']) # `res` will be an array representing the features of that particular key. @@ -57,4 +56,3 @@ feature = client.get_online_features(feature_table="nycTaxiCITable", # `model` will be a ML model that is loaded previously. result = model.predict(feature) ``` - From dd55769a09c6d1418889ba5e2537bee7d93c5595 Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Thu, 25 Aug 2022 03:22:35 -0700 Subject: [PATCH 04/15] Update model-inference-with-feathr.md --- docs/how-to-guides/model-inference-with-feathr.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/how-to-guides/model-inference-with-feathr.md b/docs/how-to-guides/model-inference-with-feathr.md index 1782ccdf1..de86a07ef 100644 --- a/docs/how-to-guides/model-inference-with-feathr.md +++ b/docs/how-to-guides/model-inference-with-feathr.md @@ -1,10 +1,10 @@ --- layout: default -title: Model Inference with Features from Feathr +title: Online Model Inference with Features from Feathr parent: How-to Guides --- -# Model Inference with Features from Feathr +# Online Model Inference with Features from Feathr After you have materialized features in online store such as Redis, usually end users want to consume those features in production environment for model inference. From 7e8e75a447702358031ba9f8ff625a7af7294c3c Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Thu, 25 Aug 2022 03:41:54 -0700 Subject: [PATCH 05/15] Update README.md --- docs/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/README.md b/docs/README.md index 6adcdd852..d22b07bb8 100644 --- a/docs/README.md +++ b/docs/README.md @@ -169,12 +169,12 @@ Follow the [quick start Jupyter Notebook](./samples/product_recommendation_demo. ## 🚀 Roadmap -For a complete roadmap with estimated dates, please [visit this page](https://github.com/linkedin/feathr/milestones?direction=asc&sort=title&state=open). - -- [x] Support streaming -- [x] Support common data sources +- [x] Support streaming features with transformation +- [x] Support common data sources and sinks - [x] Support feature store UI, including Lineage and Search functionalities +- [ ] Support a sandbox Feathr environment for better getting started experience - [ ] Support online transformation +- [ ] More online client libraries such as Java - [ ] Support feature versioning - [ ] Support feature monitoring - [ ] Support feature data deletion and retention From d8147ee998c98862e095a3965b3ab6d320b0a9fe Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Fri, 2 Sep 2022 01:49:50 -0700 Subject: [PATCH 06/15] update docs per feedback --- ...d-and-push-feathr-registry-docker-image.md | 36 +++++++++++++------ docs/dev_guide/cloud_resource_provision.md | 4 +-- docs/dev_guide/deploy-feathr-api-as-webapp.md | 28 ++++++++------- docs/dev_guide/feathr-core-code-structure.md | 2 +- docs/how-to-guides/azure-deployment-arm.md | 16 ++++++++- .../azure_resource_provision.json | 4 +-- 6 files changed, 61 insertions(+), 29 deletions(-) diff --git a/docs/dev_guide/build-and-push-feathr-registry-docker-image.md b/docs/dev_guide/build-and-push-feathr-registry-docker-image.md index 034b502df..a52612edf 100644 --- a/docs/dev_guide/build-and-push-feathr-registry-docker-image.md +++ b/docs/dev_guide/build-and-push-feathr-registry-docker-image.md @@ -6,7 +6,7 @@ parent: Developer Guides # How to build and push feathr registry docker image -This doc shows how to build feathr registry docker image locally and publish to registry. +This doc shows how to build feathr registry docker image locally and publish to DockerHub. ## Prerequisites @@ -28,32 +28,48 @@ Run **docker images** command, you will see newly created image listed in output docker images ``` -Run **docker run** command to test docker image locally: +Run **docker run** command to test docker image locally. + +### Test SQL-based registry + +You need to setup the connection string `CONNECTION_STR` for the docker container, so that it knows which SQL-based registry is connected to. The connection string will be something like this: + +```bash +"Server=tcp:testregistry.database.windows.net,1433;Initial Catalog=testsql;Persist Security Info=False;User ID=feathr@feathrtestsql;Password=StrongPassword;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;" +``` + +Then you can test the docker locally by running this command: -### Test SQL registry ```bash docker run --env CONNECTION_STR= --env API_BASE=api/v1 -it --rm -p 3000:80 feathrfeaturestore/sql-registry ``` ### Test Purview registry + +You need to setup a few environment variables, include: + +- `PURVIEW_NAME` indicates the Purview service name +- `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET` indicates the service principal account to talk with Purview service. + ```bash docker run --env PURVIEW_NAME= --env AZURE_CLIENT_ID= --env AZURE_TENANT_ID= --env AZURE_CLIENT_SECRET= --env API_BASE=api/v1 -it --rm -p 3000:80 feathrfeaturestore/feathr-registry ``` ### Test SQL registry + RBAC + ```bash docker run --env REACT_APP_ENABLE_RBAC=true --env REACT_APP_AZURE_CLIENT_ID= --env REACT_APP_AZURE_TENANT_ID= --env CONNECTION_STR= --env API_BASE=api/v1 -it --rm -p 3000:80 feathrfeaturestore/feathr-registry ``` -After docker image launched, open web browser and navigate to ,verify both UI and backend api can work correctly. +After docker image launched, open web browser and navigate to ,verify both the Feathr UI and the registry backend (SQL/Purview) can work correctly. + +## Upload to DockerHub (For Feathr Release Manager) -## Upload to DockerHub Registry +The Feathr repository should have automatic CD pipelines to publish the docker container. -Login with feathrfeaturestore account and then run **docker push** command to publish docker image to DockerHub. Contact Feathr Team (@jainr, @blrchen) for credentials. +In case if the Feathr release manager wants to do it manually, login with feathrfeaturestore account and then run **docker push** command to publish docker image to DockerHub. Contact Feathr Team (@jainr, @blrchen) for credentials. ```bash docker login -docker push feathrfeaturestore/sql-registry -``` - - +docker push feathrfeaturestore/feathr-registry +``` \ No newline at end of file diff --git a/docs/dev_guide/cloud_resource_provision.md b/docs/dev_guide/cloud_resource_provision.md index 8ac07ac43..033030694 100644 --- a/docs/dev_guide/cloud_resource_provision.md +++ b/docs/dev_guide/cloud_resource_provision.md @@ -29,12 +29,12 @@ Invoke Deployment Script from GitHub Repo with parameter for Azure Region. Available regions can be checked with this command ```powershell - Get-AzLocation | select displayname,location +Get-AzLocation | select displayname,location ``` ```powershell - iwr https://raw.githubusercontent.com/linkedin/feathr/main/docs/how-to-guides/deployFeathr.ps1 -outfile ./deployFeathr.ps1; ./deployFeathr.ps1 -AzureRegion '{Assign Your Region}' +iwr https://raw.githubusercontent.com/linkedin/feathr/main/docs/how-to-guides/deployFeathr.ps1 -outfile ./deployFeathr.ps1; ./deployFeathr.ps1 -AzureRegion '{Assign Your Region}' ``` diff --git a/docs/dev_guide/deploy-feathr-api-as-webapp.md b/docs/dev_guide/deploy-feathr-api-as-webapp.md index a25abf817..8345597bf 100644 --- a/docs/dev_guide/deploy-feathr-api-as-webapp.md +++ b/docs/dev_guide/deploy-feathr-api-as-webapp.md @@ -6,6 +6,8 @@ parent: Developer Guides # Feathr REST API +> :warning: This document is out of date and will be updated in the future. + The REST API currently supports following functionalities: 1. Get Feature by Qualified Name @@ -38,7 +40,7 @@ Here are the steps to build the API as a docker container, push it to Azure Cont 1. Install Azure CLI by following instructions [here](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) -1. Create Azure Container Registry. First create the resource group. +2. Create Azure Container Registry. First create the resource group. ```bash az group create --name --location @@ -50,13 +52,13 @@ Here are the steps to build the API as a docker container, push it to Azure Cont az acr create --resource-group --name --sku Basic ``` -1. Login to your Azure container registry (ACR) account. +3. Login to your Azure container registry (ACR) account. ```bash $ az acr login --name ``` -1. Clone the repository and navigate to api folder +4. Clone the repository and navigate to api folder ```bash $ git clone git@github.com:linkedin/feathr.git @@ -65,14 +67,14 @@ Here are the steps to build the API as a docker container, push it to Azure Cont ``` -1. Build the docker container locally, you need to have docker installed locally and have it running. To set up docker on your machine follow the instructions [here](https://docs.docker.com/get-started/) +5. Build the docker container locally, you need to have docker installed locally and have it running. To set up docker on your machine follow the instructions [here](https://docs.docker.com/get-started/) **Note: Note: /image_name is not a mandatory format for specifying the name of the image.It’s just a useful convention to avoid tagging your image again when you need to push it to a registry. It can be anything you want in the format below** ```bash $ docker build -t feathr/api . ``` -1. Run docker images command and you will see your newly created image +6. Run docker images command and you will see your newly created image ```bash $ docker images @@ -81,15 +83,15 @@ Here are the steps to build the API as a docker container, push it to Azure Cont feathr/api latest a647ea749b9b 5 minutes ago 529MB ``` -1. Before you can push an image to your registry, you must tag it with the fully qualified name of your ACR login server. The login server name is in the format .azurecr.io (all lowercase), for example, mycontainerregistry007.azurecr.io. Tag the image +7. Before you can push an image to your registry, you must tag it with the fully qualified name of your ACR login server. The login server name is in the format .azurecr.io (all lowercase), for example, mycontainerregistry007.azurecr.io. Tag the image ```bash $ docker tag feathr/api:latest feathracr.azurecr.io/feathr/api:latest ``` -1. Push the image to the registry +8. Push the image to the registry ```bash $ docker push feathracr.azurecr.io/feathr/api:latest ``` -1. List the images from your registry to see your recently pushed image +9. List the images from your registry to see your recently pushed image ``` az acr repository list --name feathracr --output table ``` @@ -103,20 +105,20 @@ Here are the steps to build the API as a docker container, push it to Azure Cont ## Deploy image to Azure WebApp for Containers 1. Go to [Azure portal](https://portal.azure.com) and search for your container registry -1. Select repositories from the left pane and click latest tag. Click on the three dots on right side of the tag and select **Deploy to WebApp** option. If you see the **Deploy to WebApp** option greyed out, you would have to enable Admin User on the registry by Updating it. +2. Select repositories from the left pane and click latest tag. Click on the three dots on right side of the tag and select **Deploy to WebApp** option. If you see the **Deploy to WebApp** option greyed out, you would have to enable Admin User on the registry by Updating it. ![Container Image 1](../images/feathr_api_image_latest.png) ![Container Image 2](../images/feathr_api_image_latest_options.png) -1. Provide a name for the deployed webapp, along with the subscription to deploy app into, the resource group and the appservice plan +3. Provide a name for the deployed webapp, along with the subscription to deploy app into, the resource group and the appservice plan ![Container Image](../images/feathr_api_image_latest_deployment.png) -1. You will get the notification that your app has been successfully deployed, click on **Go to Resource** button. +4. You will get the notification that your app has been successfully deployed, click on **Go to Resource** button. -1. On the App overview page go to the URL (https://.azurewebsites.net/docs) for deployed app (it's under URL on the app overview page) and you should see the API documentation. +5. On the App overview page go to the URL (https://.azurewebsites.net/docs) for deployed app (it's under URL on the app overview page) and you should see the API documentation. ![API docs](../images/api-docs.png) -Congratulations you have successfully deployed the Feathr API. +Congratulations you have successfully deployed the Feathr REST API. diff --git a/docs/dev_guide/feathr-core-code-structure.md b/docs/dev_guide/feathr-core-code-structure.md index ab812f32e..acf0c8c93 100644 --- a/docs/dev_guide/feathr-core-code-structure.md +++ b/docs/dev_guide/feathr-core-code-structure.md @@ -1,6 +1,6 @@ --- layout: default -title: Documentation Guideline +title: Feathr Core Code Structure parent: Developer Guides --- diff --git a/docs/how-to-guides/azure-deployment-arm.md b/docs/how-to-guides/azure-deployment-arm.md index a06033bbe..42cea7ff1 100644 --- a/docs/how-to-guides/azure-deployment-arm.md +++ b/docs/how-to-guides/azure-deployment-arm.md @@ -17,7 +17,9 @@ The provided Azure Resource Manager (ARM) template deploys the following resourc 7. Azure Event Hub 8. Azure Redis -Please note, you need to have **owner access** in the resource group you are deploying this in. Owner access is required to assign role to managed identity within ARM template so it can access key vault and store secrets. +Please note, you need to have **owner access** in the resource group you are deploying this in. Owner access is required to assign role to managed identity within ARM template so it can access key vault and store secrets. If you don't have such permission, you might want to contact your IT admin to see if they can do that. + +Although we recommend end users deploy the resources using the ARM template, we understand that in many situations where users want to reuse existing resources instead of creating new resources; or users have many other permission issues. See [Manually connecting existing resources](#manually-connecting-existing-resources) for more details. ## Architecture @@ -147,3 +149,15 @@ Follow the quick start guide [here](https://linkedin.github.io/feathr/quickstart - [SQL Registry DB Schema](https://github.com/linkedin/feathr/blob/main/registry/sql-registry/scripts/schema.sql) - [RBAC DB Schema](https://github.com/linkedin/feathr/blob/main/registry/access_control/scripts/schema.sql) + + +## Manually connecting existing resources + + +Although we recommend end users deploy the resources using the ARM template, we understand that in many situations where users want to reuse existing resources instead of creating new resources; or users have many other permission issues. + +The "source of truth" deployment template is still the [JSON file describing the Azure resources (known as "ARM template")](azure_resource_provision.json). Essentially what we are doing below is just replicate the ARM template manually. + +The part that needs most attention is the Feathr registry deployment. Let's say the developer want to reuse an existing SQL database, there are a few steps: + +1. \ No newline at end of file diff --git a/docs/how-to-guides/azure_resource_provision.json b/docs/how-to-guides/azure_resource_provision.json index 827757b8c..58300fae4 100644 --- a/docs/how-to-guides/azure_resource_provision.json +++ b/docs/how-to-guides/azure_resource_provision.json @@ -35,13 +35,13 @@ "sqlAdminUsername": { "type": "String", "metadata": { - "description": "Specifies the username for admin" + "description": "Specifies the username for SQL Database admin" } }, "sqlAdminPassword": { "type": "SecureString", "metadata": { - "description": "Specifies the password for admin" + "description": "Specifies the password for SQL Database admin" } }, "registryBackend": { From 8aa79c18140aadeb18088fb1ba1e05888e2b7652 Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Fri, 2 Sep 2022 01:57:24 -0700 Subject: [PATCH 07/15] Update streaming-source-ingestion.md --- .../streaming-source-ingestion.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/how-to-guides/streaming-source-ingestion.md b/docs/how-to-guides/streaming-source-ingestion.md index 4a59abc48..499efef5c 100644 --- a/docs/how-to-guides/streaming-source-ingestion.md +++ b/docs/how-to-guides/streaming-source-ingestion.md @@ -1,12 +1,12 @@ --- layout: default -title: Streaming Source Ingestion +title: Streaming Source Ingestion and Feature Definition parent: How-to Guides --- -# Streaming feature ingestion +# Streaming Source Ingestion and Feature Definition -Feathr supports defining features from a stream source (for example Kafka) and sink the features into an online store (such as Redis). This is very useful if you need up-to-date features for online store, for example when user clicks on the website, that web log event is usually sent to Kafka, and data scientists might need some features immediately, such as the browser used in this particular event. The steps are as below: +Feathr supports defining features from a stream source (for example Kafka) with transformations, and sink the features into an online store (such as Redis). This is very useful if you need up-to-date features for online store, for example when user clicks on the website, that web log event is usually sent to Kafka, and data scientists might need some features immediately, such as the browser used in this particular event. The steps are as below: ## Define Kafka streaming input source @@ -35,13 +35,13 @@ stream_source = KafKaSource(name="kafkaStreamingSource", ) ``` -You may need to produce data and send them into Kafka as this data source in advance. Please check [Kafka data source producer](../../feathr_project/test/prep_azure_kafka_test_data.py) as a reference. Also you should keep this producer running which means there are data stream keep coming into Kafka while calling the 'materialize_features' below. +You may need to produce data and send them into Kafka as this data source in advance. Please check [Kafka data source producer](https://github.com/linkedin/feathr/blob/main/feathr_project/test/prep_azure_kafka_test_data.py) as a reference. Also you should keep this producer running which means there are data stream keep coming into Kafka while calling the 'materialize_features' below. ## Define feature definition with the Kafka source You can then define features. They are mostly the same with the [regular feature definition](../concepts/feature-definition.md). -Note that for the `transform` part, only row level transformation is allowed in streaming anchor at the moment, i.e. the transformations listed in [Spark SQL Built-in Functions](https://spark.apache.org/docs/latest/api/sql/) are supported. Other transformations support are in the roadmap. +Note that for the `transform` part, only row level transformation is allowed in streaming anchor at the moment, i.e. the transformations listed in [Spark SQL Built-in Functions](https://spark.apache.org/docs/latest/api/sql/) are supported. Users can also define customized [Spark SQL functions](./feathr-spark-udf-advanced.md). For example, you can specify to do a row-level transformation like `trips_today + randn() * cos(trips_today)` for your input data. @@ -90,14 +90,14 @@ res = client.multi_get_online_features('kafkaSampleDemoFeature', ['1', '2'], ['f ``` -You can also refer to the [test case](../../feathr_project/test/test_azure_kafka_e2e.py) for more details. +You can also refer to the [test case](https://github.com/linkedin/feathr/blob/main/feathr_project/test/test_azure_kafka_e2e.py) for more details. ## Kafka configuration -Please refer to the [Feathr Configuration Doc](./feathr-configuration-and-env.md#kafkasasljaasconfig) for more details on the credentials. +Please refer to the [Feathr Configuration Doc](./feathr-configuration-and-env.md#KAFKA_SASL_JAAS_CONFIG) for more details on the credentials. -## Event Hub monitor +## Event Hub monitoring -Please check monitor panel on your 'Event Hub' overview page while running materialize to make sure there are both incoming and outgoing messages, like below graph. Otherwise, you may not get anything from 'get_online_features' since the source is empty. +If you feel something is wrong, you can check the monitor panel on your 'Event Hub' overview page while running the Feathr materialization job, to make sure there are both incoming and outgoing messages, like the graph below. Otherwise, you may not get anything from `get_online_features()` since the source is empty. ![Kafka Monitor Page](../images/kafka-messages-monitor.png) \ No newline at end of file From 547c832d9b4d79aab272d75beb1fbbde8b934e0e Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Fri, 2 Sep 2022 11:12:24 -0700 Subject: [PATCH 08/15] update docs --- docs/README.md | 26 +++++++++---------- docs/how-to-guides/azure-deployment-arm.md | 6 ++--- .../model-inference-with-feathr.md | 2 +- 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/README.md b/docs/README.md index 03faad3dd..e68538251 100644 --- a/docs/README.md +++ b/docs/README.md @@ -155,26 +155,26 @@ Follow the [quick start Jupyter Notebook](./samples/product_recommendation_demo. ![Architecture Diagram](./images/architecture.png) -| Feathr component | Cloud Integrations | -| ------------------------------- | --------------------------------------------------------------------------- | -| Offline store – Object Store | Azure Blob Storage, Azure ADLS Gen2, AWS S3 | -| Offline store – SQL | Azure SQL DB, Azure Synapse Dedicated SQL Pools, Azure SQL in VM, Snowflake | -| Streaming Source | Kafka, EventHub | -| Online store | Redis, Azure Cosmos DB (coming soon), Aerospike (coming soon) | -| Feature Registry and Governance | Azure Purview, ANSI SQL such as Azure SQL Server | -| Compute Engine | Azure Synapse Spark Pools, Databricks | -| Machine Learning Platform | Azure Machine Learning, Jupyter Notebook, Databricks Notebook | -| File Format | Parquet, ORC, Avro, JSON, Delta Lake, CSV | -| Credentials | Azure Key Vault | +| Feathr component | Cloud Integrations | +| ------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | +| Offline store – Object Store | Azure Blob Storage, Azure ADLS Gen2, AWS S3 | +| Offline store – SQL | Azure SQL DB, Azure Synapse Dedicated SQL Pools, Azure SQL in VM, Snowflake | +| Streaming Source | Kafka, EventHub | +| Online store | Redis, [Azure Cosmos DB](https://linkedin.github.io/feathr/how-to-guides/jdbc-cosmos-notes.html#using-cosmosdb-as-the-online-store), Aerospike (coming soon) | +| Feature Registry and Governance | Azure Purview, ANSI SQL such as Azure SQL Server | +| Compute Engine | Azure Synapse Spark Pools, Databricks | +| Machine Learning Platform | Azure Machine Learning, Jupyter Notebook, Databricks Notebook | +| File Format | Parquet, ORC, Avro, JSON, Delta Lake, CSV | +| Credentials | Azure Key Vault | ## 🚀 Roadmap - [x] Support streaming features with transformation -- [x] Support common data sources and sinks +- [x] Support common data sources and sinks. Read more in the [Cloud Integrations and Architecture](#️-cloud-integrations-and-architecture) part. - [x] Support feature store UI, including Lineage and Search functionalities - [ ] Support a sandbox Feathr environment for better getting started experience - [ ] Support online transformation -- [ ] More online client libraries such as Java +- [ ] More Feathr online client libraries such as Java - [ ] Support feature versioning - [ ] Support feature monitoring - [ ] Support feature data deletion and retention diff --git a/docs/how-to-guides/azure-deployment-arm.md b/docs/how-to-guides/azure-deployment-arm.md index 42cea7ff1..8c11d579f 100644 --- a/docs/how-to-guides/azure-deployment-arm.md +++ b/docs/how-to-guides/azure-deployment-arm.md @@ -156,8 +156,8 @@ Follow the quick start guide [here](https://linkedin.github.io/feathr/quickstart Although we recommend end users deploy the resources using the ARM template, we understand that in many situations where users want to reuse existing resources instead of creating new resources; or users have many other permission issues. -The "source of truth" deployment template is still the [JSON file describing the Azure resources (known as "ARM template")](azure_resource_provision.json). Essentially what we are doing below is just replicate the ARM template manually. +The "source of truth" deployment template is still the [JSON file describing the Azure resources (known as "ARM template")](azure_resource_provision.json). Essentially what we are doing below is just replicate the ARM template manually. Also note that this requires some basic understanding -The part that needs most attention is the Feathr registry deployment. Let's say the developer want to reuse an existing SQL database, there are a few steps: +The part that needs most attention is the Feathr registry deployment which is a bit complex, so we will focus on that part. -1. \ No newline at end of file +There are two components from deployment point of view: An Azure WebApp hosting a container containing the Feathr UI and the REST API, and the actual backend storing the metadata of those features. \ No newline at end of file diff --git a/docs/how-to-guides/model-inference-with-feathr.md b/docs/how-to-guides/model-inference-with-feathr.md index de86a07ef..74dd7c68b 100644 --- a/docs/how-to-guides/model-inference-with-feathr.md +++ b/docs/how-to-guides/model-inference-with-feathr.md @@ -6,7 +6,7 @@ parent: How-to Guides # Online Model Inference with Features from Feathr -After you have materialized features in online store such as Redis, usually end users want to consume those features in production environment for model inference. +After you have materialized features in online store such as Redis or Azure Cosmos DB, usually end users want to consume those features in production environment for model inference. With Feathr's online client, it is quite straightforward to do that. The sample code is as below, where users only need to configure the online store endpoint (if using Redis), and call `client.get_online_features()` to get the features for a particular key. From e45bac992bdc9c1d11ad36b1d5754ddddd45abb0 Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Sat, 3 Sep 2022 07:50:57 -0700 Subject: [PATCH 09/15] update docs --- ...d-and-push-feathr-registry-docker-image.md | 6 ++++- docs/how-to-guides/azure-deployment-arm.md | 25 +++++++++++++------ 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/docs/dev_guide/build-and-push-feathr-registry-docker-image.md b/docs/dev_guide/build-and-push-feathr-registry-docker-image.md index a52612edf..88df159d5 100644 --- a/docs/dev_guide/build-and-push-feathr-registry-docker-image.md +++ b/docs/dev_guide/build-and-push-feathr-registry-docker-image.md @@ -72,4 +72,8 @@ In case if the Feathr release manager wants to do it manually, login with feathr ```bash docker login docker push feathrfeaturestore/feathr-registry -``` \ No newline at end of file +``` + +## Published Feathr Registry Image + +The published feathr feature registry is located in [DockerHub here](https://hub.docker.com/r/feathrfeaturestore/feathr-registry). \ No newline at end of file diff --git a/docs/how-to-guides/azure-deployment-arm.md b/docs/how-to-guides/azure-deployment-arm.md index 8c11d579f..8ee17a955 100644 --- a/docs/how-to-guides/azure-deployment-arm.md +++ b/docs/how-to-guides/azure-deployment-arm.md @@ -17,7 +17,7 @@ The provided Azure Resource Manager (ARM) template deploys the following resourc 7. Azure Event Hub 8. Azure Redis -Please note, you need to have **owner access** in the resource group you are deploying this in. Owner access is required to assign role to managed identity within ARM template so it can access key vault and store secrets. If you don't have such permission, you might want to contact your IT admin to see if they can do that. +Please note, you need to have **owner access** in the resource group you are deploying this in. Owner access is required to assign role to managed identity within ARM template so it can access key vault and store secrets. If you don't have such permission, you might want to contact your IT admin to see if they can do that. Although we recommend end users deploy the resources using the ARM template, we understand that in many situations where users want to reuse existing resources instead of creating new resources; or users have many other permission issues. See [Manually connecting existing resources](#manually-connecting-existing-resources) for more details. @@ -113,7 +113,6 @@ https://{resource_prefix}webapp.azurewebsites.net ![feathr ui landing page](../images/feathr-ui-landingpage.png) - ### 5. Initialize RBAC access table (Optional) If you want to use RBAC access for your deployment, you also need to manually initialize the user access table. Replace `[your-email-account]` with the email account that you are currently using, and this email will be the global admin for Feathr feature registry. @@ -150,14 +149,26 @@ Follow the quick start guide [here](https://linkedin.github.io/feathr/quickstart - [RBAC DB Schema](https://github.com/linkedin/feathr/blob/main/registry/access_control/scripts/schema.sql) - ## Manually connecting existing resources +Although we recommend end users deploy the resources using the ARM template, we understand that in many situations where users want to reuse existing resources instead of creating new resources; or users have many other permission issues. -Although we recommend end users deploy the resources using the ARM template, we understand that in many situations where users want to reuse existing resources instead of creating new resources; or users have many other permission issues. - -The "source of truth" deployment template is still the [JSON file describing the Azure resources (known as "ARM template")](azure_resource_provision.json). Essentially what we are doing below is just replicate the ARM template manually. Also note that this requires some basic understanding +The "source of truth" deployment template is still the [JSON file describing the Azure resources (known as "ARM template")](azure_resource_provision.json). Essentially what we are doing below is just replicate the ARM template manually. Also note that this requires some basic understanding on how Azure works and some components within Azure. The part that needs most attention is the Feathr registry deployment which is a bit complex, so we will focus on that part. -There are two components from deployment point of view: An Azure WebApp hosting a container containing the Feathr UI and the REST API, and the actual backend storing the metadata of those features. \ No newline at end of file +There are two components from deployment point of view: An Azure WebApp hosting a container containing the Feathr UI and the REST API, and the actual backend storing the metadata of those features. + +### Deploy the WebApp and have the right settings + +The WebApp will be using the docker container [here](https://hub.docker.com/r/feathrfeaturestore/feathr-registry). Choose the one you want to deploy. After that, you will need to configure a few environment variables for the WebApp so that the containers knows some of the information. They include: + +DOCKER_REGISTRY_SERVER_URL, REACT_APP_AZURE_CLIENT_ID, REACT_APP_AZURE_TENANT_ID, API_BASE, CONNECTION_STR (for SQL databases), REACT_APP_ENABLE_RBAC, PURVIEW_NAME, AZURE_CLIENT_ID. + +The explanation for those environment variables can be found in the [ARM Template](azure_resource_provision.json). + +### Make sure SQL Databases are initialized correctly + +The second big chunk of work is to initialize the schema of SQL databases. There are two parts: + +1. \ No newline at end of file From e9a5c5f2bcbb981582019b7d24dc00b7bcce4308 Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Tue, 13 Sep 2022 07:03:39 -0700 Subject: [PATCH 10/15] Update azure-deployment-arm.md --- docs/how-to-guides/azure-deployment-arm.md | 24 ---------------------- 1 file changed, 24 deletions(-) diff --git a/docs/how-to-guides/azure-deployment-arm.md b/docs/how-to-guides/azure-deployment-arm.md index 8ee17a955..9b9563067 100644 --- a/docs/how-to-guides/azure-deployment-arm.md +++ b/docs/how-to-guides/azure-deployment-arm.md @@ -148,27 +148,3 @@ Follow the quick start guide [here](https://linkedin.github.io/feathr/quickstart - [SQL Registry DB Schema](https://github.com/linkedin/feathr/blob/main/registry/sql-registry/scripts/schema.sql) - [RBAC DB Schema](https://github.com/linkedin/feathr/blob/main/registry/access_control/scripts/schema.sql) - -## Manually connecting existing resources - -Although we recommend end users deploy the resources using the ARM template, we understand that in many situations where users want to reuse existing resources instead of creating new resources; or users have many other permission issues. - -The "source of truth" deployment template is still the [JSON file describing the Azure resources (known as "ARM template")](azure_resource_provision.json). Essentially what we are doing below is just replicate the ARM template manually. Also note that this requires some basic understanding on how Azure works and some components within Azure. - -The part that needs most attention is the Feathr registry deployment which is a bit complex, so we will focus on that part. - -There are two components from deployment point of view: An Azure WebApp hosting a container containing the Feathr UI and the REST API, and the actual backend storing the metadata of those features. - -### Deploy the WebApp and have the right settings - -The WebApp will be using the docker container [here](https://hub.docker.com/r/feathrfeaturestore/feathr-registry). Choose the one you want to deploy. After that, you will need to configure a few environment variables for the WebApp so that the containers knows some of the information. They include: - -DOCKER_REGISTRY_SERVER_URL, REACT_APP_AZURE_CLIENT_ID, REACT_APP_AZURE_TENANT_ID, API_BASE, CONNECTION_STR (for SQL databases), REACT_APP_ENABLE_RBAC, PURVIEW_NAME, AZURE_CLIENT_ID. - -The explanation for those environment variables can be found in the [ARM Template](azure_resource_provision.json). - -### Make sure SQL Databases are initialized correctly - -The second big chunk of work is to initialize the schema of SQL databases. There are two parts: - -1. \ No newline at end of file From 873687f7f9a714000f2e630415e7cb9e7536ab50 Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Tue, 13 Sep 2022 07:28:36 -0700 Subject: [PATCH 11/15] Update model-inference-with-feathr.md --- docs/how-to-guides/model-inference-with-feathr.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/how-to-guides/model-inference-with-feathr.md b/docs/how-to-guides/model-inference-with-feathr.md index 74dd7c68b..f9443cbd4 100644 --- a/docs/how-to-guides/model-inference-with-feathr.md +++ b/docs/how-to-guides/model-inference-with-feathr.md @@ -8,7 +8,7 @@ parent: How-to Guides After you have materialized features in online store such as Redis or Azure Cosmos DB, usually end users want to consume those features in production environment for model inference. -With Feathr's online client, it is quite straightforward to do that. The sample code is as below, where users only need to configure the online store endpoint (if using Redis), and call `client.get_online_features()` to get the features for a particular key. +With Feathr's [online client](https://feathr.readthedocs.io/en/latest/#feathr.FeathrClient.get_online_features), it is quite straightforward to do that. The sample code is as below, where users only need to configure the online store endpoint (if using Redis), and call `client.get_online_features()` to get the features for a particular key. ```python @@ -52,7 +52,6 @@ feature = client.get_online_features(feature_table="nycTaxiCITable", feature_names=['f_is_long_trip_distance', 'f_day_of_week']) # `res` will be an array representing the features of that particular key. - # `model` will be a ML model that is loaded previously. result = model.predict(feature) ``` From 0cbee7fcc430691b6d3cf924d25ba1fd01744c74 Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Tue, 13 Sep 2022 07:30:09 -0700 Subject: [PATCH 12/15] add sign off message Signed-off-by: Xiaoyong Zhu xiaoyzhu@outlook.com --- docs/how-to-guides/model-inference-with-feathr.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/how-to-guides/model-inference-with-feathr.md b/docs/how-to-guides/model-inference-with-feathr.md index f9443cbd4..b9277555a 100644 --- a/docs/how-to-guides/model-inference-with-feathr.md +++ b/docs/how-to-guides/model-inference-with-feathr.md @@ -51,7 +51,6 @@ feature = client.get_online_features(feature_table="nycTaxiCITable", key='2020-04-15', feature_names=['f_is_long_trip_distance', 'f_day_of_week']) # `res` will be an array representing the features of that particular key. - # `model` will be a ML model that is loaded previously. result = model.predict(feature) ``` From 8f9bdbba72b85323b14342b740ce4f160b7a6ed5 Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Tue, 13 Sep 2022 08:55:41 -0700 Subject: [PATCH 13/15] fix comments --- docs/README.md | 2 +- docs/dev_guide/build-and-push-feathr-registry-docker-image.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/README.md b/docs/README.md index e68538251..ae5748cfb 100644 --- a/docs/README.md +++ b/docs/README.md @@ -160,7 +160,7 @@ Follow the [quick start Jupyter Notebook](./samples/product_recommendation_demo. | Offline store – Object Store | Azure Blob Storage, Azure ADLS Gen2, AWS S3 | | Offline store – SQL | Azure SQL DB, Azure Synapse Dedicated SQL Pools, Azure SQL in VM, Snowflake | | Streaming Source | Kafka, EventHub | -| Online store | Redis, [Azure Cosmos DB](https://linkedin.github.io/feathr/how-to-guides/jdbc-cosmos-notes.html#using-cosmosdb-as-the-online-store), Aerospike (coming soon) | +| Online store | Redis, [Azure Cosmos DB](https://feathr-ai.github.io/feathr/how-to-guides/jdbc-cosmos-notes.html#using-cosmosdb-as-the-online-store), Aerospike (coming soon) | | Feature Registry and Governance | Azure Purview, ANSI SQL such as Azure SQL Server | | Compute Engine | Azure Synapse Spark Pools, Databricks | | Machine Learning Platform | Azure Machine Learning, Jupyter Notebook, Databricks Notebook | diff --git a/docs/dev_guide/build-and-push-feathr-registry-docker-image.md b/docs/dev_guide/build-and-push-feathr-registry-docker-image.md index 88df159d5..873c6a141 100644 --- a/docs/dev_guide/build-and-push-feathr-registry-docker-image.md +++ b/docs/dev_guide/build-and-push-feathr-registry-docker-image.md @@ -65,7 +65,7 @@ After docker image launched, open web browser and navigate to Date: Tue, 13 Sep 2022 08:57:38 -0700 Subject: [PATCH 14/15] Delete deploy-feathr-api-as-webapp.md --- docs/dev_guide/deploy-feathr-api-as-webapp.md | 124 ------------------ 1 file changed, 124 deletions(-) delete mode 100644 docs/dev_guide/deploy-feathr-api-as-webapp.md diff --git a/docs/dev_guide/deploy-feathr-api-as-webapp.md b/docs/dev_guide/deploy-feathr-api-as-webapp.md deleted file mode 100644 index 8345597bf..000000000 --- a/docs/dev_guide/deploy-feathr-api-as-webapp.md +++ /dev/null @@ -1,124 +0,0 @@ ---- -layout: default -title: Feathr REST API Deployment -parent: Developer Guides ---- - -# Feathr REST API - -> :warning: This document is out of date and will be updated in the future. - -The REST API currently supports following functionalities: - -1. Get Feature by Qualified Name -2. Get Feature by GUID -3. Get List of Features -4. Get Lineage for a Feature - -## Build and run locally - -### Install - -**NOTE:** You can run the following command in your local python environment or in your Azure Virtual machine. -You can install dependencies through the requirements file - -```bash -pip install -r requirements.txt -``` - -### Run - -This command will start the uvicorn server locally and will dynamically load your changes. - -```bash -uvicorn api:app --port 8080 --reload -``` - -## Build and deploy on Azure - -Here are the steps to build the API as a docker container, push it to Azure Container registry and then deploy it as webapp. The instructions below are for Mac/Linux but should work on Windows too. You might have to use sudo command or run docker as administrator on windows if you don't have right privileges. - -1. Install Azure CLI by following instructions [here](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) - -2. Create Azure Container Registry. First create the resource group. - - ```bash - az group create --name --location - ``` - - Then create the container registry - - ```bash - az acr create --resource-group --name --sku Basic - ``` - -3. Login to your Azure container registry (ACR) account. - - ```bash - $ az acr login --name - ``` - -4. Clone the repository and navigate to api folder - - ```bash - $ git clone git@github.com:linkedin/feathr.git - - $ cd feathr_project/feathr/api - - ``` - -5. Build the docker container locally, you need to have docker installed locally and have it running. To set up docker on your machine follow the instructions [here](https://docs.docker.com/get-started/) - **Note: Note: /image_name is not a mandatory format for specifying the name of the image.It’s just a useful convention to avoid tagging your image again when you need to push it to a registry. It can be anything you want in the format below** - - ```bash - $ docker build -t feathr/api . - ``` - -6. Run docker images command and you will see your newly created image - - ```bash - $ docker images - - REPOSITORY TAG IMAGE ID CREATED SIZE - feathr/api latest a647ea749b9b 5 minutes ago 529MB - ``` - -7. Before you can push an image to your registry, you must tag it with the fully qualified name of your ACR login server. The login server name is in the format .azurecr.io (all lowercase), for example, mycontainerregistry007.azurecr.io. Tag the image - ```bash - $ docker tag feathr/api:latest feathracr.azurecr.io/feathr/api:latest - ``` -8. Push the image to the registry - ```bash - $ docker push feathracr.azurecr.io/feathr/api:latest - ``` -9. List the images from your registry to see your recently pushed image - ``` - az acr repository list --name feathracr --output table - ``` - Output: - ``` - Result - ---------- - feathr/api - ``` - -## Deploy image to Azure WebApp for Containers - -1. Go to [Azure portal](https://portal.azure.com) and search for your container registry -2. Select repositories from the left pane and click latest tag. Click on the three dots on right side of the tag and select **Deploy to WebApp** option. If you see the **Deploy to WebApp** option greyed out, you would have to enable Admin User on the registry by Updating it. - - ![Container Image 1](../images/feathr_api_image_latest.png) - - ![Container Image 2](../images/feathr_api_image_latest_options.png) - -3. Provide a name for the deployed webapp, along with the subscription to deploy app into, the resource group and the appservice plan - - ![Container Image](../images/feathr_api_image_latest_deployment.png) - -4. You will get the notification that your app has been successfully deployed, click on **Go to Resource** button. - -5. On the App overview page go to the URL (https://.azurewebsites.net/docs) for deployed app (it's under URL on the app overview page) and you should see the API documentation. - - ![API docs](../images/api-docs.png) - -Congratulations you have successfully deployed the Feathr REST API. From cde63d2651fb3a5f6b72fb326cbe0970911336b3 Mon Sep 17 00:00:00 2001 From: Xiaoyong Zhu Date: Tue, 13 Sep 2022 09:22:56 -0700 Subject: [PATCH 15/15] Update model-inference-with-feathr.md --- docs/how-to-guides/model-inference-with-feathr.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/how-to-guides/model-inference-with-feathr.md b/docs/how-to-guides/model-inference-with-feathr.md index b9277555a..c2b5a8e7c 100644 --- a/docs/how-to-guides/model-inference-with-feathr.md +++ b/docs/how-to-guides/model-inference-with-feathr.md @@ -17,7 +17,7 @@ import os from feathr import FeathrClient # Set Redis endpoint -os.environ['online_store__redis__host'] = "xx.redis.cache.windows.net" +os.environ['online_store__redis__host'] = ".redis.cache.windows.net" os.environ['online_store__redis__port'] = "6380" os.environ['online_store__redis__ssl_enabled'] = "True" os.environ['REDIS_PASSWORD'] = ""