Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for Feathr UI, registry, and architecture. #534

Merged
merged 23 commits into from
Aug 10, 2022
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 19 additions & 13 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Feathr automatically computes your feature values and joins them to your trainin

Feathr has native integrations with Databricks and Azure Synapse:

Follow the [Feathr ARM deployment guide ](https://linkedin.github.io/feathr/how-to-guides/azure-deployment-arm.html) to run Feathr on Azure. This allows you to quickly get started with automated deployment using Azure Resource Manager template.
Follow the [Feathr ARM deployment guide](https://linkedin.github.io/feathr/how-to-guides/azure-deployment-arm.html) to run Feathr on Azure. This allows you to quickly get started with automated deployment using Azure Resource Manager template.

If you want to set up everything manually, you can checkout the [Feathr CLI deployment guide](https://linkedin.github.io/feathr/how-to-guides/azure-deployment-cli.html) to run Feathr on Azure. This allows you to understand what is going on and set up one resource at a time.

Expand All @@ -60,9 +60,22 @@ Or use the latest code from GitHub:
pip install git+https://github.com/linkedin/feathr.git#subdirectory=feathr_project
```

## 🔡 Feathr Examples
## 🔡 Feathr Highlighted Capabilities

Please read [Feathr Capabilities](https://linkedin.github.io/feathr/concepts/feathr-capabilities.html) for more examples. Below are a few selected ones:
Please read [Feathr Full Capabilities](./concepts/feathr-capabilities.md) for more examples. Below are a few selected ones:

### Feathr UI

Feathr provides an intuitive UI so you can search and explore all the available features and their corresponding lineages.

You can use Feathr UI to search features, identify data sources, track feature lineages and manage access controls. Check out the latest live demo [here](https://aka.ms/feathrdemo) to see what Feathr UI can do for you. Use one of following accounts when you are prompted to login:

- A work or school organization account, includes Office 365 subscribers.
- Microsoft personal account, this means an account can access to Skype, Outlook.com, OneDrive, and Xbox LIVE.

![Feathr UI](./images/feathr-ui.png)

For more information on the Feathr UI and the registry behind it, please refer to [Feathr Feature Registry](./concepts/feature-registry.md)

### Rich UDF Support

Expand All @@ -81,7 +94,7 @@ batch_source = HdfsSource(name="nycTaxiBatchSource",
timestamp_format="yyyy-MM-dd HH:mm:ss")
```

### Defining Window Aggregation Features
### Defining Window Aggregation Features with Point-in-time correctness

```python
agg_features = [Feature(name="f_location_avg_fare",
Expand Down Expand Up @@ -131,13 +144,6 @@ Read [Point-in-time Correctness and Point-in-time Join in Feathr](https://linked

Follow the [quick start Jupyter Notebook](./samples/product_recommendation_demo.ipynb) to try it out. There is also a companion [quick start guide](https://linkedin.github.io/feathr/quickstart_synapse.html) containing a bit more explanation on the notebook.

### Feathr UI

You can use Feathr UI to search features, identify data sources, track feature lineages and manage access controls. Check out the latest live demo [here](https://aka.ms/feathrdemo) to see what Feathr UI can do for you. Use one of following accounts when you are prompted to login:

- A work or school organization account, includes Office 365 subscribers.
- Microsoft personal account, this means an account can access to Skype, Outlook.com, OneDrive, and Xbox LIVE.

## 🗣️ Tech Talks on Feathr

- [Introduction to Feathr - Beginner's guide](https://www.youtube.com/watch?v=gZg01UKQMTY)
Expand All @@ -155,7 +161,7 @@ You can use Feathr UI to search features, identify data sources, track feature l
| Offline store – SQL | Azure SQL DB, Azure Synapse Dedicated SQL Pools, Azure SQL in VM, Snowflake |
| Streaming Source | Kafka, EventHub |
| Online store | Azure Cache for Redis |
| Feature Registry and Governance | Azure Purview |
| Feature Registry and Governance | Azure Purview, ANSI SQL such as Azure SQL Server |
| Compute Engine | Azure Synapse Spark Pools, Databricks |
| Machine Learning Platform | Azure Machine Learning, Jupyter Notebook, Databricks Notebook |
| File Format | Parquet, ORC, Avro, JSON, Delta Lake |
Expand All @@ -167,10 +173,10 @@ For a complete roadmap with estimated dates, please [visit this page](https://gi

- [x] Support streaming
- [x] Support common data sources
- [x] Support feature store UI, including Lineage and Search functionalities
- [ ] Support online transformation
- [ ] Support feature versioning
- [ ] Support feature monitoring
- [ ] Support feature store UI, including Lineage and Search functionalities
- [ ] Support feature data deletion and retention

## 👨‍👨‍👦‍👦 Community Guidelines
Expand Down
10 changes: 5 additions & 5 deletions docs/concepts/feature-definition.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,11 @@ request_anchor = FeatureAnchor(name="request_features",
features=features)
```

For the features field above, there are two different types, simple anchored features and window aggregation features.
For the features field above, there are two different types, anchor features without aggregations, and anchor features with window aggregation features.

### Simple anchored features
### Anchor features without aggregations

1. For simple anchored features, see the example below:
For simple anchored features, see the example below:

```python
f_trip_time_duration = Feature(name="f_trip_time_duration",
Expand All @@ -74,9 +74,9 @@ f_trip_time_duration = Feature(name="f_trip_time_duration",

Note that for `transform` section, you can put a simple expression to transform your features. For more information, please refer to [Feathr User Defined Functions (UDFs)](../how-to-guides/feathr-udfs.md).

### Window aggregation features
### Anchor features with aggregations

2. For window aggregation features, see the supported fields below:
For window aggregation features, see the supported fields below:

```python

Expand Down
117 changes: 117 additions & 0 deletions docs/concepts/feature-registry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
layout: default
title: Feature Registry
parent: Feathr Concepts
---

# Feature Registry and Feathr UI

Feature registry is an important component of a feature store. This documentation will cover the supported backend of a feature registry and the usages.

## Introduction
xiaoyongzhu marked this conversation as resolved.
Show resolved Hide resolved

Feathr UI and Feathr Registry are two optional components to use Feathr, but

## Deployment

Please follow the `Provision Azure Resources using ARM Template` part in the [Azure Resource Provisioning document](../how-to-guides/azure-deployment-arm.md#provision-azure-resources-using-arm-template) to provision corresponding the Azure resources. After completing those steps, you should have a set of resources that can be used for Feature Registry.

In case you want to do it in a more customized way, you can use [this Dockerfile](https://github.com/linkedin/feathr/blob/main/FeathrRegistry.Dockerfile) to deploy the REST API and UI. This docker image is more for illustration purpose, and you can customize it further (like building the REST API and UI in separate docker images).

If you use [Azure Resource Provisioning document](../how-to-guides/azure-deployment-arm.md#provision-azure-resources-using-arm-template) to provision the resources, you should be able to access the UI in this website:

```bash
https://{prefix}webapp.azurewebsites.net
```

And the corresponding REST API will be:

```bash
https://{prefix}webapp.azurewebsites.net/api/v1
```

## Deployment Options

Feathr supports two types of backends for Feature Registry - Azure Purview (Apache Atlas compatible service) and ANSI SQL. Depending on your IT setup, you might choose either of those in the above deployment steps.

Note that if you choose to enable Role-based Access Control (RBAC), you still need to use a SQL service (such as Azure SQL) to store all the RBAC related information.

## Architecture

![Architecture Diagram](../images/architecture.png)
xiaoyongzhu marked this conversation as resolved.
Show resolved Hide resolved

The architecture is as above. More specifically, there are three components for Feathr feature registry:

- Feathr UI, a react based application
xiaoyongzhu marked this conversation as resolved.
Show resolved Hide resolved
- Feathr REST API, which provides abstraction for different registry providers, as well as role-based access control (RBAC)
- Different feature registry backends. Currently only Azure Purview and SQL based registry are supported, but more registry providers from the community are welcome.

Both the Feathr UI and the Feathr Python Client interact with the Feathr REST API service. The REST API service then detect if the user has the right access, and route the corresponding request to the registry providers.

## Accessing Registry in Feathr Python Client

In the Feathr python client, if you want to access the registry, you should set the `FEATURE_REGISTRY__API_ENDPOINT` environment variable. The full [document is here](../how-to-guides/feathr-configuration-and-env.md#a-list-of-environment-variables-that-feathr-uses).

Alternatively, you can set the feature registry and the API endpoint in the configuration YAML file:

```yaml
feature_registry:
# The API endpoint of the registry service
api_endpoint: "https://feathr-sql-registry.azurewebsites.net/api/v1"
```

### Register and Listing Features
xiaoyongzhu marked this conversation as resolved.
Show resolved Hide resolved

You can register your features in the centralized registry and share it with other team members who want to consume those features and for further use. You can also use `list_registered_features` to verify if they have been registered successfully.
xiaoyongzhu marked this conversation as resolved.
Show resolved Hide resolved

```python
client.build_features(anchor_list=[agg_anchor, request_anchor], derived_feature_list=derived_feature_list)
client.register_features()
all_features = client.list_registered_features(project_name=client.project_name)
```

### Reuse Features from Existing Registry

For feature consumers, they can reuse existing features from the registry. The whole project can be retrieved to local environment by calling this API `client.get_features_from_registry` with a project name. This encourage feature reuse across organizations. For example, end users of a feature just need to read all feature definitions from the existing projects, then use a few features from the projects and join those features with a new dataset you have.

For example, in the [product recommendation demo notebook](./../samples/product_recommendation_demo.ipynb), some other team members have already defined a few features, such as `feature_user_gift_card_balance` and `feature_user_has_valid_credit_card`. If we want to reuse those features for anti-abuse purpose in a new dataset, what you can do is like this, i.e. just call `get_features_from_registry` to get the features, then put the features you want to query to the anti-abuse dataset you have.

```python
registered_features_dict = client.get_features_from_registry(client.project_name)

# Features that we want to request
feature_query = FeatureQuery(feature_list=[ "feature_user_gift_card_balance",
"feature_user_has_valid_credit_card", ],
key=user_id)
settings = ObservationSettings(
observation_path="some_anti_abuse_dataset_path",
event_timestamp_column="event_timestamp",
timestamp_format="yyyy-MM-dd")
feathr_client.get_offline_features(observation_settings=settings,
feature_query=feature_query,
output_path=output_path)
```

## Accessing Feathr UI

Feathr UI should be straightforward to use. Currently there are a few pages, including:

### Feature Summary page

You can see a list of all the features in a project:
![Feature Summary](../images/feature-summary.png)

### Feature Detailed Page

You can view the detailed information for a feature:
![Feature Detailed Page](../images/feature-details.png)

### Feature Lineage Page

You can view all the lineage info for features in a project.
![Feature Lineage Page](../images/feathr-ui.png)

### Access Control Management Page

If RBAC is enabled, admins can manage access control for users.
![Access Control Management Page](../images/access-control-management.png)
Loading