Skip to content

Commit

Permalink
feat: Contrib azure provider with synapse/mssql offline store and Azu…
Browse files Browse the repository at this point in the history
…re registry store (#3072)

* Broken state

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* working state

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix the lint issues

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Semi working state

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fremove print

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix lint

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Run build-sphinx

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Add tutorials

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix?

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix lint

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix lint

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Begin configuring tests

Signed-off-by: Danny Chiao <danny@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Working version

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix lint

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix lint

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix lint

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix azure

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix lint and address issues

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix integration tests

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix lint and address issues

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Revert

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix lint

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix lint

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix pyarrow

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* Fix lint

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>

* add requirements files

Signed-off-by: Danny Chiao <danny@tecton.ai>

* fix name of docs

Signed-off-by: Danny Chiao <danny@tecton.ai>

* fix offline store readme

Signed-off-by: Danny Chiao <danny@tecton.ai>

* fix offline store readme

Signed-off-by: Danny Chiao <danny@tecton.ai>

* fix

Signed-off-by: Danny Chiao <danny@tecton.ai>

* fix

Signed-off-by: Danny Chiao <danny@tecton.ai>

Signed-off-by: Kevin Zhang <kzhang@tecton.ai>
Signed-off-by: Danny Chiao <danny@tecton.ai>
Co-authored-by: Danny Chiao <danny@tecton.ai>
  • Loading branch information
kevjumba and adchia committed Aug 19, 2022
1 parent 4310ed7 commit 9f7e557
Show file tree
Hide file tree
Showing 65 changed files with 3,806 additions and 87 deletions.
35 changes: 29 additions & 6 deletions Makefile
Expand Up @@ -81,7 +81,8 @@ test-python-integration-local:
python -m pytest -n 8 --integration \
-k "not gcs_registry and \
not s3_registry and \
not test_lambda_materialization" \
not test_lambda_materialization and \
not test_snowflake" \
sdk/python/tests \
) || echo "This script uses Docker, and it isn't running - please start the Docker Daemon and try again!";

Expand Down Expand Up @@ -113,7 +114,8 @@ test-python-universal-spark:
not test_push_features_to_offline_store.py and \
not gcs_registry and \
not s3_registry and \
not test_universal_types" \
not test_universal_types and \
not test_snowflake" \
sdk/python/tests

test-python-universal-trino:
Expand All @@ -136,9 +138,27 @@ test-python-universal-trino:
not test_push_features_to_offline_store.py and \
not gcs_registry and \
not s3_registry and \
not test_universal_types" \
not test_universal_types and \
not test_snowflake" \
sdk/python/tests


# Note: to use this, you'll need to have Microsoft ODBC 17 installed.
# See https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/install-microsoft-odbc-driver-sql-server-macos?view=sql-server-ver15#17
test-python-universal-mssql:
PYTHONPATH='.' \
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.offline_stores.contrib.mssql_repo_configuration \
PYTEST_PLUGINS=feast.infra.offline_stores.contrib.mssql_offline_store.tests \
FEAST_USAGE=False IS_TEST=True \
FEAST_LOCAL_ONLINE_CONTAINER=True \
python -m pytest -n 8 --integration \
-k "not gcs_registry and \
not s3_registry and \
not test_lambda_materialization and \
not test_snowflake" \
sdk/python/tests


#To use Athena as an offline store, you need to create an Athena database and an S3 bucket on AWS. https://docs.aws.amazon.com/athena/latest/ug/getting-started.html
#Modify environment variables ATHENA_DATA_SOURCE, ATHENA_DATABASE, ATHENA_S3_BUCKET_NAME if you want to change the data source, database, and bucket name of S3 to use.
#If tests fail with the pytest -n 8 option, change the number to 1.
Expand All @@ -161,7 +181,8 @@ test-python-universal-athena:
not test_historical_features_persisting and \
not test_historical_retrieval_fails_on_validation and \
not gcs_registry and \
not s3_registry" \
not s3_registry and \
not test_snowflake" \
sdk/python/tests

test-python-universal-postgres-offline:
Expand Down Expand Up @@ -203,7 +224,8 @@ test-python-universal-postgres-online:
not test_push_features_to_offline_store and \
not gcs_registry and \
not s3_registry and \
not test_universal_types" \
not test_universal_types and \
not test_snowflake" \
sdk/python/tests

test-python-universal-cassandra:
Expand All @@ -230,7 +252,8 @@ test-python-universal-cassandra-no-cloud-providers:
not test_apply_data_source_integration and \
not test_nullable_online_store and \
not gcs_registry and \
not s3_registry" \
not s3_registry and \
not test_snowflake" \
sdk/python/tests

test-python-universal:
Expand Down
4 changes: 2 additions & 2 deletions README.md
Expand Up @@ -152,7 +152,7 @@ The list below contains the functionality that contributors are planning to deve
* [x] [Redshift source](https://docs.feast.dev/reference/data-sources/redshift)
* [x] [BigQuery source](https://docs.feast.dev/reference/data-sources/bigquery)
* [x] [Parquet file source](https://docs.feast.dev/reference/data-sources/file)
* [x] [Synapse source (community plugin)](https://github.com/Azure/feast-azure)
* [x] [Azure Synapse + Azure SQL source (contrib plugin)](https://docs.feast.dev/reference/data-sources/mssql)
* [x] [Hive (community plugin)](https://github.com/baineng/feast-hive)
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/data-sources/postgres)
* [x] [Spark (contrib plugin)](https://docs.feast.dev/reference/data-sources/spark)
Expand All @@ -161,7 +161,7 @@ The list below contains the functionality that contributors are planning to deve
* [x] [Snowflake](https://docs.feast.dev/reference/offline-stores/snowflake)
* [x] [Redshift](https://docs.feast.dev/reference/offline-stores/redshift)
* [x] [BigQuery](https://docs.feast.dev/reference/offline-stores/bigquery)
* [x] [Synapse (community plugin)](https://github.com/Azure/feast-azure)
* [x] [Azure Synapse + Azure SQL (contrib plugin)](https://docs.feast.dev/reference/offline-stores/mssql.md)
* [x] [Hive (community plugin)](https://github.com/baineng/feast-hive)
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/offline-stores/postgres)
* [x] [Trino (contrib plugin)](https://github.com/Shopify/feast-trino)
Expand Down
4 changes: 4 additions & 0 deletions docs/SUMMARY.md
Expand Up @@ -71,6 +71,7 @@
* [Spark (contrib)](reference/data-sources/spark.md)
* [PostgreSQL (contrib)](reference/data-sources/postgres.md)
* [Trino (contrib)](reference/data-sources/trino.md)
* [Azure Synapse + Azure SQL (contrib)](reference/data-sources/mssql.md)
* [Offline stores](reference/offline-stores/README.md)
* [Overview](reference/offline-stores/overview.md)
* [File](reference/offline-stores/file.md)
Expand All @@ -80,17 +81,20 @@
* [Spark (contrib)](reference/offline-stores/spark.md)
* [PostgreSQL (contrib)](reference/offline-stores/postgres.md)
* [Trino (contrib)](reference/offline-stores/trino.md)
* [Azure Synapse + Azure SQL (contrib)](reference/offline-stores/mssql.md)
* [Online stores](reference/online-stores/README.md)
* [SQLite](reference/online-stores/sqlite.md)
* [Snowflake](reference/online-stores/snowflake.md)
* [Redis](reference/online-stores/redis.md)
* [Datastore](reference/online-stores/datastore.md)
* [DynamoDB](reference/online-stores/dynamodb.md)
* [PostgreSQL (contrib)](reference/online-stores/postgres.md)
* [Cassandra + Astra DB (contrib)](reference/online-stores/cassandra.md)
* [Providers](reference/providers/README.md)
* [Local](reference/providers/local.md)
* [Google Cloud Platform](reference/providers/google-cloud-platform.md)
* [Amazon Web Services](reference/providers/amazon-web-services.md)
* [Azure](reference/providers/azure.md)
* [Feature repository](reference/feature-repository/README.md)
* [feature\_store.yaml](reference/feature-repository/feature-store-yaml.md)
* [.feastignore](reference/feature-repository/feast-ignore.md)
Expand Down
22 changes: 11 additions & 11 deletions docs/getting-started/concepts/registry.md
@@ -1,15 +1,15 @@
# Registry

Feast uses a registry to store all applied Feast objects (e.g. Feature views, entities, etc). The registry exposes
Feast uses a registry to store all applied Feast objects (e.g. Feature views, entities, etc). The registry exposes
methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations.

### Options for registry implementations

#### File-based registry
By default, Feast uses a file-based registry implementation, which stores the protobuf representation of the registry as
a serialized file. This registry file can be stored in a local file system, or in cloud storage (in, say, S3 or GCS).
By default, Feast uses a file-based registry implementation, which stores the protobuf representation of the registry as
a serialized file. This registry file can be stored in a local file system, or in cloud storage (in, say, S3 or GCS, or Azure).

The quickstart guides that use `feast init` will use a registry on a local file system. To allow Feast to configure
The quickstart guides that use `feast init` will use a registry on a local file system. To allow Feast to configure
a remote file registry, you need to create a GCS / S3 bucket that Feast can understand:
{% tabs %}
{% tab title="Example S3 file registry" %}
Expand All @@ -35,9 +35,9 @@ offline_store:
{% endtab %}
{% endtabs %}

However, there are inherent limitations with a file-based registry, since changing a single field in the registry
requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or
bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for
However, there are inherent limitations with a file-based registry, since changing a single field in the registry
requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or
bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for
multiple feature views or time ranges concurrently).

#### SQL Registry
Expand All @@ -47,14 +47,14 @@ This supports any SQLAlchemy compatible database as a backend. The exact schema

### Updating the registry

We recommend users store their Feast feature definitions in a version controlled repository, which then via CI/CD
automatically stays synced with the registry. Users will often also want multiple registries to correspond to
different environments (e.g. dev vs staging vs prod), with staging and production registries with locked down write
We recommend users store their Feast feature definitions in a version controlled repository, which then via CI/CD
automatically stays synced with the registry. Users will often also want multiple registries to correspond to
different environments (e.g. dev vs staging vs prod), with staging and production registries with locked down write
access since they can impact real user traffic. See [Running Feast in Production](../../how-to-guides/running-feast-in-production.md#1.-automatically-deploying-changes-to-your-feature-definitions) for details on how to set this up.

### Accessing the registry from clients

Users can specify the registry through a `feature_store.yaml` config file, or programmatically. We often see teams
Users can specify the registry through a `feature_store.yaml` config file, or programmatically. We often see teams
preferring the programmatic approach because it makes notebook driven development very easy:

#### Option 1: programmatically specifying the registry
Expand Down
3 changes: 2 additions & 1 deletion docs/how-to-guides/adding-or-reusing-tests.md
Expand Up @@ -241,7 +241,8 @@ def test_historical_features(environment, universal_data_sources, full_feature_n
validate_dataframes(
expected_df,
table_from_df_entities,
keys=[event_timestamp, "order_id", "driver_id", "customer_id"],
sort_by=[event_timestamp, "order_id", "driver_id", "customer_id"],
event_timestamp = event_timestamp,
)
# ... more test code
```
Expand Down
8 changes: 6 additions & 2 deletions docs/reference/data-sources/README.md
Expand Up @@ -35,9 +35,13 @@ Please see [Data Source](../../getting-started/concepts/data-ingestion.md) for a
{% endcontent-ref %}

{% content-ref url="postgres.md" %}
[postgres.md]([postgres].md)
[postgres.md](postgres.md)
{% endcontent-ref %}

{% content-ref url="trino.md" %}
[trino.md]([trino].md)
[trino.md](trino.md)
{% endcontent-ref %}

{% content-ref url="mssql.md" %}
[mssql.md](mssql.md)
{% endcontent-ref %}
29 changes: 29 additions & 0 deletions docs/reference/data-sources/mssql.md
@@ -0,0 +1,29 @@
# MsSQL source (contrib)

## Description

MsSQL data sources are Microsoft sql table sources.
These can be specified either by a table reference or a SQL query.

## Disclaimer

The MsSQL data source does not achieve full test coverage.
Please do not assume complete stability.

## Examples

Defining a MsSQL source:

```python
from feast.infra.offline_stores.contrib.mssql_offline_store.mssqlserver_source import (
MsSqlServerSource,
)

driver_hourly_table = "driver_hourly"

driver_source = MsSqlServerSource(
table_ref=driver_hourly_table,
event_timestamp_column="datetime",
created_timestamp_column="created",
)
```
4 changes: 4 additions & 0 deletions docs/reference/offline-stores/README.md
Expand Up @@ -35,3 +35,7 @@ Please see [Offline Store](../../getting-started/architecture-and-components/off
{% content-ref url="trino.md" %}
[trino.md](trino.md)
{% endcontent-ref %}

{% content-ref url="mssql.md" %}
[mssql.md](mssql.md)
{% endcontent-ref %}
59 changes: 59 additions & 0 deletions docs/reference/offline-stores/mssql.md
@@ -0,0 +1,59 @@
# MsSQL/Synapse offline store (contrib)

## Description

The MsSQL offline store provides support for reading [MsSQL Sources](../data-sources/mssql.md). Specifically, it is developed to read from [Synapse SQL](https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/overview-features) on Microsoft Azure

* Entity dataframes can be provided as a SQL query or can be provided as a Pandas dataframe.

## Disclaimer

The MsSQL offline store does not achieve full test coverage.
Please do not assume complete stability.

## Example

{% code title="feature_store.yaml" %}
```yaml
registry:
registry_store_type: AzureRegistryStore
path: ${REGISTRY_PATH} # Environment Variable
project: production
provider: azure
online_store:
type: redis
connection_string: ${REDIS_CONN} # Environment Variable
offline_store:
type: mssql
connection_string: ${SQL_CONN} # Environment Variable
```
{% endcode %}

## Functionality Matrix

The set of functionality supported by offline stores is described in detail [here](overview.md#functionality).
Below is a matrix indicating which functionality is supported by the Spark offline store.

| | MsSql |
| :-------------------------------- | :-- |
| `get_historical_features` (point-in-time correct join) | yes |
| `pull_latest_from_table_or_query` (retrieve latest feature values) | yes |
| `pull_all_from_table_or_query` (retrieve a saved dataset) | yes |
| `offline_write_batch` (persist dataframes to offline store) | no |
| `write_logged_features` (persist logged features to offline store) | no |

Below is a matrix indicating which functionality is supported by `MsSqlServerRetrievalJob`.

| | MsSql |
| --------------------------------- | --- |
| export to dataframe | yes |
| export to arrow table | yes |
| export to arrow batches | no |
| export to SQL | no |
| export to data lake (S3, GCS, etc.) | no |
| export to data warehouse | no |
| local execution of Python-based on-demand transforms | no |
| remote execution of Python-based on-demand transforms | no |
| persist results in the offline store | yes |

To compare this set of functionality against other offline stores, please see the full [functionality matrix](overview.md#functionality-matrix).
1 change: 1 addition & 0 deletions docs/reference/online-stores/README.md
Expand Up @@ -29,3 +29,4 @@ Please see [Online Store](../../getting-started/architecture-and-components/onli
{% content-ref url="cassandra.md" %}
[cassandra.md](cassandra.md)
{% endcontent-ref %}

2 changes: 1 addition & 1 deletion docs/reference/online-stores/cassandra.md
@@ -1,4 +1,4 @@
# Cassandra / Astra DB online store
# Cassandra + Astra DB online store (contrib)

## Description

Expand Down
2 changes: 2 additions & 0 deletions docs/reference/providers/README.md
Expand Up @@ -7,3 +7,5 @@ Please see [Provider](../../getting-started/architecture-and-components/provider
{% page-ref page="google-cloud-platform.md" %}

{% page-ref page="amazon-web-services.md" %}

{% page-ref page="azure.md" %}
26 changes: 26 additions & 0 deletions docs/reference/providers/azure.md
@@ -0,0 +1,26 @@
# Azure (contrib)

## Description

* Offline Store: Uses the **MsSql** offline store by default. Also supports File as the offline store.
* Online Store: Uses the **Redis** online store by default. Also supports Sqlite as an online store.

## Disclaimer

The Azure provider does not achieve full test coverage.
Please do not assume complete stability.

## Example

{% code title="feature_store.yaml" %}
```yaml
registry:
registry_store_type: AzureRegistryStore
path: ${REGISTRY_PATH} # Environment Variable
project: production
provider: azure
online_store:
type: redis
connection_string: ${REDIS_CONN} # Environment Variable
```
{% endcode %}
4 changes: 2 additions & 2 deletions docs/roadmap.md
Expand Up @@ -10,7 +10,7 @@ The list below contains the functionality that contributors are planning to deve
* [x] [Redshift source](https://docs.feast.dev/reference/data-sources/redshift)
* [x] [BigQuery source](https://docs.feast.dev/reference/data-sources/bigquery)
* [x] [Parquet file source](https://docs.feast.dev/reference/data-sources/file)
* [x] [Synapse source (community plugin)](https://github.com/Azure/feast-azure)
* [x] [Azure Synapse + Azure SQL source (contrib plugin)](https://docs.feast.dev/reference/data-sources/mssql)
* [x] [Hive (community plugin)](https://github.com/baineng/feast-hive)
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/data-sources/postgres)
* [x] [Spark (contrib plugin)](https://docs.feast.dev/reference/data-sources/spark)
Expand All @@ -19,7 +19,7 @@ The list below contains the functionality that contributors are planning to deve
* [x] [Snowflake](https://docs.feast.dev/reference/offline-stores/snowflake)
* [x] [Redshift](https://docs.feast.dev/reference/offline-stores/redshift)
* [x] [BigQuery](https://docs.feast.dev/reference/offline-stores/bigquery)
* [x] [Synapse (community plugin)](https://github.com/Azure/feast-azure)
* [x] [Azure Synapse + Azure SQL (contrib plugin)](https://docs.feast.dev/reference/offline-stores/mssql.md)
* [x] [Hive (community plugin)](https://github.com/baineng/feast-hive)
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/offline-stores/postgres)
* [x] [Trino (contrib plugin)](https://github.com/Shopify/feast-trino)
Expand Down

0 comments on commit 9f7e557

Please sign in to comment.