# Vertex AI - Feature Store

**Prerequisites**
- `00 - Initial Setup`
- `01 - BigQuery - Data`
    
**Resources**
- Based on:
    - https://cloud.google.com/vertex-ai/docs/featurestore/managing-featurestores
- API Documentation:
    - https://googleapis.dev/python/aiplatform/latest/aiplatform_v1beta1/services.html
    - https://googleapis.dev/python/aiplatform/latest/aiplatform_v1beta1/featurestore_service.html

**Overview**

<img src="architectures/statmike-mlops-07.png">


---
## Setup

Update Python library for aiplatform (Vertex AI)
- Restart the kernel: Menus > Kernel > Restart Kernel

In [2]:
!pip install --upgrade git+https://github.com/googleapis/python-aiplatform.git@main-test -q

Import Libraries

In [264]:
from google.cloud.aiplatform_v1beta1 import (FeaturestoreOnlineServingServiceClient, FeaturestoreServiceClient, types)
from google.protobuf.duration_pb2 import Duration
from google.protobuf.timestamp_pb2 import Timestamp
from google.protobuf.field_mask_pb2 import FieldMask

Setup GCP Parameters

In [265]:
# Locations
REGION = 'us-central1'
PROJECT_ID='statmike-mlops'
PARENT = "projects/" + PROJECT_ID + "/locations/" + REGION

Clients for Feature Store:

In [266]:
API_ENDPOINT = "{}-aiplatform.googleapis.com".format(REGION)
client_options = {"api_endpoint": API_ENDPOINT}
clients = {}

In [267]:
clients['fs'] = FeaturestoreServiceClient(client_options=client_options)

In [268]:
clients['fs_olserve'] = FeaturestoreOnlineServingServiceClient(client_options=client_options)

In [269]:
BASE_RESOURCE_PATH = clients['fs'].common_location_path(PROJECT_ID, REGION)

---
## Feature Store Data model
Feature Store organizes data with the following 3 important hierarchical concepts:

Featurestore -> EntityType -> Feature

- **Featurestore**: the place to store your features
    - **EntityType**: under a Featurestore, an EntityType describes an object to be modeled, real one or virtual one.
        - **Feature**: under an EntityType, a feature describes an attribute of the EntityType

For the digits data used in these examples, the feature store will be called digits_featurestore.  The store has 1 entity type: images.  The features will be the pixels and the target values.

---
## Create Feature Store

In [284]:
FEATURESTORE_ID = 'digits_featurestore'

In [285]:
featurestore_lro = clients['fs'].create_featurestore(
    types.featurestore_service.CreateFeaturestoreRequest(
        parent = BASE_RESOURCE_PATH,
        featurestore_id = FEATURESTORE_ID,
        featurestore=types.featurestore.Featurestore(
            display_name="Featurestore for handwritten digits",
            online_serving_config=types.featurestore.Featurestore.OnlineServingConfig(
                fixed_node_count=3
            ),
        ),
    )
)

In [286]:
featurestore_lro.result()

name: "projects/691911073727/locations/us-central1/featurestores/digits_featurestore"

Use `get_featurestore` to see details of specified feature store:

In [287]:
clients['fs'].get_featurestore(name=clients['fs'].featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID))

name: "projects/691911073727/locations/us-central1/featurestores/digits_featurestore"
create_time {
  seconds: 1625740410
  nanos: 217675000
}
update_time {
  seconds: 1625740410
  nanos: 323701000
}
etag: "AMEw9yOfaiqWCVy1WUDYkSNaJ2vMNDQfxzdmYQuZH-uN1QUI0mW4ckRc8l3Feyj2ztP5"
online_serving_config {
  fixed_node_count: 3
}
state: STABLE

Use `list_featurestores` to see details of all feature stores:

In [288]:
clients['fs'].list_featurestores(parent=PARENT)

ListFeaturestoresPager<featurestores {
  name: "projects/691911073727/locations/us-central1/featurestores/digits_featurestore"
  create_time {
    seconds: 1625740410
    nanos: 217675000
  }
  update_time {
    seconds: 1625740410
    nanos: 323701000
  }
  etag: "AMEw9yM7U-W2SAV_EzUOOqckop01PKD8ot08yCqUv1AXB9ZjhBitTa-2I4k7o1TLPC1Z"
  online_serving_config {
    fixed_node_count: 3
  }
  state: STABLE
}
>

---
## Create Entity Type

In [289]:
ENTITYTYPE_ID = 'image'

In [290]:
entitytype_lro = clients['fs'].create_entity_type(
    types.featurestore_service.CreateEntityTypeRequest(
        parent=clients['fs'].featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID),
        entity_type_id = ENTITYTYPE_ID,
        entity_type=types.entity_type.EntityType(
            description="image entity for digits",
            monitoring_config=types.featurestore_monitoring.FeaturestoreMonitoringConfig(
                snapshot_analysis=types.featurestore_monitoring.FeaturestoreMonitoringConfig.SnapshotAnalysis(
                    monitoring_interval=Duration(seconds=1800),  # 30 minutes
                ),
            ),
        ),
    )
)

In [291]:
entitytype_lro.result()

name: "projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image"
etag: "AMEw9yPVjeSXnHrtAReP42kEG00yQd4a6BF0BfDz35y8KlT6EqZE"

Use `list_entity_types` to see details of all entity types:

In [292]:
clients['fs'].list_entity_types(parent=PARENT+'/featurestores/{}'.format(FEATURESTORE_ID))

ListEntityTypesPager<entity_types {
  name: "projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image"
  description: "image entity for digits"
  create_time {
    seconds: 1625740434
    nanos: 454728000
  }
  update_time {
    seconds: 1625740434
    nanos: 454728000
  }
  etag: "AMEw9yNwwHsG--my1E-dcO7Jn9CKFhJipRsgSuBgKgIBDBvjPOYxed8KhFwKT7TB70Et"
  monitoring_config {
    snapshot_analysis {
      monitoring_interval {
        seconds: 86400
      }
    }
  }
}
>

---
## Create Features

Get the schema of the data source for new features:

In [293]:
%%bigquery schema
SELECT * 
FROM `statmike-mlops.digits.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS`
WHERE table_name = 'digits_source'

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 437.23query/s]                          
Downloading: 100%|██████████| 66/66 [00:01<00:00, 55.18rows/s]


In [294]:
schema

Unnamed: 0,table_catalog,table_schema,table_name,column_name,field_path,data_type,description
0,statmike-mlops,digits,digits_source,p0,p0,FLOAT64,
1,statmike-mlops,digits,digits_source,p1,p1,FLOAT64,
2,statmike-mlops,digits,digits_source,p2,p2,FLOAT64,
3,statmike-mlops,digits,digits_source,p3,p3,FLOAT64,
4,statmike-mlops,digits,digits_source,p4,p4,FLOAT64,
...,...,...,...,...,...,...,...
61,statmike-mlops,digits,digits_source,p61,p61,FLOAT64,
62,statmike-mlops,digits,digits_source,p62,p62,FLOAT64,
63,statmike-mlops,digits,digits_source,p63,p63,FLOAT64,
64,statmike-mlops,digits,digits_source,target,target,INT64,


Prepare a request for `batch_create_features`:
- specification for the features, data type and descriptions ....

In [295]:
REQUESTS = []
for i in range(schema.shape[0]):
    
    if schema['data_type'][i] == 'STRING': value_type = types.feature.Feature.ValueType.STRING
    elif schema['data_type'][i] == 'INT64': value_type = types.feature.Feature.ValueType.INT64
    elif schema['data_type'][i] == 'FLOAT64': value_type = types.feature.Feature.ValueType.DOUBLE
    
    if schema['description'][i] == None: description = schema['column_name'][i]
    else: description = schema['description'][i]
    
    REQUESTS.append(
        types.featurestore_service.CreateFeatureRequest(
            feature=types.feature.Feature(
                value_type = value_type,
                description = description,
                # optional, monitoring_config here as override, otherwise it inherits from entity_type
            ),
            feature_id = schema['column_name'][i].lower(),
        )    
    )

In [296]:
batchfeatures = clients['fs'].batch_create_features(
    parent = clients['fs'].entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID),
    requests = REQUESTS,
)

In [329]:
list(item.name for item in batchfeatures.result().features)

['projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p0',
 'projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p1',
 'projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p2',
 'projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p3',
 'projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p4',
 'projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p5',
 'projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p6',
 'projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p7',
 'projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/featur

---
## Search Features
Search goes across all Feature Stores and Entity Types.

Also, use the list_features function to list all.

In [330]:
# return the first feature:
list(clients['fs'].search_features(location=BASE_RESOURCE_PATH))[0]

name: "projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p0"
description: "p0"
create_time {
  seconds: 1625740476
  nanos: 662391000
}
update_time {
  seconds: 1625740476
  nanos: 662391000
}

In [331]:
# find all features with INT64 value type
list(clients['fs'].search_features(types.featurestore_service.SearchFeaturesRequest(location=BASE_RESOURCE_PATH, query="value_type=INT64")))

[name: "projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/target"
 description: "target"
 create_time {
   seconds: 1625740476
   nanos: 983494000
 }
 update_time {
   seconds: 1625740476
   nanos: 983494000
 }]

In [332]:
# find all features of the form p6* with DOUBLE value type
list(clients['fs'].search_features(types.featurestore_service.SearchFeaturesRequest(location=BASE_RESOURCE_PATH, query="feature_id:p6* AND value_type=DOUBLE")))

[name: "projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p6"
 description: "p6"
 create_time {
   seconds: 1625740476
   nanos: 692939000
 }
 update_time {
   seconds: 1625740476
   nanos: 692939000
 },
 name: "projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p60"
 description: "p60"
 create_time {
   seconds: 1625740476
   nanos: 976181000
 }
 update_time {
   seconds: 1625740476
   nanos: 976181000
 },
 name: "projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p61"
 description: "p61"
 create_time {
   seconds: 1625740476
   nanos: 977883000
 }
 update_time {
   seconds: 1625740476
   nanos: 977883000
 },
 name: "projects/691911073727/locations/us-central1/featurestores/digits_featurestore/entityTypes/image/features/p62"
 description: "p62"
 create_time {
   seconds: 1625740476
   nanos: 979383000
 }
 update_time {
   se

---
## Import Feature Values
- BigQuery (THIS DEMO)
- Avro
- CSV

Prepare a source table with timestamp (update_time) and unique id's (image_id) for each entity

In [333]:
%%bigquery
CREATE OR REPLACE TABLE `statmike-mlops.digits.digits_featurestore_import` AS
SELECT GENERATE_UUID() image_id, target_OE as target_oe, CURRENT_TIMESTAMP AS update_time, * EXCEPT(target_OE)
FROM `statmike-mlops.digits.digits_source`

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1546.38query/s]                        


Create Feature specification for each feature in the input source:

In [334]:
FEATURE_SPECS = []
for i in range(schema.shape[0]):
    FEATURE_SPECS.append(types.featurestore_service.ImportFeatureValuesRequest.FeatureSpec(id=schema['column_name'][i].lower()))

In [335]:
import_request = types.featurestore_service.ImportFeatureValuesRequest(
    entity_type = clients['fs'].entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID),
    bigquery_source = types.BigQuerySource(input_uri='bq://statmike-mlops.digits.digits_featurestore_import'),
    feature_time_field = "update_time",
    feature_time = Timestamp().GetCurrentTime(),
    entity_id_field = "image_id",
    feature_specs = FEATURE_SPECS,
    worker_count = 4,
)

In [336]:
importjob = clients['fs'].import_feature_values(import_request)

In [338]:
importjob.result()

imported_entity_count: 1797
imported_feature_value_count: 118602

---
## Serving Features

Retrieve a list of entity id's:

In [339]:
%%bigquery image_id
SELECT image_id FROM `statmike-mlops.digits.digits_featurestore_import`

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 452.80query/s]                          
Downloading: 100%|██████████| 1797/1797 [00:01<00:00, 1343.11rows/s]


In [340]:
list(image_id['image_id'])[0:5]

['fd2b54ed-c182-4bb0-9f5e-d75982bd59d3',
 '878d983a-5b60-493c-bd83-02b03c2ec0c2',
 '027cead0-3f25-494b-9a54-00da0d2b82e3',
 'e9671135-e077-46cb-a3a7-144ab9e63556',
 '8f85189f-44a7-447e-a37c-8d398e1d95e2']

### Online - One

In [341]:
single = clients['fs_olserve'].read_feature_values(
    types.featurestore_online_service.ReadFeatureValuesRequest(
        entity_type = clients['fs'].entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID),
        entity_id="fd2b54ed-c182-4bb0-9f5e-d75982bd59d3",
        feature_selector = types.FeatureSelector(id_matcher=types.IdMatcher(ids=['*'])),
    )
)

In [348]:
print(list(item.id for item in single.header.feature_descriptors))

['p60', 'p12', 'p46', 'p43', 'p4', 'p55', 'p10', 'p20', 'p35', 'p57', 'p2', 'p54', 'p9', 'p42', 'p3', 'p8', 'p37', 'p26', 'p28', 'p17', 'p56', 'p19', 'p15', 'p6', 'p48', 'p63', 'target_oe', 'p23', 'p62', 'p58', 'p31', 'p41', 'p30', 'p49', 'p38', 'p1', 'p59', 'p27', 'p45', 'p34', 'p22', 'p25', 'p52', 'p44', 'p53', 'p18', 'p39', 'p33', 'p16', 'p11', 'p5', 'p0', 'target', 'p50', 'p21', 'p36', 'p61', 'p24', 'p14', 'p7', 'p51', 'p40', 'p47', 'p32', 'p29', 'p13']


In [350]:
print(list(item.value.double_value for item in single.entity_view.data))

[13.0, 14.0, 4.0, 1.0, 13.0, 0.0, 8.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 13.0, 10.0, 0.0, 15.0, 16.0, 0.0, 3.0, 0.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 2.0, 5.0, 0.0, 4.0, 0.0, 11.0, 6.0, 16.0, 16.0, 1.0, 5.0, 16.0, 5.0, 14.0, 16.0, 0.0, 6.0, 0.0, 16.0, 3.0, 0.0, 0.0, 10.0, 14.0, 0.0, 6.0, 0.0, 0.0, 0.0, 16.0, 0.0, 0.0, 0.0, 14.0, 12.0]


### Online - Multi

In [360]:
multi = clients['fs_olserve'].streaming_read_feature_values(
    types.featurestore_online_service.StreamingReadFeatureValuesRequest(
        entity_type = clients['fs'].entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID),
        entity_ids = list(image_id['image_id'])[0:3],
        feature_selector = types.FeatureSelector(id_matcher=types.IdMatcher(ids=['*'])),
    )
)

In [361]:
for i in multi:
    print(i.entity_view.entity_id)
    print(list(item.value.double_value for item in i.entity_view.data))


[]
027cead0-3f25-494b-9a54-00da0d2b82e3
[0.0, 14.0, 0.0, 0.0, 0.0, 0.0, 15.0, 13.0, 16.0, 2.0, 11.0, 4.0, 16.0, 3.0, 0.0, 3.0, 0.0, 1.0, 0.0, 0.0, 0.0, 12.0, 0.0, 10.0, 0.0, 14.0, 0.0, 0.0, 0.0, 14.0, 11.0, 16.0, 6.0, 7.0, 0.0, 0.0, 0.0, 0.0, 16.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 7.0, 16.0, 9.0, 0.0, 1.0, 4.0, 11.0, 0.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 16.0, 0.0, 1.0, 0.0]
878d983a-5b60-493c-bd83-02b03c2ec0c2
[0.0, 16.0, 0.0, 11.0, 0.0, 0.0, 13.0, 15.0, 14.0, 0.0, 10.0, 5.0, 13.0, 8.0, 0.0, 0.0, 0.0, 4.0, 0.0, 1.0, 0.0, 16.0, 0.0, 15.0, 0.0, 15.0, 0.0, 0.0, 0.0, 12.0, 8.0, 10.0, 15.0, 4.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 0.0, 0.0, 0.0, 3.0, 0.0, 5.0, 0.0, 0.0, 16.0, 16.0, 8.0, 0.0, 5.0, 0.0, 6.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 6.0, 11.0, 3.0, 1.0, 0.0]
fd2b54ed-c182-4bb0-9f5e-d75982bd59d3
[0.0, 14.0, 0.0, 1.0, 0.0, 0.0, 13.0, 14.0, 16.0, 2.0, 12.0, 6.0, 16.0, 6.0, 0.0, 0.0, 0.0, 6.0, 0.0, 3.0, 0.0, 16.0, 0.0, 16.0, 0.0, 14.0, 0.0, 0.0, 0.0, 13.0, 11.0, 10.0, 13.0, 5.0, 0.0

### Batch (For training or large scale prediction)

In [362]:
from google.cloud import bigquery

DESTINATION_DATASET = 'digits_training'

clients['bq'] = bigquery.Client()
dataset_id = "{}.{}".format(clients['bq'].project, DESTINATION_DATASET)
dataset = bigquery.Dataset(dataset_id)
dataset.location = REGION
dataset = clients['bq'].create_dataset(dataset, exists_ok = True)

In [363]:
batch_request = types.featurestore_service.ExportFeatureValuesRequest(
    entity_type = clients['fs'].entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID),
    snapshot_export = types.ExportFeatureValuesRequest.SnapshotExport(snapshot_time = Timestamp().GetCurrentTime()),
    destination = types.FeatureValueDestination(bigquery_destination = types.BigQueryDestination(output_uri='bq://statmike-mlops.digits_training.training')),
    feature_selector = types.FeatureSelector(id_matcher=types.IdMatcher(ids=['*']))
)

In [364]:
batchjob = clients['fs'].export_feature_values(batch_request)

In [365]:
batchjob.result()



---
## Clean Up

In [256]:
clients['fs'].delete_feature(
    name = PARENT + '/featurestores/{}/entityTypes/{}/features/p0'.format(FEATURESTORE_ID,ENTITYTYPE_ID)
).result()



In [257]:
clients['fs'].delete_feature(
    name = clients['fs'].feature_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID, 'p1')
).result()



In [282]:
clients['fs'].delete_entity_type(
    request = types.DeleteEntityTypeRequest(
        name = clients['fs'].entity_type_path(PROJECT_ID, REGION, FEATURESTORE_ID, ENTITYTYPE_ID),
        force = True)
).result()



In [283]:
clients['fs'].delete_featurestore( name = clients['fs'].featurestore_path(PROJECT_ID, REGION, FEATURESTORE_ID))

<google.api_core.operation.Operation at 0x7f9b00a440d0>

In [262]:
clients['bq'].delete_dataset(dataset, delete_contents = True)