# Feature Store - A unified storage of curated features

This notebook is intended to help you get started with Feature Store in the H2O AI Cloud using python.

* **Product Documentation:** https://docs.h2o.ai/feature-store/latest-stable/docs/index.html

## Prerequistes

Update the `h2o_ai_cloud.py` file with the connection parameters for your H2O AI Cloud environemnt:
1. Login to your H2O AI Cloud environment
1. Click your username or avatar in the H2O AI Cloud navigation bar
1. Navigate to `CLI & API Access`
1. Use the variables from the `Accessing H2O AI Cloud APIs` section to populate the parameters

In [3]:
from getpass import getpass

from h2o_ai_cloud import token_provider, fs_client
from featurestore import CSVFile, Schema

from pyspark.sql import SparkSession

## Securely connect to the platform
We first connect to the H2O AI Cloud using our platform token to create a token provider object. We can then use this object to log into Feature Store.

In [5]:
client = fs_client(token_provider())

## Understand the environment

In [None]:
client.get_version()

## Configure spark for Feature Store 

In [None]:
spark = SparkSession.builder \
    .master("local") \
    .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.1,com.amazonaws:aws-java-sdk-bundle:1.12.238,io.delta:delta-core_2.12:1.2.1") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .getOrCreate()

spark.sparkContext.setLogLevel("ERROR")

## Define data source 
Feature Store supports different data sources - https://docs.h2o.ai/feature-store/latest-stable/docs/supported_data_sources.html

In [27]:
source = CSVFile("s3a://h2o-public-test-data/smalldata/gbm_test/titanic.csv")

## Extract schema from data source
The schema represents the features of the feature set

In [4]:
schema = client.extract_schema_from_source(source)

## Create a project
User can follow naming conventions mentioned in here - https://docs.h2o.ai/feature-store/latest-stable/docs/api/naming_conventions.html

In [29]:
project = client.projects.create("sample_project")

## Create a feature set

In [30]:
feature_set = project.feature_sets.register(schema, "sample_fs")

## Ingest data from source
Uploading data into Feature Store

In [5]:
feature_set.ingest(source)

## Retrieve the data

In [32]:
reference = feature_set.retrieve()

## Download features
Download the files from Feature Store

In [6]:
reference.download()

## Obtain data as a Spark Frame 
Download features as spark dataframe

In [7]:
reference.as_spark_frame(spark).show()

### Prepare a schema from a string
Schema can be created from a string format

In [35]:
schema_str = "id integer, value string"
schema = Schema.create_from(schema_str)

### Create another feature set

In [36]:
fs_online = project.feature_sets.register(schema, "sample_fs_online", primary_key="id")

## Ingest data from Online Feature Store

In [37]:
fs_online.ingest_online('{"id": 1, "value": "test"}')

## Retrieve data from Online Feature Store

In [10]:
fs_online.retrieve_online(1)

## Delete a feature set

In [8]:
fs_online.delete()

## Delete a project

In [9]:
project.delete()

## Clean up

In [2]:
!rm 