# Getting Started with Feature Store

This notebook provides a getting started tutorial for how to securely connect to Feature Store from a local workstation and then accomplish common tasks on Feature Store.


## Notebook Setup

Follow the following steps to install Feature Store on your local machine.

1. Download the Feature Store python client wheel: https://demo.feature-store.h2o.ai/
2. Install this library: `pip install featurestore-0.0.33-py2.py3-none-any.whl`
3. Download the pem file (employees only): https://h2oai.slack.com/files/UR8D9HBPZ/F02G69RJU8N/isrgrootx1.pem

In [1]:
from featurestore import Client
from featurestore import Schema
from featurestore.core.data_types import STRING
from featurestore import Column, CSVFile
from featurestore.core.job_types import INGEST, RETRIEVE, EXTRACT_SCHEMA
from featurestore.core.filter import FilterBuilder, Case
from featurestore.core.filter.collections import FeatureSet, Feature


import getpass

## Table of Contents
<div class="toc"><ul class="toc-item"><li><span><a href="#Notebook-Setup" data-toc-modified-id="Notebook-Setup-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Notebook Setup</a></span></li><li><span><a href="#Securely-Connect" data-toc-modified-id="Securely-Connect-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Securely Connect</a></span></li><li><span><a href="#Apps" data-toc-modified-id="Apps-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Apps</a></span><ul class="toc-item"><li><span><a href="#List-all-apps" data-toc-modified-id="List-all-apps-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>List all apps</a></span></li><li><span><a href="#Find-the-app-with-the-most-versions" data-toc-modified-id="Find-the-app-with-the-most-versions-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Find the app with the most versions</a></span></li><li><span><a href="#List-all-apps-I-own" data-toc-modified-id="List-all-apps-I-own-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>List all apps I own</a></span></li><li><span><a href="#Change-the-visibility-of-an-app" data-toc-modified-id="Change-the-visibility-of-an-app-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Change the visibility of an app</a></span></li><li><span><a href="#View-all-tags-assigned-to-an-app" data-toc-modified-id="View-all-tags-assigned-to-an-app-3.5"><span class="toc-item-num">3.5&nbsp;&nbsp;</span>View all tags assigned to an app</a></span></li><li><span><a href="#Add-a-new-category-tag-to-an-app" data-toc-modified-id="Add-a-new-category-tag-to-an-app-3.6"><span class="toc-item-num">3.6&nbsp;&nbsp;</span>Add a new category tag to an app</a></span></li></ul></li><li><span><a href="#App-Instances" data-toc-modified-id="App-Instances-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>App Instances</a></span><ul class="toc-item"><li><span><a href="#Find-the-app-with-the-most-running-instances" data-toc-modified-id="Find-the-app-with-the-most-running-instances-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Find the app with the most running instances</a></span></li><li><span><a href="#Start-a-new-instance" data-toc-modified-id="Start-a-new-instance-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Start a new instance</a></span></li><li><span><a href="#List-all-instances-I-own" data-toc-modified-id="List-all-instances-I-own-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>List all instances I own</a></span></li><li><span><a href="#Pause-and-resume-an-instance" data-toc-modified-id="Pause-and-resume-an-instance-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>Pause and resume an instance</a></span></li><li><span><a href="#View-the-logs-of-an-instance-I-own" data-toc-modified-id="View-the-logs-of-an-instance-I-own-4.5"><span class="toc-item-num">4.5&nbsp;&nbsp;</span>View the logs of an instance I own</a></span></li><li><span><a href="#View-the-logs-of-an-app-I-own-and-instance-I-do-not-own" data-toc-modified-id="View-the-logs-of-an-app-I-own-and-instance-I-do-not-own-4.6"><span class="toc-item-num">4.6&nbsp;&nbsp;</span>View the logs of an app I own and instance I do not own</a></span></li><li><span><a href="#Delete-an-instance" data-toc-modified-id="Delete-an-instance-4.7"><span class="toc-item-num">4.7&nbsp;&nbsp;</span>Delete an instance</a></span></li></ul></li></ul></div>

## Securely Connect

First, update your download path variable to the path you saved the pem file

In [2]:
location_of_root_certificate = '/Users/admin/Downloads/ISRGRootX1.pem'

Now, lets connect to the feature store server

In [24]:
# connect
client = Client(
    "demo-api.feature-store.h2o.ai", 
    True, 
    root_certificates=location_of_root_certificate
)

Connecting to the server demo-api.feature-store.h2o.ai ...


To get the personal access token, run the line of code and click on url link that prints out. Then copy and paste the token into the box.

In [25]:
# login
print(f"Click this URL to get your token: {client.auth.get_login_url()}")

client.auth.set_auth_token(getpass.getpass("Paste your token"))

Click this URL to get your token: https://login.microsoftonline.com/840229f2-c911-49e6-a73d-5b3a4311835a/oauth2/v2.0/authorize?client_id=181404ad-fa85-47bc-9584-a02b2efae0d0&code_challenge=27Fbwa5MdeWEolES9HgGtvSpMh_05eaWlt_GgfCsYJ4&code_challenge_method=S256&redirect_uri=https://demo.feature-store.h2o.ai/Callback&response_type=code&scope=openid%20offline_access%20api://featurestore-demo/user_impersonation&state=E25PVg4b7Z
Paste your token········


In [26]:
# test connection
client.auth.get_active_user()

id: "384a71e9-d5a4-434b-856e-c546314f7f9a"
name: "jeffrey.canisius"
email: "jeffrey.canisius@h2o.ai"

## Projects API

### List all projects

In [6]:
client.projects.list()

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = ""
	debug_error_string = "{"created":"@1637249074.495122000","description":"Error received from peer ipv4:52.191.34.108:443","file":"src/core/lib/surface/call.cc","file_line":1070,"grpc_message":"","grpc_status":2}"
>

### Create a project
If the secret argument is set True, then the project is visibile to the owner exclusively. Others would not see the project in the list of projects and view project details.

If the locked argument is set True, then only users with consumer permission can list and get feature sets from this project

In [19]:
project = client.projects.create(project_name="test_project", description="description", secret=False, locked=None)

Project 'test_project' already exists.


### Call existing project

In [21]:
project = client.projects.get("test_project")
project

name: "test_project"
description: "description"
owner {
  id: "384a71e9-d5a4-434b-856e-c546314f7f9a"
  name: "jeffrey.canisius"
  email: "jeffrey.canisius@h2o.ai"
}
created_date_time {
  2021-11-18T15:27:04.862000
}
last_update_date_time {
  2021-11-18T15:28:32.186000
}

### Delete a project
Note that once you delete a project, it will remove all feature sets and features stored inside this project

In [9]:
project.delete()

Waiting for project 'test_project' deletion
Waiting for project 'test_project' deletion
Waiting for project 'test_project' deletion
Waiting for project 'test_project' deletion


### Change visibility of project

In [20]:
project.secret = False
project.update_metadata()

## Schema API

### Create schema from string

In [None]:
schema = "col1 integer, col2 string, col3 integer"
schema = Schema.from_string(schema)
schema

### Change data type of current schema

In [None]:
schema = schema.update_column("col1",STRING)
schema

### Add new columns to current schema

In [None]:
new_column = Column("new_column", "STRING")

# Append
schema = schema.append(new_column)  # Append to the end
schema = schema.append(new_column, "col2")  # Append after col2
# Prepend
schema = schema.prepend(new_column)  # Prepend to the beginning
schema = schema.prepend(new_column, "col2")  # Prepend before col2

schema

### Load schema from a feature set

In [None]:
load_schema = feature_set.get_schema()

### Save schema as string

In [None]:
str_schema = schema.to_string()

str_schema

### Feature Set API

### Register a feature set

In [None]:
project.feature_sets.register(schema,
                              "feature_set_name",
                              description="",
                              primary_key=None,
                              secondary_key=None,
                              time_travel_column=None,
                              time_travel_column_format="yyyy-MM-dd HH:mm:ss",
                              secret=False,
                              masked_features=None)

### List feature sets within a project
You can also pass in array of tags to the method if you want to list feature sets with specific tags

In [None]:
project.feature_sets.list()

### Obtain a feature set

In [None]:
fs = project.feature_sets.get("feature_set_name", version=None)
fs

### List all versions of feature set

In [None]:
fs.list_versions()

### Delete Feature set

In [None]:
fs.delete()

### Update meta data on feature set
To find out all fields that can be updated on feature set object, go to 'https://dev.feature-store.h2o.ai/doc/api/feature_set_api.html#updating-meta-data-on-feature-set'

In [None]:
fs.secret = False
# Add new tags that will overwrite any existing tags
fs.tags = ["new tag 1", "new tag 2"] # This will overwrite the existing tags with the given list of values
# Add a new tag to the feature set
fs.tags.append("new tag") # This will add the new tag to the list of existing tags
fs.update_metadata()

### Ingest API
To prepare the data and ingest it into the feature store, run:

In [None]:
source = CSVFile("wasbs://benchmarkdata@featurestoretestdata.blob.core.windows.net/small_size_data/customer_churn_data_based_on_dates.csv")

In [None]:
fs.ingest(source, start_date_time=None, end_date_time=None, new_version_on_schema_change=False)

You can revert last ingest by running:

In [None]:
fs.revert_last_ingest()

### Retrieve API
To retrieve the data, run:

In [None]:
ref = fs.retrieve(start_date_time=None, end_date_time=None)

### Preview Data
By default, you can view 10 rows of data. If you want to see more rows to a max of 100 rows, include num_rows as an agrument (i.e num_rows=20).

In [None]:
ref.preview()

### Download files from feature store
You can download the data onto you local machine by running the following line:

In [None]:
dir = ref.download()

## Jobs API

### List jobs
You can also pass in additional arguments active=False to return all jobs and job_type parameter to retrieve specific job types(i.e. job_types=INGEST). THE 3 job types are INGEST, RETRIEVE and EXTRACT_SCHEMA

In [None]:
client.jobs.list()

You can also list jobs that are currently processing for the specific feature set

In [None]:
fs.get_active_jobs()
fs.get_active_jobs(job_type=INGEST)

### Get a job

In [None]:
job = client.jobs.get("job_id")

### Check job status

In [None]:
job.is_done()

### Get job result

In [None]:
job.get_result()

### Download data

In [None]:
retrieve_job = client.jobs.get("job_id")
data_path = retrieve_job.download()

## Feature Set Search API

### Build filter conditions

In [None]:
filter_builder = FilterBuilder().add(FeatureSet.tags == "new tag")

### Search feature sets based on filters

In [None]:
project.feature_sets.list(filters = filter_builder)

## Create New Feature Set Version