# <span style='color:#ff5f27'> 👨🏻‍🏫 Snowflake as a Source for Feature Groups in Hopsworks </span>

Follow this [guide](https://docs.hopsworks.ai/latest/user_guides/fs/storage_connector/creation/snowflake/) to set up a Snowflake connector in Hopsworks.


In [1]:
import hopsworks
from hsfs.feature import Feature
import snowflake.connector

proj = hopsworks.login()
fs = proj.get_feature_store()

  from .autonotebook import tqdm as notebook_tqdm


Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/17565
Connected. Call `.close()` to terminate connection gracefully.


## <span style='color:#ff5f27'> 🔮 Retrieve a Connector

Firstly, connect to feature store and then retrieve your **Snowflake storage connector**.

Replace `my_storage_connector_name` with your Snowflake storage connector name.

In Snowflake, you will need to go to the data marketplace and `get` the [Chicago Divvy Bike Status dataset](https://app.snowflake.com/marketplace/listing/GZSTZBWGAEV/ahead-chicago-divvy-bike-station-status?search=chicago%20bike). 
Add the dataset to a schema called "PUBLIC" (or change the details in the connector below).

In [2]:
connector = fs.get_storage_connector("my_storage_connector_name")

In [4]:
def get_connection():
    conn = snowflake.connector.connect(
        user=connector.user,
        password=connector.password,
        account=connector.account,
        warehouse=connector.warehouse,
        database="CHICAGO_DIVVY_BIKE_STATION_STATUS",
        schema="PUBLIC"
    )
    return conn

## <span style='color:#ff5f27'> 📝 Read Data </span>

You can retrieve your data by passing a SQL query as a string to the snowflake connector.

In [5]:
conn = get_connection()

# SQL query to fetch the data
query = "SELECT * FROM STATION_INFO_FLATTEN"

# Execute the query
cur = conn.cursor()
cur.execute(query)
rows = cur.fetchall()

# Convert to DataFrame
import pandas as pd
df = pd.DataFrame(rows, columns=[x[0] for x in cur.description])

# Close the cursor and connection
cur.close()
conn.close()
df

Unnamed: 0,SHORT_NAME,STATION_TYPE,NAME,LON,ELECTRIC_BIKE_SURCHARGE_WAIVER,EXTERNAL_ID,LEGACY_ID,CAPACITY,HAS_KIOSK,STATION_ID,REGION_ID,EIGHTD_STATION_SERVICES,LAT
0,"""TA1309000064""","""classic""","""Wolcott Ave & Polk St""",-87.673688,false,"""a3ab86b6-a135-11e9-9cda-0a87ae2ba916""","""342""",23,true,"""a3ab86b6-a135-11e9-9cda-0a87ae2ba916""","""0""",[],41.871262
1,"""15575""","""classic""","""Broadway & Thorndale Ave""",-87.6601406209,false,"""a3af2c5f-a135-11e9-9cda-0a87ae2ba916""","""458""",19,true,"""a3af2c5f-a135-11e9-9cda-0a87ae2ba916""","""0""",[],41.98974251144
2,"""KA1503000065""","""classic""","""Woodlawn Ave & Lake Park Ave""",-87.5970051479,false,"""a3ad4d1b-a135-11e9-9cda-0a87ae2ba916""","""413""",15,true,"""a3ad4d1b-a135-11e9-9cda-0a87ae2ba916""","""0""",[],41.81409271048
3,"""15491""","""classic""","""63rd St Beach""",-87.57632374763489,false,"""a3a547b8-a135-11e9-9cda-0a87ae2ba916""","""101""",15,true,"""a3a547b8-a135-11e9-9cda-0a87ae2ba916""","""0""",[],41.78091096424803
4,"""13292""","""classic""","""Kedzie Ave & Palmer Ct""",-87.707322,false,"""a3a9f76a-a135-11e9-9cda-0a87ae2ba916""","""290""",15,true,"""a3a9f76a-a135-11e9-9cda-0a87ae2ba916""","""0""",[],41.921525
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1365,,"""lightweight""","""Michigan Ave & 102nd St""",-87.61984,false,"""motivate_CHI_1674190492950080350""","""1674190492950080350""",10,false,"""1674190492950080350""",,[],41.7083
1366,,"""lightweight""","""Pullman - Planet Fitness""",-87.59779,false,"""motivate_CHI_1677249879663712418""","""1677249879663712418""",10,false,"""1677249879663712418""",,[],41.69782
1367,,"""lightweight""","""Lamon Ave & Belmont Ave""",-87.7492834,false,"""motivate_CHI_1563698701206292480""","""1563698701206292480""",9,false,"""1563698701206292480""",,[],41.9390108
1368,,"""lightweight""","""Racine Ave & 76th""",-87.654054,false,"""motivate_CHI_1674190591734328324""","""1674190591734328324""",10,false,"""1674190591734328324""",,[],41.755786


## <span style='color:#ff5f27'> 📝 Write Data to Hopsworks </span>

Create a feature group and write the Pandas DataFrame to the Feature Group.
Hopsworks will automatically lowercase the uppercase column names.

In [6]:
bike_stations = fs.get_or_create_feature_group(name="chicago_bike_stations",
                                    version=1,
                                    description="Chicago bike station details",
                                    primary_key=["station_id"]
                                   )
bike_stations.insert(df)



Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/17565/fs/17485/fg/730480


Uploading Dataframe: 100.00% |█████████████████████████████████| Rows 1370/1370 | Elapsed Time: 00:07 | Remaining Time: 00:00


Launching job: chicago_bike_stations_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/17565/jobs/named/chicago_bike_stations_1_offline_fg_materialization/executions


(<hsfs.core.job.Job at 0x7f99db323040>, None)

## <span style='color:#ff5f27'> 📝 Read Data </span>

This time, we are reading from a table with a timestamp. We are limiting it to the most recent 50k rows, but you can change it if you want.

In [7]:
conn = get_connection()

query = """
    SELECT STATION_ID as id
        , STATION_STATUS as station_status
        , NUM_BIKES_AVAILABLE as num_bikes_available
        , NUM_EBIKES_AVAILABLE as num_ebikes_available
        , LAST_UPDATED as last_updated
    FROM STATION_STATUS_FLATTEN_FULL ORDER BY last_updated LIMIT 50000 
"""
# Execute the query
cur = conn.cursor()
cur.execute(query)
rows = cur.fetchall()

# Convert to DataFrame
import pandas as pd
df2 = pd.DataFrame(rows, columns=[x[0] for x in cur.description])

# Close the cursor and connection
cur.close()
conn.close()
df2

Unnamed: 0,ID,STATION_STATUS,NUM_BIKES_AVAILABLE,NUM_EBIKES_AVAILABLE,LAST_UPDATED
0,"""418""","""active""",2,0,2021-10-20 19:52:20
1,"""565""","""active""",2,2,2021-10-20 19:52:20
2,"""588""","""active""",7,5,2021-10-20 19:52:20
3,"""545""","""active""",1,0,2021-10-20 19:52:20
4,"""153""","""active""",8,1,2021-10-20 19:52:20
...,...,...,...,...,...
49995,"""682""","""active""",0,0,2021-10-20 20:55:35
49996,"""1594046362333434512""","""active""",5,5,2021-10-20 20:55:35
49997,"""57""","""active""",5,3,2021-10-20 20:55:35
49998,"""1448642183732401786""","""active""",5,5,2021-10-20 20:55:35


In [8]:
from great_expectations.core import ExpectationSuite, ExpectationConfiguration

# Create an Expectation Suite
expectation_suite = ExpectationSuite(
    expectation_suite_name="transaction_suite")

expectation_suite.add_expectation(
    ExpectationConfiguration(
        expectation_type="expect_column_values_to_not_be_null",
        kwargs={"column":"id"}
    )
)

{"kwargs": {"column": "id"}, "meta": {}, "expectation_type": "expect_column_values_to_not_be_null"}

## <span style='color:#ff5f27'> 📝 Create Feature Group </span>

This time, we are creating a feature group with a timestamp.

In [9]:
bike_station_status = fs.get_or_create_feature_group(name="chicago_bike_station_status",
                                    version=1,
                                    description="Chicago bike station details",
                                    primary_key=["id"],
                                    event_time="last_updated",
                                    online_enabled=True,
                                    expectation_suite=expectation_suite
                                   )
bike_station_status.insert(df2)



Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/17565/fs/17485/fg/729472
Validation succeeded.
Validation Report saved successfully, explore a summary at https://c.app.hopsworks.ai:443/p/17565/fs/17485/fg/729472


Uploading Dataframe: 100.00% |███████████████████████████████| Rows 50000/50000 | Elapsed Time: 00:09 | Remaining Time: 00:00


Launching job: chicago_bike_station_status_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/17565/jobs/named/chicago_bike_station_status_1_offline_fg_materialization/executions


(<hsfs.core.job.Job at 0x7f99dc6c9270>,
 {
   "success": true,
   "evaluation_parameters": {},
   "statistics": {
     "evaluated_expectations": 1,
     "successful_expectations": 1,
     "unsuccessful_expectations": 0,
     "success_percent": 100.0
   },
   "meta": {
     "great_expectations_version": "0.15.12",
     "expectation_suite_name": "transaction_suite",
     "run_id": {
       "run_time": "2024-04-18T06:00:48.814759+00:00",
       "run_name": null
     },
     "batch_kwargs": {
       "ge_batch_id": "011c36b6-fd49-11ee-98f3-00155d1167e0"
     },
     "batch_markers": {},
     "batch_parameters": {},
     "validation_time": "20240418T060048.814651Z",
     "expectation_suite_meta": {
       "great_expectations_version": "0.15.12"
     }
   },
   "results": [
     {
       "success": true,
       "result": {
         "element_count": 50000,
         "unexpected_count": 0,
         "unexpected_percent": 0.0,
         "unexpected_percent_total": 0.0,
         "partial_unexpected_

## <span style='color:#ff5f27'> 📝 Create a Feature View and Training Data </span>

Join features from our feature group with no event_time (bike_stations) with our feature group with event_time (bike_station_status).

In [13]:
# select the features for your model
selected_features = bike_station_status.select(['station_status','num_bikes_available']).join(bike_stations.select(['station_type', 'capacity', 'has_kiosk']), left_on="id", right_on="station_id")

In [14]:
fv = fs.get_or_create_feature_view(name="chicago_bike_availability", 
                                   version=1,
                                   description="Predict bike availability",
                                   query=selected_features,
                                   labels=["num_bikes_available"]
                                  )

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/17565/fs/17485/fv/chicago_bike_availability/version/1


In [15]:
X_train, X_test, y_train, y_test = fv.train_test_split(test_size=0.1)

Finished: Reading data from Hopsworks, using ArrowFlight (3.09s) 




In [16]:
X_train

Unnamed: 0,station_status,station_type,capacity,has_kiosk
0,"""active""","""lightweight""",10,false
2,"""active""","""lightweight""",9,false
3,"""active""","""lightweight""",9,false
4,"""active""","""lightweight""",8,false
5,"""active""","""lightweight""",4,false
...,...,...,...,...
7193,"""planned""","""lightweight""",6,false
7194,"""active""","""lightweight""",9,false
7195,"""active""","""lightweight""",9,false
7196,"""active""","""lightweight""",6,false
