In [1]:
import pandas as pd

data = { 
    'credit_card_number': ['1111 2222 3333 4444', '1111 2222 3333 4444','1111 2222 3333 4444','1111 2222 3333 4444'],
    'trans_datetime': ['2022-01-01 08:44', '2022-01-01 19:44', '2022-01-01 20:44', '2022-01-01 20:55'],
    'amount': [142.34, 12.34, 66.29, 112.33],
    'location': ['Sao Paolo', 'Rio De Janeiro', 'Stockholm', 'Stockholm'],
    'fraud': [False, False, True, True] 
}

df = pd.DataFrame.from_dict(data)
df['trans_datetime']= pd.to_datetime(df['trans_datetime'])
df

Unnamed: 0,credit_card_number,trans_datetime,amount,location,fraud
0,1111 2222 3333 4444,2022-01-01 08:44:00,142.34,Sao Paolo,False
1,1111 2222 3333 4444,2022-01-01 19:44:00,12.34,Rio De Janeiro,False
2,1111 2222 3333 4444,2022-01-01 20:44:00,66.29,Stockholm,True
3,1111 2222 3333 4444,2022-01-01 20:55:00,112.33,Stockholm,True


In [2]:
import hopsworks
proj = hopsworks.login()
fs = proj.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/398
Connected. Call `.close()` to terminate connection gracefully.


### Create a Feature Group

Hopsworks have comprehensive documentation on Feature Groups. Click on these links to learn more.

* [Feature Group Concept](https://docs.hopsworks.ai/3.0/concepts/fs/feature_group/fg_overview/)
* [Feature Group Creation Guide](https://docs.hopsworks.ai/3.0/user_guides/fs/feature_group/create/)
* [Feature Group API Docs](https://docs.hopsworks.ai/feature-store-api/3.0/generated/api/feature_group_api/)

In [3]:
fg = fs.get_or_create_feature_group(
     name="credit_card_transactions",
     version=1,
     description="Credit Card Transaction data",
     primary_key=['credit_card_number'],
     event_time='trans_datetime'
) 

### Write your DataFrame to the Feature Group
When you write your DataFrame to the feature group, first the DataFrame is copied to Hopsworks. 
Then a backfill ingestion job is run on Hopsworks to insert/append the DataFrame to the Feature Group. 
The job is a Spark job, and the data is stored in a Apache Hudi table in Hopsworks.

It will take about 1 minute for the ingestion job to complete.
If you don't want to wait 1 minute, you make the ingestion job run in the background with:


    fg.insert(df, write_options={"wait_for_job": False})

In [4]:
fg.insert(df)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/398/fs/335/fg/917


Uploading Dataframe: 0.00% |          | Rows 0/4 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/398/jobs/named/credit_card_transactions_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7f2a1b8fb9d0>, None)

In [5]:
query = fg.select_all()

In [6]:
fv = fs.create_feature_view(name="credit_card_transactions",
                            version=1,
                            description="Features from the credit_card_transactions FG",
                            labels=["fraud"],
                            query=query)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/398/fs/335/fv/credit_card_transactions/version/1


In [7]:
X_train, y_train, X_test, y_test = fv.train_test_split(0.5)

2022-09-15 12:05:04,113 INFO: USE `dowlingj_featurestore`
2022-09-15 12:05:05,352 INFO: SELECT `fg0`.`credit_card_number` `credit_card_number`, `fg0`.`trans_datetime` `trans_datetime`, `fg0`.`amount` `amount`, `fg0`.`location` `location`, `fg0`.`fraud` `fraud`
FROM `dowlingj_featurestore`.`credit_card_transactions_1` `fg0`


