## Feast demo - local version (simplest)

#### 1. Install libraries. After this step restart runtime.

In [1]:
!pip install feast pandas

Collecting feast
  Downloading feast-0.21.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.3 MB)
[K     |████████████████████████████████| 15.3 MB 21.6 MB/s 
Collecting grpcio-reflection<2,>=1.34.0
  Downloading grpcio_reflection-1.46.3-py3-none-any.whl (16 kB)
Collecting PyYAML<7,>=5.4.*
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 50.4 MB/s 
Collecting pydantic<2,>=1
  Downloading pydantic-1.9.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.1 MB)
[K     |████████████████████████████████| 11.1 MB 54.9 MB/s 
Collecting fastavro<2,>=1.1.0
  Downloading fastavro-1.4.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB)
[K     |████████████████████████████████| 2.3 MB 53.8 MB/s 
[?25hCollecting mmh3
  Downloading mmh3-3.0.0-cp37-cp37m-manylinux2010_x86_64.whl (50 kB)
[K     |█████████████████████████████

#### 2. Check feast version

In [2]:
!feast version 

Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage
  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
Feast SDK Version: "feast 0.21.2"


#### 3. Create feast repo

In [3]:
!feast init feature_repo

  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]

Creating a new Feast repository in [1m[32m/content/feature_repo[0m.



#### 4. Review code (menu on left)

Click on directories list and review files.

#### 5. Download code from repo (if it's not done yet).

In [4]:
!git clone https://github.com/juskuz/feast-basic-demo-aitech.git

Cloning into 'feast-basic-demo-aitech'...
remote: Enumerating objects: 9, done.[K
remote: Counting objects: 100% (9/9), done.[K
remote: Compressing objects: 100% (8/8), done.[K
remote: Total 9 (delta 0), reused 9 (delta 0), pack-reused 0[K
Unpacking objects: 100% (9/9), done.


#### 6. Generate sample data 


In [5]:
!python feast-basic-demo-aitech/data_generator.py 50 generated_data

INFO:root:Generating synthetic data from 2022-05-09 to 2022-05-19 23:00:00...
INFO:root:9900 stats rows generated
INFO:root:528 payments rows generated
INFO:root:Saving player stats to /content/generated_data/player_stats
INFO:root:Saving player payments to /content/generated_data/player_payments


In [6]:
# new directory appeared
!ls

feast-basic-demo-aitech  feature_repo  generated_data  sample_data


#### 7. Show generated data (_*.parquet_ file)


File can be also downloaded and opened with any tool for parquet files (e.g.ParquetViewer or PowerBI).

In [None]:
# run script (trimmed output) or copy content and run in notebook (visible more)
# !python feast-basic-demo-aitech/show_generated_data.py

In [7]:
import pandas as pd

df = pd.read_parquet("generated_data/player_payments")
print()
print("----------------------PAYMENTS DF----------------------")
print(df)
print("Length of df:", len(df))
print("Types:")
print(df.dtypes)

df = pd.read_parquet("generated_data/player_stats")
print()
print("----------------------STATS DF----------------------")
print(df)
print("Length of df:", len(df))
print("Types:")
print(df.dtypes)


----------------------PAYMENTS DF----------------------
     index player_id                  ts  amount  transactions
0        0       0QG 2022-05-09 00:00:00  163.48           4.0
1        1       0QG 2022-05-09 01:00:00  820.94           8.0
2        2       0QG 2022-05-09 02:00:00  942.34           2.0
3        3       0QG 2022-05-09 03:00:00  622.34           8.0
4        4       0QG 2022-05-09 04:00:00  469.93           3.0
..     ...       ...                 ...     ...           ...
523  13195       ZA9 2022-05-19 19:00:00  586.61           8.0
524  13196       ZA9 2022-05-19 20:00:00  468.32          10.0
525  13197       ZA9 2022-05-19 21:00:00   44.81           8.0
526  13198       ZA9 2022-05-19 22:00:00  451.02           6.0
527  13199       ZA9 2022-05-19 23:00:00  999.38           5.0

[528 rows x 5 columns]
Length of df: 528
Types:
index                    int64
player_id               object
ts              datetime64[ns]
amount                 float64
transactions  

### 8. Feast (features creating, applying, moving from offline to online).

After creating feast repo a new file _example.py_ appeared. It contains sample code describing features. Based on it you can create your own file. Below you see code for your newly generated data. Save code as _feature_repo/features.py_). If needed you can change paths (then in further steps you'll need modifications).

In [8]:
from datetime import timedelta

from feast import Entity, Feature, FeatureView, FileSource, ValueType

payments_source = FileSource(
    path="/content/generated_data/player_payments", # set correct path if needed
    event_timestamp_column="ts",
)

player_stats = FileSource(
    path="/content/generated_data/player_stats", # set correct path if needed
    event_timestamp_column="ts",
)

player = Entity(name="player_id", value_type=ValueType.STRING, description="player id")

payments_fv = FeatureView(
    name="payments",
    entities=["player_id"],
    ttl=timedelta(hours=6),
    features=[
        Feature("amount", ValueType.FLOAT),
        Feature("transactions", ValueType.INT32),
    ],
    batch_source=payments_source
)

stats_fv = FeatureView(
    name="stats",
    entities=["player_id"],
    ttl=timedelta(hours=6),
    features=[
        Feature("win_loss_ratio", ValueType.FLOAT),
        Feature("games_played", ValueType.INT32),
        Feature("time_in_game", ValueType.FLOAT),
    ],
    batch_source=player_stats
)


  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]


Remove from repo _example.py_ file. Then run `feast apply`.

In [22]:
# this way you can't change location from terminal
!ls
!cd feature_repo/
!ls

feast-basic-demo-aitech  feature_repo  generated_data  sample_data
feast-basic-demo-aitech  feature_repo  generated_data  sample_data


In [23]:
# these version also won't work:
#1
!cd feature_repo/
!ls

# #2
# !cd feature_repo/ | ls

# #3 
# !cd feature repo/ | feast apply
# # throws message: Can't find feature_store.yaml at /content. Make sure you're running feast from an initialized feast repository.


feast-basic-demo-aitech  feature_repo  generated_data  sample_data


In [9]:
# this works
!feast -c feature_repo/ apply

  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
Created entity [1m[32mplayer_id[0m
Created feature view [1m[32mstats[0m
Created feature view [1m[32mpayments[0m

Created sqlite table [1m[32mfeature_repo_payments[0m
Created sqlite table [1m[32mfeature_repo_stats[0m



In [None]:
# run script or copy content and run in notebook
# !python feast-basic-demo-aitech/show_stores.py

In [10]:
from datetime import datetime
from os import stat

from feast import FeatureStore
import pandas as pd


def show_stores():
    PLAYER_ID1 = "0QG"
    PLAYER_ID2 = "ZA9"

    fs = FeatureStore("feature_repo")
    payments = pd.read_parquet("generated_data/player_payments")[["player_id", "ts"]]
    payments = payments[payments["player_id"].isin([PLAYER_ID1, PLAYER_ID2])]
    stats = pd.read_parquet("generated_data/player_stats")[["player_id", "ts"]]
    stats = stats[stats["player_id"].isin([PLAYER_ID1, PLAYER_ID2])]
    entity_df = pd.concat([payments, stats]).sort_index().drop_duplicates().reset_index()

    print()
    print("----------------------HIST DATA FRAME----------------------")
    print(entity_df)
    
    trainging_df = fs.get_historical_features(
        entity_df=entity_df,
        features=[
            "payments:amount",
            "payments:transactions",
            "stats:win_loss_ratio",
            "stats:games_played",
            "stats:time_in_game",
        ]
    ).to_df()

    print()
    print("----------------------OFFLINE DATA FRAME PLAYER1----------------------")
    print(trainging_df[trainging_df["player_id"] == PLAYER_ID1].reset_index().drop(columns=["level_0"]))

    print()
    print("----------------------OFFLINE DATA FRAME PLAYER2----------------------")
    print(trainging_df[trainging_df["player_id"] == PLAYER_ID2].reset_index().drop(columns=["level_0"]))

    entity_rows = [{"player_id": PLAYER_ID1}, {"player_id": PLAYER_ID2}]
    online_df = fs.get_online_features(
        features=[
            "payments:amount",
            "payments:transactions",
            "stats:win_loss_ratio",
            "stats:games_played",
            "stats:time_in_game",
        ],
        entity_rows=entity_rows
    ).to_df()

    print()
    print("----------------------ONLINE DATA FRAME----------------------")
    print(online_df[["player_id", "amount", "transactions", "win_loss_ratio", "games_played", "time_in_game"]])



In [11]:
show_stores()


----------------------HIST DATA FRAME----------------------
     index player_id                  ts
0        0       0QG 2022-05-09 00:00:00
1        1       0QG 2022-05-09 01:00:00
2        2       0QG 2022-05-09 02:00:00
3        3       0QG 2022-05-09 03:00:00
4        4       0QG 2022-05-09 04:00:00
..     ...       ...                 ...
523    523       ZA9 2022-05-19 19:00:00
524    524       ZA9 2022-05-19 20:00:00
525    525       ZA9 2022-05-19 21:00:00
526    526       ZA9 2022-05-19 22:00:00
527    527       ZA9 2022-05-19 23:00:00

[528 rows x 3 columns]
Using ts as the event timestamp. To specify a column explicitly, please name it event_timestamp.

----------------------OFFLINE DATA FRAME PLAYER1----------------------
     index player_id                        ts  amount  transactions  \
0        0       0QG 2022-05-09 00:00:00+00:00  163.48           4.0   
1        1       0QG 2022-05-09 01:00:00+00:00  820.94           8.0   
2        2       0QG 2022-05-09 02:00:

Nothing appeared in ONLINE part beacause we didn't run `feast materialize`. This command needs start and end timepoints. It can be done using command below(here binding to variables):
`MAT_START_TIME=$(date -u +"%Y-%m-%dT06:00:00")`
`MAT_END_TIME=$(date -u +"%Y-%m-%dT08:00:00")`


In [12]:
!MAT_START_TIME=$(date -u +"%Y-%m-%dT06:00:00")
!MAT_END_TIME=$(date -u +"%Y-%m-%dT08:00:00")

!feast materialize $MAT_START_TIME $MAT_END_TIME

  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
Usage: feast materialize [OPTIONS] START_TS END_TS
Try 'feast materialize --help' for help.

Error: Missing argument 'START_TS'.


Code is not working beacause variable is not remembered. Command `feast materialize` needs to be run in directory with _feature_store.yaml_ or the path should be specified with -c parameter.

In [13]:
!(date -u +"%Y-%m-%dT06:00:00")
!(date -u +"%Y-%m-%dT08:00:00")

2022-05-20T06:00:00
2022-05-20T08:00:00


In [14]:
!feast -c feature_repo/ materialize 2022-05-20T06:00:00 2022-05-20T08:00:00

  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
Materializing [1m[32m2[0m feature views from [1m[32m2022-05-20 06:00:00+00:00[0m to [1m[32m2022-05-20 08:00:00+00:00[0m into the [1m[32msqlite[0m online store.

[1m[32mstats[0m:
0it [00:00, ?it/s]
[1m[32mpayments[0m:
0it [00:00, ?it/s]


Check if data appeared

In [15]:
show_stores()


----------------------HIST DATA FRAME----------------------
     index player_id                  ts
0        0       0QG 2022-05-09 00:00:00
1        1       0QG 2022-05-09 01:00:00
2        2       0QG 2022-05-09 02:00:00
3        3       0QG 2022-05-09 03:00:00
4        4       0QG 2022-05-09 04:00:00
..     ...       ...                 ...
523    523       ZA9 2022-05-19 19:00:00
524    524       ZA9 2022-05-19 20:00:00
525    525       ZA9 2022-05-19 21:00:00
526    526       ZA9 2022-05-19 22:00:00
527    527       ZA9 2022-05-19 23:00:00

[528 rows x 3 columns]
Using ts as the event timestamp. To specify a column explicitly, please name it event_timestamp.

----------------------OFFLINE DATA FRAME PLAYER1----------------------
     index player_id                        ts  amount  transactions  \
0        0       0QG 2022-05-09 00:00:00+00:00  163.48           4.0   
1        1       0QG 2022-05-09 01:00:00+00:00  820.94           8.0   
2        2       0QG 2022-05-09 02:00:

For selected time range the data is not available. Choose another range (wider).

In [16]:
!feast -c feature_repo/ materialize 2022-05-01T06:00:00 2022-05-20T08:00:00

  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
Materializing [1m[32m2[0m feature views from [1m[32m2022-05-01 06:00:00+00:00[0m to [1m[32m2022-05-20 08:00:00+00:00[0m into the [1m[32msqlite[0m online store.

[1m[32mstats[0m:
100%|█████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 3396.75it/s]
[1m[32mpayments[0m:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 247.61it/s]


In [17]:
# optional: materialize in incremental version
# !feast -c feature_repo/ materialize-incremental "2022-05-22T08:00:00"

In [18]:
# !python feast-basic-demo-aitech/show_stores.py
show_stores()


----------------------HIST DATA FRAME----------------------
     index player_id                  ts
0        0       0QG 2022-05-09 00:00:00
1        1       0QG 2022-05-09 01:00:00
2        2       0QG 2022-05-09 02:00:00
3        3       0QG 2022-05-09 03:00:00
4        4       0QG 2022-05-09 04:00:00
..     ...       ...                 ...
523    523       ZA9 2022-05-19 19:00:00
524    524       ZA9 2022-05-19 20:00:00
525    525       ZA9 2022-05-19 21:00:00
526    526       ZA9 2022-05-19 22:00:00
527    527       ZA9 2022-05-19 23:00:00

[528 rows x 3 columns]
Using ts as the event timestamp. To specify a column explicitly, please name it event_timestamp.

----------------------OFFLINE DATA FRAME PLAYER1----------------------
     index player_id                        ts  amount  transactions  \
0        0       0QG 2022-05-09 00:00:00+00:00  163.48           4.0   
1        1       0QG 2022-05-09 01:00:00+00:00  820.94           8.0   
2        2       0QG 2022-05-09 02:00:

#### Add new data

If you need add more data you can generate again (now curr mode).

In [19]:
!python feast-basic-demo-aitech/data_generator.py 50 generated_data --mode curr

INFO:root:Generating synthetic data from 2022-05-20 to 2022-05-20 21:15:35.620883...
INFO:root:825 stats rows generated
INFO:root:44 payments rows generated
INFO:root:Appending player stats to /content/generated_data/player_stats
INFO:root:Appending player payments to /content/generated_data/player_payments


After adding new data it's necassary to run `materialize` (move from offline to online)

In [20]:
!feast -c feature_repo/ materialize 2022-05-01T06:00:00 2022-05-20T15:00:00

  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
Materializing [1m[32m2[0m feature views from [1m[32m2022-05-01 06:00:00+00:00[0m to [1m[32m2022-05-20 15:00:00+00:00[0m into the [1m[32msqlite[0m online store.

[1m[32mstats[0m:
100%|█████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 3417.28it/s]
[1m[32mpayments[0m:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 232.93it/s]


In [21]:
# !python feast-basic-demo-aitech/show_stores.py
show_stores()


----------------------HIST DATA FRAME----------------------
     index player_id                  ts
0        0       0QG 2022-05-20 00:00:00
1        0       0QG 2022-05-09 00:00:00
2        1       0QG 2022-05-20 01:00:00
3        2       0QG 2022-05-20 02:00:00
4        2       0QG 2022-05-09 02:00:00
..     ...       ...                 ...
567    567       ZA9 2022-05-19 19:00:00
568    568       ZA9 2022-05-19 20:00:00
569    569       ZA9 2022-05-19 21:00:00
570    570       ZA9 2022-05-19 22:00:00
571    571       ZA9 2022-05-19 23:00:00

[572 rows x 3 columns]
Using ts as the event timestamp. To specify a column explicitly, please name it event_timestamp.

----------------------OFFLINE DATA FRAME PLAYER1----------------------
     index player_id                        ts  amount  transactions  \
0        0       0QG 2022-05-09 00:00:00+00:00  163.48           4.0   
1       45       0QG 2022-05-09 01:00:00+00:00  820.94           8.0   
2        2       0QG 2022-05-09 02:00: