# Iris Data Feature Store

## Install all the required packages

In [1]:
conda install -c conda-forge -q numpy scipy scikit-learn

Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs:
    - numpy
    - scikit-learn
    - scipy


The following NEW packages will be INSTALLED:

  joblib             conda-forge/noarch::joblib-1.5.1-pyhd8ed1ab_0 
  libblas            conda-forge/linux-64::libblas-3.9.0-32_h59b9bed_openblas 
  libcblas           conda-forge/linux-64::libcblas-3.9.0-32_he106b2a_openblas 
  libgfortran        conda-forge/linux-64::libgfortran-15.1.0-h69a702a_2 
  libgfortran5       conda-forge/linux-64::libgfortran5-15.1.0-hcea5267_2 
  liblapack          conda-forge/linux-64::liblapack-3.9.0-32_h7ac8fdf_openblas 
  libopenblas        conda-forge/linux-64::libopenblas-0.3.30-pthreads_h94d23a6_0 
  numpy              conda-forge/linux-64::numpy-2.2.6-py310hefbff90_0 
  scikit-learn       conda-forge/linux-64::scikit-learn-1.7.0-py31

In [2]:
! pip3 install --upgrade --quiet  google-cloud-aiplatform

In [3]:
!pip install --quiet feast[gcp] feast[redis] google-cloud-bigquery pandas scikit-learn numpy pandas

## Initialize a blank Feast repo

Only for the first time all the setup is already done to update the feature_store.yaml and corresponding repo file is setup for the demo of this project

In [None]:
#!feast init -t gcp iris_project

## Generate Enhance Iris Dataset

The dataset we have lacks columns such as unique identfiers and timestamps which are essential to have in a dataset for feature store.
We will enhance the existing iris dataset to have these required columns and following code does that for us.

In [4]:
import random, os
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

df = pd.read_csv("data/iris.csv")

df["iris_id"] = df.index.astype(int)

# Generate random event timestamps (within the last 30 days)
now = datetime.utcnow()
df["event_timestamp"] = [
    now - timedelta(days=random.randint(0, 30)) for _ in range(len(df))
]

df["created"] = pd.to_datetime(datetime.utcnow())

# Reorder columns
cols = [
    "iris_id",
    "sepal_length",
    "sepal_width",
    "petal_length",
    "petal_width",
    "species",
    "event_timestamp",
    "created",
]
df = df[cols]

# Save as CSV
output_path = "data/"
df.to_parquet(output_path+"iris_data.parquet", index=False)
df.to_parquet("iris_project/feature_repo/data/iris_data.parquet")
df.to_csv(output_path+"iris_data.csv", index=False)

print(f"Feast-ready parquet data saved to: {output_path}iris_data.parquet")
print(f"Feast-ready csv data saved to: {output_path}iris_data.csv")

Feast-ready parquet data saved to: data/iris_data.parquet
Feast-ready csv data saved to: data/iris_data.csv


In [None]:
# %%bash
# cd ~/iris_project/feature_repo/
# feast init .

In [5]:
%%bash
cd ~/iris_project/feature_repo/
feast apply



No project found in the repository. Using project name iris_project defined in feature_store.yaml
Applying changes for project iris_project
Created project iris_project
Created entity iris
Created feature view iris_features
Created feature service iris_feature_service_v1





Created sqlite table iris_project_iris_features



In [6]:
%%bash
cd ~/iris_project/feature_repo/
feast materialize 2025-01-01 2025-06-22



Materializing [1m[32m1[0m feature views from [1m[32m2025-01-01 00:00:00+00:00[0m to [1m[32m2025-06-22 00:00:00+00:00[0m into the [1m[32msqlite[0m online store.

[1m[32miris_features[0m:


100%|███████████████████████████████████████████████████████████| 139/139 [00:00<00:00, 8550.01it/s]


In [7]:
from feast import FeatureStore
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics

store = FeatureStore(repo_path="iris_project/feature_repo")
entity_rows = [{"iris_id": i} for i in range(150)]

features_df = store.get_online_features(
    features=[
        "iris_features:sepal_length",
        "iris_features:sepal_width",
        "iris_features:petal_length",
        "iris_features:petal_width",
    ],
    entity_rows=entity_rows,
).to_df()

iris_original = pd.read_csv("data/iris_data.csv")
iris_original["iris_id"] = iris_original.index

merged_df = pd.merge(features_df, iris_original[["iris_id", "species"]], on="iris_id")

train, test = train_test_split(merged_df, test_size = 0.4, stratify = merged_df['species'], random_state = 42)
X_train = train[['sepal_length','sepal_width','petal_length','petal_width']]
y_train = train.species
X_test = test[['sepal_length','sepal_width','petal_length','petal_width']]
y_test = test.species

mod_dt = DecisionTreeClassifier(max_depth = 3, random_state = 1)
mod_dt.fit(X_train,y_train)
prediction=mod_dt.predict(X_test)
print('The accuracy of the Decision Tree is',"{:.3f}".format(metrics.accuracy_score(prediction,y_test)))




The accuracy of the Decision Tree is 0.917


