# Core 7.4 Feature Store - Basic Retrieval

In this section, we will create a `Feature Vector` to retrieve the 3 `Feature Sets` created previously.

---

### References

Much of the following content is derived from the official documenation:
- [Creating and using feature vectors](https://docs.mlrun.org/en/latest/feature-store/feature-vectors.html)

---

### Example Overview

In this example, we will retrieve/join the 3 previous `Feature Sets` together into a single `Feature Vector`. This will allow us to retrieve the features in batch and in real-time

---

### Setup

In [17]:
import pandas as pd
import mlrun
import mlrun.feature_store as fstore
from mlrun.datastore.targets import ParquetTarget

project = mlrun.get_or_create_project("iguazio-academy", context="./")

> 2022-04-22 22:11:55,036 [info] loaded project iguazio-academy from MLRun DB


---

### Define Feature Vector

Our `Feature Vector` will allow us to join and retrieve the 3 previous `Feature Sets` together as a single dataset. We can define it like so:

In [2]:
fvec = fstore.FeatureVector(
    name="heart-disease-vector",
    features=["heart-disease-categorical.*", "heart-disease-continuous.*"],
    description="Financial dataset",
    label_feature="heart-disease-target.target"
)
fvec.save()

Note the additional `label_feature` field. This allows us to specify a certain column as the target for the training set.

---

### Retrieve Feature Vector as Dataframe

In order to retrieve our `Feature Vector`, we just need to specify the project and name of the vector.

One of the most common ways to retrieve your dataset is as a dataframe. This is very simple to do:

In [6]:
df = fstore.get_offline_features("iguazio-academy/heart-disease-vector").to_dataframe()
df.head()

Unnamed: 0,age,sex,cp,exang,fbs,slope,thal,trestbps,chol,restecg,thalach,oldpeak,ca,target
0,52,male,typical_angina,no,False,downsloping,normal,125,212,1,168,1.0,2.0,0
1,53,male,typical_angina,yes,True,upsloping,normal,140,203,0,155,3.1,0.0,0
2,70,male,typical_angina,yes,False,upsloping,normal,145,174,1,125,2.6,0.0,0
3,61,male,typical_angina,no,False,downsloping,normal,148,203,1,161,0.0,1.0,0
4,62,female,typical_angina,no,True,flat,reversable_defect,138,294,1,106,1.9,3.0,0


This has joined all 3 of our `Feature Vectors` into a single dataset using the defined `Entity`. If you would like to explicitly see the `Entity` during retrieval, you can do the following:

In [8]:
df = fstore.get_offline_features("iguazio-academy/heart-disease-vector", with_indexes=True).to_dataframe()
df.head()

Unnamed: 0_level_0,age,sex,cp,exang,fbs,slope,thal,trestbps,chol,restecg,thalach,oldpeak,ca,target
patient_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
e443544b-8d9e-4f6c-9623-e24b6139aae0,52,male,typical_angina,no,False,downsloping,normal,125,212,1,168,1.0,2.0,0
8227d3df-16ab-4452-8ea5-99472362d982,53,male,typical_angina,yes,True,upsloping,normal,140,203,0,155,3.1,0.0,0
10c4b4ba-ab40-44de-8aba-6bdb062192c4,70,male,typical_angina,yes,False,upsloping,normal,145,174,1,125,2.6,0.0,0
f0acdc22-7ee6-4817-a671-e136211bc0a6,61,male,typical_angina,no,False,downsloping,normal,148,203,1,161,0.0,1.0,0
2d6b3bca-4841-4618-9a8c-ca902010b009,62,female,typical_angina,no,True,flat,reversable_defect,138,294,1,106,1.9,3.0,0


---

### Retrieve Feature Vector as Parquet

Another common use case is to materialize the joined dataset on the filesystem using a target such as the `ParquetTarget` or `CSVTarget`. You can do this like so:

In [13]:
resp = fstore.get_offline_features("iguazio-academy/heart-disease-vector", target=ParquetTarget())

> 2022-04-22 22:10:55,709 [info] wrote target: {'name': 'parquet', 'kind': 'parquet', 'path': 'v3io:///projects/iguazio-academy/FeatureStore/heart-disease-vector/parquet/vectors/heart-disease-vector-latest.parquet', 'status': 'ready', 'updated': '2022-04-22T22:10:55.709201+00:00', 'size': 16818}


This will create a parquet file on the filesystem located here:

In [12]:
!ls /v3io/projects/iguazio-academy/FeatureStore/heart-disease-vector/parquet/vectors

heart-disease-vector-latest.parquet


You can even read the parquet file without any use of the Feature Store if desired:

In [18]:
df = pd.read_parquet("/v3io/projects/iguazio-academy/FeatureStore/heart-disease-vector/parquet/vectors/heart-disease-vector-latest.parquet")
df.head()

Unnamed: 0,age,sex,cp,exang,fbs,slope,thal,trestbps,chol,restecg,thalach,oldpeak,ca,target
0,52,male,typical_angina,no,False,downsloping,normal,125,212,1,168,1.0,2.0,0
1,53,male,typical_angina,yes,True,upsloping,normal,140,203,0,155,3.1,0.0,0
2,70,male,typical_angina,yes,False,upsloping,normal,145,174,1,125,2.6,0.0,0
3,61,male,typical_angina,no,False,downsloping,normal,148,203,1,161,0.0,1.0,0
4,62,female,typical_angina,no,True,flat,reversable_defect,138,294,1,106,1.9,3.0,0


---

### Retrieve Records in Real-Time

In addition to retrieving the entire dataset in batch, you can also retrieve individual records in real-time using their `Entity`. This is useful for enriching incoming records and imputing missing values.

To retrieve records in real-time, we can create an online feature service like so:

In [19]:
feature_service = fstore.get_online_feature_service(feature_vector="iguazio-academy/heart-disease-vector")

Then, we can retrieve a record like so:

In [24]:
feature_service.get(
    [
        {"patient_id" : "e443544b-8d9e-4f6c-9623-e24b6139aae0"},
        {"patient_id" : "8227d3df-16ab-4452-8ea5-99472362d982"}
    ]
)

[{'age': 52,
  'sex': 'male',
  'cp': 'typical_angina',
  'exang': 'no',
  'fbs': False,
  'slope': 'downsloping',
  'thal': 'normal',
  'trestbps': 125,
  'chol': 212,
  'restecg': 1,
  'thalach': 168,
  'oldpeak': 1.0,
  'ca': 2.0},
 {'age': 53,
  'sex': 'male',
  'cp': 'typical_angina',
  'exang': 'yes',
  'fbs': True,
  'slope': 'upsloping',
  'thal': 'normal',
  'trestbps': 140,
  'chol': 203,
  'restecg': 0,
  'thalach': 155,
  'oldpeak': 3.1,
  'ca': 0.0}]

---