# Fraud Detection with Feature Store on OpenShift AI 3.2

This notebook demonstrates the use of **Feature Store** (Feast) on **Red Hat OpenShift AI 3.2** for **bank fraud detection**.

## Prerequisites

- Workbench created in the `fraud-detection-ml` project
- Feature Store `fraud_detection` selected in the workbench configuration
- Data Connection to MinIO configured (for historical features)

## Workflow
1. Connect to the Feature Store (using the auto-mounted client config)
2. Explore registered features
3. Retrieve historical features for training
4. Train a fraud detection model
5. Real-time prediction via the online store

## 1. Install dependencies

In [None]:
!pip install -q feast[postgres] scikit-learn pandas pyarrow s3fs boto3

## 2. Connect to the Feature Store

The Feast client configuration is **auto-mounted** by RHOAI when you select the Feature Store in the workbench settings. The config file is at `/opt/app-root/src/feast-config/<project_name>`.

In [None]:
import os

from feast import FeatureStore

# The Feast client config is auto-mounted by RHOAI when you select
# the Feature Store in the workbench settings.
feast_config_dir = "/opt/app-root/src/feast-config"

if os.path.isdir(feast_config_dir):
    config_files = [
        os.path.join(feast_config_dir, f)
        for f in os.listdir(feast_config_dir)
        if os.path.isfile(os.path.join(feast_config_dir, f))
    ]
else:
    config_files = []

if config_files:
    fs_yaml = config_files[0]
    print(f"Using auto-mounted config: {fs_yaml}")
    with open(fs_yaml) as f:
        print(f.read())
    store = FeatureStore(fs_yaml_file=fs_yaml)
else:
    raise FileNotFoundError(
        f"No Feast config found in {feast_config_dir}. "
        "Make sure you selected the Feature Store when creating the workbench."
    )

print(f"\nProject: {store.project}")

## 3. Explore registered features

The features, entities, and feature views are already registered via `feast apply` (done during deployment).

In [None]:
print("=== Entities ===")
for entity in store.list_entities():
    print(f"  - {entity.name}")

print("\n=== Feature Views ===")
for fv in store.list_feature_views():
    print(f"  - {fv.name} ({len(fv.features)} features, TTL={fv.ttl})")
    for feature in fv.features:
        print(f"      {feature.name}: {feature.dtype}")

print("\n=== On-Demand Feature Views ===")
for odfv in store.list_on_demand_feature_views():
    print(f"  - {odfv.name}")
    for feature in odfv.features:
        print(f"      {feature.name}: {feature.dtype}")

print("\n=== Data Sources ===")
for ds in store.list_data_sources():
    print(f"  - {ds.name} ({type(ds).__name__})")

## 4. Retrieve historical features (Training)

We retrieve features from the **offline store** (Parquet files on MinIO/S3) to train a fraud detection model.

This requires the MinIO Data Connection configured in the workbench (provides `AWS_*` environment variables).

In [ ]:
import numpy as np
import pandas as pd

np.random.seed(42)

now = pd.Timestamp.now()
customer_ids = [f"C{str(i).zfill(5)}" for i in range(1, 51)]
N_TRANSACTIONS = 2000

# Simulate labeled transaction data
entity_df = pd.DataFrame({
    "customer_id": np.random.choice(customer_ids, N_TRANSACTIONS),
    "event_timestamp": [now - pd.Timedelta(hours=np.random.randint(1, 24)) for _ in range(N_TRANSACTIONS)],
    "transaction_amount": np.round(np.random.exponential(200, N_TRANSACTIONS), 2),
    "is_foreign_transaction": np.random.choice([0, 1], N_TRANSACTIONS, p=[0.85, 0.15]),
})

# Features to retrieve
feature_refs = [
    "customer_profile:age",
    "customer_profile:account_age_days",
    "customer_profile:credit_limit",
    "customer_profile:num_cards",
    "transaction_stats:avg_transaction_amount_30d",
    "transaction_stats:num_transactions_7d",
    "transaction_stats:num_transactions_1d",
    "transaction_stats:max_transaction_amount_7d",
    "transaction_stats:num_foreign_transactions_30d",
    "transaction_stats:num_declined_transactions_7d",
    "fraud_risk_features:amount_ratio_to_avg",
    "fraud_risk_features:amount_ratio_to_max",
    "fraud_risk_features:risk_score",
]

# Retrieve historical features from the offline store (Parquet on S3)
print("Retrieving historical features from S3...")
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=feature_refs,
).to_df()

print(f"Training dataset: {training_df.shape[0]} rows, {training_df.shape[1]} columns")
training_df.head(10)

## 5. Train a fraud detection model

In [ ]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

# Generate simulated fraud labels
# Transactions with a high risk_score are more likely to be fraudulent
training_df = training_df.dropna()
fraud_probability = 1 / (1 + np.exp(-(training_df["risk_score"] - 1.5) * 3))
training_df["is_fraud"] = (np.random.random(len(training_df)) < fraud_probability).astype(int)

print(f"Fraud distribution:")
print(training_df["is_fraud"].value_counts())
print(f"Fraud rate: {training_df['is_fraud'].mean():.2%}")

# Prepare features for the model
model_features = [
    "age", "account_age_days", "credit_limit", "num_cards",
    "avg_transaction_amount_30d", "num_transactions_7d", "num_transactions_1d",
    "max_transaction_amount_7d", "num_foreign_transactions_30d",
    "num_declined_transactions_7d",
    "transaction_amount", "is_foreign_transaction",
    "amount_ratio_to_avg", "amount_ratio_to_max", "risk_score",
]

X = training_df[model_features]
y = training_df["is_fraud"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Random Forest
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Evaluate
y_pred = clf.predict(X_test)
print("\n=== Classification Report ===")
print(classification_report(y_test, y_pred, target_names=["Legitimate", "Fraud"]))

print("=== Confusion Matrix ===")
print(confusion_matrix(y_test, y_pred))

In [None]:
# Feature importance
import matplotlib.pyplot as plt

importances = pd.Series(clf.feature_importances_, index=model_features).sort_values(ascending=True)

fig, ax = plt.subplots(figsize=(10, 6))
importances.plot(kind="barh", ax=ax)
ax.set_title("Feature importance for fraud detection")
ax.set_xlabel("Importance")
plt.tight_layout()
plt.show()

## 6. Real-time prediction via the Online Store

Simulate an incoming transaction: retrieve the customer's features from the **online store** (PostgreSQL) in real-time, then apply the model.

In [None]:
# Simulate a suspicious transaction
test_customer = "C00042"
transaction = {
    "transaction_amount": 4500.00,  # high amount
    "is_foreign_transaction": 1,     # foreign transaction
}

print(f"Incoming transaction for customer {test_customer}:")
print(f"  Amount: {transaction['transaction_amount']} EUR")
print(f"  Foreign transaction: {'Yes' if transaction['is_foreign_transaction'] else 'No'}")

# Retrieve features in real-time from PostgreSQL (online store)
online_features = store.get_online_features(
    entity_rows=[
        {
            "customer_id": test_customer,
            **transaction,
        }
    ],
    features=feature_refs,
).to_dict()

print("\nFeatures retrieved from the online store (PostgreSQL):")
for key, values in online_features.items():
    if key != "customer_id":
        print(f"  {key}: {values[0]}")

In [ ]:
# Build feature vector for prediction
feature_vector = {}
feature_vector.update(online_features)
feature_vector.update({k: [v] for k, v in transaction.items()})

predict_df = pd.DataFrame(feature_vector)

# Keep only the model features (in the right order)
available_features = [f for f in model_features if f in predict_df.columns]
predict_input = predict_df[available_features]

# Predict
prediction = clf.predict(predict_input)[0]
probability = clf.predict_proba(predict_input)[0]

print("\n" + "=" * 50)
if prediction == 1:
    print(f"FRAUD ALERT - Probability: {probability[1]:.1%}")
    print("Action: Transaction blocked for review")
else:
    print(f"Legitimate transaction - Fraud probability: {probability[1]:.1%}")
    print("Action: Transaction approved")
print("=" * 50)

## 7. Architecture Summary

```
                    OpenShift AI 3.2
    +---------------------------------------------+
    |                                             |
    |   +-------------+   +-------------------+   |
    |   |  Notebook    |   |  Feature Store    |   |
    |   |  (Workbench) |-->|  (Feast Operator) |   |
    |   +-------------+   +---------+---------+   |
    |                               |              |
    |              +----------------+----------+   |
    |              v                v          v   |
    |   +----------------+ +------------+ +-----+  |
    |   | Offline Store  | | Online     | |  S3 |  |
    |   | (Parquet/MinIO)| | Store      | |     |  |
    |   |                | | (Postgres) | | Reg |  |
    |   | Training       | | Serving    | |+Data|  |
    |   +----------------+ +------------+ +-----+  |
    |                                             |
    +---------------------------------------------+
```

| Component | Technology | Usage |
|-----------|------------|-------|
| Offline Store | Parquet on S3 (MinIO) | Historical features for training |
| Online Store | PostgreSQL | Real-time features for inference |
| Registry | S3 (MinIO) | Feature metadata |
| Feature Server | Feast (RHOAI Operator) | gRPC/REST API to serve features |
| Notebook | RHOAI Workbench | Development and experimentation |