# Feature Store: Manage Features and Generate Training Data

This notebook demonstrates the `snowflakeR` package interface to the Snowflake Feature Store.
You'll learn how to define entities, create feature views, generate training data with
point-in-time correct joins, and tie it all together with the Model Registry.

This notebook is for **local R environments** (RStudio, Posit Workbench, JupyterLab with R kernel).
For Snowflake Workspace Notebooks, use `workspace_feature_store.ipynb`.

**Before you start:** Copy `notebook_config.yaml.template` to `notebook_config.yaml`
and edit it with your account, warehouse, database, and schema.

**Sections:**
1. Setup
2. Connect & Feature Store Context
3. Entities
4. Feature Views
5. Training Data Generation
6. Retrieve Features for Inference
7. End-to-End: Feature Store + Model Registry
8. Cleanup

---

## 1. Setup

Install snowflakeR and set up the Python environment (one time only).

```r
# install.packages("pak")
# pak::pak("Snowflake-Labs/snowflakeR")
# snowflakeR::sfr_install_python_deps()
```

In [None]:
library(snowflakeR)

---
## 2. Connect & Feature Store Context

`sfr_load_notebook_config()` reads `notebook_config.yaml` and runs
`USE WAREHOUSE / DATABASE / SCHEMA` to set the execution context.

All table references use fully qualified names via `sfr_fqn()` for
consistency with Workspace Notebooks.

In [None]:
# Connect (reads ~/.snowflake/connections.toml or config connection params)
conn <- sfr_connect()

# Load config and set execution context
conn <- sfr_load_notebook_config(conn)
conn

In [None]:
# Create a Feature Store context targeting the configured schema
# `create = TRUE` creates the schema and required tags if they don't exist
fs <- sfr_feature_store(
  conn,
  database  = conn$database,
  schema    = conn$schema,
  warehouse = conn$warehouse,
  create    = TRUE
)

fs

### What is an `sfr_feature_store` object?

It holds the connection and the target database/schema/warehouse for all Feature Store
operations. Pass it as the first argument to all `sfr_*_entity()` and `sfr_*_feature_view()` functions.

---

## 3. Entities

**Entities** define join keys that link features to business objects (customers, products, etc.).
They are the foundation for Feature Views.

### Create sample data

First, let's create some sample tables to work with.

In [None]:
# Create sample order data
sfr_execute(conn, paste("
  CREATE OR REPLACE TABLE", sfr_fqn(conn, "SFR_DEMO_ORDERS"), "(
    customer_id INT,
    order_date  DATE,
    order_total DOUBLE
  )
"))

sfr_execute(conn, paste("
  INSERT INTO", sfr_fqn(conn, "SFR_DEMO_ORDERS"), "VALUES
    (1, '2025-01-15', 45.50),
    (1, '2025-02-20', 82.30),
    (1, '2025-03-10', 15.00),
    (2, '2025-01-22', 120.00),
    (2, '2025-03-05', 55.75),
    (3, '2025-02-01', 200.00),
    (3, '2025-02-15', 30.25),
    (3, '2025-03-20', 95.50)
"))

# Create sample label data
sfr_execute(conn, paste("
  CREATE OR REPLACE TABLE", sfr_fqn(conn, "SFR_DEMO_LABELS"), "(
    customer_id INT,
    churned     INT
  )
"))

sfr_execute(conn, paste("
  INSERT INTO", sfr_fqn(conn, "SFR_DEMO_LABELS"), "VALUES (1, 0), (2, 1), (3, 0)
"))

cat("Sample tables created.\n")

### Create and manage entities

In [None]:
# Create a customer entity
customer <- sfr_create_entity(
  fs,
  name      = "SFR_DEMO_CUSTOMER",
  join_keys = "CUSTOMER_ID",
  desc      = "Demo customer entity"
)

customer

In [None]:
# List all entities
entities <- sfr_list_entities(fs)
entities

In [None]:
# Get a specific entity
customer <- sfr_get_entity(fs, "SFR_DEMO_CUSTOMER")
customer

In [None]:
# Update the description
sfr_update_entity(fs, "SFR_DEMO_CUSTOMER", desc = "Primary customer entity for demo")

---

## 4. Feature Views

**Feature Views** define the SQL transformation that produces features.
They can be:
- **Managed:** Automatically refreshed as a dynamic table (`refresh_freq` specified)
- **External:** Manually maintained (no `refresh_freq`)

### Create a Feature View from SQL

In [None]:
# One-step creation: SQL-based Feature View
fv <- sfr_create_feature_view(
  fs,
  name     = "SFR_DEMO_CUST_FEATURES",
  version  = "v1",
  entities = customer,
  features = paste("
    SELECT
      customer_id,
      AVG(order_total)   AS avg_order_total,
      COUNT(*)           AS order_count,
      SUM(order_total)   AS total_spend,
      MAX(order_date)    AS last_order_date
    FROM", sfr_fqn(conn, "SFR_DEMO_ORDERS"), "
    GROUP BY customer_id
  "),
  desc = "Customer aggregate features from orders"
)

fv

### Alternative: Two-step (draft then register)

This mirrors the Python API and is useful when you want to inspect the draft:

```r
# Step 1: Create a local draft
fv_draft <- sfr_feature_view(
  name     = "MY_FEATURES",
  entities = customer,
  features = "SELECT ... FROM ...",
  refresh_freq = "1 hour"
)

# Step 2: Register (materialise)
fv <- sfr_register_feature_view(fs, fv_draft, version = "v1")
```

### Alternative: dbplyr-based features

Use `sfr_dbi_connection()` to get an RSnowflake DBI connection for dbplyr:

```r
library(dplyr); library(dbplyr)

dbi_con <- sfr_dbi_connection(conn)
orders_tbl <- tbl(dbi_con, I(sfr_fqn(conn, "SFR_DEMO_ORDERS")))
features_query <- orders_tbl |>
  group_by(CUSTOMER_ID) |>
  summarise(avg_total = mean(ORDER_TOTAL), order_count = n())

fv <- sfr_create_feature_view(
  fs, "CUST_FV_DBPLYR", "v1",
  entities = customer,
  features = features_query   # dbplyr lazy table -> SQL
)
```

### Manage Feature Views

In [None]:
# List all Feature Views
fvs <- sfr_list_feature_views(fs)
fvs

In [None]:
# Get a specific version
fv <- sfr_get_feature_view(fs, "SFR_DEMO_CUST_FEATURES", "v1")
fv

In [None]:
# Read feature data directly
feature_data <- sfr_read_feature_view(fs, "SFR_DEMO_CUST_FEATURES", "v1")
feature_data

### Refresh management (for managed Feature Views)

```r
# Manually trigger a refresh
sfr_refresh_feature_view(fs, "MY_FV", "v1")

# Check refresh history
sfr_get_refresh_history(fs, "MY_FV", "v1")

# Pause/resume automatic refresh
sfr_suspend_feature_view(fs, "MY_FV", "v1")
sfr_resume_feature_view(fs, "MY_FV", "v1")
```

---

## 5. Training Data Generation

Join **spine** (label) data with Feature Views using point-in-time correct joins.
This ensures no data leakage -- features are joined as-of the label timestamp.

In [None]:
# Generate training data by joining labels with features
training_data <- sfr_generate_training_data(
  fs,
  spine = paste("SELECT customer_id, churned FROM", sfr_fqn(conn, "SFR_DEMO_LABELS")),
  features = list(
    list(name = "SFR_DEMO_CUST_FEATURES", version = "v1")
  ),
  spine_label_cols = "churned"
)

training_data

The result is a regular R data.frame -- ready for `lm()`, `glm()`, `randomForest()`, etc.

---

## 6. Retrieve Features for Inference

At inference time, fetch the **latest** feature values (no labels, no PIT logic).

In [None]:
# Get current features for all customers
inference_features <- sfr_retrieve_features(
  fs,
  spine = paste("SELECT DISTINCT customer_id FROM", sfr_fqn(conn, "SFR_DEMO_ORDERS")),
  features = list(
    list(name = "SFR_DEMO_CUST_FEATURES", version = "v1")
  )
)

inference_features

---

## 7. End-to-End -- Feature Store + Model Registry

Tie everything together: generate training data from the Feature Store,
train a model in R, register it, and score new customers.

In [None]:
# 1. Generate training data from Feature Store
training <- sfr_generate_training_data(
  fs,
  spine = paste("SELECT customer_id, churned FROM", sfr_fqn(conn, "SFR_DEMO_LABELS")),
  features = list(
    list(name = "SFR_DEMO_CUST_FEATURES", version = "v1")
  ),
  spine_label_cols = "churned"
)

cat("Training data:\n")
str(training)

In [None]:
# 2. Train a model in R
model <- glm(
  churned ~ avg_order_total + order_count + total_spend,
  data   = training,
  family = binomial
)

summary(model)

In [None]:
# 3. Test locally
test_input <- training[, c("avg_order_total", "order_count", "total_spend")]
preds <- sfr_predict_local(model, test_input)
cbind(training[, c("customer_id", "churned")], preds)

In [None]:
# 4. Register to Model Registry
reg <- sfr_model_registry(conn)

mv <- sfr_log_model(
  reg,
  model      = model,
  model_name = "SFR_DEMO_CHURN",
  input_cols = list(
    avg_order_total = "double",
    order_count     = "double",
    total_spend     = "double"
  ),
  output_cols = list(prediction = "double"),
  comment = "Logistic regression for customer churn"
)

mv

In [None]:
# 5. Score new customers using Feature Store features
new_features <- sfr_retrieve_features(
  fs,
  spine = paste("SELECT DISTINCT customer_id FROM", sfr_fqn(conn, "SFR_DEMO_ORDERS")),
  features = list(
    list(name = "SFR_DEMO_CUST_FEATURES", version = "v1")
  )
)

# Local prediction (or use sfr_predict for remote)
scores <- sfr_predict_local(
  model,
  new_features[, c("avg_order_total", "order_count", "total_spend")]
)

cbind(new_features[, "customer_id", drop = FALSE], churn_score = scores$prediction)

---

## 8. Cleanup

In [None]:
# Uncomment to clean up demo objects
# (commented out to avoid accidental deletion on Run All)
#
# sfr_delete_model(reg, "SFR_DEMO_CHURN")
# sfr_delete_feature_view(fs, "SFR_DEMO_CUST_FEATURES", "v1")
# sfr_delete_entity(fs, "SFR_DEMO_CUSTOMER")
#
# sfr_execute(conn, paste("DROP TABLE IF EXISTS", sfr_fqn(conn, "SFR_DEMO_ORDERS")))
# sfr_execute(conn, paste("DROP TABLE IF EXISTS", sfr_fqn(conn, "SFR_DEMO_LABELS")))
#
# sfr_disconnect(conn)
# cat("All demo objects cleaned up.\n")

---

## Next steps

- **Full Feature Store API:** `vignette("feature-store", package = "snowflakeR")`
- **Model Registry details:** `vignette("model-registry", package = "snowflakeR")`
- **Workspace Notebook tips:** `vignette("workspace-notebooks", package = "snowflakeR")`