# Setup Notebook

## Environment Configuration for Cost-Aware AI Decision System

This notebook initializes the Databricks environment:
1. **Creates Unity Catalog** - Centralized data governance
2. **Creates Schema** - Logical grouping for our tables
3. **Creates Volume** - Storage for raw data files

### Prerequisites
- Databricks workspace with Unity Catalog enabled
- Appropriate permissions to create catalogs/schemas

### Project Architecture
```
Unity Catalog: cost_aware_capstone
    └── Schema: risk_decisioning
        ├── Volume: raw_data (CSV files)
        ├── Table: bronze_cost_aware_cases
        ├── Table: silver_cost_aware_features
        ├── Table: ml_risk_predictions
        └── Table: gold_decision_recommendations
```

---
### Project Constants

In [0]:
PROJECT_CATALOG = "cost_aware_capstone"
PROJECT_SCHEMA = "risk_decisioning"
RAW_VOLUME = "raw_data"

print("Project catalog:", PROJECT_CATALOG)
print("Project schema:", PROJECT_SCHEMA)
print("Raw data volume:", RAW_VOLUME)

### Create Catalog

In [0]:
%sql
CREATE CATALOG IF NOT EXISTS cost_aware_capstone;

### Create Schema

In [0]:
%sql

USE CATALOG cost_aware_capstone;

CREATE SCHEMA IF NOT EXISTS risk_decisioning;
USE SCHEMA risk_decisioning;

### Create Unity Catalog Volume

In [0]:
%sql
CREATE VOLUME IF NOT EXISTS raw_data;

### Upload data using databricks UI 

In [0]:
file_path = "/Volumes/cost_aware_capstone/risk_decisioning/raw_data/cost_aware_cases.csv"

---
## Setup Complete

**Next Steps**:
1. Upload `cost_aware_cases.csv` to the Volume using Databricks UI
2. Run `01_Bronze_Ingestion.ipynb` to load data into Delta table

**Data Schema Expected**:
| Column | Type | Description |
|--------|------|-------------|
| case_id | STRING | Unique identifier |
| transaction_amount | DOUBLE | Transaction value |
| tx_velocity_24h | INT | Transaction count in 24h |
| unusual_location_flag | INT | Location anomaly indicator |
| device_change_flag | INT | New device indicator |
| account_age_days | INT | Account tenure |
| investigation_cost | DOUBLE | Cost to investigate |
| fraud_loss_if_missed | DOUBLE | Potential loss if fraud |
| label_fraud | INT | Ground truth label |

---