# Risk Rule Engine — Notebook Template

This notebook is the **working template** for the Risk Rule Engine project, structured as a **two-stage engine**:

1. **Feature Selection Engine** (`feature_selection_py/`)  
   Produces a *rule-ready* feature specification (`variable_spec`) and diagnostics artifacts (IV, correlation, VIF, worst-tail ranking).

2. **Rule Construction Engine** (`rules_py/`)  
   Generates candidate rules (1D / 2D / 3D), evaluates each rule with a **single scalar metric** (e.g., **G:B ratio** or **Bad Balance BR-times**), and provides deep-dive impact / overlap / visualisations.

---

## Folder structure assumed

- `feature_selection_py/`
  - `feature_selection_pipeline.py`  ← main entrypoint for Part 1
  - `fs_utils.py`, `fs_iv.py`, `fs_filters.py`, `fs_plots.py`
  - `feature_engineering.py` (optional)

- `rules_py/`
  - `rule_metrics.py`
  - `rules_searching.py`
  - `rule_impact_analysis.py`
  - `rule_overlap_analysis.py`
  - `rule_visualisation.py`

_Last updated: 2026-01-24_


## 0. Setup

Run this once per session. Adjust `PROJECT_ROOT` if needed.


In [None]:
import sys
from pathlib import Path

# If you open the notebook from repo root, this is fine:
PROJECT_ROOT = Path.cwd()

# If you open from a subfolder, uncomment and point to repo root:
# PROJECT_ROOT = Path("/workspaces/NextGen_Rule_Engine")

sys.path.insert(0, str(PROJECT_ROOT))

print("PROJECT_ROOT:", PROJECT_ROOT)


In [None]:
import numpy as np
import pandas as pd

pd.set_option("display.max_columns", 200)
pd.set_option("display.width", 120)


## 1. Load data

Replace this section with your real data loading logic.

Required (typical) columns:
- `BAD_FLAG`: 1/0 indicator (bad)
- `TOTAL_BAL`: total exposure / balance
- `BAD_BAL`: bad exposure / balance


In [None]:
# TODO: replace with your data source
# Example:
# data = pd.read_parquet(PROJECT_ROOT / "data" / "dataset.parquet")

data = pd.DataFrame()  # placeholder
data.head()


In [None]:
# Define your column names here
BAD_FLAG = "bad_flag"
TOTAL_BAL = "written_amount"
BAD_BAL = "bad_balance"

required_cols = [BAD_FLAG, TOTAL_BAL, BAD_BAL]
missing = [c for c in required_cols if c not in data.columns]
if missing:
    print("⚠️ Missing columns:", missing)
else:
    print("✅ Required columns present")


# Part 1 — Feature Selection Engine

**Goal:** produce a rule-ready feature specification (`variable_spec`) that will be consumed by the rule construction engine.

Outputs:
- `selected_num`, `selected_cat`
- `variable_spec` (DataFrame)
- `artifacts` (IV, correlation pairs, VIF, worst-tail ranking tables)


## 2. Import feature selection pipeline

In [None]:
from feature_selection_py.feature_selection_pipeline import (
    run_feature_selection_pipeline,
    FeatureSelectionConfig,
)

print("Feature selection imports OK")


## 3. (Optional) Load data dictionary

If you have a data dictionary (variable/definition/direction), load it here.


In [None]:
# Example:
# data_dictionary = pd.read_csv(PROJECT_ROOT / "data" / "data_dictionary.csv")

data_dictionary = None  # or a DataFrame
data_dictionary


## 4. Run feature selection pipeline

Tune thresholds in `FeatureSelectionConfig` depending on your dataset and business need.


In [None]:
cfg = FeatureSelectionConfig(
    iv_threshold=0.02,
    corr_threshold=0.85,
    use_vif=False,             # set True if you want VIF pruning
    run_worst_tail_rank=True,
    worst_pct=0.05,
    max_features=None,         # set an int to cap
)

if len(data) > 0 and not missing:
    fs_out = run_feature_selection_pipeline(
        data,
        bad_flag=BAD_FLAG,
        bal_variable=TOTAL_BAL,
        bad_bal_variable=BAD_BAL,
        data_dictionary=data_dictionary,
        config=cfg,
    )

    selected_num = fs_out["selected_num"]
    selected_cat = fs_out["selected_cat"]
    variable_spec = fs_out["variable_spec"]
    fs_artifacts = fs_out["artifacts"]

    print("Selected numeric:", len(selected_num))
    print("Selected categorical:", len(selected_cat))
    variable_spec.head()


## 5. Inspect feature selection artifacts (optional)

In [None]:
if "fs_artifacts" in globals():
    fs_artifacts.keys()


In [None]:
# IV table
if "fs_artifacts" in globals() and fs_artifacts.get("iv_table") is not None:
    fs_artifacts["iv_table"].head(20)


In [None]:
# Worst-tail ranking (bad volume)
if "fs_artifacts" in globals() and fs_artifacts.get("worst_tail_bad_vol") is not None:
    fs_artifacts["worst_tail_bad_vol"].head(20)


## 6. Export `variable_spec` for rule construction (optional)

In [None]:
from pathlib import Path

OUT_DIR = Path(PROJECT_ROOT) / "results"
OUT_DIR.mkdir(exist_ok=True)

if "variable_spec" in globals():
    out_path = OUT_DIR / "variable_spec.csv"
    variable_spec.to_csv(out_path, index=False)
    print("Saved:", out_path)


# Part 2 — Rule Construction Engine

**Goal:** generate candidate rules (1D / 2D / 3D), score each rule with a **single metric**, and produce ranked tables.

Metrics:
- `G_to_B` (G:B ratio)
- `BR_Bal_Times` (bad balance BR-times, depending on your metric implementation)


## 7. Import rule engine modules

In [None]:
from rules_py.rule_metrics import (
    compute_baseline_stats,
    rule_metric_summary,
    combine_checking_gb_ratio,
    combine_checking_bal_br_times,
)

from rules_py.rules_searching import (
    Rules_Optimisation_Search_Algorithm_1D,
    Rules_Optimisation_Search_Algorithm_2D,
    Rules_Optimisation_Search_Algorithm_3D,
)

from rules_py.rule_impact_analysis import (
    new_baseline_performance_after_rule,
    build_new_baseline_table_from_rule_list,
    # If you implemented your multi-rule impact-table builder, import it too:
    # build_rule_impact_table_from_masks,
)

from rules_py.rule_overlap_analysis import (
    two_rules_redundancy,
    three_rules_redundancy,
)

from rules_py.rule_visualisation import (
    group_performance_one_rule,
    group_performance_two_rules,
)

print("Rule engine imports OK")


## 8. Baseline stats

In [None]:
if len(data) > 0 and not missing:
    baseline_stats = compute_baseline_stats(data, BAD_FLAG, TOTAL_BAL, BAD_BAL)
    baseline_stats


## 9. Candidate rule search (1D / 2D / 3D)

Uses `variable_spec` from Part 1 as the input spec for rule construction.


In [None]:
Metric_name = "G_to_B"   # or "BR_Bal_Times"
min_bads = 10

if "variable_spec" in globals() and len(data) > 0 and not missing and len(variable_spec) > 0:
    df_1d = Rules_Optimisation_Search_Algorithm_1D(
        data=data,
        variable_dateframe=variable_spec,
        bad_flag=BAD_FLAG,
        bad_bal=BAD_BAL,
        total_bal=TOTAL_BAL,
        selected_function=None,
        Metric_name=Metric_name,
        min_bads=min_bads,
    )
    df_1d.head(20)


In [None]:
if "variable_spec" in globals() and len(data) > 0 and not missing and len(variable_spec) > 0:
    df_2d = Rules_Optimisation_Search_Algorithm_2D(
        data=data,
        variable_dateframe=variable_spec,
        bad_flag=BAD_FLAG,
        bad_bal=BAD_BAL,
        total_bal=TOTAL_BAL,
        selected_function=None,
        Metric_name=Metric_name,
        min_bads=min_bads,
    )
    df_2d.head(20)


In [None]:
if "variable_spec" in globals() and len(data) > 0 and not missing and len(variable_spec) > 0:
    df_3d = Rules_Optimisation_Search_Algorithm_3D(
        data=data,
        variable_dateframe=variable_spec,
        bad_flag=BAD_FLAG,
        bad_bal=BAD_BAL,
        total_bal=TOTAL_BAL,
        selected_function=None,
        Metric_name=Metric_name,
        min_bads=min_bads,
        include_mixed_3d=True,
    )
    df_3d.head(20)


## 10. Deep-dive impact for selected rules

Evaluate the **new baseline** after implementing a chosen rule.


In [None]:
# Example: deep dive on a single manual rule mask (replace with your chosen mask)
# rule = (data["x"] > 5)
# impact = new_baseline_performance_after_rule(data, rule, BAD_FLAG, TOTAL_BAL, BAD_BAL)
# impact

print("TODO: select rule mask and run impact analysis")


## 11. Export ranked tables (optional)

In [None]:
OUT_DIR = Path(PROJECT_ROOT) / "results"
OUT_DIR.mkdir(exist_ok=True)

# Uncomment when tables exist
# if "df_1d" in globals(): df_1d.to_csv(OUT_DIR / "ranked_rules_1d.csv", index=False)
# if "df_2d" in globals(): df_2d.to_csv(OUT_DIR / "ranked_rules_2d.csv", index=False)
# if "df_3d" in globals(): df_3d.to_csv(OUT_DIR / "ranked_rules_3d.csv", index=False)

print("Exports folder:", OUT_DIR)
