# Credit Risk Pipeline Quickstart

This notebook runs the **Unified Risk Pipeline** end-to-end on the bundled synthetic dataset while preserving the step-by-step execution flow. You can rerun individual sections independently without switching to a simplified pipeline.

## 1. Environment & Data Preparation

In [None]:
import sys
import importlib
import importlib.util
import subprocess
import site
import shutil
from pathlib import Path

NOTEBOOK_FLAGS = globals().setdefault('_NOTEBOOK_FLAGS', {})

PROJECT_ROOT = Path.cwd().resolve()
SRC_PATH = PROJECT_ROOT / "src"
if SRC_PATH.exists() and str(SRC_PATH) not in sys.path:
    sys.path.insert(0, str(SRC_PATH))


def _force_reinstall_from_dev() -> None:
    # Uninstall existing risk-pipeline packages and reinstall the development branch
    python_exe = sys.executable
    for name in ('risk-pipeline', 'risk_pipeline'):
        try:
            subprocess.run([python_exe, '-m', 'pip', 'uninstall', '-y', name], check=False)
        except Exception:
            pass
    try:
        for sp in site.getsitepackages() + [site.getusersitepackages()]:
            package_path = Path(sp) / 'risk_pipeline'
            if package_path.exists():
                shutil.rmtree(package_path, ignore_errors=True)
    except Exception:
        pass
    try:
        subprocess.run([python_exe, '-m', 'pip', 'cache', 'purge'], check=False)
    except Exception:
        pass
    git_url = 'git+https://github.com/selimoksuz/risk-model-pipeline.git@development'
    subprocess.run([python_exe, '-m', 'pip', 'install', '--no-cache-dir', '--force-reinstall', '-U', git_url], check=True)


def _import_risk_pipeline():
    try:
        import risk_pipeline as rp
        return rp
    except Exception as exc:
        message = repr(exc)
        print('risk_pipeline import failed:', message)
        if 'IndentationError' in message or 'data_processor.py' in message:
            print('Attempting clean reinstall from development branch...')
            _force_reinstall_from_dev()
            import importlib as _il
            return _il.import_module('risk_pipeline')
        print('Manual install hint:')
        print('  pip install -U "git+https://github.com/selimoksuz/risk-model-pipeline.git@development"')
        raise


TSFRESH_AVAILABLE = importlib.util.find_spec('tsfresh') is not None
if TSFRESH_AVAILABLE:
    print('tsfresh available (advanced time-series features can be enabled via config).')
else:
    print('tsfresh is not installed; pipeline will proceed without time-series feature mining.')

risk_pipeline = _import_risk_pipeline()
NOTEBOOK_FLAGS['tsfresh_available'] = TSFRESH_AVAILABLE

if 'pipe' not in globals():
    pipe = None
if 'results' not in globals():
    results = {}
if 'full_results' not in globals():
    full_results = {}


In [None]:
from pathlib import Path
import pandas as pd
import numpy as np
from IPython.display import display

from risk_pipeline.data.sample import load_credit_risk_sample

sample = load_credit_risk_sample()
OUTPUT_DIR = Path('notebooks/outputs/credit_risk_sample_notebook')
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

NOTEBOOK_CONTEXT = globals().setdefault('_NOTEBOOK_CONTEXT', {
    'data': {},
    'artifacts': {},
    'paths': {},
    'options': {}
})

NOTEBOOK_CONTEXT['paths']['root'] = PROJECT_ROOT
NOTEBOOK_CONTEXT['paths']['output'] = OUTPUT_DIR

NOTEBOOK_CONTEXT['data']['development'] = sample.development.copy()
NOTEBOOK_CONTEXT['data']['calibration_longrun'] = sample.calibration_longrun.copy()
NOTEBOOK_CONTEXT['data']['calibration_recent'] = sample.calibration_recent.copy()
NOTEBOOK_CONTEXT['data']['scoring_future'] = sample.scoring_future.copy()
NOTEBOOK_CONTEXT['data']['data_dictionary'] = sample.data_dictionary.copy()

NOTEBOOK_CONTEXT['artifacts'].setdefault('processed', None)
NOTEBOOK_CONTEXT['artifacts'].setdefault('splits', None)
NOTEBOOK_CONTEXT['artifacts'].setdefault('woe', None)
NOTEBOOK_CONTEXT['artifacts'].setdefault('selection', {})
NOTEBOOK_CONTEXT['artifacts'].setdefault('modeling', {})
NOTEBOOK_CONTEXT['artifacts'].setdefault('calibration', {})
NOTEBOOK_CONTEXT['artifacts'].setdefault('risk_bands', {})
NOTEBOOK_CONTEXT['artifacts'].setdefault('reports', {})

NOTEBOOK_CONTEXT['options']['random_state'] = 42


MAX_DEV_ROWS = 50000
if len(NOTEBOOK_CONTEXT['data']['development']) > MAX_DEV_ROWS:
    NOTEBOOK_CONTEXT['data']['development'] = NOTEBOOK_CONTEXT['data']['development'].sample(n=MAX_DEV_ROWS, random_state=42).reset_index(drop=True)
    print(f'Warning: downsampled development data to {MAX_DEV_ROWS:,} rows to avoid memory pressure.')

MAX_SCORE_ROWS = 50000
if len(NOTEBOOK_CONTEXT['data']['scoring_future']) > MAX_SCORE_ROWS:
    NOTEBOOK_CONTEXT['data']['scoring_future'] = NOTEBOOK_CONTEXT['data']['scoring_future'].sample(n=MAX_SCORE_ROWS, random_state=42).reset_index(drop=True)

DEV_DF = NOTEBOOK_CONTEXT['data']['development']
CAL_LONG_DF = NOTEBOOK_CONTEXT['data']['calibration_longrun']
CAL_RECENT_DF = NOTEBOOK_CONTEXT['data']['calibration_recent']
SCORE_DF = NOTEBOOK_CONTEXT['data']['scoring_future']
DATA_DICTIONARY = NOTEBOOK_CONTEXT['data']['data_dictionary']

dev_df = DEV_DF
cal_long_df = CAL_LONG_DF
cal_recent_df = CAL_RECENT_DF
score_df = SCORE_DF
data_dictionary = DATA_DICTIONARY

print(f"Development dataset: {dev_df.shape[0]:,} rows, {dev_df.shape[1]} columns")
print(f"Stage 1 calibration dataset: {cal_long_df.shape[0]:,} rows")
print(f"Stage 2 calibration dataset: {cal_recent_df.shape[0]:,} rows")
print(f"Scoring dataset: {score_df.shape[0]:,} rows")

display(dev_df.head())


## 2. Pipeline Configuration

In [None]:
from risk_pipeline.core.config import Config
from risk_pipeline.unified_pipeline import UnifiedRiskPipeline

cfg = Config(
    target_column='target',
    id_column='customer_id',
    time_column='app_dt',

    create_test_split=True,
    stratify_test=True,
    group_split_by_id=True,
    train_ratio=0.8,
    test_ratio=0.2,
    oot_months=3,

    output_folder=str(OUTPUT_DIR),
    output_excel_path=str(OUTPUT_DIR / 'risk_pipeline_report.xlsx'),

    enable_tsfresh_features=NOTEBOOK_FLAGS.get('tsfresh_available', False),
    enable_tsfresh_rolling=False,
    tsfresh_feature_set='efficient',
    tsfresh_n_jobs=4,
    tsfresh_window_months=12,
    tsfresh_min_events=1,
    tsfresh_min_unique_months=1,
    tsfresh_min_coverage_ratio=1.0,
    tsfresh_include_current_record=False,

    selection_steps=[
        'univariate',
        'psi',
        'vif',
        'correlation',
        'iv',
        'boruta',
        'stepwise',
    ],
    min_univariate_gini=0.05,
    psi_threshold=0.25,
    monthly_psi_threshold=0.15,
    oot_psi_threshold=0.25,
    test_psi_threshold=0.25,
    vif_threshold=5.0,
    correlation_threshold=0.9,
    iv_threshold=0.02,
    stepwise_method='forward',
    stepwise_max_features=25,
    psi_compare_axes=['monthly', 'oot', 'test'],
    psi_decision='all',

    algorithms=[
        'logistic',
        'lightgbm',
        'xgboost',
        'catboost',
        'randomforest',
        'extratrees',
        'woe_boost',
        'woe_li',
        'shao',
        'xbooster',
    ],
    model_selection_method='gini_oot',
    model_stability_weight=0.2,
    min_gini_threshold=0.5,
    max_train_oot_gap=0.03,
    use_optuna=True,
    hpo_trials=1,
    hpo_timeout_sec=1800,
    cv_folds=5,
    early_stopping_rounds=200,

    use_noise_sentinel=True,
    enable_dual=True,
    enable_woe_boost_scorecard=True,
    calculate_shap=True,
    shap_sample_size=2000,
    enable_scoring=True,
    score_model_name='best',
    enable_stage2_calibration=True,

    optimize_risk_bands=True,
    n_risk_bands=10,
    risk_band_method='pd_constraints',
    risk_band_min_bins=7,
    risk_band_max_bins=10,
    risk_band_micro_bins=1000,
    risk_band_min_sample_size=0,
    risk_band_min_weight=0.05,
    risk_band_max_weight=0.30,
    risk_band_hhi_threshold=0.15,
    risk_band_binomial_pass_weight=0.85,
    risk_band_max_iterations=100,
    risk_band_max_phase_iterations=50,
    risk_band_early_stop_rounds=10,

    random_state=42,
    n_jobs=-1,
    log_level='INFO',
)

pipe = UnifiedRiskPipeline(cfg)
cfg.excel_overwrite = True
cfg.cv_enable = True
cfg.class_weight = 'balanced'
cfg.sample_weight_column = None
cfg.freeze_config = True
cfg.persist_artifacts = True
cfg.save_model = True
cfg.run_id = 'quickstart_pipeline'
NOTEBOOK_CONTEXT['artifacts']['pipeline'] = pipe
NOTEBOOK_CONTEXT['artifacts']['config'] = cfg
print('Pipeline configured. enable_dual:', cfg.enable_dual, '| tsfresh enabled:', cfg.enable_tsfresh_features)


### Konfig Aciklamalari (Ozet)

- Cekirdek: target_column, id_column, time_column (snapshot_column otomatik olarak snapshot_month olarak ayarlanir)
- Ayirim: create_test_split, stratify_test, group_split_by_id, train_ratio, test_ratio, oot_ratio, oot_months
- Cikti: output_folder, output_excel_path, excel_overwrite
- TSFresh: enable_tsfresh_features, enable_tsfresh_rolling, tsfresh_window_months, tsfresh_min_events, tsfresh_min_unique_months, tsfresh_min_coverage_ratio, tsfresh_include_current_record
- Secim: selection_steps, min_univariate_gini, psi_threshold, monthly_psi_threshold, oot_psi_threshold, test_psi_threshold, vif_threshold, correlation_threshold, stepwise_method, stepwise_max_features
- PSI: psi_compare_axes, psi_decision, psi_bucketing_mode_woe, psi_bucketing_mode_raw
- Model: algorithms, model_selection_method, model_stability_weight, min_gini_threshold, max_train_oot_gap
- HPO/CV: use_optuna, hpo_trials, hpo_timeout_sec, cv_enable, cv_folds, early_stopping_rounds
- Agirliklar: class_weight, sample_weight_column
- Tanimlama: use_noise_sentinel, calculate_shap, enable_dual, enable_woe_boost_scorecard, enable_stage2_calibration, enable_scoring, score_model_name
- Kalibrasyon: stage1 ve stage2 zinciri (isotonic -> stage2 hedef ayari), stage2 hedef orani icin hazir alanlar
- Risk Bant: n_risk_bands, risk_band_method, risk_band_min_bins, risk_band_max_bins, risk_band_micro_bins, risk_band_min_sample_size, risk_band_min_weight, risk_band_max_weight, risk_band_hhi_threshold, risk_band_binomial_pass_weight, risk_band_max_iterations, risk_band_max_phase_iterations, risk_band_early_stop_rounds
- Calisma: random_state, n_jobs, freeze_config, persist_artifacts, save_model, run_id, log_level


## 3. TSFresh Feature Extraction

In [None]:
processed = pipe.run_process(dev_df, create_map=True, force=True)
NOTEBOOK_CONTEXT['artifacts']['processed'] = processed
print(f"Processed feature space: {processed.shape[1]} columns")

meta = pipe.data_.get('tsfresh_metadata')
if isinstance(meta, pd.DataFrame) and not meta.empty:
    display(meta.head())
else:
    status = 'disabled via config' if not cfg.enable_tsfresh_features else 'not generated'
    print(f'No TSFresh feature metadata available ({status}).')


## 4. Raw Numeric Processing

In [None]:
splits = pipe.run_split(processed, force=True)
NOTEBOOK_CONTEXT['artifacts']['splits'] = splits
raw_layers = pipe.results_.get('raw_numeric_layers', {})
print(f"Identified numeric features: {len(pipe.data_.get('numeric_features', []))}")
if raw_layers:
    train_raw = raw_layers.get('train_raw_prepped')
    if isinstance(train_raw, pd.DataFrame):
        preview_cols = pipe.data_.get('numeric_features', [])[:10]
        if preview_cols:
            display(train_raw[preview_cols].head())

impute_stats = getattr(pipe.data_processor, 'imputation_stats_', {})
if impute_stats:
    display(pd.DataFrame(impute_stats).T.head())

config_summary = pd.DataFrame([
    ("Target column", cfg.target_column),
    ("ID column", cfg.id_column),
    ("Time column", cfg.time_column),
    ("Train/Test/OOT split", f"{cfg.train_ratio:.0%}/{cfg.test_ratio:.0%}/{cfg.oot_ratio:.0%}"),
    ("OOT holdout months", cfg.oot_months),
    ("Risk band method", cfg.risk_band_method),
    ("Calibration chain", 'stage1 -> stage2'),
], columns=['Parameter', 'Configured value'])
display(config_summary)

flag_toggles = pd.DataFrame({
    'Feature': [
        'Dual RAW+WOE flow',
        'TSFresh feature mining',
        'Noise sentinel checks',
        'Optuna HPO',
        'Stage 2 calibration',
        'Scoring pipeline',
        'SHAP importance',
    ],
    'Enabled': [
        cfg.enable_dual,
        cfg.enable_tsfresh_features,
        cfg.use_noise_sentinel,
        cfg.use_optuna,
        cfg.enable_stage2_calibration,
        cfg.enable_scoring,
        cfg.calculate_shap,
    ],
})
flag_toggles['Enabled'] = flag_toggles['Enabled'].map({True: 'Yes', False: 'No'})
display(flag_toggles)

thresholds = pd.DataFrame({
    'Threshold': [
        'PSI master',
        'Monthly PSI',
        'OOT PSI',
        'Test PSI',
        'IV floor',
        'Univariate Gini floor',
        'Correlation ceiling',
        'VIF ceiling',
        '|Train-OOT| Gini gap',
    ],
    'Value': [
        cfg.psi_threshold,
        cfg.monthly_psi_threshold,
        cfg.oot_psi_threshold,
        cfg.test_psi_threshold,
        cfg.iv_threshold,
        cfg.min_univariate_gini,
        cfg.correlation_threshold,
        cfg.vif_threshold,
        cfg.max_train_oot_gap,
    ],
})
display(thresholds)

selection_order = pd.DataFrame({'Selection step': cfg.selection_steps})
selection_order.index = selection_order.index + 1
display(selection_order)

algorithms = pd.DataFrame({'Algorithm': cfg.algorithms})
algorithms.index = algorithms.index + 1
display(algorithms)


## 5. WOE Transformation

In [None]:
woe_results = pipe.run_woe(splits, force=True)
NOTEBOOK_CONTEXT['artifacts']['woe'] = woe_results
values = woe_results.get('woe_values', {})
print(f"WOE maps generated for {len(values)} variables")
if values:
    preview_rows = [
        {
            'variable': name,
            'type': info.get('type'),
            'iv': info.get('iv'),
        }
        for name, info in list(values.items())[:10]
    ]
    if preview_rows:
        display(pd.DataFrame(preview_rows))


## 6. Feature Selection

In [None]:
selection_raw = pipe.run_selection(mode='RAW', splits=splits, woe_results=woe_results, force=True)
selection_woe = pipe.run_selection(mode='WOE', splits=splits, woe_results=woe_results, force=True)

NOTEBOOK_CONTEXT['artifacts']['selection']['RAW'] = selection_raw
NOTEBOOK_CONTEXT['artifacts']['selection']['WOE'] = selection_woe

summary = pd.DataFrame([
    {'mode': 'RAW', 'selected_features': len(selection_raw.get('selected_features', []) or [])},
    {'mode': 'WOE', 'selected_features': len(selection_woe.get('selected_features', []) or [])},
])
display(summary)


## 7. Modeling (RAW vs WOE)

In [None]:
def _best_auc(payload):
    scores = payload.get('scores', {}) or {}
    preferred = payload.get('best_model_name')

    def _score(metrics):
        if not metrics:
            return float('-inf')
        for key in ('oot_auc', 'test_auc', 'train_auc'):
            value = metrics.get(key)
            if value is not None:
                return value
        return float('-inf')

    if preferred and preferred in scores:
        return _score(scores[preferred])
    if scores:
        return max((_score(metrics) for metrics in scores.values()), default=float('-inf'))
    return float('-inf')

models_raw = pipe.run_modeling(mode='RAW', splits=splits, selection_results=selection_raw, force=True)
models_woe = pipe.run_modeling(mode='WOE', splits=splits, selection_results=selection_woe, force=True)

NOTEBOOK_CONTEXT['artifacts']['modeling']['RAW'] = models_raw
NOTEBOOK_CONTEXT['artifacts']['modeling']['WOE'] = models_woe

frames = []
for mode_label, payload in [('RAW', models_raw), ('WOE', models_woe)]:
    scores = payload.get('scores', {})
    if scores:
        frame = pd.DataFrame(scores).T
        frame['mode'] = mode_label
        frames.append(frame)

if frames:
    combined = pd.concat(frames).reset_index().rename(columns={'index': 'model'})
    if 'status' in combined.columns:
        combined = combined.drop(columns=['status'])
    display(combined.sort_values(['mode', 'oot_auc'], ascending=[True, False]))
else:
    print('No models trained for either mode.')

flows = {
    'RAW': {
        'selection_results': selection_raw,
        'model_results': models_raw,
        'best_auc': _best_auc(models_raw),
    },
    'WOE': {
        'selection_results': selection_woe,
        'model_results': models_woe,
        'best_auc': _best_auc(models_woe),
    },
}

best_mode, best_flow = max(flows.items(), key=lambda item: item[1]['best_auc'])
pipe.results_['flows'] = flows
pipe.results_['best_mode'] = best_mode
pipe.results_['selection_results'] = best_flow['selection_results']
pipe.results_['model_results'] = best_flow['model_results']
pipe.config.enable_woe = (best_mode == 'WOE')
pipe.models_ = best_flow['model_results'].get('models', {})
pipe.selected_features_ = best_flow['selection_results'].get('selected_features', [])

print(f"Best mode: {best_mode} | Best model: {best_flow['model_results'].get('best_model_name')}")


## 8. Calibration

In [None]:
stage1 = pipe.run_stage1_calibration(model_results=pipe.results_['model_results'], calibration_df=cal_long_df, force=True)
stage2 = pipe.run_stage2_calibration(stage1_results=stage1, recent_df=cal_recent_df, force=True)

NOTEBOOK_CONTEXT['artifacts']['calibration']['stage1'] = stage1
NOTEBOOK_CONTEXT['artifacts']['calibration']['stage2'] = stage2

if isinstance(stage1, dict) and stage1.get('calibration_metrics'):
    print('Stage 1 calibration metrics:')
    display(pd.DataFrame([stage1['calibration_metrics']]))
else:
    print('Stage 1 calibration metrics unavailable.')

if isinstance(stage2, dict) and stage2.get('stage2_metrics'):
    print('Stage 2 calibration metrics:')
    display(pd.DataFrame([stage2['stage2_metrics']]))
else:
    print('Stage 2 calibration metrics unavailable.')


## 9. Risk Band Optimisation

In [None]:
bands = pipe.run_risk_bands(stage1_results=stage1, stage2_results=stage2, splits=pipe.results_.get('splits'), force=True)
NOTEBOOK_CONTEXT['artifacts']['risk_bands'] = bands

if isinstance(bands, dict):
    metrics = bands.get('metrics')
    band_stats = bands.get('band_stats') or bands.get('bands')
    if isinstance(metrics, dict):
        print('Risk band metrics:')
        display(pd.DataFrame(metrics, index=['value']).T)
    if isinstance(band_stats, pd.DataFrame) and not band_stats.empty:
        display(band_stats.head(20))
else:
    print('Risk band optimisation output not available.')


## 10. Reporting & Diagnostics

In [None]:
reports = pipe.run_reporting(force=True)
NOTEBOOK_CONTEXT['artifacts']['reports'] = reports

excel_path = Path(cfg.output_excel_path)
if excel_path.exists():
    print(f"Excel workbook generated: {excel_path}")
    try:
        xls = pd.ExcelFile(excel_path)
        print('Workbook sheets:')
        for sheet in xls.sheet_names:
            print(' -', sheet)
    except Exception as exc:
        print('Excel inspection failed:', exc)
else:
    print('Excel workbook has not been created.')

available_reports = sorted((reports or {}).keys())
if available_reports:
    display(pd.DataFrame({'report_key': available_reports}))


## 11. Consolidated Pipeline Run

In [None]:
full_pipe = UnifiedRiskPipeline(cfg)
full_results = full_pipe.fit(
    dev_df,
    data_dictionary=data_dictionary,
    calibration_df=cal_long_df,
    stage2_df=cal_recent_df,
    score_df=score_df,
)

NOTEBOOK_CONTEXT['artifacts']['full_run'] = full_results
print(f"Best mode: {full_results.get('best_model_mode')} | Best model: {full_results.get('best_model_name')}")
registry = pd.DataFrame(full_results.get('model_registry', []))
if not registry.empty:
    display(registry.sort_values(['mode', 'oot_auc'], ascending=[True, False]).head(20))
else:
    print('Model registry is empty.')


## 12. Scoring on Recent Data

In [None]:
scoring_output = pipe.run_scoring(score_df, force=True)
NOTEBOOK_CONTEXT['artifacts']['scoring'] = scoring_output

scored_df = scoring_output.get('dataframe') if isinstance(scoring_output, dict) else None
if isinstance(scored_df, pd.DataFrame):
    display(scored_df.head(10))
else:
    print('Scored dataframe unavailable.')

metrics = scoring_output.get('metrics') if isinstance(scoring_output, dict) else None
if isinstance(metrics, dict) and metrics:
    display(pd.DataFrame([metrics]).T)


## 13. Detayli Rapor Aciklamalari

**Model Ozeti & Performans**
- models_summary
- best_model
- best_model_vars_df
- confusion_matrix
- performance_report (AUC, Gini, KS, Brier, LogLoss, F1, Precision, Recall)
- lift_table (band, records, events, mean_score, lift, ks, capture_rate)
- baseline_metrics ve baseline_lift_table
- shap_importance

**Degisken Analizi**
- final_vars
- top20_iv
- top50_univariate
- selection_history
- correlation_clusters
- vif_summary
- noise_sentinel_check
- variable_dictionary
- shap_summary

**WOE & Binning**
- woe_mapping (numeric ve kategorik detay)
- woe_bins (degisken bazinda bin istatistikleri)
- best_model_details (bin_number, bin_range, woe, counts, event_rate, iv_contrib, importance)
- woe_degradation (raw ve woe gini farki)

**Stabilite & Validasyon**
- psi_summary, psi_dropped_features (train vs test, train vs oot)
- WOE_PSI, Score_PSI, Quantile_PSI
- run_meta (calisma parametreleri)
- monitor_report (score_psi, feature_psi)

**Kalibrasyon**
- calibration_metrics (Brier, ECE, MCE, Spiegelhalter z/p)
- calibration_tables (band bazinda observed_pd, predicted_pd, guven araliklari, binomial p)
- stage1 ve stage2 ciktisi
- hosmer_lemeshow sonucu

**Risk Bantlari & Skorlama**
- risk_bands (min_score, max_score, mean_score, n_samples, n_events, event_rate, sample_pct)
- band_tests: binomial_test, hosmer_lemeshow_test, herfindahl_index, monotonicity, band_psi
- risk_band_summary (scaled_score 300-850, risk_level)
- risk_score_mapping (SQL/Python uretimi)
- scoring_summary ve scored_data

**PD Binleme Optimizasyonu**
- micro_bins
- evaluate_bins (CI, DR-PD farki, HHI, agirlik, monotonluk, binomial pass weight)
- calculate_penalty (ceza bilesenleri: CI overlap, binomial fail/pas, DR-PD, HHI, weight, monotonicity)
- multi_start_optimization -> stepwise_optimize (fazlar: CI, Binomial, ince ayar)
- report_results (optimum esikler ve detay tablo)

**CLI / Notebook / Surec**
- Tek pipeline akisi (config ile dual secenekler)
- Train + OOT zorunlu; varsa sozluk rapora eklenir
- Secim akisi: PSI -> IV -> Correlation/VIF -> Boruta -> Stepwise (+noise sentinel)
- Algoritmalar: Logistic, GAM, CatBoost, LightGBM, XGBoost, RandomForest, ExtraTrees
- Notebook: end-to-end calisma, skor uretimi ve prediction

**Operasyon & Iyilestirme**
- README ve lisans kontrolleri
- Test kapsamini ve CI/CD akisini takip edin
- Surumleme, pre-commit, drift izleme (PSI/KS otomasyonu, alarm mekanizmalari)

**Git Is Akisi**
- development dalini guncel tutun (fetch/checkout/pull)
- git status ve remote show origin ile takibi surdurun
