# Volatility Scaling & Portfolio Analysis

This notebook demonstrates how to:
1. Imports, Data Loader and Rf Detector
2. Select fund (month period logic)
3. Weight prep
4. Core Stats + Run Analysis
5. Export
6. Widget /UI
7. Output in-sample and out-of-sample results to Excel with formatting.

In [1]:
# Legacy helper and metric functions replaced by modules
# See trend_analysis.metrics and run_analysis.py


In [2]:
import pandas as pd
import numpy as np
import logging
import inspect
from collections import namedtuple
from typing import Dict, Optional, Callable
import ipywidgets as widgets
from ipyfilechooser import FileChooser
from IPython.display import display, clear_output
from trend_analysis.data import load_csv, identify_risk_free_fund
from trend_analysis.core.rank_selection import (
    FundSelectionConfig,
    RiskStatsConfig,
    select_funds,
    register_metric,
    METRIC_REGISTRY,
)
from trend_analysis.export import make_summary_formatter, export_to_excel


## 2. Select Funds

In [3]:
# ===============================================================
# 2 · SELECT_FUNDS  (restored ≤ 3-missing-months rule)
# ===============================================================

cfg = FundSelectionConfig(
    max_missing_months           = 3,
    max_consecutive_month_gap    = 6,
    outlier_threshold            = 0.5,
    zero_return_threshold        = 0.2,
    enforce_monotonic_index      = True,
    allow_duplicate_dates        = False,
    max_missing_ratio            = 0.05,
    max_drawdown                 = 0.3,
    min_volatility               = 0.05,
    max_volatility               = 1.0,
    min_avg_return               = 0.0,
    max_skewness                 = 3.0,
    max_kurtosis                 = 10.0,
    expected_freq                = "B",
    max_gap_days                 = 3,
    min_aum_usd                  = 1e7,
)

def select_funds(
    df: pd.DataFrame,
    rf_col: str,
    fund_columns: list[str],
    in_sdate: str,
    in_edate: str,
    out_sdate: str,
    out_edate: str,
    cfg: FundSelectionConfig,
    selection_mode: str = "all",
    random_n: int | None = None
) -> list[str]:
    """
    Select eligible funds with additional data-validity and coverage checks driven by FundSelectionConfig.
    """
    # Ensure Date is sorted
    df = df.sort_values("Date")  # guarantee monotonic index

    # Prepare monthly periods within analysis window
    df["Month"] = df["Date"].dt.to_period("M")
    span = pd.period_range(
        pd.Period(in_sdate, "M"), pd.Period(out_edate, "M"), freq="M"
    )

    eligible_funds: list[str] = []
    for f in fund_columns:
        try:
            ser = df.set_index("Date")[f]
            clean = ser.dropna()

            # 1. Implausible value limits
            if not clean.between(-cfg.implausible_value_limit, cfg.implausible_value_limit).all():
                raise ValueError(f"Values outside ±{cfg.implausible_value_limit}")

            # 2. Extreme outlier threshold
            if (clean.abs() > cfg.outlier_threshold).any():
                raise ValueError(f"Outliers beyond ±{cfg.outlier_threshold}")

            # 3. Excessive zero-return rate
            if (clean == 0).mean() > cfg.zero_return_threshold:
                raise ValueError(f"Zero-return proportion > {cfg.zero_return_threshold}")

            # 4. Monotonic date index
            if cfg.enforce_monotonic_index and not clean.index.is_monotonic_increasing:
                raise ValueError("Date index not monotonically increasing")

            # 5. Duplicate dates
            if not cfg.allow_duplicate_dates and clean.index.duplicated().any():
                raise ValueError("Duplicate dates detected in index")

            # 6. Coverage checks using config thresholds
            m_ok = df.groupby("Month")[f].apply(lambda col: col.notna().any())
            mask = m_ok.reindex(span, fill_value=False).to_numpy()

            # tolerance for missing months per-cfg
            missing_count = (~mask).sum()
            if missing_count > cfg.max_missing_months:
                raise ValueError(f"Missing-month count {missing_count} exceeds {cfg.max_missing_months}")

            # maximum run of consecutive missing months per-cfg with guard
            temp = np.flatnonzero(np.r_[True, mask, True])
            if temp.size <= 1:
                gap = 0
            else:
                gap = np.diff(temp).max() - 1
            if gap > cfg.max_consecutive_month_gap:
                raise ValueError(f"Consecutive-missing gap {gap} exceeds {cfg.max_consecutive_month_gap}")

            eligible_funds.append(f)

        except ValueError:
            continue
        except KeyError:
            continue
        except Exception:
            continue

    # Final selection-mode logic
    if selection_mode == "all" or random_n is None:
        return eligible_funds
    if selection_mode == "random":
        if random_n > len(eligible_funds):
            raise ValueError(
                f"random_n exceeds eligible pool: {random_n} > {len(eligible_funds)}"
            )
        return list(np.random.choice(eligible_funds, random_n, replace=False))

    raise ValueError(f"Unsupported selection_mode '{selection_mode}'")



## 3. Weight Prep

In [4]:
# ───────────────────────────────────────────────────────────────
#  3 · WEIGHT PREP
# ───────────────────────────────────────────────────────────────
def prepare_weights(selected: list[str],
                    custom: Dict[str, int] | None) -> tuple[Dict[str, float], np.ndarray]:
    if not custom:
        w = {f: 1/len(selected) for f in selected}
    else:
        missing = [f for f in selected if f not in custom]
        if missing:
            raise ValueError(f"Missing weights for {missing}")
        w = {f: pct/100 for f, pct in custom.items()}
        if abs(sum(w.values()) - 1) > 1e-6:
            raise ValueError("Custom weights must sum to 100.")
    vec = np.array([w[f] for f in selected])
    return w, vec


## 4. Analysis (In-Sample & Out-of-Sample)
Use `trend_analysis.pipeline.run_analysis` for the analysis. Previous helper functions were removed in favor of the module version.

## 5. Excel Export
Creates an Excel file with In-Sample, Out-of-Sample and Equal-weight and User-weight.

In [6]:
# Deprecated local export_to_excel; use trend_analysis.export.export_to_excel\n

## 6. Run Parameters, Widgets & User Inputs
Here we define some IPython widgets for in-sample/out-of-sample dates, target volatility, monthly cost, etc. Also lets us use custom weights.

### Using This Notebook
1. Run all cells.
2. Call `demo_run()` in a new cell to see a quick example with dummy data.
3. To use your own data, load it into a DataFrame (make sure it has a Date column and decimal returns in other columns), then call `pipeline.run_analysis()` and `export_to_excel()`.
4. For interactive selection, do:
   ```python
   display(ui_inputs)
   ```
   Then wire the `apply_button` to a callback function that reads the widget values and runs `pipeline.run_analysis()`.
5. For custom weights, call:
   ```python
   my_weights = get_custom_weights(selected_funds)
   ```
   Then pass `my_weights` into your logic.


In [None]:
from trend_analysis.core.rank_selection import build_ui
from IPython.display import display

display(build_ui())