#

## Risk Scoring Formula (Row Level)

| Formula              | Description                                                                                             | Best Suited Risk Philosophy                                                                                                                                                               |
| -------------------- | ------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Weighted Average** | Blend of scores, lets you emphasize factors (e.g., impact over exploitability).                         | **Balanced / Customizable**: Use when you want to reflect organizational priorities (e.g., “We care more about impact than exploitability”). Good for nuanced, practical risk management. |
| **Multiplicative**   | All factors must be high for a high score; a low score in any factor sharply reduces the overall score. | **Conservative**: Highlights only those risks where every dimension is bad (e.g., both easily exploited and highly impactful). Good if you only want to flag “slam dunk” threats.         |
| **Worst Case (Max)** | Takes the highest value from all factors.                                                               | **Worst-case**: Suitable for organizations that want to act on the most severe aspect of any vulnerability, regardless of other mitigating factors.                                       |
| **Simple Mean**      | Straight average of the factors.                                                                        | **Neutral / Cumulative**: Treats all dimensions as equally important and gives a broad overall risk. Good for general baseline or when you don’t have a strong preference.                |

## Aggregation Method (Asset Level)

| Aggregation | Description                                          | Best Suited Risk Philosophy                                                                                                                                                    |
| ----------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Max**     | Highest risk score among all CVEs for the asset.     | **Worst-case / “Red flag”**: Use if you want to know the single scariest risk for each asset—i.e., “What’s the worst that could happen?”                                       |
| **Mean**    | Average risk score across all CVEs for the asset.    | **Balanced / Generalized**: Captures the average risk profile. Good if you want to monitor overall risk climate for assets, not just outliers.                                 |
| **Median**  | Middle value; less influenced by extreme values.     | **Robust / Outlier-resistant**: Useful when your dataset has outliers or you want to avoid over-reacting to rare extreme cases.                                                |
| **Sum**     | Total of all risk scores for all CVEs for the asset. | **Cumulative / Exposure-focused**: Reflects the total “risk load.” Good for asset owners/managers who want to prioritize based on total exposure (“death by a thousand cuts”). |

## Risk Philosophy Table

| Risk Philosophy        | Scoring Formula(s)            | Aggregation(s) | When to Use                                               |
| ---------------------- | ----------------------------- | -------------- | --------------------------------------------------------- |
| **Conservative**       | Multiplicative                | Max            | To only act on high-confidence, multi-dimensional threats |
| **Worst-case**         | Worst Case (Max)              | Max            | To flag any asset with a single severe vulnerability      |
| **Balanced/Pragmatic** | Weighted Average, Simple Mean | Mean, Median   | For realistic, overall asset risk monitoring              |
| **Cumulative**         | Simple Mean, Weighted Average | Sum            | When interested in total risk exposure per asset          |
| **Outlier-resistant**  | Simple Mean, Weighted Average | Median         | To ignore rare extremes and focus on typical risks        |

### __Examples__:
* A highly regulated environment (finance, healthcare):\
_Worst Case scoring + Max aggregation_ (err on the side of caution).

* Resource-constrained org (must prioritize what to fix):\
_Weighted Average_ scoring (tune to what matters for you),\
_Sum or Mean_ aggregation (total or average burden per asset).

* Security-mature org (wants to “right-size” response):\
_Simple Mean + Median_ aggregation (focus on central tendency).

In [52]:
# interactive_risk_scoring_module.py
"""
Interactive Risk Scoring Module with User Input/Dropdown for Formula & Aggregation
- Supports multiple risk formulas (weighted, multiplicative, worst-case, mean)
- Aggregates by (max, mean, median, sum, count high-risk CVEs)
- Designed for Jupyter notebook (with ipywidgets dropdowns)
"""

import pandas as pd
import numpy as np
import ipywidgets as widgets
from ipywidgets import interact, Dropdown


# --- CONFIG ---
input_file = "../data/vuln_catalogue_v2.csv"  # Change to your path if needed

# --- LOAD DATA ---
def load_vuln_data(file_path):
    df = pd.read_csv(file_path)
    return df

# --- RISK FORMULAS ---
def weighted_average_score(row, weights=None):
    if weights is None:
        weights = {'baseScore': 0.5, 'exploitabilityScore': 0.25, 'impactScore': 0.25}
    vals = [(row.get(col), w) for col, w in weights.items() if pd.notnull(row.get(col))]
    if not vals:
        return np.nan
    score = sum(v * w for v, w in vals)
    total_weight = sum(w for _, w in vals)
    return round(score / total_weight, 2)

def multiplicative_risk_score(row):
    vals = [row.get(col) for col in ['baseScore', 'exploitabilityScore', 'impactScore']]
    if any(pd.isnull(v) for v in vals):
        return np.nan
    vals_norm = [v / 10.0 for v in vals]
    score = np.prod(vals_norm) * 10
    return round(score, 2)

def worst_case_score(row):
    vals = [row.get(col) for col in ['baseScore', 'exploitabilityScore', 'impactScore']]
    vals = [v for v in vals if pd.notnull(v)]
    if not vals:
        return np.nan
    return max(vals)

def simple_mean_score(row):
    vals = [row.get(col) for col in ['baseScore', 'exploitabilityScore', 'impactScore']]
    vals = [v for v in vals if pd.notnull(v)]
    if not vals:
        return np.nan
    return round(np.mean(vals), 2)

formula_map = {
    'Weighted Average': weighted_average_score,
    'Multiplicative': multiplicative_risk_score,
    'Worst Case (Max)': worst_case_score,
    'Simple Mean': simple_mean_score,
}

agg_map = {
    'Max': 'max',
    'Mean': 'mean',
    'Median': 'median',
    'Sum': 'sum',
}

def count_high_risk(series, threshold=7.0):
    return (series >= threshold).sum()

# --- INTERACTIVE FUNCTION ---
def interactive_risk_scoring(input_file=input_file):
    df = load_vuln_data(input_file)
    for col in ['baseScore', 'exploitabilityScore', 'impactScore']:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors='coerce')
        else:
            df[col] = np.nan
    
    def update_scoring(formula, aggregation, highrisk_threshold):
        # Calculate riskScore
        df['riskScore'] = df.apply(formula_map[formula], axis=1)
        # Aggregate per asset (Title)
        group = df.groupby('Title')
        agg_df = group['riskScore'].agg(agg_map[aggregation]).reset_index()
        agg_df = agg_df.rename(columns={'riskScore': f'{aggregation}RiskScore'})
        # Count high risk CVEs per asset
        highrisk_df = group['riskScore'].apply(lambda x: (x >= highrisk_threshold).sum()).reset_index()
        highrisk_df = highrisk_df.rename(columns={'riskScore': f'countHighRiskCVEs (>{highrisk_threshold})'})
        # Merge for summary
        summary = pd.merge(agg_df, highrisk_df, on='Title', how='left').sort_values(by='MaxRiskScore',ascending=False)
        # Show sample summary and first few vulnerabilities for inspection
        print("\nAsset-level Risk Summary:")
        display(summary.sort_values(by='MaxRiskScore',ascending=False).head(10))
        print("\nSample vulnerabilities (with riskScore):")
        display(df[['Title', 'cveID', 'riskScore']].sort_values(by='riskScore',ascending=False).head(20))
        #pie chart
        severity_counts = df['baseSeverity'].value_counts()
        severity_counts.plot(kind='pie', autopct='%1.1f%%', startangle=140, figsize=(6,6))
        plt.title("Distribution of Severity Levels")
        plt.ylabel("")
        plt.show()
        # time series
        df['published'] = pd.to_datetime(df['published'], errors='coerce')
        df['month'] = df['published'].dt.to_period('M')
        df['year'] = df['published'].dt.to_period('Y')
        monthly_cves = df.groupby(['month', 'Title'])['cveID'].nunique().unstack(fill_value=0)
        monthly_cves.plot(figsize=(14,7))
        plt.title("Monthly Count of New CVEs per Asset")
        plt.ylabel("Number of New CVEs")
        plt.xlabel("Month")
        plt.legend(title='Asset', bbox_to_anchor=(1.05, 1), loc='upper left')
        plt.tight_layout()
        plt.show()
        return

    interact(
        update_scoring,
        formula=Dropdown(options=list(formula_map.keys()), value='Weighted Average', description='Risk Formula:'),
        aggregation=Dropdown(options=list(agg_map.keys()), value='Max', description='Aggregation:'),
        highrisk_threshold=widgets.FloatSlider(value=7.0, min=0.0, max=10.0, step=0.1, description='High Risk CVE:')
    )



# --- MAIN EXECUTION ---
if __name__ == "__main__":
    interactive_risk_scoring(input_file)

interactive(children=(Dropdown(description='Risk Formula:', options=('Weighted Average', 'Multiplicative', 'Wo…