# F3. Risk Scoring

>>_The CSF’s use will vary based on an organization’s unique mission and risks. With an understanding of stakeholder expectations and risk appetite and tolerance (as outlined in GOVERN), an organization can prioritize cybersecurity activities to make informed decisions about cybersecurity expenditures and actions. An organization may choose to handle risk in one or more ways — including mitigating, transferring, avoiding, or accepting negative risks and realizing, sharing, enhancing, or accepting positive risks — depending on the potential impacts and likelihoods. Importantly, an organization can use the CSF both internally to manage its cybersecurity capabilities and externally to oversee or communicate with third parties._\
\
\- _[National Institute of Standards and Technology (NIST)](https://nvlpubs.nist.gov/nistpubs/CSWP/NIST.CSWP.29.pdf)_

In [1]:
import pandas as pd
philosophy = pd.read_csv('../data/risk_philosophy.csv')
philosophy

Unnamed: 0,Risk Philosophy,Scoring Formula(s),Aggregation(s),When to Use
0,Conservative,Multiplicative,Max,"To only act on high-confidence, multi-dimensio..."
1,Worst-case,Worst Case (Max),Max,To flag any asset with a single severe vulnera...
2,Balanced/Pragmatic,"Weighted Average, Simple Mean","Mean, Median","For realistic, overall asset risk monitoring"
3,Cumulative,"Simple Mean, Weighted Average",Sum,When interested in total risk exposure per asset
4,Outlier-resistant,"Simple Mean, Weighted Average",Median,To ignore rare extremes and focus on typical r...


## Intended Purpose of Code

The risk scoring module was designed with an interactive dashboard that generates personalized risk scoring and summarizes findings using tables and graphs to account for individual user needs. Some of the graphs are generated using data not involved in calculating risk scores and don't update with new user input. These graphs are 'static' and supplement findings in the risk score analysis.

### Key Features

#### Risk-Scoring
* Interactive risk scoring module with user input/dropdown/slider for:
    * Risk Formula
        * Supports multiple risk formulas (weighted, multiplicative, worst-case, mean)
    * Aggregation Method
        * Aggregates by (max, mean, median, sum, count high-risk CVEs)
    * Floating Slider
        * Allows users to toggle CVE count thresholds per asset

#### Analysis & Visualization
* Generates the following in response to user inputs in the interactive risk scoring module:
    * Summary Tables
        * Asset-level Risk Summary
        * CVE-level Vulnerabilities Summary
    * Heatmap
        * asset vs riskScore
    * Time Series:
        * monthly count of new CVEs per asset
            * Future Enhancement: multiple choice legend allowing users to filter any combination of assets
 

#### Static Visualizations
* Pie Chart
    * distribution of severity levels (Critical/High/Medium/Low)

### Known Issues

* Save buttons overwrite existing files instead of saving a unique file
    * Appending a version number to the end of the file with each click could resolve this
* No way to sort summary tables
    * _Needs more thought..._

_The interactive components of the below code were AI generated to tailor analysis to individual user needs._

[Check out the streamlit app of this dashboard here!](https://f3-risk-scoring.streamlit.app/)

In [1]:
# can you convert the following jupyter notebook code to streamlit?

import ipywidgets as widgets
from IPython.display import display, FileLink
import matplotlib.pyplot as plt
import numpy as np
from ipywidgets import interact, Dropdown
import seaborn as sns
import pandas as pd

# --- Helper Functions for Saving Tables and Charts ---
def save_table_to_ass_csv(df, filename="../data/asset_risk_summary.csv"):
    df.to_csv(filename, index=False)
    print(f"Table saved as '{filename}'")
    display(FileLink(filename))

def save_table_to_vul_csv(df, filename="../data/cve_vuln_summary.csv"):
    df.to_csv(filename, index=False)
    print(f"Table saved as '{filename}'")
    display(FileLink(filename))

def add_ass_table_save_buttons(df, table_label="ass_table"):
    save_ass_csv_button = widgets.Button(description=f"Save {table_label}")
    def on_save_ass_csv_clicked(b):
        save_table_to_ass_csv(df, filename="../data/asset_risk_summary.csv")
    save_ass_csv_button.on_click(on_save_ass_csv_clicked)
    display(save_ass_csv_button)

def add_vul_table_save_buttons(df, table_label="vul_table"):
    save_vul_csv_button = widgets.Button(description=f"Save {table_label}")
    def save_vul_csv_clicked(b):
        save_table_to_vul_csv(df, filename="../data/cve_vuln_summary.csv")
    save_vul_csv_button.on_click(save_vul_csv_clicked)
    display(save_vul_csv_button)

# Configuration
input_file = "../data/vuln_catalogue_v2.csv"

def load_vuln_data(file_path):
    df = pd.read_csv(file_path)
    return df

# --- RISK FORMULAS ---
def weighted_average_score(row, weights=None):
    if weights is None:
        weights = {'baseScore': 0.5, 'exploitabilityScore': 0.25, 'impactScore': 0.25}
    vals = [(row.get(col), w) for col, w in weights.items() if pd.notnull(row.get(col))]
    if not vals:
        return np.nan
    score = sum(v * w for v, w in vals)
    total_weight = sum(w for _, w in vals)
    return round(score / total_weight, 2)

def multiplicative_risk_score(row):
    vals = [row.get(col) for col in ['baseScore', 'exploitabilityScore', 'impactScore']]
    if any(pd.isnull(v) for v in vals):
        return np.nan
    vals_norm = [v / 10.0 for v in vals]
    score = np.prod(vals_norm) * 10
    return round(score, 2)

def worst_case_score(row):
    vals = [row.get(col) for col in ['baseScore', 'exploitabilityScore', 'impactScore']]
    vals = [v for v in vals if pd.notnull(v)]
    if not vals:
        return np.nan
    return max(vals)

def simple_mean_score(row):
    vals = [row.get(col) for col in ['baseScore', 'exploitabilityScore', 'impactScore']]
    vals = [v for v in vals if pd.notnull(v)]
    if not vals:
        return np.nan
    return round(np.mean(vals), 2)

formula_map = {
    'Weighted Average': weighted_average_score,
    'Multiplicative': multiplicative_risk_score,
    'Worst Case (Max)': worst_case_score,
    'Simple Mean': simple_mean_score,
}

agg_map = {
    'Max': 'max',
    'Mean': 'mean',
    'Median': 'median',
    'Sum': 'sum',
}

def count_high_risk(series, threshold=7.0):
    return (series >= threshold).sum()

# --- Interactive Risk Scoring ---


def filter_monthly_cves_by_severity(df, year_range):
    asset_col = 'Title'
    min_year, max_year = year_range
    df_filtered = df[(df['year'] >= min_year) & (df['year'] <= max_year)]
    grouped = df_filtered.groupby(['month', asset_col, 'baseSeverity'])['cveID'].nunique().reset_index()
    pivoted = grouped.pivot_table(index=['month', asset_col], columns='baseSeverity', values='cveID', fill_value=0).reset_index()

    ordered_severities = ['CRITICAL', 'HIGH', 'MEDIUM', 'LOW']
    severity_columns = [col for col in ordered_severities if col in pivoted.columns]
    pivoted['New CVEs'] = pivoted[severity_columns].sum(axis=1)

    for col in severity_columns + ['New CVEs']:
        pivoted[col] = pivoted[col].round().astype(int)

    pivoted = pivoted.sort_values(by=['Title', 'month'])

    mom_cols = ['New CVEs'] + severity_columns
    mom_data = {}
    for col in mom_cols:
        change_col = col + ' MoM %'
        mom_data[change_col] = pivoted.groupby('Title')[col].pct_change().fillna(0).replace([float('inf'), -float('inf')], 0)

    for col, series in mom_data.items():
        pivoted[col] = (series * 100).round().astype(int).astype(str) + '%'

    base_cols = ['month', asset_col, 'New CVEs'] + severity_columns
    final_cols = []
    for col in base_cols:
        final_cols.append(col)
        if col != 'month' and col != asset_col:
            final_cols.append(col + ' MoM %')

    global mom
    mom = pivoted[final_cols]
    
    return mom


def display_synchronized_widgets(df):
    years = sorted(df['year'].dropna().unique())
    if not years:
        print("No valid years found in the data.")
        return

    year_slider = widgets.IntRangeSlider(
        value=[min(years), max(years)],
        min=min(years),
        max=max(years),
        step=1,
        description='Year Range:',
        continuous_update=False
    )

    def plot_monthly_cves(year_range):
        min_year, max_year = year_range
        df_filtered = df[(df['year'] >= min_year) & (df['year'] <= max_year)]
        monthly_cves = df_filtered.groupby(['month', 'Title'])['cveID'].nunique().unstack(fill_value=0)
        if monthly_cves.empty:
            print(f"No data available for the selected year range: {min_year}-{max_year}")
            return
        monthly_cves.plot(figsize=(14,7))
        plt.title(f"Monthly Count of New CVEs per Asset ({min_year}-{max_year})")
        plt.ylabel("Number of New CVEs")
        plt.xlabel("Month")
        plt.legend(title='Asset', bbox_to_anchor=(1.05, 1), loc='upper left')
        plt.tight_layout()
        plt.show()

    def update_table(year_range):
        table = filter_monthly_cves_by_severity(df, year_range)
        display(table if not table.empty else "No data available for the selected year range.")

    display(widgets.HTML(value="<b>Year Range Filter</b>"))
    display(year_slider)
    display(widgets.HTML(value="<b>Interactive Time Series Graph</b>"))
    display(widgets.interactive_output(plot_monthly_cves, {'year_range': year_slider}))
    display(widgets.HTML(value="<b>Interactive CVE Severity Table</b>"))
    display(widgets.interactive_output(update_table, {'year_range': year_slider}))

def interactive_risk_scoring(input_file=input_file):
    df = load_vuln_data(input_file)
    for col in ['baseScore', 'exploitabilityScore', 'impactScore']:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors='coerce')
        else:
            df[col] = np.nan

    if 'published' in df.columns:
        df['published'] = pd.to_datetime(df['published'], errors='coerce')
        df['month'] = df['published'].dt.to_period('M').astype(str)
        df['year'] = df['published'].dt.year
    else:
        raise KeyError("No 'published' column found in DataFrame.")

    def update_scoring(formula, aggregation, highrisk_threshold):
        df['riskScore'] = df.apply(formula_map[formula], axis=1)

        group = df.groupby(['Title','cpeName'])
        agg_df = group['riskScore'].agg(agg_map[aggregation]).reset_index()
        agg_df = agg_df.rename(columns={'riskScore': f'{aggregation}RiskScore'})

        highrisk_df = group['riskScore'].apply(lambda x: (x >= highrisk_threshold).sum()).reset_index()
        highrisk_df = highrisk_df.rename(columns={'riskScore': f'countHighRiskCVEs (>{highrisk_threshold})'})

        summary = pd.merge(agg_df, highrisk_df, on='cpeName', how='left')
        summary['Title'] = summary['Title_x']
        summary.drop(columns=['Title_x','Title_y'], inplace=True, axis=1)
        summary.insert(0, "Title", summary.pop("Title"))

        global d1
        d1 = summary
        print("\nAsset-level Risk Summary:")
        display(d1)
        add_ass_table_save_buttons(d1, table_label="Summary")

        global d2
        d2 = df[['Title', 'cpeName', 'cveID', 'riskScore']].sort_values(by='riskScore', ascending=False)
        print("\nCVE-level Vulnerabilities Summary:")
        display(d2.head(20))
        add_vul_table_save_buttons(df[['Title','cpeName', 'cveID', 'riskScore']], table_label="Summary")

        severity_counts = df['baseSeverity'].value_counts()
        severity_counts.plot(kind='pie', autopct='%1.1f%%', startangle=140, figsize=(6,6))
        plt.title("Distribution of Severity Levels")
        plt.ylabel("")
        plt.show()

        # --- PIE CHART DATA TABLE VIEW ---
        global severity_counts_df
        severity_counts_df = severity_counts.reset_index()
        severity_counts_df.columns = ['Severity', 'Count']
        print("\nUnderlying Data for Severity Pie Chart:")
        display(severity_counts_df)

        top_assets = summary.sort_values(by=f'{aggregation}RiskScore', ascending=False).head(20)
        heatmap_data = top_assets.set_index('Title')[[f'{aggregation}RiskScore']]
        plt.figure(figsize=(2, 10))
        sns.heatmap(heatmap_data, annot=True, cmap='YlOrRd', cbar=True)
        plt.title(f"Heatmap: Top 20 Assets by {aggregation} Risk Score\n(Formula: {formula})")
        plt.xlabel(f"{aggregation} Risk Score")
        plt.ylabel("Asset (Title)")
        plt.xticks(rotation=0)
        plt.show()

        # --- HEATMAP DATA TABLE VIEW ---
        global heatmap_table
        heatmap_table = heatmap_data.reset_index()
        heatmap_table.columns = ['Title', f'{aggregation}RiskScore']
        print("\nUnderlying Data for Heatmap:")
        display(heatmap_table)

        display_synchronized_widgets(df)

    interact(
        update_scoring,
        formula=Dropdown(options=list(formula_map.keys()), value='Weighted Average', description='Risk Formula:'),
        aggregation=Dropdown(options=list(agg_map.keys()), value='Max', description='Aggregation:'),
        highrisk_threshold=widgets.FloatSlider(value=7.0, min=0.0, max=10.0, step=0.1, description='High Risk CVE:')
    )

# --- MAIN EXECUTION ---
if __name__ == "__main__":
    interactive_risk_scoring(input_file)

interactive(children=(Dropdown(description='Risk Formula:', options=('Weighted Average', 'Multiplicative', 'Wo…

In [3]:
mom.columns

Index(['month', 'Title', 'New CVEs', 'New CVEs MoM %', 'CRITICAL',
       'CRITICAL MoM %', 'HIGH', 'HIGH MoM %', 'MEDIUM', 'MEDIUM MoM %', 'LOW',
       'LOW MoM %'],
      dtype='object', name='baseSeverity')

In [62]:
# UI Dataframes: d1, d2, mom, heatmap_table, severity_counts_df
scores = pd.merge(d1,d2,how='inner',on=['cpeName','Title'])
scores.sort_values(by='riskScore').head(10)

Unnamed: 0,Title,cpeName,MaxRiskScore,countHighRiskCVEs (>7.0),cveID,riskScore
381,Oracle Database 19c Enterprise Edition,cpe:2.3:a:oracle:database:19c:*:*:*:enterprise...,6.82,0,CVE-2021-2207,1.7
424,Oracle Database Server 19c,cpe:2.3:a:oracle:database_server:19c:*:*:*:*:*...,7.47,3,CVE-2020-2734,1.77
423,Oracle Database Server 19c,cpe:2.3:a:oracle:database_server:19c:*:*:*:*:*...,7.47,3,CVE-2021-2000,1.77
422,Oracle Database Server 19c,cpe:2.3:a:oracle:database_server:19c:*:*:*:*:*...,7.47,3,CVE-2020-2516,1.77
314,Adobe Acrobat Reader 20.004.30006 Classic Edition,cpe:2.3:a:adobe:acrobat_reader:20.004.30006:*:...,5.82,0,CVE-2021-44714,1.85
436,Oracle Database Vault 19c,cpe:2.3:a:oracle:database_vault:19c:*:*:*:*:*:*:*,2.0,0,CVE-2021-2326,2.0
421,Oracle Database Server 19c,cpe:2.3:a:oracle:database_server:19c:*:*:*:*:*...,7.47,3,CVE-2021-2175,2.0
420,Oracle Database Server 19c,cpe:2.3:a:oracle:database_server:19c:*:*:*:*:*...,7.47,3,CVE-2022-21247,2.0
379,Oracle Database 19c Enterprise Edition,cpe:2.3:a:oracle:database:19c:*:*:*:enterprise...,6.82,0,CVE-2022-21432,2.0
380,Oracle Database 19c Enterprise Edition,cpe:2.3:a:oracle:database:19c:*:*:*:enterprise...,6.82,0,CVE-2021-2245,2.0


In [15]:
cves = pd.read_csv('../data/vuln_catalogue_v2.csv')
df = pd.merge(cves,scores, how='inner', on=['Title','cpeName','cveID'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 442 entries, 0 to 441
Data columns (total 29 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Unnamed: 0                442 non-null    int64  
 1   sid                       442 non-null    int64  
 2   WrittenAt                 442 non-null    object 
 3   Title                     442 non-null    object 
 4   cpeName                   442 non-null    object 
 5   cveID                     440 non-null    object 
 6   published                 440 non-null    object 
 7   last_modified             440 non-null    object 
 8   vectorString              421 non-null    object 
 9   baseScore                 421 non-null    float64
 10  exploitabilityScore       421 non-null    float64
 11  impactScore               421 non-null    float64
 12  baseSeverity              421 non-null    object 
 13  attackVector              421 non-null    object 
 14  attackComp

In [29]:
risks = df.copy()

In [30]:
risks.columns

Index(['Unnamed: 0', 'sid', 'WrittenAt', 'Title', 'cpeName', 'cveID',
       'published', 'last_modified', 'vectorString', 'baseScore',
       'exploitabilityScore', 'impactScore', 'baseSeverity', 'attackVector',
       'attackComplexity', 'privilegesRequired', 'userInteraction', 'scope',
       'confidentialityImpact', 'integrityImpact', 'availabilityImpact',
       'cwes', 'description', 'references', 'tags', 'full_json',
       'MaxRiskScore', 'countHighRiskCVEs (>7.0)', 'riskScore'],
      dtype='object')

In [31]:
risks.drop(columns=['Unnamed: 0', 'sid', 'WrittenAt','last_modified', 'vectorString', 
           'privilegesRequired', 'userInteraction', 'scope', 'integrityImpact', 
           'cwes', 'references', 'tags', 'full_json'], inplace=True,axis=1)
top10 = risks.head(10)
top10.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Title                     10 non-null     object 
 1   cpeName                   10 non-null     object 
 2   cveID                     9 non-null      object 
 3   published                 9 non-null      object 
 4   baseScore                 9 non-null      float64
 5   exploitabilityScore       9 non-null      float64
 6   impactScore               9 non-null      float64
 7   baseSeverity              9 non-null      object 
 8   attackVector              9 non-null      object 
 9   attackComplexity          9 non-null      object 
 10  confidentialityImpact     9 non-null      object 
 11  availabilityImpact        9 non-null      object 
 12  description               10 non-null     object 
 13  MaxRiskScore              9 non-null      float64
 14  countHighRisk

In [32]:
records = top10.to_dict(orient='records')
records

[{'Title': 'Tableau Desktop 2021.1',
  'cpeName': 'cpe:2.3:a:tableau:tableau_desktop:2021.1:*:*:*:*:*:*:*',
  'cveID': nan,
  'published': nan,
  'baseScore': nan,
  'exploitabilityScore': nan,
  'impactScore': nan,
  'baseSeverity': nan,
  'attackVector': nan,
  'attackComplexity': nan,
  'confidentialityImpact': nan,
  'availabilityImpact': nan,
  'description': 'NO CVEs FOUND FOR THIS ASSET',
  'MaxRiskScore': nan,
  'countHighRiskCVEs (>7.0)': 0,
  'riskScore': nan},
 {'Title': 'Adobe Acrobat Reader 20.004.30006 Classic Edition',
  'cpeName': 'cpe:2.3:a:adobe:acrobat_reader:20.004.30006:*:*:*:classic:*:*:*',
  'cveID': 'CVE-2021-39836',
  'published': '2021-09-29T16:15:08.513',
  'baseScore': 7.8,
  'exploitabilityScore': 1.8,
  'impactScore': 5.9,
  'baseSeverity': 'HIGH',
  'attackVector': 'LOCAL',
  'attackComplexity': 'LOW',
  'confidentialityImpact': 'HIGH',
  'availabilityImpact': 'HIGH',
  'description': 'Acrobat Reader DC versions 2021.005.20060 (and earlier), 2020.004.3000

In [33]:
from openai import OpenAI
import os
import pandas as pd
from IPython.display import Markdown, display
import json

# Config
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise RuntimeError("API key not set.")
client = OpenAI(api_key=api_key)

In [34]:
def explain_top_risks(df, client):
    risks_json = json.dumps(records, indent=2)

    # Construct the system and user prompts
    system_prompt = (
        "You are a cybersecurity expert specializing in vulnerability risk analysis "
        "and mitigation planning with a focus on NIST's Cybersecurity Framework 2.0."
    )

    user_prompt = (
        "Below is a list of the 10 highest-risk vulnerabilities (CVEs) affecting my assets. "
        "For each item, do the following:\n"
        "1. Briefly explain why this CVE poses a high risk based on its details.\n"
        "2. Suggest concise, actionable mitigation steps or best practices.\n"
        "Generate a report memo with only the subject line in the memo header,"
        "Present the output as a numbered list. Here are the vulnerabilities:\n"
        f"{risks_json}"
    )

    # Send to OpenAI
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        max_tokens=1000,
        temperature=0.2
    )
    return response.choices[0].message.content

# Now, call the function and store the result
top_risks = explain_top_risks(df, client)

with open("../markdown/top_risks.md", "w", encoding="utf-8") as f:
    f.write(top_risks)

with open("../markdown/top_risks.md", "r", encoding="utf-8") as f:
    md_content = f.read()

display(Markdown(md_content))

**Subject: Vulnerability Risk Analysis and Mitigation Report**

1. **Tableau Desktop 2021.1**
   - **Risk Explanation:** No CVEs were found for this asset, indicating no known vulnerabilities are currently associated with this version.
   - **Mitigation Steps:** Regularly update the software to the latest version and monitor for any new vulnerability disclosures.

2. **CVE-2021-39836 (Adobe Acrobat Reader 20.004.30006 Classic Edition)**
   - **Risk Explanation:** This use-after-free vulnerability allows arbitrary code execution if a user opens a malicious file, posing a high risk due to its potential impact on confidentiality and availability.
   - **Mitigation Steps:** Update Adobe Acrobat Reader to the latest version where this vulnerability is patched. Educate users on the risks of opening files from untrusted sources.

3. **CVE-2021-39837 (Adobe Acrobat Reader 20.004.30006 Classic Edition)**
   - **Risk Explanation:** Similar to CVE-2021-39836, this use-after-free vulnerability can lead to arbitrary code execution, requiring user interaction.
   - **Mitigation Steps:** Apply the latest security patches from Adobe. Implement user training to avoid opening suspicious files.

4. **CVE-2021-39838 (Adobe Acrobat Reader 20.004.30006 Classic Edition)**
   - **Risk Explanation:** This vulnerability also involves a use-after-free condition, allowing arbitrary code execution upon opening a malicious file.
   - **Mitigation Steps:** Ensure Adobe Acrobat Reader is updated to a version that addresses this issue. Reinforce user awareness regarding file safety.

5. **CVE-2021-39839 (Adobe Acrobat Reader 20.004.30006 Classic Edition)**
   - **Risk Explanation:** Another use-after-free vulnerability leading to arbitrary code execution, with high impact on confidentiality and availability.
   - **Mitigation Steps:** Update to the latest Adobe Acrobat Reader version. Conduct regular security awareness training for users.

6. **CVE-2021-39840 (Adobe Acrobat Reader 20.004.30006 Classic Edition)**
   - **Risk Explanation:** This vulnerability allows arbitrary code execution through a use-after-free condition, requiring user interaction.
   - **Mitigation Steps:** Patch Adobe Acrobat Reader to the latest version. Educate users on identifying and avoiding malicious files.

7. **CVE-2021-39841 (Adobe Acrobat Reader 20.004.30006 Classic Edition)**
   - **Risk Explanation:** A type confusion vulnerability that could lead to arbitrary code execution if a malicious file is opened.
   - **Mitigation Steps:** Update Adobe Acrobat Reader to mitigate this vulnerability. Implement strict policies on file handling and user training.

8. **CVE-2021-39842 (Adobe Acrobat Reader 20.004.30006 Classic Edition)**
   - **Risk Explanation:** This use-after-free vulnerability can result in arbitrary code execution, posing a significant risk.
   - **Mitigation Steps:** Ensure all systems have the latest Adobe updates. Train users to recognize and avoid phishing attempts.

9. **CVE-2021-39843 (Adobe Acrobat Reader 20.004.30006 Classic Edition)**
   - **Risk Explanation:** An out-of-bounds write vulnerability that could lead to arbitrary code execution, requiring user interaction.
   - **Mitigation Steps:** Apply the latest security updates from Adobe. Conduct regular security training sessions for users.

10. **CVE-2021-39844 (Adobe Acrobat Reader 20.004.30006 Classic Edition)**
    - **Risk Explanation:** This out-of-bounds read vulnerability could lead to the disclosure of arbitrary memory information, though it has a lower severity.
    - **Mitigation Steps:** Update Adobe Acrobat Reader to the latest version. Maintain a robust security awareness program to minimize risks from user actions.

For all vulnerabilities, it is crucial to maintain a proactive patch management process and continuously educate users on cybersecurity best practices to mitigate risks effectively.

In [61]:
# Define functions
def get_trend_summary(df, freq='Q'):
    df['published'] = pd.to_datetime(df['published'])
    if freq == 'Q':
        df['period'] = df['published'].dt.to_period('Q')
    elif freq == 'Y':
        df['period'] = df['published'].dt.to_period('Y')
    else:
        raise ValueError("freq must be 'Q' for quarter or 'Y' for year.")
    trend = df.groupby(['period', 'cpeName', 'baseSeverity']).size().unstack(fill_value=0)
    return trend

def calculate_qoq_change(trend_df):
    trend_df = trend_df.sort_index(level=0)
    pct_change = trend_df.groupby('cpeName').pct_change().replace([float('inf'), -float('inf')], 0).fillna(0) * 100
    return pct_change.round(1)

def generate_trend_narrative_prompt(trend_df, pct_change_df, start_period, end_period):
    filtered_trend = trend_df.loc[start_period:end_period].reset_index()
    filtered_pct_change = pct_change_df.loc[start_period:end_period].reset_index()

    for df in (filtered_trend, filtered_pct_change):
        if 'period' in df.columns:
            df['period'] = df['period'].astype(str)

    trend_data = filtered_trend.to_dict(orient='records')
    change_data = filtered_pct_change.to_dict(orient='records')

    prompt = (
        "You are a cybersecurity risk analyst. Below are two tables:\n\n"
        "1. Raw counts of vulnerabilities (CVEs) by asset and severity for each quarter.\n"
        "2. The corresponding quarter-over-quarter percentage change for each asset and severity.\n\n"
        "Analyze the data, highlighting emerging risk patterns. Narrate where there were significant spikes or drops,\n"
        "and call out assets with the most notable changes. Use percentages and time periods\n"
        "(e.g., 'Web servers saw a 40% spike in Critical CVEs in Q1 2025').\n"
        "Your response should be titled: Risk Trends Insights.\n"
        "Finish by suggesting which assets or severities require immediate attention due to recent trends.\n\n"
        f"Raw trend data:\n{json.dumps(trend_data, indent=2)}\n\n"
        f"Quarterly percent change data:\n{json.dumps(change_data, indent=2)}\n"
    )
    return prompt

# Generate trends and changes
quarterly_trend = get_trend_summary(df, freq='Q')
quarterly_change = calculate_qoq_change(quarterly_trend)

# --- User input for date range ---
start_period = input("Enter start quarter (e.g., 2023Q1): ")
end_period = input("Enter end quarter (e.g., 2024Q4): ")

# --- Generate prompt and call OpenAI ---
prompt = generate_trend_narrative_prompt(quarterly_trend, quarterly_change, start_period, end_period)
system_prompt = "You are an expert at analyzing vulnerability data for security reporting with a particular interest in NIST's Cybersecurity Framework 2.0."

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt}
    ],
    max_tokens=1000,
    temperature=0.3
)

risk_trend_narrative = response.choices[0].message.content

# Save and display
with open("../markdown/risk_trend_narrative.md", "w", encoding="utf-8") as f:
    f.write(risk_trend_narrative)

with open("../markdown/risk_trend_narrative.md", "r", encoding="utf-8") as f:
    md_content = f.read()

display(Markdown(md_content))

Enter start quarter (e.g., 2023Q1):  2024Q1
Enter end quarter (e.g., 2024Q4):  2025Q1


### Risk Trends Insights

Upon analyzing the vulnerability data across various assets and severities, several key trends and risk patterns have emerged over the observed quarters.

#### Adobe Acrobat Reader
- **Q1 2024 to Q2 2024**: Adobe Acrobat Reader experienced a significant spike in High severity vulnerabilities, with an increase of 128.6%. This sharp rise indicates a potential increase in risk exposure during this period, necessitating close monitoring and potential mitigation strategies.
- **Q2 2024 to Q3 2024**: There was a notable decrease in High severity vulnerabilities by 31.2%, suggesting some mitigation efforts or patching may have been effective. However, Medium severity vulnerabilities increased by 25%, indicating a shift in the nature of vulnerabilities being exploited or discovered.
- **Q3 2024 to Q4 2024**: High severity vulnerabilities continued to decline by 45.5%, but Medium severity vulnerabilities doubled (100% increase), suggesting a persistent risk that might not be adequately addressed by current security measures.
- **Q4 2024 to Q1 2025**: The trend of decreasing High severity vulnerabilities stabilized, but Medium severity vulnerabilities saw a significant drop of 70%. This indicates a potential improvement in handling medium-level threats, possibly through better patch management or security practices.

#### Microsoft Exchange Server
- **2024Q1**: Microsoft Exchange Server (cumulative update 14) had a single Critical vulnerability, which was not present in subsequent quarters, indicating a resolution or patching of this particular issue.
- **2024Q4**: The base version of Microsoft Exchange Server maintained a consistent count of High severity vulnerabilities (1 per quarter), with no significant changes in other severity levels. This stability suggests a consistent but low-level risk that should be monitored for any emerging threats.

### Emerging Risk Patterns
- **Adobe Acrobat Reader**: The fluctuations in vulnerability counts, particularly the significant increase in High severity vulnerabilities in Q2 2024 and the persistent Medium severity vulnerabilities, highlight Adobe Acrobat Reader as a critical asset requiring immediate attention. The variability suggests potential gaps in security controls or patch management processes.
- **Microsoft Exchange Server**: The consistent presence of High severity vulnerabilities, albeit low in number, suggests a need for ongoing vigilance. While the Critical vulnerabilities were resolved, the potential for new threats remains.

### Recommendations for Immediate Attention
- **Focus on Adobe Acrobat Reader**: Given the significant fluctuations and spikes in vulnerabilities, particularly in High and Medium severities, Adobe Acrobat Reader should be prioritized for security reviews. Implementing robust patch management and monitoring processes can help mitigate these risks.
- **Monitor Microsoft Exchange Server**: Although the vulnerabilities are fewer, the consistent presence of High severity issues warrants continuous monitoring and proactive threat intelligence to prevent exploitation.

In conclusion, while both assets require attention, Adobe Acrobat Reader poses a more dynamic risk landscape that necessitates immediate and sustained focus to ensure vulnerabilities are managed effectively.