# Credit Risk & Loan Performance: Risk Reporting and Quantification

#### Author: Satveer Kaur
#### Date: 2025-10-28
#### Notebook Purpose:
The primary goal of this notebook is to transition from visual insight generation (EDA) to quantifiable risk reporting. Using the final, validated, and optimized dataset (`clean_data_for_reporting.csv`), the analysis will structure the findings into professional risk reports to achieve the following:

1. **Quantify Risk Separation:** Translate the visual trends confirmed in Notebook 3 (FICO, DTI, Income) into precise metrics: **Total Loans (Volume), Default Count, and the Observed Default Rate (ODR)** for every segment.

2. **Generate Segmentation Tables:** Create structured, formatted tables that quantify risk exposure and default frequency across the key borrower segments.

3. **Produce Reporting Deliverables:** Export the final, auditable risk tables (segmentation matrices) to external files (e.g., CSV/Excel) for use by management, credit policy teams, and regulatory reporting.

This notebook serves as the **final deliverable** of the data analytics phase, providing the quantitative evidence needed to inform underwriting policy and portfolio strategy.

#### 1. Setup and Data Loading

In [67]:
# Importing Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

df_report = pd.read_csv('../data/clean_data/clean_data_for_reporting.csv', low_memory=False, parse_dates=['issue_date'])

#### 2. Risk Quantification Function
**Purpose:** To efficiently generate the three core metrics required for any risk segmentation table: **Total Loans (Volume), Default Count, and Observed Default Rate (ODR).**

In [68]:
def create_segmentation_table(df, segment_col):
    agg_df = df.groupby(segment_col).agg(
        total_loans = ('is_default','count'),
        default_count = ('is_default', 'sum'),
        ODR = ('is_default', 'mean')
    )
    return agg_df

#### 3. FICO Risk Segmentation Table
**Purpose:** Generate the final, formatted table for the FICO bins.

In [69]:
df_report['fico_bin'].unique()

array(['Good (670-739)', 'Subprime/Poor (<670)', 'Very Good (740-799)',
       'Excellent (800+)'], dtype=object)

In [70]:
# creating segmentation table for FICO
fico_report = create_segmentation_table(df_report, 'fico_bin')

# setting order for fico
fico_order = ['Excellent (800+)', 'Very Good (740-799)','Good (670-739)', 'Subprime/Poor (<670)']

# reset index
fico_report.reset_index(inplace=True)

# rename columns for professional display
fico_report = fico_report.rename(columns={
    'fico_bin': 'FICO Bin',
    'total_loans': 'Total Loans',
    'default_count':'Default Count'
})

# Convert fico_bin to categorical type with specified order
fico_report['FICO Bin'] = pd.Categorical(
    fico_report['FICO Bin'],
    categories=fico_order,
    ordered=True
)

# sort the df_report using categorical order
fico_report = fico_report.sort_values(by='FICO Bin')

# format the columns for final display
fico_report_styled=fico_report.style.format(
    {
        'Total Loans': '{:,.0f}'.format,
        'Default Count' : '{:,.0f}'.format,
        'ODR':'{:.2%}'.format
    }
).set_caption('Fico Risk Segmentation Table')


In [71]:
fico_report_styled

Unnamed: 0,FICO Bin,Total Loans,Default Count,ODR
0,Excellent (800+),1526,102,6.68%
3,Very Good (740-799),12985,1303,10.03%
1,Good (670-739),96065,19224,20.01%
2,Subprime/Poor (<670),24035,6307,26.24%


#### 4. DTI Risk Segmentation Table
**Purpose:** Generate the final, formatted table for the DTI Quintile.

In [72]:
# create segmentation table for DTI
dti_report = create_segmentation_table(df_report,'dti_quintile')

# Reset index
dti_report.reset_index(inplace=True)

# rename columns for professional display
dti_report = dti_report.rename(columns={
    'dti_quintile': 'DTI Quintile',
    'total_loans': 'Total Loans',
    'default_count':'Default Count'
})

# format the columns for final display
dti_report_styled=dti_report.style.format(
    {
        'Total Loans': '{:,.0f}'.format,
        'Default Count' : '{:,.0f}'.format,
        'ODR':'{:.2%}'.format
    }
).set_caption('DTI Risk Segmentation Table')

dti_report_styled

Unnamed: 0,DTI Quintile,Total Loans,Default Count,ODR
0,Q1 (Lowest DTI),26933,4119,15.29%
1,Q2,26939,4483,16.64%
2,Q3,26918,5127,19.05%
3,Q4,26905,5932,22.05%
4,Q5 (Highest DTI),26881,7264,27.02%


#### 5. Income Segmentation Table
**Purpose:** Generate the final, formatted table for the Income Brackets.

In [73]:
# create segmentation table for income brackets
income_report = create_segmentation_table(df_report,'income_brackets')

# Reset index
income_report.reset_index(inplace=True)

# order for income brackets
income_order = ['< $50k',' $50k - $100k',' $100k - $150k', ' > $150k']

# converting income_bracket to categorical type with specified order
income_report['income_brackets']= pd.Categorical(
    income_report['income_brackets'],
    categories=income_order,
    ordered=True
)
# sort income_report using categorical order
income_report = income_report.sort_values(by='income_brackets')
# rename columns for professional display
income_report = income_report.rename(columns={
    'income_brackets': 'Annual Income Bracket',
    'total_loans': 'Total Loans',
    'default_count':'Default Count'
})

# format the columns for final display
income_report_styled=income_report.style.format(
    {
        'Total Loans': '{:,.0f}'.format,
        'Default Count' : '{:,.0f}'.format,
        'ODR':'{:.2%}'.format
    }
).set_caption('Annual Income Risk Segmentation Table')

income_report_styled

Unnamed: 0,Annual Income Bracket,Total Loans,Default Count,ODR
3,< $50k,38640,8872,22.96%
1,$50k - $100k,68170,13628,19.99%
0,$100k - $150k,19411,3154,16.25%
2,> $150k,8390,1282,15.28%


#### 5. Conclusion and Export
**Purpose:** To finalize the analysis by quantifying and structuring all risk insights into a primary deliverable. This section exports the final, auditable FICO, DTI, and Income segmentation tables into a single Excel report.

In [74]:
report_dataframes = {
    'FICO_Segmentation': fico_report,
    'DTI_Segmentation': dti_report,
    'Income_Segmentation': income_report
}

# make folder to store the segmentation tables
os.makedirs('../data/processed', exist_ok=True)

output_file = '../data/processed/risk_segmentation_report.xlsx'

with pd.ExcelWriter(output_file, engine='xlsxwriter') as writer:
    for sheet_name, df_to_write in report_dataframes.items():
        df_to_write.to_excel(writer, sheet_name=sheet_name, index=False)

#### 6. Summary and Next Steps

##### Summary
This notebook successfully concluded the quantitative analysis and data preparation phase by:

1. **Quantifying Risk:** Generated three segmented reports (FICO, DTI, Income) detailing the Observed Default Rate (ODR), Total Loans, and Default Count for every risk segment.
2. **Professional Formatting:** Applied correct column renaming, categorical ordering, and professional number formatting for clarity.
3. **Data Checkpoint:** Successfully exported all three critical reports into a single, clean Excel workbook (`risk_segmentation_report.xlsx`), making the data consumption-ready for the next step.

##### Project Status: Ready for Visualization
The entire analytical project, from data ingestion (Notebook 1) through reporting (Notebook 4), is complete. The results are locked down and saved in the Excel deliverable. The project is now moving from data preparation to visual data storytelling.

##### Next Steps: Tableau Dashboard Creation
The final deliverable is the interactive dashboard. The next phase will use the output of this notebook directly in Tableau:

1. **Data Source Connection:** Import the risk_segmentation_report.xlsx file into Tableau. Since the data is already aggregated, this single file serves as the definitive source for the dashboard.
2. **Visualization Design:** Create the required credit risk charts:
 -  **Trend Visualizations:** Line or Bar charts showing ODR vs. FICO and ODR vs. DTI.
 - **Volume Metrics:** Bar charts showing loan volume (Total Loans) to provide context alongside risk rates.
 - **Summary Table:** An interactive table using the final formatted data to allow users to quickly view all metrics per segment.
3. **Dashboard Assembly:** Integrate the sheets into a final, interactive dashboard with clear titles, filters, and annotations that communicate the key risk drivers to stakeholders.