# IFRS 9 ECL Project - Data Loading & Exploration

**Author:** [Sebastian Gonzalez]  
**Date:** January 2025  

## Project Context

This notebook is part of a multi-step project to demonstrate an **IFRS 9** approach to credit risk using a publicly available **Credit Risk Dataset**.  
I'll ultimately show how to calculate **Expected Credit Loss (ECL)**, relying on:
1. **Probability of Default (PD)** modeling with machine learning.
2. **Loss Given Default (LGD)** assumption or derivation.
3. **Exposure at Default (EAD)** from the loan amount.
4. A simplified **macro scenario** to illustrate forward-looking requirements.

### Goals of this Notebook
1. **Load** the CSV (`credit_risk_dataset.csv`) from the `data/` folder.
2. **Inspect** columns, data types, and missing values.
3. **Conduct** a brief exploratory analysis (EDA), e.g., basic distributions.
4. **Plan** how to handle any data cleaning or feature transformations for IFRS 9 calculations.


In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Configure visualizations
%matplotlib inline
sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)

# Adjust display setting
pd.set_option('display.max_columns', 30)

# Load dataset
df = pd.read_csv("../data/credit_risk_dataset.csv")

## Dataset Overview

In [7]:
# Basic checks
print('Data shape:', df.shape)
print('\nFirst 5 rows:')
display(df.head())
print('\nData types:')
print(df.dtypes)
print('\nMissing values per column:')
print(df.isnull().sum())      

Data shape: (32581, 12)

First 5 rows:


Unnamed: 0,person_age,person_income,person_home_ownership,person_emp_length,loan_intent,loan_grade,loan_amnt,loan_int_rate,loan_status,loan_percent_income,cb_person_default_on_file,cb_person_cred_hist_length
0,22,59000,RENT,123.0,PERSONAL,D,35000,16.02,1,0.59,Y,3
1,21,9600,OWN,5.0,EDUCATION,B,1000,11.14,0,0.1,N,2
2,25,9600,MORTGAGE,1.0,MEDICAL,C,5500,12.87,1,0.57,N,3
3,23,65500,RENT,4.0,MEDICAL,C,35000,15.23,1,0.53,N,2
4,24,54400,RENT,8.0,MEDICAL,C,35000,14.27,1,0.55,Y,4



Data types:
person_age                      int64
person_income                   int64
person_home_ownership          object
person_emp_length             float64
loan_intent                    object
loan_grade                     object
loan_amnt                       int64
loan_int_rate                 float64
loan_status                     int64
loan_percent_income           float64
cb_person_default_on_file      object
cb_person_cred_hist_length      int64
dtype: object

Missing values per column:
person_age                       0
person_income                    0
person_home_ownership            0
person_emp_length              895
loan_intent                      0
loan_grade                       0
loan_amnt                        0
loan_int_rate                 3116
loan_status                      0
loan_percent_income              0
cb_person_default_on_file        0
cb_person_cred_hist_length       0
dtype: int64
