# IS4487 Week 5 - Practice Code

This notebook is designed to help you follow along with the **Week 5 Lecture and Reading**

The practice code demos are intended to give you a chance to see working code and can be a source for your lab and assignment work.  Each section contains short explanations and annotated code that reflect the steps in the reading.

### Topics for this demo:
- Univariate analysis
- Multi-variate analysis

<a href="https://colab.research.google.com/github/vandanara/UofUtah_IS4487/blob/main/Demos/demo_05_eda.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Context: Financial Risk Assesement Analytics
FinTrust Bank is a mid-sized financial institution that provides consumer loans, credit cards, and mortgages. The risk analytics team is investigating patterns in customer borrowing behavior to better understand which customers are most likely to default on their loans.

Your task is to explore the data to understand how each variable behaves in the financial risk assesment.

## Step 1: Install and Import all Necessary libraries

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## Step 2: Load Dataset (replace with your own file if needed)

In [None]:
url = 'https://raw.githubusercontent.com/Stan-Pugsley/is_4487_base/refs/heads/main/DataSets/fintrust_loans.csv'  # Replace with uploaded URL or path
df = pd.read_csv(url)

## Step 3: Preview and Basic Info

In [None]:
print(df.head())

In [None]:
df.info()

In [None]:
df.describe()

## Step 4: Check for Missing Values

In [None]:
print("\nMissing Values:")
print(df.isnull().sum())

## Step 5: Univariate Visualization

In [None]:
sns.histplot(df['CreditScore'].dropna(), kde=True)
plt.title("Distribution of Credit Score")
plt.show()

In [None]:
sns.boxplot(x='LoanStatus', y='ApplicantIncome', data=df)
plt.title("Applicant Income by Loan Status")
plt.show()

## Step 6: Bivariate Visualization

In [None]:
correlation = df[['LoanAmount', 'CreditScore']].corr()
print(correlation)

In [None]:
sns.scatterplot(x='LoanAmount', y='CreditScore', hue='LoanStatus', data=df)
plt.title("Loan Amount vs. Credit Score")
plt.show()

In [None]:
sns.countplot(x='EmploymentStatus', hue='LoanStatus', data=df)
plt.title("Loan Status by Employment Type")
plt.xticks(rotation=45)
plt.show()

## Step 7: Outlier Detection using IQR

In [None]:
Q1 = df['LoanAmount'].quantile(0.25)
Q3 = df['LoanAmount'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df['LoanAmount'] < Q1 - 1.5 * IQR) | (df['LoanAmount'] > Q3 + 1.5 * IQR)]
print(f"\nNumber of Loan Amount Outliers: {len(outliers)}")

## Step 8: Ethical Reflection Prompt

Question: What are the risks of using demographic data like 'Gender' or 'Marital Status' to assess default risk? How can we mitigate bias in modeling?"

Please write your answer here: