# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual



# **Project Summary -**

Paisabazaar helps customers access banking and credit products by evaluating their creditworthiness. A key part of this evaluation is the credit score, which indicates how likely a person is to repay loans. Accurately classifying credit scores allows Paisabazaar to improve credit risk assessment, reduce loan defaults, and provide personalized financial advice, leading to better decision-making and product recommendations.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


To accurately classify the credit score of customers (as Good, Standard, or Poor) based on their demographic, financial, and behavioral data.


#### **Define Your Business Objective?**

Answer Here.


The objective of conducting EDA (Exploratory Data Analysis) is to understand patterns and trends in customer data that influence their credit scores (Good, Standard, Poor).

* Identify Key Demographic Factors – Such as Age, Occupation, and Income levels that are associated with good or poor credit scores.

* Analyze Financial Behavior – Including outstanding debt, number of loans, and credit utilization ratio to determine repayment capacity.

* Understand Behavioral Patterns – Such as delayed payments and minimum amount payments, which strongly affect credit risk.




# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
dataset = pd.read_csv("/content/policy bazar.csv")
dataset

### Dataset First View

In [None]:
# Dataset First Look
dataset.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
dataset.shape

### Dataset Information

In [None]:
# Dataset Info
dataset.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(dataset[dataset.duplicated])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
dataset.isnull().sum()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10,6))
sns.heatmap(dataset.isnull(),cbar=False)
plt.title("Missing value",fontdict={"fontsize":20,"fontweight": 'bold'})
plt.xlabel("values")
plt.ylabel("column name")
plt.show()

### What did you know about your dataset?


Answer Here: I got 0 missing value in this dataset.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
dataset.columns

In [None]:
# Dataset Describe
dataset.describe()

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in dataset.columns.to_list():
  print("no of unique value in ",i,'is',dataset[i].nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
count_credit_score = dataset.loc[:, 'Credit_Score'].value_counts() #count the credit score
count_credit_score

In [None]:
unique_occupation = dataset.loc[:, 'Occupation'].unique()        #unique occuption
unique_occupation

In [None]:
count_credit_Mix = dataset['Credit_Mix'].value_counts()   #count credit Mix
count_credit_Mix

In [None]:
unique_payment_of_min_account = dataset.loc[:,'Payment_of_Min_Amount'].unique()     #no of unique payments
unique_payment_of_min_account

In [None]:
unique_payment_behaviour = dataset.loc[:, 'Payment_Behaviour'].value_counts()  # No of unique payment behaviour
unique_payment_behaviour

In [None]:
age_count = dataset['Age'].value_counts().sort_index(ascending=True)
age_count

### What all manipulations have you done and insights you found?

Answer Here:
* we have three type of credit score (Good, Standard, poor).
* number of Occupation is 15.
* number of payment behaviour is 6.



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1: Age Vs credit score

In [None]:
# Chart - 1 visualization code
sns.histplot(data=dataset, x="Age", hue="Credit_Score", multiple="stack", palette="Set2")
plt.title("Age Distribution by Credit Score")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.It shows the distribution of Age across different Credit Score categories (Good, Standard, Poor).

Helps compare how each credit score group is spread over different age ranges.

Stacking makes it easier to see the total count of customers per age group as well as the relative proportion of each credit score.



##### 2. What is/are the insight(s) found from the chart?

Answer Here: Most customers fall between the age group of ~20 to 45 years.

Standard credit score dominates across almost all age groups.

Poor credit score is more common among younger customers (around 20–30 years).

Good credit score is relatively higher in customers above 30 years compared to younger ones.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:
* positive business impact:

Helps financial institutions target specific age groups for credit improvement programs.

Younger customers (20–30) with poor scores can be educated or offered low-risk products to build credit history.

Customers with good scores can be offered premium products/loans, increasing revenue.

* lead Negative growth:

The higher proportion of poor scores among younger customers indicates higher credit risk for this group.

If not managed, this can increase loan defaults, leading to negative growth.

Businesses might lose potential future customers if young people fail to improve their credit early.

#### Chart - 2:: occupation vs credit score

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(10,6))
sns.countplot(data=dataset, x='Occupation',hue='Credit_Score',palette='viridis')
plt.xticks(rotation=45)
plt.title('Occuption Vs Credit Score')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:It clearly compares Credit Score (Good, Standard, Poor) across different occupations.

It helps identify which professions have higher or lower creditworthiness.

Bar charts are ideal for comparing categorical variables (Occupation) with frequency counts.



##### 2. What is/are the insight(s) found from the chart?

Answer Here:Standard credit score is the most common across all occupations.

Professions like Lawyer, Media Manager, and Developer have slightly higher counts of Standard scores.

Good credit scores are less frequent across all occupations, meaning very few professionals maintain excellent credit.

Poor credit scores are relatively higher in Entrepreneurs, Engineers, and Teachers compared to some other professions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:Positive Business Impact:

Lenders can design credit improvement programs for professions where poor credit scores are higher.

Occupations with higher good scores can be offered premium loans or credit cards.

Helps in risk-based pricing by tailoring interest rates for specific occupational segments.

* Lead to Negative:


yes,Occupations like Entrepreneurs and Teachers having relatively more poor credit scores may indicate higher lending risk.

If businesses lend without considering this insight, it could increase default rates and financial losses.

#### Chart - 3: Annual Income vs credit score

In [None]:
# Chart - 3 visualization code
sns.boxplot(data=dataset,x='Credit_Score',y='Annual_Income',color='Blue')
plt.title('Annual Income with credit score',)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.A boxplot was selected because:

It effectively shows distribution, median, and outliers of Annual Income for each Credit Score category (Good, Standard, Poor).

Helps identify income ranges and detect income disparity among credit score groups.

Boxplots are best for comparing continuous variables (Annual Income) across categorical groups (Credit Score).

##### 2. What is/are the insight(s) found from the chart?

Answer Here:Customers with Good credit scores tend to have higher median annual incomes compared to Standard and Poor groups.

Poor credit score group has the lowest median income and many outliers with very high income.

Income range is widest in the Good score group, indicating people with higher incomes are more likely to have better credit scores.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:Positive Business Impact:

Lenders can design credit products suitable for each income-credit score segment.

Customers with higher incomes but poor scores can be targeted for financial literacy programs to improve their scores.

Helps in credit risk assessment by considering income as a factor before lending.

*  lead to negative growth

Presence of high-income individuals with poor credit scores indicates poor financial management despite earnings.

Lending to such individuals without proper risk evaluation may lead to defaults, resulting in negative growth.






#### Chart - 4: Outstanding debt vs Credit score

In [None]:
# Chart - 4 visualization code:Boxplot
sns.boxplot(data=dataset,x='Credit_Score',y='Outstanding_Debt',color='Orange')
plt.title('Outstanding Debt Vs Credit Score')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.
 A boxplot was chosen because:

It helps compare the distribution, median, and outliers of Outstanding Debt across Credit Score categories.

Shows how debt levels differ between Good, Standard, and Poor credit scores.

Boxplots are best for analyzing continuous financial variables grouped by categories.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:
Customers with Poor credit scores have the highest median outstanding debt, which indicates a strong link between high debt and poor credit scores.

Good credit score customers generally have lower debt levels, but a few outliers have very high debt despite a good score.

The Standard score group lies in between but closer to the Poor group in terms of debt distribution.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:
* Positive Business Impact:

Helps lenders identify high-risk customers (those with high outstanding debt and poor scores) and limit lending or increase monitoring.

Financial institutions can offer debt management programs to customers with high debts to prevent defaults.

Enables better risk-based interest rate decisions based on outstanding debt.

*  lead to negative growth

A large group of customers with high debt and poor scores represents a potential default risk.

If lenders provide loans without considering this insight, it can increase NPAs (Non-Performing Assets), leading to financial losses.

Outliers with good scores but very high debt might pose a hidden risk if their repayment ability suddenly drops.



#### Chart - 5:Correlation Heatmap of financial Heatmap

In [None]:
# Chart - 5 visualization code: Heatmap
sns.heatmap(dataset.corr(numeric_only=True),annot=False,cmap='coolwarm')
plt.title('Correlation Heatmap of Financial Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here: Correlation heatmap was chosen because:

It shows the strength and direction of relationships between different financial variables.

It helps identify which variables are strongly correlated, which is useful for feature selection in credit score prediction.

The color gradient makes it easy to visualize positive and negative correlations.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Annual Income is strongly correlated with Monthly Inhand Salary.

Outstanding Debt is positively correlated with Credit Utilization Ratio and Total EMI per month, meaning higher debts lead to higher utilization and EMIs.

Num of Loans and Interest Rate also show moderate positive correlation.

Monthly Balance has a negative correlation with Outstanding Debt, which is logical (more debt → less remaining balance).



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here
* Positive Business Impact:
Strong correlation between Annual Income and Monthly Inhand Salary → helps predict repayment capacity.

Credit Utilization Ratio and Outstanding Debt insights → useful for building better credit scoring models.

Identifying customers with good income and low utilization → helps target low-risk borrowers.

* Negative Growth Risks:
High Outstanding Debt with high Interest Rate → may increase defaults and reduce customer loyalty.

Positive link between Delay from Due Date and Number of Delayed Payments → shows habitual late payers, increasing NPAs.

Negative correlation between Monthly Balance and Credit Utilization Ratio → high credit users with low balance are risky.

#### Chart - 6: Payment behaviour Vs credit score

In [None]:
# Chart - 6 visualization code: counterplot
plt.figure(figsize=(10,6))
sns.countplot(data=dataset,x='Payment_Behaviour',hue='Credit_Score',palette= 'plasma')
plt.title('Payment Behaviour vs Credit score')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This grouped bar chart is chosen because:

It compares Payment Behaviour categories across different Credit Scores (Good, Standard, Poor).

It visually shows the distribution of customers’ payment patterns and how they are related to creditworthiness.

It is easy to compare multiple groups (Good, Standard, Poor) within each payment behaviour category.

##### 2. What is/are the insight(s) found from the chart?

Answer Here: Customers with Low_spent_Small_value_payments have the highest count across all credit score categories, especially Standard and Poor.

Good credit score customers are less frequent in almost every payment behaviour compared to Standard and Poor.

High_spent_Large_value_payments and Low_spent_Large_value_payments categories have fewer Poor credit score customers, implying higher spending on large payments might be linked with better financial discipline.

Standard credit score customers dominate all categories, followed by Poor, then Good.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:  
* Positive Business Impact:

Banks or financial companies can identify high-risk customers (more in Low_spent_Small_value_payments) and design interventions like financial literacy programs or tailored repayment plans.

 * Negative Impact:

The large proportion of customers in Low_spent_Small_value_payments with Poor credit scores shows higher risk of non-payment or low profitability.

#### Chart - 7: Delayed payments by credit Score

In [None]:
# Chart - 7 visualization code: boxplot
sns.boxplot(data=dataset,x='Credit_Score', y='Num_of_Delayed_Payment',color='Red')
plt.title('Delayed payments vs credit_score')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.
A boxplot is chosen because it effectively shows the distribution, median, and spread of delayed payments for each credit score category.

It also highlights outliers (extreme delayed payments) which are important in credit risk analysis.

It helps compare the central tendency (median) and variability of delayed payments across Good, Standard, and Poor credit scores.


##### 2. What is/are the insight(s) found from the chart?

Answer Here
Good credit score customers have the lowest median number of delayed payments and a smaller spread.

Standard credit score customers have a higher median and wider variability of delayed payments.

Poor credit score customers show the highest median delayed payments, with a larger spread.

A few outliers exist in Good credit score customers, meaning some still delay payments significantly despite having a good score.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:
* Positive Impact:

Helps in risk-based segmentation – customers with higher delayed payments can be targeted for stricter credit limits or repayment reminders.
* Negative Impact:

If a company extends more credit to Standard or Poor credit score customers without monitoring delayed payments, it can lead to increased defaults and financial losses.

#### Chart - 8:Credit Inquries by credit score

In [None]:
# Chart - 8 visualization code: Boxplot
sns.boxplot(data=dataset,x="Credit_Score",y="Num_Credit_Inquiries",color='Green')
plt.title('Credit Inquries By Credit Score')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here: A Boxplot show the num of inquries compare with credit score categories.
Boxplot help identify data distribution and outliners which important to understanding the risky behaviour.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:
poor credit score: we can see that num of inquries median lies between the 7 to 8
good credit score: in this boxplot median lies between  the 2 to 3

from above statment we can gernalies that if num of inquries lead to the poor credit score because the finacial instability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here
* Positive Impact:

Lenders can use credit inquiry count as a key risk factor while approving loans.

Customers with frequent inquiries can be flagged as high-risk and given stricter lending criteria.

* Negative Growth Insight:

If a company aggressively approves loans for customers with too many credit inquiries, loan default risk will increase, leading to bad debt growth.





#### Chart - 9 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(dataset[['Annual_Income','Monthly_Inhand_Salary','Outstanding_Debt']])
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.
A pairplot visualizes all pairwise relationships between continuous variables and also shows the marginal distributions of each variable via histograms.

quickly spotting correlations, clusters, outliers, and distribution shapes in a single glance


##### 2. What is/are the insight(s) found from the chart?

Answer Here
Annual Income and Monthly Inhand Salary distributions appear right‑skewed, with many lower values and a long tail toward higher incomes.

Outstanding Debt also shows skewness: most users carry low-to-moderate debt, with fewer having very high balances.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.
* Young customers (20–30): Offer low-risk credit products and financial literacy programs to build credit history.

* High-risk occupations (Entrepreneurs, Teachers): Apply stricter risk checks and provide credit improvement support.

* High-income poor scorers: Evaluate debt-to-income ratio and offer debt management plans.

* High outstanding debt customers: Limit lending and provide repayment assistance to reduce NPAs.

* Frequent credit inquiries: Use inquiry count as a key risk factor to avoid over-leveraged borrowers.

* Good scorers with low utilization: Target with premium products to increase revenue.

* Habitual late payers: Set stricter limits, repayment reminders, and auto-payment options to reduce defaults.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***