<a href="https://colab.research.google.com/github/siddhantpagare486/PaisaBazaar_Project/blob/main/PaisaBazaar_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **Paisabazaar Credit Score Prediction and Risk Profiling**


### **Project Type**    - **Exploratory Data Analysis**
### **By - Siddhant Pagare**


## **Project Summary -**




This project aims to build a robust credit score classification model for Paisabazaar, a leading financial services platform, using a systematic Exploratory Data Analysis (EDA) and machine learning approach. The primary objective is to predict whether a customer’s credit score is Good, Average, or Poor based on financial behavior and personal data, such as income, credit utilization, payment history, and outstanding debt. The project begins with comprehensive technical documentation and data exploration in a Colab Notebook, using commands like .head(), .tail(), and .describe() to understand the dataset. A data dictionary is created to define each feature clearly.

Missing values and null entries are detected and handled using imputation strategies based on column types, ensuring clean and usable data. Outliers are identified using Interquartile Range (IQR) and capped or transformed appropriately. Correlation heatmaps, pairplots, and trend analysis are used to derive meaningful insights from the data, such as the impact of late payments or high credit usage on credit scores. The project adheres to the milestones in the problem statement by incorporating five or more types of visualizations—bar plots, histograms, pie charts, box plots, and scatter plots—to present patterns and distribution.

The code is modular, well-commented, and outputs are properly formatted for clarity. A final summary consolidates the findings from EDA, data cleaning, and model training. For presentation, a video walkthrough will be created, ensuring grammatical accuracy and fluency, clearly explaining the approach, insights, and model logic to stakeholders. The project not only fulfills the rubric requirements but also demonstrates a scalable approach to credit scoring, helping Paisabazaar enhance its risk assessment and financial recommendation systems.



# **Problem Statement**


Paisabazaar aims to improve its credit risk assessment process by accurately predicting the credit score category (Good, Average, Poor) of individuals based on their financial and behavioral data. Credit scores are critical for financial institutions to evaluate the creditworthiness of a customer and decide loan eligibility, interest rates, and credit limits. However, manual assessment is time-consuming and inconsistent.

The objective of this project is to build a classification model that can predict the credit score category of a customer using features such as annual income, credit card utilization, outstanding debt, number of late payments, and repayment behavior. This will involve performing a thorough Exploratory Data Analysis (EDA), handling missing values and outliers, generating actionable insights, and training a machine learning model that is both accurate and interpretable.

The solution should enhance Paisabazaar’s ability to:

*   Automate creditworthiness evaluation
*   Reduce loan default risks
*   Offer personalized financial products
*   Improve customer satisfaction and financial inclusion






# **General Guidelines** : -  

1. **Understand the Problem Clearly**

*Define the goal* : Classify customers into credit score categories (Good, Average, Poor).

Know the business impact: Credit scoring affects loan approvals, product offerings, and default risk.

2. **Dataset Familiarization **

Explore all features and understand what each one represents.

Check data types, ranges, and missing/null values.

Create a data dictionary for easy reference.

3. **Exploratory Data Analysis (EDA)**

Use .head(), .describe(), .info() to summarize data.

Perform value counts on categorical features.

Use at least 5 types of visualizations:

*Histogram*

*Boxplot*

*Correlation Heatmap*

*Pie Chart or Count Plot*

*Scatter Plot or Pair Plot *italicized text*

4. **Data Cleaning**

Handle missing values using mean, median, or mode.

Treat outliers using IQR or Z-score.

Convert categorical variables using encoding techniques.

5. **Feature Engineering**

Create new features if necessary (e.g., debt-to-income ratio).

Normalize or scale features if required.

6. **Modeling**

Try multiple classification algorithms (Logistic Regression, Random Forest, Decision Tree, XGBoost).

Split dataset into train and test sets.

Use cross-validation for better generalization.

7. **Evaluation**

Use metrics like accuracy, precision, recall, and F1-score.

Interpret the results clearly with confusion matrix.

8. **Documentation**

Keep your code modular and commented.

Format outputs for readability.

Summarize findings after EDA and modeling.

Clearly explain your steps in a Colab Notebook.

9. **Video Presentation**

Clearly articulate your approach and results.

Ensure fluency and grammatical accuracy.

Highlight visual insights, modeling choices, and impact on stakeholders.

10. **Final Submission**

Include the Colab notebook, project summary, video presentation, and all relevant files.


# ***Let's Begin !***

## ***1. Getting to know the data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
data=pd.read_csv("/content/drive/MyDrive/Paisabazaar.csv")
data

In [None]:
data.head(10) #First 10 rows

In [None]:
data.tail(7) # Last 7 rows

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
data.shape

### Dataset Description

In [None]:
# Dataset Description
data.describe()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count of columns
data.duplicated()

In [None]:
#find unique in name column
data["Name"].unique()
len(data["Name"].unique())

#### Missing Values/Null Values and column datatype info

In [None]:
# Missing Values/Null Values Count
data.isnull().sum()

In [None]:
# Visualizing the missing values
data.isnull().sum().plot(kind='bar') # will show nothing coz no missing value is present

In [None]:
#Column names
data.columns

In [None]:
#dataset information
data.info()

In [None]:
#get columns with float data type
data.select_dtypes(include="float64").columns


In [None]:
#length of float data type columns
len(data.select_dtypes(include="float64").columns)

In [None]:
#columns with integer data type
data.select_dtypes(include="int64").columns

In [None]:
#number of columns with integer data type
len(data.select_dtypes(include="int64").columns)

In [None]:
#object columns
data.select_dtypes(include="object").columns

In [None]:
#length of object datatype columns
len(data.select_dtypes(include="object").columns)

In [None]:
#unique valuse
for col in data.columns:
    print(f"{col}: {data[col].nunique()} unique values")


### What did you know about your dataset?

Upon initial examination, the dataset consists of 100,000 records and 28 well-structured columns. It is noteworthy that the dataset is free from duplicate entries and contains no missing values, providing a robust and reliable foundation for analysis. This eliminates the need for initial data cleaning related to null values or redundancy.

The dataset features:

1] 18 columns with float64 data type,

2] 3 columns with int64 data type, and

3] 7 columns with object data type.

This rich combination of numerical and categorical variables offers substantial potential for extracting meaningful insights and building predictive models.

## ***2. Univariate Analysis***

In [None]:
# for annual income
sns.histplot(data['Annual_Income'], kde=True)
plt.title("Distribution of Annual Income")
plt.show()

In [None]:
#box plot for Annual income
fig=px.box(data,x="Annual_Income",title="Boxplot for Annual income")
fig.show()

Insights of Annual Income of Customers:-


*   The Annual income vs count plot is right skewed

- Median annual income is approximately ₹37,000–₹40,000.
- The majority of incomes (50%) fall between ₹20,000 and ₹70,000.
- Data shows a **positive skew**, with some individuals earning exceptionally high incomes.
- Outliers are present above ₹150,000, indicating a small group of high earners.
- These findings suggest a dominant middle-income segment with a niche high-income market, both of which require different product strategies.




In [None]:
#for Outstanding Debt
sns.histplot(data["Outstanding_Debt"],kde=True,color="red")
plt.title("Distribution of Outstanding Debt")
plt.show()

In [None]:
#Box plot for Outstanding Dept for finding outliers
fig=px.box(data,x="Outstanding_Debt",title="Box Plot of Outstanding Debt",points='outliers')
fig.show()

Insights for Outstanding_Debt:-
- The distribution of `Outstanding_Debt` is **right-skewed**, meaning most users owe a relatively small amount.
- Multiple peaks in the distribution suggest the presence of **distinct customer segments**.
- The median outstanding debt is approximately ₹1300.
- 50% of users have debts between ₹500 and ₹2000.
- A significant number of outliers exist above ₹4000, with some nearing ₹5000.
- A small portion of users has very high outstanding debt (>₹4000), which could indicate **higher credit risk**.



In [None]:
#MOnthly balance
sns.histplot(data['Monthly_Balance'], kde=True,color="purple")
plt.title("Distribution of Monthly Balance")
plt.show()

In [None]:
fig=px.box(data,x="Monthly_Balance",title="Box Plot of Monthly Balance",points='outliers')
fig.show()

Insights for Monthly Balance:-


*   The distribution is right-skewed (positively skewed).
*   The median (central vertical line in the box) lies slightly left of the box center, indicating a slightly left-skewed distribution, though still somewhat symmetric.
*   The box represents the middle 50% of the data (from Q1 to Q3).
*   This range shows the majority of customers have monthly balances between approximately ₹250 and ₹500.
*   A significant number of outliers are visible beyond ₹800, with some reaching nearly ₹1200.



In [None]:
#For number of delayed payment
sns.histplot(data,x="Num_of_Delayed_Payment",color="green")
plt.title("Distribution of Number of Delayed Payments")
plt.show()

In [None]:
fig=px.box(data,x="Num_of_Delayed_Payment",title="Box Plot of Number of Delayed Payments")
fig.show()

Insights for Delayed Payments:-


*   Customers with ≤ 8 delayed payments may be considered low risk and reliable.

*   Customers with > 19 delays fall into the high-risk zone and may need attention for credit risk assessment.
*   This feature can be valuable for segmenting customers based on payment discipline.



In [None]:
#Changed Credit Limit
sns.histplot(data,x="Changed_Credit_Limit",color="orange",kde=True)
plt.title("Distribution of Changed Credit Limit")
plt.show()

In [None]:
fig=px.box(data,x="Changed_Credit_Limit",title="Distribution of Changed Credit Limit")
fig.show()

Insights for delayed payment:-

*  The median changed credit limit is approximately 10, indicating that 50% of the customers had a credit limit change of ≤ 10 units.

*  The minimum value appears close to 0, showing some customers had very minimal or no change in their credit limits.

*   The maximum is around 30, suggesting that some customers experienced significant increases in their credit limits.

*   The right whisker is longer, indicating a slight positive skew — a small number of customers had much higher credit limit changes compared to the rest.

*  A small segment with high credit limit changes (≥25) could represent customers with improved creditworthiness or income.



In [None]:
#Credit Utilization Ratio
sns.histplot(data,x="Credit_Utilization_Ratio",color="blue",kde=True)
plt.title("Distribution of Credit Utilization Ratio")
plt.show()

In [None]:
fig=px.box(data,x="Credit_Utilization_Ratio",title="Distribution of Credit Utilization Ratio")
fig.show()

Insights for Credit Utilization ratio:-

*  The presence of outliers on the higher end, suggests a slight positive (right) skew.
*  The minimum credit utilization ratio observed is 20%. The maximum observed ratio is 50%.
*  The median credit utilization ratio is approximately 32.31%.
*  The data suggests a population that generally manages its credit utilization in the moderate to low-moderate range, with a small segment that uses a higher proportion of their available credit.

In [None]:
#Interest Rate
sns.countplot(data,x="Interest_Rate",color="brown")
plt.title("Distribution of Interest Rate")
plt.xticks(rotation=90)
plt.show()

In [None]:
fig=px.box(data,x="Interest_Rate",title="Distribution of Interest Rate")
fig.show()

Insights for Interest Rate:-

The distribution of interest rates ranges from 1% to 34%. The majority of interest rates fall between 7% and 20%, with a median of 13%. There are no identified outliers, and the distribution appears to be relatively symmetric.

In [None]:
#for number of credit card
sns.countplot(data,x="Num_Credit_Card",color="pink")
plt.title("Distribution of Number of Credit Cards")
plt.show()

In [None]:
fig=px.box(data,x="Num_Credit_Card",title="Distribution of Number of Credit Cards")
fig.show()

Insights for Credit Card Number:-

The number of credit cards held by individuals in this dataset ranges from 0 to 11. The majority of individuals have between 4 and 7 credit cards, with a median of 5. The distribution shows a slight tendency towards individuals having more credit cards, but all observed counts fall within the calculated fences, indicating no extreme outliers.

In [None]:
# NUmbers of bank Account
sns.countplot(data,x="Num_Bank_Accounts",color="yellow")
plt.title("Distribution of Number of Bank Accounts")
plt.show()

In [None]:
fig=px.box(data,x="Num_Bank_Accounts",title="Distribution of Number of Bank Accounts")
fig.show()

Insights for bank accounts:-
- The median number of bank accounts is **5**.
- The data has **no outliers**, as all values fall within the lower (0) and upper (11) fences.
- The distribution appears to be **fairly symmetric**, with no visible skewness.
- The number of bank accounts ranges from **0 to 11**, showing a moderate spread.

In [None]:
#For numerical column -1] Annual Income with credit score
avg_income = data.groupby('Credit_Score')['Annual_Income'].mean().reset_index()

fig = px.bar(avg_income, x='Credit_Score', y='Annual_Income',
             title='Average Annual Income by Credit Score',
             color='Credit_Score', text='Annual_Income')
fig.show()

In [None]:
#Box plot for Annual income
fig=px.box(avg_income, x='Annual_Income',title="Box Plot of Annual Income",points='outliers')
fig.show()

Insights:-


*   If Average Annual income > 50987  credit score is good
*   If Average Annual income <= 50987 credit score is standard
*   If Average Annual income <= 40584 credit score is poor

Box Plot :-

- **Median Annual Income** is around ₹50,000.
- Most customers earn between ₹43,000 and ₹57,000.
- Distribution is **fairly symmetrical**, indicating no skewness.
- **No extreme outliers** suggest clean data and a well-targeted customer base.
- Insights can help Paisabazaar develop financial products for the middle-income segment, which represents the bulk of users.






In [None]:
# 2] Outstanding_Debt
sns.histplot(data['Outstanding_Debt'], kde=True)
plt.title("Distribution of Outstanding Debt")
plt.show()

In [None]:
#Box plot for Outstending debt
fig=px.box(data,x="Outstanding_Debt",title="Box Plot of Outstanding Debt",points='outliers')
fig.show()

## 3. ***Bivariate Analysis***


In [None]:
#Annula income vs credit score
avg_income=data.groupby("Credit_Score")["Annual_Income"].mean().reset_index()
fig=px.bar(avg_income,x="Credit_Score",y="Annual_Income",title="Average Annual Income by Credit Score",color="Credit_Score")
fig.show()

Analysis: Average Annual Income by Credit Score

- The average annual income **increases** as the credit score category improves.
- Individuals with a **'Good' credit score** have the **highest average income**, around **₹65,200**.
- Those with a **'Standard' credit score** earn moderately, with an average income slightly above **₹50,000**.
- People categorized under **'Poor' credit score** have the **lowest average annual income**, approximately **₹40,000**.
- This trend suggests a **positive relationship between income and credit score** — higher income levels may lead to better credit behavior, resulting in a better credit score.
- It also highlights that **financial capacity** could influence one's ability to manage debt responsibly and maintain a strong credit profile.


In [None]:
#Outsatnding Debt vs Credit score
avg_debt=data.groupby("Credit_Score")["Outstanding_Debt"].mean().reset_index()
fig=px.bar(avg_debt,x="Credit_Score",y="Outstanding_Debt",title="Average Outstanding Debt by Credit Score",color="Credit_Score")
fig.show()

 Insights: Average Outstanding Debt by Credit Score

1. **Poor credit scores** have the **highest average debt** (~₹2081), indicating higher financial risk.
2. **Standard scores** show **moderate debt** (~₹1300–₹1400), reflecting average credit behavior.
3. **Good credit scores** have the **lowest debt** (~₹800), showing strong financial discipline.
4. There is a **negative correlation**: higher debt generally leads to lower credit scores.

---

 Analytical Implications

- Poor scorers may need **credit limits or monitoring** due to high risk.
- Use **targeted interventions** like counseling or debt repayment plans.



In [None]:
#Monthly balance vs Credit Score
avg_balance=data.groupby("Credit_Score")["Monthly_Balance"].mean().reset_index()
fig=px.bar(avg_balance,x="Credit_Score",y="Monthly_Balance",title="Average Monthly Balance by Credit Score",color="Credit_Score")
fig.show()

 Analysis: Average Monthly Balance by Credit Score

- Individuals with a **'Good' credit score** have the **highest average monthly balance**, approximately **₹456.7**, indicating better savings or spending control.
- Those in the **'Standard' credit score** category maintain an average monthly balance of around **₹400**, which is moderate.
- Individuals with a **'Poor' credit score** have the **lowest average monthly balance**, around **₹340**.
- This trend shows a **positive relationship between monthly balance and credit score** — people who maintain higher balances are more likely to have better credit scores.
- It highlights the importance of financial discipline and balance retention in maintaining a healthy credit profile.


In [None]:
#Credit Utilization Ratio vs Credit score
avg_utilization=data.groupby("Credit_Score")["Credit_Utilization_Ratio"].mean().reset_index()
fig=px.pie(avg_utilization,values="Credit_Utilization_Ratio",names="Credit_Score",title="Average Credit Utilization Ratio by Credit Score")
fig.show()

### 📊 Insights: Average Credit Utilization Ratio by Credit Score

1. **Good Credit Score:**  
   - Accounts for **33.7%** of the average credit utilization ratio.
   - Slightly higher share than the other categories, possibly indicating optimal credit use within acceptable limits.

2. **Standard Credit Score:**  
   - Represents **33.3%** of the average credit utilization.
   - Very close to both good and poor categories, suggesting similar credit usage patterns.

3. **Poor Credit Score:**  
   - Makes up **33.0%** of the utilization ratio.
   - Slightly lower than others, but the difference is marginal.

---

### 🧠 Analytical Implications

- The **credit utilization ratio is relatively balanced** across all credit score categories.
- This suggests **credit utilization alone may not be the strongest factor** differentiating credit scores in this dataset.
- To better distinguish credit score categories, combine this metric with others like **outstanding debt, payment history**, or **delayed payments**.




In [None]:
#Number of loans vs credit score
avg_loans=data.groupby("Credit_Score")["Num_of_Loan"].mean().reset_index()
fig=px.bar(avg_loans,x="Credit_Score",y="Num_of_Loan",title="Average Number of Loans by Credit Score",color="Credit_Score")
fig.show()

Insights: Average Number of Loans by Credit Score

- Individuals with a **'Poor' credit score** have the **highest average number of loans** (~4.8).
- People in the **'Standard' category** take an average of around **3.3 loans**.
- Those with a **'Good' credit score** have the **lowest average number of loans** (~2.2).
- A **higher number of loans** appears to be associated with a **lower credit score**.


In [None]:
#Numer of delayed payment vs Credit score
avg_delay=data.groupby("Credit_Score")["Num_of_Delayed_Payment"].mean().reset_index()
fig=px.bar(avg_delay,x="Credit_Score",y="Num_of_Delayed_Payment",title="Average Number of Delayed Payments by Credit Score",color="Credit_Score")
fig.show()

In [None]:
fig=px.box(avg_delay,x="Num_of_Delayed_Payment",title="Distribution of Number of Delayed Payments")
fig.show()

📊 Insights: Average Number of Delayed Payments by Credit Score

- 🔴 **Poor Credit Score** has the **highest delayed payments** (~16), indicating poor repayment behavior.
- 🟢 **Standard Credit Score** shows a **moderate number** of delays (~13–14), suggesting average payment discipline.
- 🔵 **Good Credit Score** has the **lowest delayed payments** (~8), reflecting strong financial responsibility.

---

 🧠 Analytical Implications

- 📉 Frequent delayed payments are strongly associated with **lower credit scores**.
- ➤ This metric can be a **critical indicator** in credit risk assessment and score modeling.
- ✅ Encouraging on-time payments can significantly improve overall credit health.


In [None]:
#Number of credit card vs Credit score
avg_credit=data.groupby("Credit_Score")["Num_Credit_Card"].mean().reset_index()
fig=px.bar(avg_credit,x="Credit_Score",y="Num_Credit_Card",title="Average Number of Credit Cards by Credit Score",color="Credit_Score")
fig.show()


📊 Insights: Average Number of Credit Cards by Credit Score

- 🔴 **Poor Credit Score** users have the **highest number of credit cards** (~6.5), indicating possible over-dependence on credit.
- 🟢 **Standard Credit Score** users have a **moderate number** of cards (~5.4), suggesting balanced credit usage.
- 🔵 **Good Credit Score** users have the **fewest credit cards** (~4.2), which may reflect more disciplined or selective credit behavior.

---

🧠 Analytical Implications

- 📉 Having **too many credit cards** may correlate with **poor credit scores**, possibly due to higher debt exposure or missed payments.
- ✅ Fewer credit cards, when used responsibly, could be a trait of **good credit management**.
- ➤ This feature can aid in identifying users at risk of **credit overextension**.


In [None]:
#Changed Credit LImit vs  Credit Card
avg_credit_limit=data.groupby("Num_Credit_Card")["Changed_Credit_Limit"].mean().reset_index()
fig=px.bar(avg_credit_limit,x="Num_Credit_Card",y="Changed_Credit_Limit",title="Average Changed Credit Limit by Number of Credit Cards",color="Num_Credit_Card")
fig.show()

📊 Insights: Average Changed Credit Limit by Number of Credit Cards

- 🔢 Users with **more credit cards tend to have a higher average change in credit limit**.
- 🟣 For users with **0 to 2 credit cards**, the average change in credit limit remains relatively low (~6–7 units).
- 🟠 From **3 to 6 cards**, the limit increase becomes more noticeable (~9–10 units).
- 🟡 Users with **8+ credit cards** experience the **highest changes**, reaching **~15–16 units**.

---

🧠 Analytical Implications

- 💳 **Number of credit cards is positively correlated** with changes in credit limits, likely due to increased credit activity and usage.
- 🔍 This trend may indicate **higher creditworthiness or risk exposure** among users with many credit cards.
- ✅ Credit issuers may be **adjusting limits more frequently** based on usage patterns and repayment history.

---

📌 Note:
- Having more cards might increase flexibility but also raises the importance of responsible usage to maintain a good credit score.


In [None]:
#nterest rate vs Credit Score
avg_interest=data.groupby("Credit_Score")["Interest_Rate"].mean().reset_index()
fig=px.bar(avg_interest,x="Credit_Score",y="Interest_Rate",title="Average Interest Rate by Credit Score",color="Credit_Score")
fig.show()

📊 Insights from "Average Interest Rate by Credit Score" Bar Chart

* **Poor credit scores incur the highest average interest rates (approx. 20.19%).**
* **Good credit scores receive the lowest average interest rates (around 7-8%).**
* **Standard credit scores have moderate average interest rates (about 14%).**

🧠 Implications:

* **Interest rates directly reflect creditworthiness, with higher risk leading to higher rates.**
* **Maintaining a good credit score offers significant financial benefits through lower interest rates.**
* **Lenders effectively differentiate risk and pricing based on credit score categories.**
"""

In [None]:
#occupation vs credit score
avg_occupation=data.groupby("Credit_Score")["Occupation"].value_counts().reset_index(name="Count")
fig=px.pie(avg_occupation,names="Occupation",values="Count",color="Occupation",title="Occupation Distribution by Credit Score")
fig.show()

📊 Insights from "Occupation Distribution by Credit Score" Pie Chart

* **Occupation distribution across credit scores appears relatively even, with no single occupation dominating.**
* **"Writer" is highlighted with a count of 6304, representing 6.3% of the total.**
* **Most occupations fall within a narrow percentage range (approximately 6.3% to 7.1%).**

🧠 **Implications:**

* **Credit scores do not seem to be disproportionately tied to any specific occupation.**
* **The dataset likely includes a diverse range of professions, preventing strong occupational bias in credit scoring.**
* **Further analysis is needed to determine if subtle differences in credit behavior exist across these nearly equally represented occupations.**


## ***4.Multivariate Analysis***

In [None]:
#Converting Credit Score in numerical column
data['Credit_Score_Label'] = data['Credit_Score'].map({'Poor': 0, 'Standard': 1, 'Good': 2})

In [None]:
heatmap_cols=["Annual_Income","Outstanding_Debt","Monthly_Balance","Credit_Utilization_Ratio","Num_of_Loan","Num_of_Delayed_Payment","Num_Credit_Card","Changed_Credit_Limit","Interest_Rate","Credit_Score_Label"]
corr_matrix=data[heatmap_cols].corr()
fig=px.imshow(corr_matrix,text_auto=True)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
fig.show()

📊 Insights from Correlation Heatmap


* **Strong Negative Correlation with Interest Rate:** Credit_Score_Label has a very strong negative correlation with 'Interest_Rate' (dark blue square, implying higher credit scores mean lower interest rates).
* **Negative Correlation with Credit Utilization Ratio:** There's a notable negative correlation between Credit_Score_Label and 'Credit_Utilization_Ratio', suggesting lower utilization is associated with better credit scores.
* **Negative Correlation with Outstanding Debt and Delayed Payments:** Credit_Score_Label shows a negative correlation with 'Outstanding_Debt' and 'Num_of_Delayed_Payment', indicating that lower debt and fewer delayed payments correspond to higher credit scores.
* **Positive Correlation with Changed Credit Limit and Monthly Balance:** There's a positive correlation with 'Changed_Credit_Limit' (likely implying more credit availability for better scores) and 'Monthly_Balance'.
* **Weak/Negligible Correlation with Annual Income, Num_of_Loan, Num_Credit_Card:** Credit_Score_Label has very weak correlations with 'Annual_Income', 'Num_of_Loan', and 'Num_Credit_Card'.

🧠 Implications:

* **Interest Rate is a Primary Consequence:** Credit Score Label is a strong predictor of the 'Interest_Rate' a borrower will receive.
* **Key Drivers of Credit Score:** Managing 'Credit_Utilization_Ratio', 'Outstanding_Debt', and 'Num_of_Delayed_Payment' are crucial for improving or maintaining a good credit score.
* **Credit Limit Increases as a Reward:** Higher credit scores may lead to 'Changed_Credit_Limit' increases.
* **Income and Loan/Card Count Less Direct:** Personal income, number of loans, or credit cards held do not appear to be primary direct drivers of the 'Credit_Score_Label' itself, though they might influence other factors.


# **Conclusion**

## 🔴 **Strong Negative Correlations (Higher Credit Score, Lower Value):**

* Interest_Rate (-0.485): As credit score improves, the interest rate tends to decrease significantly.

* Num_Credit_Inquiries (-0.435): More credit inquiries are associated with lower credit scores.

* Delay_from_due_date (-0.431): Longer delays from due dates are strongly linked to lower credit scores.

* Num_Credit_Card (-0.404): A higher number of credit cards is surprisingly associated with lower credit scores.

* Num_Bank_Accounts (-0.388): A higher number of bank accounts shows a negative correlation with credit score.

* Outstanding_Debt (-0.387): Higher outstanding debt is correlated with lower credit scores.

* Num_of_Delayed_Payment (-0.373): More delayed payments lead to lower credit scores.

* Num_of_Loan (-0.358): A higher number of loans is somewhat associated with lower credit scores.

* Changed_Credit_Limit (-0.171): While weaker, a negative correlation suggests less frequent changes to credit limits might be associated with higher credit scores.

## 🟢  **Strong Positive Correlations (Higher Credit Score, Higher Value):**

* Credit_History_Age (0.389): Older credit history is positively correlated with higher credit scores, which is expected.

## 🟠 **Moderate Positive Correlations:**

* Annual_Income (0.213): Higher annual income has a moderate positive correlation with credit score.

* Monthly_Inhand_Salary (0.210): Similar to annual income, higher monthly in-hand salary is moderately linked to better credit scores.

* Monthly_Balance (0.198): A higher monthly balance shows a positive correlation with credit score.

* Amount_invested_monthly (0.172): Higher monthly investment amounts are somewhat positively correlated with better credit scores.

## ⚪**Weak/Negligible Correlations:**

* Credit_Utilization_Ratio (0.046): Surprisingly, the correlation with credit utilization ratio is very weak, suggesting it might not be a primary linear driver of the Credit_Score_Label in this dataset, or its relationship is non-linear.

* Total_EMI_per_month (0.017): There's almost no linear correlation between total EMI per month and credit score.