# **Project Name**    - Paisabazaar Banking Fraud Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Member 1**      - Prathmesh Choure


# **Project Summary -**

Project Summary: Enhancing Credit Assessment for Paisabazaar

Paisabazaar, a leading financial services platform, is focused on refining its credit evaluation methods to make smarter lending decisions and reduce the likelihood of loan defaults. To support this objective, analyzing customer data and categorizing credit scores based on financial behavior—such as income, credit usage, and payment history—can offer crucial insights that improve risk management and overall business strategy.

The dataset under study includes several important variables like annual income, number of bank accounts, credit utilization ratio, outstanding debt, delayed payments, and credit history. These features serve as the foundation for developing a predictive model capable of classifying users into one of three categories: “Good,” “Standard,” or “Poor” credit score. This model aims to assist Paisabazaar in accurately determining the creditworthiness of applicants and tailoring financial offerings accordingly.

Univariate Analysis
This initial step allowed us to explore each variable independently. For example, studying the distribution of age revealed the most common customer age groups. A pie chart visualizing credit score segments showcased how the customer base is distributed in terms of credit risk. Boxplots of variables like annual income highlighted both income variation and the presence of extreme values, which are critical when designing custom loan products for high-income or outlier profiles.

Bivariate Analysis
Next, we examined how two variables relate to each other. A scatter plot of annual income versus outstanding debt showed that while high-income individuals may carry more debt, this doesn’t always indicate a poor credit profile. A boxplot comparing credit utilization ratios across credit score categories illustrated that higher utilization generally correlates with lower credit ratings, reinforcing its importance as a risk metric. Additionally, a line chart analyzing the relationship between monthly in-hand salary and the number of delayed payments suggested that customers with lower salaries are more prone to payment delays—an indicator of financial pressure.

Multivariate Analysis
To gain deeper insights, we explored how multiple factors interact simultaneously. A correlation heatmap of numerical variables like income, debt, and credit inquiries helped us detect strong relationships that may affect credit risk. Moreover, a pairplot involving age, income, and credit scores offered a multi-dimensional perspective on customer segments, highlighting how different features cluster together for various credit risk groups.

# **GitHub Link -**

https://github.com/prattham30/Paisabazaar-Banking-Fraud


# **Problem Statement**


Paisabazaar, a prominent financial services platform, plays a key role in helping users compare and apply for credit and banking products. A fundamental aspect of this service involves evaluating the creditworthiness of applicants — a factor that significantly influences loan approval rates and risk control measures. While traditional credit scoring methods are widely used, they often fail to capture the full range of financial behaviors and personal characteristics, which can lead to inaccurate risk assessments and suboptimal lending decisions.

To overcome these limitations, Paisabazaar aims to implement a data-driven credit scoring solution that leverages detailed customer demographic and financial information. The objective is to build a robust predictive model that classifies individuals into three distinct credit score categories: “Good,” “Standard,” and “Poor.” By doing so, the company seeks to streamline its credit evaluation process, reduce the risk of defaults, and offer more customized financial solutions to its users.

The main challenge is to extract meaningful insights from the dataset, pinpoint the most influential variables affecting credit scores, and develop an accurate and scalable machine learning model. This approach will not only support more reliable lending decisions but also enhance user experience by enabling the delivery of targeted financial products and services.

Ultimately, this initiative will empower Paisabazaar to refine its credit assessment strategies, improve operational efficiency, and drive business growth through smarter, data-informed decision-making.

#### **Define Your Business Objective?**

The primary objective of this case study is to strengthen Paisabazaar’s credit assessment system by building a predictive model capable of accurately classifying customers based on their credit scores. This model will leverage a range of customer attributes, including:

1. Income levels

2. Credit utilization ratio (how much of their available credit customers use)

3. Payment behavior (on-time payments, minimum dues, delays)

4. Outstanding debts

5. Other financial indicators

**Key Goals**

1. Enhance Risk Management
Accurately evaluating an individual’s creditworthiness will help reduce the chances of loan defaults, thereby supporting safer lending decisions for Paisabazaar’s partner institutions.

2. Streamline Loan Approvals
Automatically classifying customers into categories such as Good, Standard, or Poor credit scores will make the loan approval process faster and more efficient.

3. Deliver Personalized Financial Solutions
Using predicted credit scores, Paisabazaar can offer tailored financial product recommendations—such as specific loan options or credit cards—based on customer profiles, ultimately improving customer satisfaction and trust.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
cd /content/drive/MyDrive/Project

In [None]:
data = pd.read_csv('dataset.csv')

### Dataset First View

In [None]:
# Dataset First Look
data.head(5)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
data.shape

### Dataset Information

In [None]:
# Dataset Info
data.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(data[data.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(data.isnull().sum())

In [None]:
# Visualizing the missing values
# Checking Null Value by plotting Heatmap
sns.heatmap(data.isnull(), cbar=True)

### What did you know about your dataset?

The dataset is from the financial services industry (Paisabazaar), containing 100,000 rows and 28 columns, with the objective of analyzing customer credit behavior and predicting credit scores to enhance risk assessment, and there are no missing or duplicate values in the dataset.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
data.columns

In [None]:
# Dataset Describe
data.describe(include = 'all')

### Variables Description

•	**ID**	:	Unique identifier for each record

•	**Customer_ID**	:	Unique ID assigned to each customer

•	**Month**	•	Month of the transaction/activity

•	**Name**	:	Customer’s full name

•	**Age**	:	Age of the customer

•	**SSN**	:	Social Security Number (masked/pseudonymized)

•	**Occupation**	:	Customer’s job profile (e.g., Scientist, Teacher)

•	**Annual_Income**	:	Yearly income of the customer (in USD)

•	**Monthly_Inhand_Salary**	:	Average monthly salary after tax

•	**Num_Bank_Accounts**	:	Number of active bank accounts

•	**Num_Credit_Card**	:	Number of credit cards held

•	**Interest_Rate**	:	Interest rate on credit card loans

•	**Num_of_Loan**	:	Number of loans the customer has

•	**Type_of_Loan**	:	Types of loans (can be multiple, separated by commas)

•	**Delay_from_due_date**	:	Average delay in days for payment past due date

•	**Num_of_Delayed_Payment**	:	Total number of delayed payments

•	**Changed_Credit_Limit**	:	Change in credit card limit over time

•	**Num_Credit_Inquiries**	:	Number of credit inquiries made

•	**Credit_Mix**	:	Mix of different types of credits (Good, Standard, Bad)

•	**Outstanding_Debt**	:	Amount of debt still unpaid

•	**Credit_Utilization_Ratio**	:	Ratio of used credit to available credit

•	**Credit_History_Age**	:	Age of credit history in months

•	**Payment_of_Min_Amount**	:	Whether the customer pays the minimum due amount (Yes/No/NA)

•	**Total_EMI_per_month**	:	Total monthly EMI payments

•	**Amount_invested_monthly**	:	Average monthly investment

•	**Payment_Behaviour**	:	Spending behavior like "High_spent_Small_value_payments"

•	**Monthly_Balance**	:	Balance remaining at the end of the month

•	**Credit_Score**	:	Target variable: Credit rating of the customer (Good, Standard, Poor)












### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

for i in data.columns.tolist():
  print("No. of unique values in ",i,"is",data[i].nunique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
data['Payment_of_Min_Amount']=data['Payment_of_Min_Amount'].replace('NM','No') # Replace NM into No according to variable discreption  befor Payment_of_Min_Amount: ['No' 'NM' 'Yes']  after ['No', 'Yes']
data['Payment_of_Min_Amount'].unique()

In [None]:
# convert float into int
data['Num_Bank_Accounts'] = data['Num_Bank_Accounts'].astype('int64')
data['Age'] = data['Age'].astype('int64')
data['Num_Credit_Inquiries'] = data['Num_Credit_Inquiries'].astype('int64')

print(data.dtypes)

In [None]:
# Round upto two decimal
data['Monthly_Inhand_Salary'] = data['Monthly_Inhand_Salary'].round(2)
data['Total_EMI_per_month']=  data['Total_EMI_per_month'].round(2)
data['Amount_invested_monthly'] = data['Amount_invested_monthly'].round(2)
data['Credit_Utilization_Ratio'] = data['Credit_Utilization_Ratio'].round(2)
data['Monthly_Balance'] = data['Monthly_Balance'].round(2)
data['Monthly_Balance'] = data['Monthly_Balance'].round(2)
data.head(3)

In [None]:
# Remove unnesery column like Name and SSN
data.drop('Name',inplace=True,axis=1)
data.drop('SSN',inplace=True,axis=1)

In [None]:
data.columns

In [None]:
# find all missing value
missing_values = data.isnull().sum().sort_values()
missing_values

In [None]:
# find duplicte value in data using duplicate function
data.duplicated().sum()

### What all manipulations have you done and insights you found?

1. The Payment_of_Min_Amount column had some ambiguity ('NM'), which was
clarified by replacing it with 'No'.

2. Numeric data was cleaned up by converting floats to integers where appropriate and rounding monetary values for precision.

3. Personal columns like Name and SSN were dropped as they were not useful for analysis.

4. The dataset was well-prepared, with no missing or duplicate values, indicating a clean and ready-to-use dataset for further analysis.




## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Pie Chart: Credit Score of the person

In [None]:
# Chart - 1 visualization code
plt.pie(data['Credit_Score'].value_counts(),labels = data['Credit_Score'].value_counts().index)
plt.title('Credit Score of the person')
plt.legend(data['Credit_Score'].value_counts().index)
plt.show()

##### 1. Why did you pick the specific chart?

*Answer* : Pie charts are useful for understanding proportions. It shows how different credit scores are distributed as a part of the whole.

##### 2. What is/are the insight(s) found from the chart?

Answer : A majority of customers fall into the "Good" credit score category, with fewer in the "Poor" category.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : This is a positive insight because it shows the customer base is generally creditworthy.

#### Chart - 2 Histogram: Distribution of Age

In [None]:
# Chart - 2 visualization code

sns.histplot(data['Age'], bins=10, kde=True, color='b')
plt.title('Distribution of Age of Individuals')
plt.xlabel('Age')
plt.ylabel('Number of Individuals')
plt.grid(True)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Answer : A histogram is ideal for understanding the age distribution of customers.

##### 2. What is/are the insight(s) found from the chart?

Answer : A skewed distribution (e.g., more customers being younger) could help target age-specific services.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : Knowing which age groups dominate the customer base allows for targeted marketing and product development

#### Chart - 3 Bar Chart for Occupation

In [None]:
# Chart - 3 visualization code
sns.barplot(y=data['Occupation'].value_counts().index,x=data['Occupation'].value_counts(),palette='Set2')
plt.xlabel('Occupation')
plt.ylabel('Type of Individual')
plt.title('Occupation')
plt.show()

##### 1. Why did you pick the specific chart?

Answer : To visualize the frequency of different occupations.

##### 2. What is/are the insight(s) found from the chart?

Answer : Knowing the occupation with the most loan applicants can help focus marketing efforts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


Answer : Provides clarity on which occupational segments to target for financial products.

#### Chart - 4 Boxplot: Annual Income

In [None]:
# Chart - 4 visualization code
sns.boxplot(data['Annual_Income'],palette='Set2')
plt.xlabel('Annual_Income')
plt.ylabel('No of Individual')
plt.title('Annual_Income')
plt.show()

##### 1. Why did you pick the specific chart?

Answer : A box plot visualizes the spread of the data and helps identify outliers.

##### 2. What is/are the insight(s) found from the chart?

Answer : There are a few high-income individuals, but most fall within a certain range.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : Outliers could represent high-net-worth individuals who might require specialized financial services.

#### Chart - 5 Line Chart: Monthly Balance over Time

In [None]:
# Chart - 5 visualization code
cust_data=data[data['Customer_ID']==3392]
plt.plot(cust_data['Month'],cust_data['Monthly_Balance'])
plt.xlabel('Month')
plt.ylabel('Monthly Balance')
plt.title('Monthly Balance over the Month')
plt.show()

##### 1. Why did you pick the specific chart?

Answer : Line charts are great for showing trends over time.

##### 2. What is/are the insight(s) found from the chart?

Answer : The balance for this customer shows fluctuations but stays relatively stable.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : Tracking monthly balances helps in understanding cash flow patterns and credit utilization behavior.

#### Chart - 6 Scatter Plot: Annual Income vs. Monthly Inhand Salary

In [None]:
# Chart - 6 visualization code
sns.scatterplot(x=data['Annual_Income'],y=data['Monthly_Inhand_Salary'],palette='Set2')
plt.xlabel('Annual Income')
plt.ylabel('Monthly Inhand Salary')
plt.title('Annual Income vs Monthly Inhand Salary')
plt.show()

##### 1. Why did you pick the specific chart?

Answer : Scatter plots help to show relationships between two continuous variables.

##### 2. What is/are the insight(s) found from the chart?

Answer : There is a positive relationship between annual income and monthly salary, as expected.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : Insights about income can help in offering relevant credit or loan products based on income levels.



#### Chart - 7 Bar Chart: Payment Behaviour

In [None]:
# Chart - 7 visualization code
sns.barplot(x=data['Payment_Behaviour'].value_counts().index,y=data['Payment_Behaviour'].value_counts(),palette='Set2')
plt.xlabel('Payment Behaviour')
plt.ylabel('No of Individual')
plt.title('Payment Behaviour')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

Answer : A bar chart is ideal for categorical data such as Payment Behaviour.

##### 2. What is/are the insight(s) found from the chart?

Answer : Certain professions like Scientist, Engineer, and Developer are more common.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : Marketing financial products tailored to specific Payment Behaviour (e.g., high-tech workers) could be beneficial.

#### Chart - 8 Boxplot: Monthly Balance

In [None]:
# Chart - 8 visualization code
sns.boxplot(data['Monthly_Balance'],palette='Set2')
plt.xlabel('Monthly Balance')
plt.ylabel('No of Individual')
plt.title('Monthly Balance')
plt.show()

##### 1. Why did you pick the specific chart?

Answer : Box plots help visualize the distribution and identify outliers.



##### 2. What is/are the insight(s) found from the chart?

Answer : There are significant outliers in monthly balance, suggesting that some individuals have much higher savings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : High balances might indicate customers who could be targeted for investment or savings products.

#### Chart - 9 Pie Chart: Payment of Minimum Amount

In [None]:
# Chart - 9 visualization code
plt.pie(data['Payment_of_Min_Amount'].value_counts(),labels=data['Payment_of_Min_Amount'].value_counts().index)
plt.title('Payment of Min Amount')
plt.legend(data['Payment_of_Min_Amount'].value_counts().index)
plt.show()

##### 1. Why did you pick the specific chart?

Answer : Pie charts are great for understanding proportions of binary or categorical data.



##### 2. What is/are the insight(s) found from the chart?

Answer : A significant portion of customers do not pay the minimum amount.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : Customers not paying the minimum could be at risk of default, negatively impacting business.

#### Chart - 10 Line Chart: Credit Utilization Ratio over the Month of customer id 3392 as an example.

In [None]:
# Chart - 10 visualization code
cust_data=data[data['Customer_ID']==3392]
plt.plot(cust_data['Month'],cust_data['Credit_Utilization_Ratio'],color='r')
plt.xlabel('Month')
plt.ylabel('Credit Utilization Ratio')
plt.title('Credit Utilization Ratio over the Month customer id 3392')
plt.show()

##### 1. Why did you pick the specific chart?

Answer : Line charts effectively show the trend of credit utilization over time.



##### 2. What is/are the insight(s) found from the chart?

Answer : Credit utilization shows significant fluctuation.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : High credit utilization could increase the risk of default, a negative impact on business.



#### Chart - 11 Histogram: Total EMI per month

In [None]:
# Chart - 11 visualization code
sns.histplot(data['Total_EMI_per_month'],bins=25,kde=True,color='g')
plt.xlabel('EMI Per Month')
plt.ylabel('No of Individual')
plt.title('Total EMI per month')
plt.show()

##### 1. Why did you pick the specific chart?

Answer : A histogram helps in understanding the distribution of EMI payments.



##### 2. What is/are the insight(s) found from the chart?

Answer : Most customers pay a moderate EMI, with few paying high amounts.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : Moderate EMI payments suggest reasonable debt levels, which is positive for business sustainability.



#### Chart - 12 Scatter plot: Outstanding Debt vs Credit Utilization Ratio

In [None]:
# Chart - 12 visualization code
sns.scatterplot(x=data['Outstanding_Debt'],y=data['Credit_Utilization_Ratio'],palette='Set2')
plt.xlabel('Outstanding Debt')
plt.ylabel('Credit Utilization Ratio')
plt.title('Outstanding Debt vs Credit Utilization Ratio')
plt.show()

##### 1. Why did you pick the specific chart?

Answer : Scatter plots show the relationship between two financial metrics.



##### 2. What is/are the insight(s) found from the chart?

Answer : Higher debt correlates with higher credit utilization.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : Customers with higher debt may be more prone to financial difficulties, potentially leading to defaults.



#### Chart - 13 Bar Chart: Number of Bank Account

In [None]:
# Chart - 13 visualization code
sns.barplot(x=data['Num_Bank_Accounts'].value_counts().index,y=data['Num_Bank_Accounts'].value_counts(),palette='Set2')
plt.xlabel('Num Bank Accounts')
plt.ylabel('No of Individual')
plt.title('Number of Bank Accounts')
plt.show()

##### 1. Why did you pick the specific chart?

Answer : A bar chart helps to compare the number of individuals with different numbers of bank accounts.



##### 2. What is/are the insight(s) found from the chart?

Answer : Most customers have 2-3 bank accounts.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer : Understanding banking behavior helps in offering cross-bank financial products.



#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(12,10))
sns.heatmap((data.select_dtypes(include=['float64', 'int64']).corr()),annot=True, cmap="viridis", fmt=".2f")
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

Answer : To see correlations between numerical variables.



##### 2. What is/are the insight(s) found from the chart?

Answer : Strong correlations may reveal key financial behaviors (e.g., high correlation between age and income).



#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(data[['Age','Annual_Income','Monthly_Balance','Outstanding_Debt','Credit_Utilization_Ratio','Num_Credit_Inquiries',]],palette='Set2')
plt.show()

##### 1. Why did you pick the specific chart?

Answer : To explore relationships between key variables like Age, Income, Balance Debt, Utilizetion, and Inquiries.



##### 2. What is/are the insight(s) found from the chart?

Answer : The pair plot helps to visualize any patterns or outliers in the data.



## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Develop a Credit Scoring Prediction System
Leverage key financial indicators such as annual income, existing debts, credit utilization rates, repayment patterns, and past delays in payments to build a machine learning-based credit scoring model. Algorithms like Random Forest, XGBoost, or Logistic Regression can be employed to classify users into categories like “Good,” “Average,” and “Poor” credit profiles.
This automated system will streamline the evaluation process, ensuring quicker decisions and minimizing risk by flagging potentially default-prone borrowers early.

2. Offer Tailored Financial Product Suggestions
By segmenting users based on their credit profiles and financial behavior (e.g., number of loans taken, credit inquiries made, and payment discipline), Paisabazaar can provide more personalized loan products.
For instance, customers with strong credit scores could be targeted with premium offerings and lower interest rates, while those with weaker profiles may receive smaller, more manageable loans or be guided toward financial advisory services.
Additionally, using insights such as common income brackets or job types among applicants can further refine marketing and product recommendations.

3. Strengthen Risk Management Framework
Implement risk-based pricing strategies where applicants with higher credit risk (e.g., high utilization or missed payments) face slightly adjusted interest rates. At the same time, offer structured payment plans or tools to help these customers stay on track.
Such measures help balance business growth with risk mitigation, protecting revenue while extending services to a broader user base.

4. Promote Credit Awareness and Support
Introduce educational initiatives for users who display poor credit habits, such as only paying minimum dues or maxing out credit limits. Providing timely credit counseling and practical advice on improving financial health can boost both repayment rates and customer trust.
Empowering users to manage their credit responsibly benefits both the customer and the platform in the long run.

# **Conclusion**

By harnessing the power of data analytics and implementing a predictive credit scoring system, Paisabazaar can enhance the accuracy and efficiency of its credit evaluation process. This strategic shift will not only reduce the chances of loan defaults but also enable the company to offer customized financial products tailored to individual customer needs—thereby improving customer satisfaction and retention.
Furthermore, adopting risk-based pricing and prioritizing financial education for high-risk borrowers will allow the platform to manage credit risks more effectively. Altogether, these actions will support sustainable business growth and strengthen Paisabazaar’s position as a customer-centric financial marketplace.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***