<a href="https://colab.research.google.com/github/gaurav4601/capstoneproject3/blob/master/Credit_Card_Default_Prediction_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img title="Almabetter" alt="Almabetter" src="https://pbs.twimg.com/profile_images/1649033540149866497/tg4B3SVf_400x400.jpg" width=70px>

## Credit Card Default Prediction
<img title="Credit Card" alt="Credit Card" src="https://media.istockphoto.com/id/1215256045/vector/safe-payment-logo-template-designs-vector-illustration.jpg?s=612x612&w=0&k=20&c=22EA9Y3-gToqirb3PlgCqjnoprrgXyPAvO4_CZmT2Jc=" width=120px>


#### **About Project** :
>The Credit Card Default Prediction project is aimed at predicting whether a credit card user is likely to default on their payment, using machine learning techniques. Credit card default is a common problem in the financial industry, and can lead to significant financial losses for both the credit card issuer and the user.

To develop an effective credit card default prediction model, the project will involve collecting and analyzing a large dataset of credit card transactions and user information. The dataset will include a range of features such as user demographics, payment history, credit limits, and other relevant factors.

<br>
<hr>
<br>

#### **Little Bit😶‍🌫️ about Domain** 

>A credit card is a type of payment card that allows cardholders to borrow money from a bank or financial institution to make purchases. When a credit card is used to make a purchase, the cardholder is essentially borrowing money from the credit card issuer to pay for the transaction.

> The credit card issuer sets a credit limit, which is the maximum amount of money that the cardholder can borrow. The cardholder can use the credit card to make purchases up to this limit. The credit card issuer charges interest on the outstanding balance if the cardholder does not pay the full balance by the due date.

> Credit cards work by using a system of payments and approvals between the merchant, the bank that issued the credit card, and the credit card network, such as Visa, Mastercard, or American Express. When a cardholder uses a credit card to make a purchase, the merchant sends a request for payment authorization to the credit card network. The credit card network then sends the request to the issuing bank to verify that the cardholder has sufficient credit available to make the purchase.

> Once the transaction is approved, the cardholder is responsible for paying the outstanding balance to the credit card issuer. The credit card issuer charges interest on the outstanding balance, which can be a significant amount over time if the balance is not paid off quickly.

> Credit cards are widely used for making purchases, both in person and online, and offer benefits such as reward points, cashback, and travel rewards. However, they also come with risks, including high-interest rates, fees, and the potential for overspending and accumulating debt.

<img text='illustration' src='https://media.istockphoto.com/id/672422834/vector/thief-steal-credit-card.jpg?s=612x612&w=0&k=20&c=kOyoYFgH1MDvUgHRc9j7tk3Y5nUrUw2i1SRCzaV-KF0=' width=350px>

<a href="https://github.com/gaurav4601/capstoneproject3/tree/master" ><img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="50" height="50"></a>

> GitHub Profile ```(https://github.com/gaurav4601/capstoneproject3/tree/)```

> Project Type : Classification - Supervised Machine Learning
> Contributor Individual 
> Name : Gaurav Dattatraya Paithane 



In [1]:
# importing all required libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px


#additional as required

import plotly.graph_objs as go 
import plotly.io as pio

# set the default template for all Plotly Express charts to 'plotly_white'
pio.templates.default = 'plotly_white'


>**Data Loading from CSV**

In [6]:
# loading csv file directly from the github-raw

data = pd.read_csv('https://raw.githubusercontent.com/gaurav4601/capstoneproject3/master/default%20of%20credit%20card%20clients.xls%20-%20Data.csv', on_bad_lines='skip')

#copy data to df

df = data.copy()

### 🏛️  **First Lookup Over Data**

In [8]:
# Dataset First Look
df.head()

Unnamed: 0.1,Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,...,X15,X16,X17,X18,X19,X20,X21,X22,X23,Y
0,ID,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,...,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,default payment next month
1,1,20000,2,2,1,24,2,2,-1,-1,...,0,0,0,0,689,0,0,0,0,1
2,2,120000,2,2,2,26,-1,2,0,0,...,3272,3455,3261,0,1000,1000,1000,0,2000,1
3,3,90000,2,2,2,34,0,0,0,0,...,14331,14948,15549,1518,1500,1000,1000,1000,5000,0
4,4,50000,2,2,1,37,0,0,0,0,...,28314,28959,29547,2000,2019,1200,1100,1069,1000,0


In [9]:
#last rows
df.tail()

Unnamed: 0.1,Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,...,X15,X16,X17,X18,X19,X20,X21,X22,X23,Y
29996,29996,220000,1,3,1,39,0,0,0,0,...,88004,31237,15980,8500,20000,5003,3047,5000,1000,0
29997,29997,150000,1,3,2,43,-1,-1,-1,-1,...,8979,5190,0,1837,3526,8998,129,0,0,0
29998,29998,30000,1,2,2,37,4,3,2,-1,...,20878,20582,19357,0,0,22000,4200,2000,3100,1
29999,29999,80000,1,3,1,41,1,-1,0,0,...,52774,11855,48944,85900,3409,1178,1926,52964,1804,1
30000,30000,50000,1,2,1,46,0,0,0,0,...,36535,32428,15313,2078,1800,1430,1000,1000,1000,1


In [10]:
#shape of data
df.shape

(30001, 25)

In [11]:
# sample

df.sample(5)

Unnamed: 0.1,Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,...,X15,X16,X17,X18,X19,X20,X21,X22,X23,Y
27743,27743,30000,2,2,2,23,2,2,2,2,...,41931,38638,35343,0,1800,9851,0,1183,5132,0
11841,11841,150000,1,1,2,28,0,-1,-1,-1,...,1736,82188,67578,20732,7672,1736,82178,5000,2600,0
11287,11287,200000,2,3,1,42,0,0,0,0,...,1800,819,886,1345,1108,669,0,886,2415,0
12955,12955,150000,2,1,2,29,-1,-1,-1,0,...,204,16048,0,390,649,145,16048,0,0,0
1216,1216,230000,1,1,2,32,0,0,0,0,...,106677,88091,60929,7326,10000,10000,3000,2000,2000,0


In [12]:
# duplicate values

df.duplicated().sum()

0

In [13]:
# check for null or missing values
print(df.isnull().sum())
df.isna().sum()

Unnamed: 0    0
X1            0
X2            0
X3            0
X4            0
X5            0
X6            0
X7            0
X8            0
X9            0
X10           0
X11           0
X12           0
X13           0
X14           0
X15           0
X16           0
X17           0
X18           0
X19           0
X20           0
X21           0
X22           0
X23           0
Y             0
dtype: int64


Unnamed: 0    0
X1            0
X2            0
X3            0
X4            0
X5            0
X6            0
X7            0
X8            0
X9            0
X10           0
X11           0
X12           0
X13           0
X14           0
X15           0
X16           0
X17           0
X18           0
X19           0
X20           0
X21           0
X22           0
X23           0
Y             0
dtype: int64

In [14]:
# info about the data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30001 entries, 0 to 30000
Data columns (total 25 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  30001 non-null  object
 1   X1          30001 non-null  object
 2   X2          30001 non-null  object
 3   X3          30001 non-null  object
 4   X4          30001 non-null  object
 5   X5          30001 non-null  object
 6   X6          30001 non-null  object
 7   X7          30001 non-null  object
 8   X8          30001 non-null  object
 9   X9          30001 non-null  object
 10  X10         30001 non-null  object
 11  X11         30001 non-null  object
 12  X12         30001 non-null  object
 13  X13         30001 non-null  object
 14  X14         30001 non-null  object
 15  X15         30001 non-null  object
 16  X16         30001 non-null  object
 17  X17         30001 non-null  object
 18  X18         30001 non-null  object
 19  X19         30001 non-null  object
 20  X20   

In [15]:
# 5 number summary of dataset

df.describe()

Unnamed: 0.1,Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,...,X15,X16,X17,X18,X19,X20,X21,X22,X23,Y
count,30001,30001,30001,30001,30001,30001,30001,30001,30001,30001,...,30001,30001,30001,30001,30001,30001,30001,30001,30001,30001
unique,30001,82,3,8,5,57,12,12,12,12,...,21549,21011,20605,7944,7900,7519,6938,6898,6940,3
top,ID,50000,2,2,2,29,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
freq,1,3365,18112,14030,15964,1605,14737,15730,15764,16455,...,3195,3506,4020,5249,5396,5968,6408,6703,7173,23364


### **Notes from First Lookup Over Data**
<hr>

- The dataset is from the credit card industry and aims to analyze the default of customers to gain insights into their behavior.
- There are 30,001 rows and 25 columns in the dataset.
- No missing or duplicate values were found in the dataset.
-  The objective of this project is to build a classification model that can predict the possibility of a customer defaulting on their credit card.
- The dataset includes features such as credit amount, gender, education, marital status, and history of past payments, which will be used to build the classification model.


<hr>