## DATA EXPLORATORY ANALYSIS FOR CREDIT CARD

### Business Problem:

In order to effectively produce quality decisions in the modern credit card industry, knowledge 
must be gained through effective data analysis and modeling. Through the use of dynamic data-driven decision-making tools and procedures, information can be gathered to successfully evaluate 
all aspects of credit card operations. PSPD Bank has banking operations in more than 50 countries 
across the globe. Mr. Jim Watson, CEO, wants to evaluate areas of bankruptcy, fraud, and 
collections, respond to customer requests for help with proactive offers and service.

Following are some of questions to understand the customer spend and repayment behaviour.

### Import necessary libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

### Import the data sets

In [14]:
customer = pd.read_csv('Customer Acqusition.csv')
spend = pd.read_csv('spend.csv')
repayment = pd.read_csv('Repayment.csv')

In [5]:
customer.head()

Unnamed: 0,No,Customer,Age,City,Product,Limit,Company,Segment
0,1,A1,76,BANGALORE,Gold,500000.0,C1,Self Employed
1,2,A2,71,CALCUTTA,Silver,100000.0,C2,Salaried_MNC
2,3,A3,34,COCHIN,Platimum,10000.0,C3,Salaried_Pvt
3,4,A4,47,BOMBAY,Platimum,10001.0,C4,Govt
4,5,A5,56,BANGALORE,Platimum,10002.0,C5,Normal Salary


In [6]:
spend.head()

Unnamed: 0,Sl No:,Customer,Month,Type,Amount
0,1,A1,12-Jan-04,JEWELLERY,485470.8
1,2,A1,3-Jan-04,PETRO,410556.13
2,3,A1,15-Jan-04,CLOTHES,23740.46
3,4,A1,25-Jan-04,FOOD,484342.47
4,5,A1,17-Jan-05,CAMERA,369694.07


In [7]:
repayment.head()

Unnamed: 0,SL No:,Customer,Month,Amount,Unnamed: 4
0,,A1,12-Jan-04,495414.75,
1,2.0,A1,3-Jan-04,245899.02,
2,3.0,A1,15-Jan-04,259490.06,
3,4.0,A1,25-Jan-04,437555.12,
4,5.0,A1,17-Jan-05,165972.88,


### Exploratory Data Analysis

In [8]:
print(customer.shape)
print(spend.shape)
print(repayment.shape)

(100, 8)
(1500, 5)
(1523, 5)


In [15]:
# Drop the column "No" from customer data

customer.drop('No', axis = 1, inplace = True)
customer.head()

Unnamed: 0,Customer,Age,City,Product,Limit,Company,Segment
0,A1,76,BANGALORE,Gold,500000.0,C1,Self Employed
1,A2,71,CALCUTTA,Silver,100000.0,C2,Salaried_MNC
2,A3,34,COCHIN,Platimum,10000.0,C3,Salaried_Pvt
3,A4,47,BOMBAY,Platimum,10001.0,C4,Govt
4,A5,56,BANGALORE,Platimum,10002.0,C5,Normal Salary


In [16]:
# Drop the column "Sl No:" from spend data

spend.drop('Sl No:', axis = 1, inplace = True)
spend['Month'] = pd.to_datetime(spend['Month'], format = '%d-%b-%y')
spend.head()

Unnamed: 0,Customer,Month,Type,Amount
0,A1,2004-01-12,JEWELLERY,485470.8
1,A1,2004-01-03,PETRO,410556.13
2,A1,2004-01-15,CLOTHES,23740.46
3,A1,2004-01-25,FOOD,484342.47
4,A1,2005-01-17,CAMERA,369694.07


In [17]:
# Drop the column "SL No:" and "Unnamed: 4" from repayment data

repayment.drop(['SL No:', 'Unnamed: 4'], axis = 1, inplace = True)
repayment['Month'] = pd.to_datetime(repayment['Month'], format = '%d-%b-%y')
repayment.head()

Unnamed: 0,Customer,Month,Amount
0,A1,2004-01-12,495414.75
1,A1,2004-01-03,245899.02
2,A1,2004-01-15,259490.06
3,A1,2004-01-25,437555.12
4,A1,2005-01-17,165972.88


In [18]:
customer.isnull().sum()

Customer    0
Age         0
City        0
Product     0
Limit       0
Company     0
Segment     0
dtype: int64

In [19]:
spend.isnull().sum()

Customer    0
Month       0
Type        0
Amount      0
dtype: int64

In [20]:
repayment.isnull().sum()

Customer    23
Month       23
Amount      23
dtype: int64

### 1. In the above dataset,

###  (a) In case age is less than 18, replace it with mean of age values.