# Group Project

### Banking_loan_Modelling
Marketing Campaign for Banking Products
#### Done By: Jerry Maleka

© ExploreAI 2024 (Worlplace Project)

---

## Table of Contents

<a href=#BC> Background Context</a>

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Data Collection and Description</a>

<a href=#three>3. Loading Data </a>

<a href=#four>4. Data Cleaning and Filtering</a>

<a href=#five>5. Exploratory Data Analysis (EDA)</a>

<a href=#nine>6. Conclusion and Future Work</a>

<a href=#ten>7. References</a>

---
 <a id="BC"></a>
## **Background Context**
<a href=#cont>Back to Table of Contents</a>

The bank faces a challenge in converting its liability-based customers (depositors) into personal loan customers. To address this, the bank aims to develop a model that can effectively identify potential loan customers, thereby increasing the success rate of loan campaigns and reducing costs.

**Project Details**
**Problem**: Identifying potential personal loan customers from the bank's existing liability-based customer base.

**Data**: A dataset containing information on 5000 customers.

**Goal**: Develop a predictive model that can accurately classify customers as potential loan customers or not. This model will help the bank target its loan campaigns more effectively, leading to increased loan business and higher profitability.

**Potential Approach**

**1. Data Exploration**: Analyze the dataset to understand the characteristics of the customers, including demographics, financial behavior, and other relevant factors.

**2. Feature Engineering**: Create new features or transform existing features to improve the model's predictive power.

**3. Model Selection**: Choose appropriate machine learning algorithms for classification, such as logistic regression, decision trees, random forests, or gradient boosting.  

**4. Model Training**: Train the selected models on the training data.

**5. Model Evaluation**: Evaluate the performance of the models using metrics like accuracy, precision, recall, and F1-score.

**6. Model Deployment**: Deploy the best-performing model into production to assist the bank in identifying potential loan customers.


#**By following this approach, the bank can develop a valuable tool to enhance its loan business and achieve its financial goals.**
---

---
<a href=#one></a>
## **Importing Packages**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Set up the Python environment with necessary libraries and tools.
* **Details:** List and import all the Python packages that will be used throughout the project such as Pandas for data manipulation, Matplotlib/Seaborn for visualization, scikit-learn for modeling, etc.
---

In [15]:
# Importing all Important Packages
import pickle                                       #For saving and loading Python objects.
#import joblib                                       #For saving and loading large NumPy arrays and Python objects efficiently.
import seaborn                                      #For saving and loading large NumPy arrays and Python objects efficiently.
from sklearn import metrics                         #For calculating evaluation metrics like mean squared error, R-squared, etc.
import statsmodels.api as sm                        #For statistical modeling, including linear regression
from sklearn.pipeline import make_pipeline          #For creating pipelines that chain multiple data preprocessing and modeling steps.
from sklearn.tree import DecisionTreeRegressor      #For decision tree models.
from sklearn.preprocessing import StandardScaler    #For data preprocessing tasks like scaling and normalization.
from sklearn.model_selection import train_test_split #For tasks like splitting data into training and testing sets.
from sklearn.linear_model  import LinearRegression, Ridge, Lasso  #For linear regression models, including Ridge and Lasso regression.
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import pandas as pd                                 # for data manipulation and analysis
import numpy as np                                  # for numerical operations
##import sweetviz as sw
import matplotlib.pyplot as plt                     # for data visualization
import seaborn as sns                               # for enhanced data visualization 
import warnings                                     #For controlling warnings that might be generated during code execution.
warnings.filterwarnings('ignore')                   # for excluding warnings
from scipy.stats import ttest_ind

---
<a href=#two></a>
## **Data Collection ,Description & Columns**
<a href=#cont>Back to Table of Contents</a>

* ** Data Descriptio:**

The file Bank.xls contains information on 5000 customers. The data contain customer demographic information (age, income, etc.), customer relationship with the bank (mortgage, securities account, etc.), and customer reaction to the most recent personal loan campaign (Personal Loan).

Among these 5000 consumers, only 480 (or 9.6%) accepted the personal loan provided to them in the previous campaign.

Data: https://www.kaggle.com/itsmesunil/bank-loan-modelling/download

Attribute Information:
● ID: Customer ID

● Age: Customer's age in completed years

● Experience: #years of professional experience

● Income: Annual income of the customer ($000)

● ZIP Code: Home Address ZIP code.

● Family: Family size of the customer

● CCAvg: Avg. spending on credit cards per month ($000)

● Education: Education Level. 1: Undergrad; 2: Graduate; 3: Advanced/Professional

● Mortgage: Value of house mortgage if any. ($000)

● Personal Loan: Did this customer accept the personal loan offered in the last campaign?

● Securities Account: Does the customer have a securities account with the bank?

● CD Account: Does the customer have a certificate of deposit (CD) account with the bank?

● Online: Does the customer use internet banking facilities?

● Credit card: Does the customer use a credit card issued by the bank?

---

---
<a href=#three></a>
## **Loading Data**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Load the data into the notebook for manipulation and analysis.
* **Details:** Show the code used to load the data and display the first few rows to give a sense of what the raw data looks like.
---

In [11]:
#Loading the Banking Loan Dataset using pandas
df = pd.read_excel("Bank_Personal_Loan_Modelling.xlsx", sheet_name='Data')
df.head()

Unnamed: 0,ID,Age,Experience,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Personal Loan,Securities Account,CD Account,Online,CreditCard
0,1,25,1,49,91107,4,1.6,1,0,0,1,0,0,0
1,2,45,19,34,90089,3,1.5,1,0,0,1,0,0,0
2,3,39,15,11,94720,1,1.0,1,0,0,0,0,0,0
3,4,35,9,100,94112,1,2.7,2,0,0,0,0,0,0
4,5,35,8,45,91330,4,1.0,2,0,0,0,0,0,1


---
<a href=#four></a>
## **Data Cleaning and Filtering**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Prepare the data for analysis by cleaning and filtering.
* **Details:** Include steps for handling missing values, removing outliers, correcting errors, and possibly reducing the data (filtering based on certain criteria or features).
---

In [20]:
!pip install sweetviz
import sweetviz as sw

Defaulting to user installation because normal site-packages is not writeable
Could not fetch URL https://pypi.org/simple/sweetviz/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/sweetviz/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)'))) - skipping
Could not fetch URL https://pypi.org/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)'))) - skipping


ERROR: Could not find a version that satisfies the requirement sweetviz (from versions: none)
ERROR: No matching distribution found for sweetviz


In [21]:
# Find count of nulls
null_counts = df.isnull().sum()

# Determine % of nulls in columns
null_percentage = (df.isnull().sum() / len(df)) * 100
null_percentage = null_percentage[null_percentage > 0]

# Add to dataframe
null_df = pd.DataFrame({
    'null_count': null_counts,
    'null_column_percentage': null_percentage
})

# Filter df for nulls only
null_df = null_df[null_df["null_count"]> 0]
null_df

Unnamed: 0,null_count,null_column_percentage


In [13]:
# To find the dtypes in the DataFrame of each columns
df.dtypes

ID                      int64
Age                     int64
Experience              int64
Income                  int64
ZIP Code                int64
Family                  int64
CCAvg                 float64
Education               int64
Mortgage                int64
Personal Loan           int64
Securities Account      int64
CD Account              int64
Online                  int64
CreditCard              int64
dtype: object

In [22]:
# To view some basic statistical details.
df.describe()

Unnamed: 0,ID,Age,Experience,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Personal Loan,Securities Account,CD Account,Online,CreditCard
count,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0
mean,2500.5,45.3384,20.1046,73.7742,93152.503,2.3964,1.937913,1.881,56.4988,0.096,0.1044,0.0604,0.5968,0.294
std,1443.520003,11.463166,11.467954,46.033729,2121.852197,1.147663,1.747666,0.839869,101.713802,0.294621,0.305809,0.23825,0.490589,0.455637
min,1.0,23.0,-3.0,8.0,9307.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1250.75,35.0,10.0,39.0,91911.0,1.0,0.7,1.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,2500.5,45.0,20.0,64.0,93437.0,2.0,1.5,2.0,0.0,0.0,0.0,0.0,1.0,0.0
75%,3750.25,55.0,30.0,98.0,94608.0,3.0,2.5,3.0,101.0,0.0,0.0,0.0,1.0,1.0
max,5000.0,67.0,43.0,224.0,96651.0,4.0,10.0,3.0,635.0,1.0,1.0,1.0,1.0,1.0


In [23]:
# Transpose of df.describe()
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
ID,5000.0,2500.5,1443.520003,1.0,1250.75,2500.5,3750.25,5000.0
Age,5000.0,45.3384,11.463166,23.0,35.0,45.0,55.0,67.0
Experience,5000.0,20.1046,11.467954,-3.0,10.0,20.0,30.0,43.0
Income,5000.0,73.7742,46.033729,8.0,39.0,64.0,98.0,224.0
ZIP Code,5000.0,93152.503,2121.852197,9307.0,91911.0,93437.0,94608.0,96651.0
Family,5000.0,2.3964,1.147663,1.0,1.0,2.0,3.0,4.0
CCAvg,5000.0,1.937913,1.747666,0.0,0.7,1.5,2.5,10.0
Education,5000.0,1.881,0.839869,1.0,1.0,2.0,3.0,3.0
Mortgage,5000.0,56.4988,101.713802,0.0,0.0,0.0,101.0,635.0
Personal Loan,5000.0,0.096,0.294621,0.0,0.0,0.0,0.0,1.0


In [24]:
# To check the counts of negative values in experience column
df[df['Experience'] < 0]['Experience'].count()

52

In [25]:
#To check the ammount of negative values
df[df['Experience'] < 0]['Experience'].value_counts()

Experience
-1    33
-2    15
-3     4
Name: count, dtype: int64

In [26]:
# Dropping the ID and Experience column
df.drop(['ID','Experience'],axis=1,inplace=True)

In [27]:
#To display top 5 rows
df.head()

Unnamed: 0,Age,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Personal Loan,Securities Account,CD Account,Online,CreditCard
0,25,49,91107,4,1.6,1,0,0,1,0,0,0
1,45,34,90089,3,1.5,1,0,0,1,0,0,0
2,39,11,94720,1,1.0,1,0,0,0,0,0,0
3,35,100,94112,1,2.7,2,0,0,0,0,0,0
4,35,45,91330,4,1.0,2,0,0,0,0,0,1


---
<a href=#five></a>
## **Exploratory Data Analysis (EDA)**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Explore and visualize the data to uncover patterns, trends, and relationships.
* **Details:** Use statistics and visualizations to explore the data. This may include histograms, box plots, scatter plots, and correlation matrices. Discuss any significant findings.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#nine></a>
## **Conclusion and Future Work**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Summarize the findings and discuss future directions.
* **Details:** Conclude with a summary of the results, insights gained, limitations of the study, and suggestions for future projects or improvements in methodology or data collection.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#ten></a>
## **References**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Provide citations and sources of external content.
* **Details:** List all the references and sources consulted during the project, including data sources, research papers, and documentation for tools and libraries used.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

## Additional Sections to Consider

* ### Appendix: 
For any additional code, detailed tables, or extended data visualizations that are supplementary to the main content.

* ### Collaborators: 
  - Jerry Maleka
  
