# Predicting Credit Card Approvals with Logistic Regression

In this project we are going to build a Machine Learning model using the Logistic Regression algorithm, to predict whether a request for a credit card gets rejected or approved. There are various factors determining the result of a credict card request, namely high loan balances, low income levels, or too many inquiries on an individual's credit report. We are going to use all these features to build an automatic credit card approval predictor using machine learning.


![image](https://images.unsplash.com/photo-1589758438368-0ad531db3366?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2532&q=80)

## Project Outline
- First, we will start off by loading and viewing the dataset.
- We will see that the dataset has a mixture of both numerical and non-numerical features, that it contains values from different ranges, plus that it contains a number of missing entries.
- We will have to preprocess the dataset to ensure the machine learning model we choose can make good predictions.
- After our data is in good shape, we will do some exploratory data analysis to build our intuitions.
- Finally, we will build a machine learning model that can predict if an individual's application for a credit card will be accepted.


## Project Tasks
1. [Credit card applications](#1.-Credit-card-applications)
2. [Inspecting the applications](#2._Inspecting_the_applications)
3. Splitting the dataset into train and test sets
4. Handling the missing values (part i)
5. Handling the missing values (part ii)
6. Handling the missing values (part iii)
7. Preprocessing the data (part i)
8. Preprocessing the data (part ii)
9. Fitting a logistic regression model to the train set
10. Making predictions and evaluating performance
11. Grid searching and making the model perform better
12. Finding the best performing model


### 1. Credit card applications
First we load our dataset into ```cc_apps``` using  ```pandas```. The loaded dataset includes the following: Gender, Age, Debt, Married, BankCustomer, EducationLevel, Ethnicity, YearsEmployed, PriorDefault, Employed, CreditScore, DriversLicense, Citizen, ZipCode, Income and finally the ApprovalStatus.

In [3]:
import pandas as pd
cc_apps = pd.read_csv('Dataset/cc_approvals.data')
cc_apps

Unnamed: 0,b,30.83,0,u,g,w,v,1.25,t,t.1,01,f,g.1,00202,0.1,+
0,a,58.67,4.460,u,g,q,h,3.04,t,t,6,f,g,00043,560,+
1,a,24.50,0.500,u,g,q,h,1.50,t,f,0,f,g,00280,824,+
2,b,27.83,1.540,u,g,w,v,3.75,t,t,5,t,g,00100,3,+
3,b,20.17,5.625,u,g,w,v,1.71,t,f,0,f,s,00120,0,+
4,b,32.08,4.000,u,g,m,v,2.50,t,f,0,t,g,00360,0,+
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
684,b,21.08,10.085,y,p,e,h,1.25,f,f,0,f,g,00260,0,-
685,a,22.67,0.750,u,g,c,v,2.00,f,t,2,t,g,00200,394,-
686,a,25.25,13.500,y,p,ff,ff,2.00,f,t,1,t,g,00200,1,-
687,b,17.92,0.205,u,g,aa,v,0.04,f,f,0,f,g,00280,750,-


### 2. Inspecting the applications
Now, we inspect the structure, numerical summary, and specific rows of the dataset by extracting the summary statistics of the data using the ```describe()``` method of ```cc_apps```. Then, we use the ```info()``` method of ```cc_apps``` to get more information about the DataFrame.

<a id='2._Inspecting_the_applications'></a>

### 3. Splitting the dataset into train and test sets

Taking a good look at the data, we understand that features such as ```DriverLisence``` or ```ZipCode``` are not effective in credir approval and we can set them aside using the ```drop()``` method.. Next, it is time to split our data into train set and test set. 