#Loan Prediction Application Using Streamlit

##Dataset
###Source: 
https://github.com/mridulrb/Predict-loan-eligibility-using-IBM-Watson-Studio/blob/master/Dataset/train_ctrUa4K.csv
###Attribute Information:

*   Loan_ID	- Unique Loan ID
*   Gender -	Male/ Female
*   Married	- Applicant married (Y/N)
*   Dependents -	Number of dependents
*   Education -	Applicant Education (Graduate/ Not Graduate)
*   Self_Employed -	Self employed (Y/N)
*   ApplicantIncome -	Applicant income
*   CoapplicantIncome -	Coapplicant income
*   LoanAmount -	Loan amount in thousands
*   Loan_Amount_Term -	Term of loan in months
*   Credit_History - credit history meets guidelines
*   Property_Area -	Urban/Semi Urban/Rural
*   Loan_Status -	Loan approved (Y/N)



## Data Pre-processing

Loading and visualization of dataset. 
 

In [1]:
from google.colab import drive
drive.mount('/content/drive')
folder_path='/content/drive/MyDrive/train_ctrUa4K.csv'

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.activity.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fexperimentsandconfigs%20https%3a%2f%2fwww.googleapis.com%2fauth%2fphotos.native&response_type=code

Enter your authorization code:
4/1AX4XfWg8Dw25LuDT-ClbHS9tTXaMIxecfQNHFa3HvK9-sbhYLhVEoNn9NM0
Mounted at /content/drive


In [2]:
import pandas as pd
train = pd.read_csv(folder_path) 
train.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


As we know machine learning models take only numbers as inputs and can not process strings. So, we have to deal with the categories present in the dataset and convert them into numbers.

In [3]:
train['Gender']= train['Gender'].map({'Male':0, 'Female':1})
train['Married']= train['Married'].map({'No':0, 'Yes':1})
train['Loan_Status']= train['Loan_Status'].map({'N':0, 'Y':1})

Next, we check if there are any missing values present in the dataset.

In [4]:
train.isnull().sum()

Loan_ID               0
Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64

So, there are missing values in the dataset on many variables including the Gender, Married, LoanAmount variable. Next, we will remove all the rows which contain any missing values in them.

In [5]:
train = train.dropna()
train.isnull().sum()

Loan_ID              0
Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
Loan_Status          0
dtype: int64

Now there are no missing values in the dataset. Next, we will separate the dependent (Loan_Status) and the independent variables.


In [6]:
# Here we have only picked 5 variables that seemed most relevant to us. These are the Gender, Marital Status, ApplicantIncome, LoanAmount, and 
# Credit_History
# and stored them in variable X. Target variable is stored in another variable y.

X = train[['Gender', 'Married', 'ApplicantIncome', 'LoanAmount', 'Credit_History']]
y = train.Loan_Status
X.shape, y.shape

((480, 5), (480,))

##Model Building

Here, we will first split our dataset into a training and validation set, so that we can train the model on the training set and evaluate its performance on the validation set.

In [7]:
from sklearn.model_selection import train_test_split
x_train, x_cv, y_train, y_cv = train_test_split(X,y, test_size = 0.2, random_state = 10)

We have split the data using the train_test_split function from the sklearn library keeping the test_size as 0.2 which means 20 percent of the total dataset will be kept aside for the validation set. Next, we will train the random forest model using the training set.

In [8]:
from sklearn.ensemble import RandomForestClassifier 
model = RandomForestClassifier(max_depth=4, random_state = 10) 
model.fit(x_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=4, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=10, verbose=0,
                       warm_start=False)

Here, we have kept the max_depth as 4 for each of the trees of our random forest and stored the trained model in a variable named model. Now, our model is trained, let’s check its performance on both the training and validation set.

In [9]:
from sklearn.metrics import accuracy_score
pred_cv = model.predict(x_cv)
accuracy_score(y_cv,pred_cv)

0.8020833333333334

The model is 80% accurate on the validation set. Let’s check the performance on the training set too.

In [10]:
pred_train = model.predict(x_train)
accuracy_score(y_train,pred_train)

0.8203125

Performance on the training set is almost similar to that on the validation set. So, the model has generalized well. Finally, we will save this trained model so that it can be used in the future to make predictions on new observations.

In [11]:
# saving the model 
import pickle 
pickle_out = open("classifier.pkl", mode = "wb") 
pickle.dump(model, pickle_out) 
pickle_out.close()

We are saving the model in `pickle` format and storing it as `classifier.pkl`. This will store the trained model and we will use this while deploying the model.


## Model Deployment
Necessary installations for deploying the model

`pyngrok` is a python wrapper for ngrok which helps to open secure tunnels from public URLs to localhost. This will help us to host our web app.

`streamlit` lets you turn data scripts into sharable web apps in minutes, not weeks. It's all Python, open-source, and free! And once you've created an app you can use our free sharing platform to deploy, manage, and share your app with the world.

`streamlit_ace` Ace component of streamlit editor.

In [12]:
#Installation
!pip install -q pyngrok
!pip install -q streamlit
!pip install -q streamlit_ace

[?25l[K     |▍                               | 10 kB 24.1 MB/s eta 0:00:01[K     |▉                               | 20 kB 27.9 MB/s eta 0:00:01[K     |█▎                              | 30 kB 13.0 MB/s eta 0:00:01[K     |█▊                              | 40 kB 9.8 MB/s eta 0:00:01[K     |██▏                             | 51 kB 5.0 MB/s eta 0:00:01[K     |██▋                             | 61 kB 5.4 MB/s eta 0:00:01[K     |███                             | 71 kB 5.8 MB/s eta 0:00:01[K     |███▌                            | 81 kB 6.5 MB/s eta 0:00:01[K     |████                            | 92 kB 6.8 MB/s eta 0:00:01[K     |████▍                           | 102 kB 5.2 MB/s eta 0:00:01[K     |████▉                           | 112 kB 5.2 MB/s eta 0:00:01[K     |█████▎                          | 122 kB 5.2 MB/s eta 0:00:01[K     |█████▊                          | 133 kB 5.2 MB/s eta 0:00:01[K     |██████▏                         | 143 kB 5.2 MB/s eta 0:00:01[K  

This is the snippet which will create the application for us.

In [13]:
# In this part, we are saving the script as app.py, and then we are loading the required libraries which are pickle to 
# load the trained model and streamlit to build the app. 
# Then we are loading the trained model and saving it in a variable named classifier.

%%writefile app.py
 
import pickle
import streamlit as st
 
# loading the trained model
pickle_in = open('classifier.pkl', 'rb') 
classifier = pickle.load(pickle_in)
 
@st.cache()

# Here, we have defined the prediction function. This function will take the data provided 
# by users as input and make the prediction using the model that we have loaded earlier. 
# It will take the customer details like the gender, marital status, income, loan amount, and credit history as input, 
# and then pre-process that input so that it can be feed to the model and finally, make the prediction using the model loaded as a classifier. 
# In the end, it will return whether the loan is approved or not based on the output of the model.
  
# defining the function which will make the prediction using the data which the user inputs 
def prediction(Gender, Married, ApplicantIncome, LoanAmount, Credit_History):   
    # Pre-processing user input    
    if Gender == "Male":
        Gender = 0
    else:
        Gender = 1
    if Married == "Unmarried":
        Married = 0
    else:
        Married = 1
    if Credit_History == "Unclear Debts":
        Credit_History = 0
    else:
        Credit_History = 1  
    LoanAmount = LoanAmount / 1000
    # Making predictions 
    prediction = classifier.predict( 
        [[Gender, Married, ApplicantIncome, LoanAmount, Credit_History]])
    if prediction == 0:
        pred = 'Rejected'
    else:
        pred = 'Approved'
    return pred

#

  
# this is the main function in which we define our webpage  
def main():       
    # front end elements of the web page 
    html_temp = """ 
    <div style ="background-color:yellow;padding:13px"> 
    <h1 style ="color:black;text-align:center;">Loan Prediction Application Using Streamlit</h1> 
    </div> 
    """
    # display the front end aspect
    st.markdown(html_temp, unsafe_allow_html = True) 
    # following lines create boxes in which user can enter data required to make prediction 
    Gender = st.selectbox('Gender',("Male","Female"))
    Married = st.selectbox('Marital Status',("Unmarried","Married")) 
    ApplicantIncome = st.number_input("Applicants monthly income") 
    LoanAmount = st.number_input("Total loan amount")
    Credit_History = st.selectbox('Credit_History',("Unclear Debts","No Unclear Debts"))
    result =""
    # when 'Predict' is clicked, make the prediction and store it 
    if st.button("Predict"): 
        result = prediction(Gender, Married, ApplicantIncome, LoanAmount, Credit_History) 
        st.success('Your loan is {}'.format(result))
        print(LoanAmount)
     
if __name__=='__main__': 
    main()

Writing app.py


Alright, let’s now host this app to a public URL using `pyngrok` library.

In [14]:
!streamlit run app.py &>/dev/null&

Here, we are first running the python script. And then we will connect it to a public URL:

In [15]:
from pyngrok import ngrok
 
public_url = ngrok.connect('8501')
public_url



<NgrokTunnel: "http://5e26-35-231-48-174.ngrok.io" -> "http://localhost:8501">