# Loan predictions

We are going to review the project from the original learning lab, Automation of the loan eligibility process. It consists of original instructions and provided solutions. Run the code in your Jupyter and remember what we are doing in each step.

### Problem Statement

We want to automate the loan eligibility process based on customer details that are provided as online application forms are being filled. You can find the dataset [here](https://drive.google.com/file/d/1h_jl9xqqqHflI5PsuiQd_soNYxzFfjKw/view?usp=sharing). These details concern the customer's Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and other things as well. 

|Variable| Description|
|: ------------- |:-------------|
|Loan_ID| Unique Loan ID|
|Gender| Male/ Female|
|Married| Applicant married (Y/N)|
|Dependents| Number of dependents|
|Education| Applicant Education (Graduate/ Under Graduate)|
|Self_Employed| Self employed (Y/N)|
|ApplicantIncome| Applicant income|
|CoapplicantIncome| Coapplicant income|
|LoanAmount| Loan amount in thousands|
|Loan_Amount_Term| Term of loan in months|
|Credit_History| credit history meets guidelines|
|Property_Area| Urban/ Semi Urban/ Rural|
|Loan_Status| Loan approved (Y/N)

## 1. Making Predictions Using Pipeline
Transform the original data prep, feat. engineering and modeling steps into Pipeline.

The goal here is to create Pipeline that will take original rows from our dataset and predict the probability of being granted a loan for each row. No prior transformations should be applied to data before using Pipeline. The pipeline should have following steps:

- Preprocessing pipeline
    - **DataframeFunctionTransformer** - createing total_income
    - **fill NA transformer** - we should use ColumnTransformer to fill NANs to different columns using different methods. 
    - **DataframeFunctionTransformer** - transforming back to DataFrame
    - **log transformer** - we should use ColumnTransformer apply log transformation to LoanAmount and total_income (we will create total_income using DataframeFunctionTransformer)
- OneHotEncoder
- MinMaxScaler
- RandomForest

Use **GridSearch** to tune the parameters of RandomForest

In [6]:
import pandas as pd

In [42]:
# load the data
df = pd.read_csv("data.csv") 
df['Loan_Status'] = df['Loan_Status'].replace({'Y':1, 'N':0})

# create X,y
y = df.pop('Loan_Status')
X = df.drop('Loan_ID',axis=1)

In [2]:
# import additional libraries
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.pipeline import Pipeline,FeatureUnion

In [5]:
# own data-frame transformer class we will use in pipeline
class DataframeFunctionTransformer:
    def __init__(self, func):
        self.func = func

    def transform(self, input_df, **transform_params):
        return self.func(input_df)

    def fit(self, X, y=None, **fit_params):
        return self

def create_total_income_feature(input_df):
    input_df['total_income'] = input_df['ApplicantIncome'] + input_df['CoapplicantIncome']
    return input_df

def to_dataframe(array):
    columns= ['Gender','Dependents','Married','Self_Employed','LoanAmount',
               'Loan_Amount_Term','Credit_History','Education','ApplicantIncome',
               'CoapplicantIncome','Property_Area','total_income']
    
    return pd.DataFrame(array, columns = columns)

#### Continue with your Pipeline here

## 2. Persist your GridSearch Object using pickle

## 3. Deploy your model to cloud and test it with PostMan, BASH or Python

### For Testing

In [4]:
data = {"Gender":"Male",
        "Married":"No",
        "Dependents":"0",
        "Education":"Graduate",
        "Self_Employed":"No",
        "ApplicantIncome":5849,
        "CoapplicantIncome":0.0,
        "LoanAmount": None,
        "Loan_Amount_Term":360.0,
        "Credit_History":1.0,
        "Property_Area":"Urban",
        "total_income":5849.0}

In [54]:
import requests
URL = "http:<your AWS instance>:5000/<your endpoint>"

# sending get request and saving the response as response object 
r = requests.post(url = URL, json = data) 

In [55]:
# predictions
print(r.json())

[[0.075, 0.925]]
