# Home Loan Prediction
The last assignment explored datasets related to home loan applications in San Diego county. Now we will train a machine learning model to predict whether to accept or reject a loan application.

**Your goal in this assignment is to explore different kinds of explanations of machine learning models.**


## Part 1: Building a Model

Upload the .zip file ('data.zip') included in the homework assignment. I **strongly** recommend using the following code rather than the Colab web interface for uploading files, particularly for those with slower internet connections. 

In [203]:
from google.colab import files
uploaded = files.upload()

Saving data-1.zip to data-1 (2).zip


In [262]:
import zipfile
import io
zf = zipfile.ZipFile(io.BytesIO(uploaded['data-1.zip']),"r")
zf.extractall()

Recall that in the first assignment, you chose one of three datasets for the task of building a machine learning model to predict which loan applications your small credit union should approve. Uncomment the line corresponding to the dataset that you chose.




In [263]:
import pandas as pd # import pandas library
#df = pd.read_csv('data/home_loans_1.csv', low_memory=False) # read the csv file into a pandas dataframe object
#df = pd.read_csv('data/home_loans_2.csv', low_memory=False) # read the csv file into a pandas dataframe object
#df = pd.read_csv('data/home_loans_3.csv', low_memory=False) # read the csv file into a pandas dataframe object

First, we want to split the data into two separate dataframes. One dataframe will hold the data that we want to use to predict whether we should approve the loan application. The other dataframe should hold the data about the actual approval decisions that were previously made by humans. Next, we transform text data into numerical data so that we can apply our machine learning algorithm. 

In [265]:
input_columns = df.columns.drop(labels=['denial_reason', 'loan_approved'])
X = df[input_columns]
y = df['loan_approved']
text_columns = X.dtypes[X.dtypes == 'object'].index
X = pd.get_dummies(X, columns=text_columns)

In [266]:
sum(y==1) / (sum(y==0) + sum(y==1))

0.5986283474853037

We will use [scikit learn](https://scikit-learn.org/stable/index.html) to build our machine learning model.

First, we will split the data into a training set and a test set.

In [267]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

### Question 1.A: How many rows are in the training set, and how many are in the test set?  
#### (Review the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) if necessary.)
_Double click to write your answer question here. Show your work in code below if applicable._

Now we train a logistic regression model on the training data and calculate the accuracy on the training set.

In [268]:
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier().fit(X_train, y_train)

In [269]:
predictions = model.predict(X_train)
accuracy = sum(predictions==y_train)/len(y_train)
'The accuracy on the training set is about {}%'.format(round(accuracy*100, 1))

'The accuracy on the training set is about 70.6%'

### Question 1.B: What is the accuracy of our model on the test set?
_Double click to write your answer question here. Show your work in code below if applicable._

### Question 1.C: Is our model more accurate for some towns than others?
_Double click to write your answer question here. Show your work in code below if applicable._

## Part 2: Understanding Individual Predictions

### Question 2.A: Suppose this model were used to automatically approve or deny loan applications. What are 2-3 questions that someone might have about the model if it denied their loan application?

_Double click to write your answer question here. Show your work in code below if applicable._

Let's look at one of the loan applications from the test set that our model would deny.

In [276]:
application = X_test[model.predict(X_test)==0].iloc[[2]]
application

Unnamed: 0,loan_amount_000s,applicant_income_000s,occupied_by_owner,town_name_Coronado,town_name_Del Mar,town_name_La Jolla,town_name_Poway,town_name_Rancho Sante Fe,town_name_Solana Beach,loan_purpose_name_Home improvement,loan_purpose_name_Home purchase,loan_purpose_name_Refinancing,co_applicant_sex_Female,"co_applicant_sex_Information not provided by applicant in mail, Internet, or telephone application",co_applicant_sex_Male,co_applicant_sex_No co-applicant,co_applicant_sex_Not applicable,co_applicant_race_American Indian or Alaska Native,co_applicant_race_Asian,co_applicant_race_Black or African American,"co_applicant_race_Information not provided by applicant in mail, Internet, or telephone application",co_applicant_race_Native Hawaiian or Other Pacific Islander,co_applicant_race_No co-applicant,co_applicant_race_Not applicable,co_applicant_race_White,applicant_sex_Female,"applicant_sex_Information not provided by applicant in mail, Internet, or telephone application",applicant_sex_Male,applicant_race_American Indian or Alaska Native,applicant_race_Asian,applicant_race_Black or African American,"applicant_race_Information not provided by applicant in mail, Internet, or telephone application",applicant_race_Multiracial,applicant_race_Native Hawaiian or Other Pacific Islander,applicant_race_White,applicant_ethnicity_Hispanic or Latino,"applicant_ethnicity_Information not provided by applicant in mail, Internet, or telephone application",applicant_ethnicity_Not Hispanic or Latino,applicant_ethnicity_Not applicable
914,1323.858165,309.712996,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0


We can look at this application's nearest neighbors from the training set.

In [275]:
neighbor_indices = model.kneighbors(application)[1][0]
neighbors = X_train.iloc[neighbor_indices].copy()
neighbors['loan_approved']=y_train.iloc[neighbor_indices]
neighbors

Unnamed: 0,loan_amount_000s,applicant_income_000s,occupied_by_owner,town_name_Coronado,town_name_Del Mar,town_name_La Jolla,town_name_Poway,town_name_Rancho Sante Fe,town_name_Solana Beach,loan_purpose_name_Home improvement,loan_purpose_name_Home purchase,loan_purpose_name_Refinancing,co_applicant_sex_Female,"co_applicant_sex_Information not provided by applicant in mail, Internet, or telephone application",co_applicant_sex_Male,co_applicant_sex_No co-applicant,co_applicant_sex_Not applicable,co_applicant_race_American Indian or Alaska Native,co_applicant_race_Asian,co_applicant_race_Black or African American,"co_applicant_race_Information not provided by applicant in mail, Internet, or telephone application",co_applicant_race_Native Hawaiian or Other Pacific Islander,co_applicant_race_No co-applicant,co_applicant_race_Not applicable,co_applicant_race_White,applicant_sex_Female,"applicant_sex_Information not provided by applicant in mail, Internet, or telephone application",applicant_sex_Male,applicant_race_American Indian or Alaska Native,applicant_race_Asian,applicant_race_Black or African American,"applicant_race_Information not provided by applicant in mail, Internet, or telephone application",applicant_race_Multiracial,applicant_race_Native Hawaiian or Other Pacific Islander,applicant_race_White,applicant_ethnicity_Hispanic or Latino,"applicant_ethnicity_Information not provided by applicant in mail, Internet, or telephone application",applicant_ethnicity_Not Hispanic or Latino,applicant_ethnicity_Not applicable,loan_approved
1699,1360.405374,563.309975,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,1
1903,1445.657939,618.630469,1,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,1
1039,1207.051959,576.361347,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,1,0,1
387,1276.864613,491.50401,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0
1233,1437.17574,533.56627,1,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,1


### Question 2.B: What would happen if this applicant's income doubled (but everything else stayed the same)? Would the model approve this new application?

_Double click to write your answer question here. Show your work in code below if applicable._

### Question 2.C: Imagine that you are designing a tool that shows applicants the model's output for their application and displays some additional information explaining the model's output. Sketch three different versions of what this tool might look like. These sketches should be rough--hand-drawn sketches are preferred. 

_Attach a pdf with your sketches. Please include any annotations/description on the pdf itself (not in this notebook)._