# Home Loan Prediction
The last assignment explored datasets related to home loan applications in San Diego county. Now we will train a machine learning model to predict whether to accept or reject a loan application.

**Your goal in this assignment is to explore different kinds of explanations of machine learning models.**


## Part 1: Building a Model

Upload the .zip file ('data.zip') included in the homework assignment. I **strongly** recommend using the following code rather than the Colab web interface for uploading files, particularly for those with slower internet connections. 

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
import zipfile
import io
zf = zipfile.ZipFile(io.BytesIO(uploaded['data.zip']),"r")
zf.extractall()

We will use the home_loans_1.csv dataset from the last assignment.




In [None]:
import pandas as pd # import pandas library
df = pd.read_csv('data/home_loans_1.csv', low_memory=False) # read the csv file into a pandas dataframe object

First, we want to split the data into two separate dataframes. One dataframe will hold the data that we want to use to predict whether we should approve the loan application. The other dataframe should hold the data about the actual approval decisions that were previously made by humans. Next, we transform text data into numerical data so that we can apply our machine learning algorithm. 

In [None]:
input_columns = df.columns.drop(labels=['denial_reason', 'loan_approved'])
X = df[input_columns]
y = df['loan_approved']
text_columns = X.dtypes[X.dtypes == 'object'].index
X = pd.get_dummies(X, columns=text_columns)

We will use [scikit learn](https://scikit-learn.org/stable/index.html) to build our machine learning model.

First, we will split the data into a training set and a test set.

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

### Question 1.A: How many rows are in the training set, and how many are in the test set?  
#### (Review the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) if necessary.)
_Double click to write your answer question here. Show your work in code below if applicable._

Now we train a nearest neighbors classifier on the training data and calculate the accuracy on the training set.

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())
model = pipeline.fit(X_train, y_train) 

In [None]:
predictions = model.predict(X_train)
accuracy = sum(predictions==y_train)/len(y_train)
'The accuracy on the training set is about {}%'.format(round(accuracy*100, 1))


### Question 1.B: What is the accuracy of our model on the test set?
_Double click to write your answer question here. Show your work in code below if applicable._

## Part 2: Understanding Individual Predictions

### Question 2.A: Suppose this model were used to automatically approve or deny loan applications. What are 3 questions that someone might have about the model if it denied their loan application?

_Double click to write your answer question here. Show your work in code below if applicable._


1.   
2.   
3.   



Let's look at one of the loan applications from the test set that our model would deny.

In [None]:
application = X_test[model.predict(X_test)==0].iloc[[0]]
application

We can look at this application's nearest neighbors from the training set.

In [None]:
neighbor_indices = model['kneighborsclassifier'].kneighbors(application)[1][0]
neighbors = X_train.iloc[neighbor_indices].copy()
neighbors['loan_approved']=y_train.iloc[neighbor_indices]
neighbors

### Question 2.B: What would happen if this applicant's income doubled (but everything else stayed the same)? Would the model approve this new application?

_Double click to write your answer question here. Show your work in code below if applicable._

### Question 2.C: Imagine that you are designing a tool that shows applicants the model's output for their application and displays some additional information explaining the model's output. Sketch three different versions of what this tool might look like. These sketches should be rough--hand-drawn sketches are preferred. 

_Attach a pdf with your sketches. Please include any annotations/description on the pdf itself (not in this notebook)._