Skip to content

kreetigulati/mortgage_megalodons

 
 

Repository files navigation

Mortgage Megalodons

  • Team Members: Jordan Shapiro, Laura Jimenez, Kreeti Gulati

Check Your Mortgage Prediction

  • Objective: Using one million rows of Home Mortgage Disclosure Act data to build a machine learning algorithm to correctly predict future mortgage determinations

Can historic mortgage application determinations to classify the outcome of future applications?

This project seeks to address that question by using real-world records from the Home Mortgage Disclosure Act. Pulling data from 2018, 2019, and 2020, the cleaned dataset totaled approximately 890,000 rows.

Dataset

Documentation

Summary

Each mortgage is a record of a joint application, no single-person home buyers included. Applicant and co-applicant information captured includes: age, race, sex, income, debt ratio, ethnicity, and credit. Home information captured includes: census tract, state, county, property value, construction method. Loan information includes: loan amount and purpose.

Using the TensorFlow and sklearn libraries to build the model, we were able to predict the classification of new mortgage applications with 82% accuracy. The model uses five Dense layers, including three hidden layers, and is activated using the tanh function. The optimizer, stochastic gradient descent, gave the best outcome. To prevent overfitting the model uses early stopping and learning rate decay to find the global minimum value.

Limitations: To clean the data sufficiently for the model, choices had to be made about outliers, joint applications, years, and quantity of data. Ultimately we chose to do join applications, limit the income, and only choose single-family columns. Overall loss for the training and testing values is high, .39. Time was the most bounding constraint, with additional time the model could be further tweaking to reduce the loss.

PowerPoint

Machine Learning Model

Training the Model

Website Deployment with Heroku

Creating Flask API Website

With More Time

  • Decrease loss with additional hyper parameter tuning
  • Allow anyone to input their information into the form, with fewer features
  • More data, different data
  • Greater awareness of server and cloud cpu and storage

About

Using one million rows of Home Mortgage Disclosure Act data to build a machine learning algorithm to correctly predict future mortgage determinations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 70.7%
  • Python 22.3%
  • CSS 7.0%