Business Analytics Capstone Project

Project Overview:

Based on records of previous 311 complaints, can we predict how long an incident will take to resolve?

Context:

In choosing a business problem to solve, my team and I decided to look into the problem of reducing abnormally long 311 call resolution time. Specifically, we attempted to build supervised regression models that can predict 311 call resolution times.

In theory, the predictions would reflect normal resolution times and if a call, in actuality, exceeds the prediction by a lot, we can conclude that the resolution of the issue in the call should be prioritized since it is outside of normal processing time.

Link to Reports:

Note: Some files created during the project are not in this repository due to file misplacement (e.g. the tree-based models I created). However, model performance results are included in the "Model Findings and Results report".

Project Breakdown/Summary

Stage 1: Data Preparation and Understanding Data

Steps performed at this stage:

Finding Data and Joining Datasets (2020 311 Cases in San Francisco, Registered Business Locations, City Facilities, and San Francisco Socio-Economic Profiles)
Feature Selection (removing unneeded columns to reduce dimensionality)
Extract Resolution Time
Conduct Basic Data Cleaning (removing duplicates and NA values)
Feature Engineering and Data Exploration (Done by me using Python in Jupyter Notebook and Excel)

Stage 2: Model Selection

Having learnt various machine learning methods/algorithms, we narrowed down the types of models we will build using the following criteria:

Uses supervised learning (since we are using historical data)
Able to perform regression (since we are predicting a quantitative, continuous dependent variable - resolution time)

Due to time constraint, we only built the following types of models: Multiple Linear Regression, RandomForest, and XGBoost.

Stage 3: Model Findings and Results

After building various machine learning models, we compared and evaluated the performance of all three types of models. We decided to use the performance of the Multiple Linear Regression model as the baseline since it is the most basic model.

The models were evaluated with the following the metrics: Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-Squared. Of all models, the XGBoost model I built had the best performance. However, even our best models only explained 26% of the variation in resolution time.

We also concluded that our models were not accurate enough for production, with too large of residuals.

Takeaway:

Personal Reflection: We should also have accounted the data for a larger range of years for potentially better prediction performance in our models, in addition to researching other features that could have been better predictors.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
images		images
9430 Data Preparation.pptx		9430 Data Preparation.pptx
BUS 9430 Model Selection.pptx		BUS 9430 Model Selection.pptx
CSV Files Created.zip		CSV Files Created.zip
Data_Cleaning_for_Modeling.ipynb		Data_Cleaning_for_Modeling.ipynb
Extracting Resolution Time.ipynb		Extracting Resolution Time.ipynb
Feature_Extraction_and_Data_Exploration.ipynb		Feature_Extraction_and_Data_Exploration.ipynb
Model Findings & Results.pptx		Model Findings & Results.pptx
Original Project Proposal.docx		Original Project Proposal.docx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Business Analytics Capstone Project

Project Breakdown/Summary

Stage 1: Data Preparation and Understanding Data

Stage 2: Model Selection

Stage 3: Model Findings and Results

Takeaway:

About

Releases

Packages

Languages

kimberlytanyh/Predicting_311_Call_ResolutionTime

Folders and files

Latest commit

History

Repository files navigation

Business Analytics Capstone Project

Project Breakdown/Summary

Stage 1: Data Preparation and Understanding Data

Stage 2: Model Selection

Stage 3: Model Findings and Results

Takeaway:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages