Skip to content
ML model to predict grocery reorders (AWS EMR, Spark, MongoDB)
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
DF_to_ML.ipynb
Data Processing to Random Forest Final Script.ipynb
Initial Data Processing.ipynb
README.md
SparkSQL Data Manipulation v2.ipynb
SparkSQL Data Manipulation.ipynb
SparkSQL.ipynb
randomforest.ipynb
shardonnay697.csv

README.md

MSDS 697 Final Project - Grocery Reorder Prediction

Team Members: Evan Calkins, Brian Dorsey, Sankeerti Haniyur, Chris Olley, & Connor Swanson

Project Goals

  • Work with a large dataset (>= 3GB)
  • Preprocess data, train, and evaluate a machine learning model on AWS EMR using Apache Spark
  • Analyze the impact of EMR cluster size on performance

Modeling

  • Generate target labels and user-specific features around grocery purchasing
  • Hosted our data in a MongoDB cluster
  • RandomForest model achieved an F1 score of 0.857

Links

You can’t perform that action at this time.