Skip to content
ML model to predict grocery reorders (AWS EMR, Spark, MongoDB)
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
Data Processing to Random Forest Final Script.ipynb
Initial Data Processing.ipynb
SparkSQL Data Manipulation v2.ipynb
SparkSQL Data Manipulation.ipynb

MSDS 697 Final Project - Grocery Reorder Prediction

Team Members: Evan Calkins, Brian Dorsey, Sankeerti Haniyur, Chris Olley, & Connor Swanson

Project Goals

  • Work with a large dataset (>= 3GB)
  • Preprocess data, train, and evaluate a machine learning model on AWS EMR using Apache Spark
  • Analyze the impact of EMR cluster size on performance


  • Generate target labels and user-specific features around grocery purchasing
  • Hosted our data in a MongoDB cluster
  • RandomForest model achieved an F1 score of 0.857


You can’t perform that action at this time.