Skip to content

mc1231/STA9760_Project2_Yelp_Data_Analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Analyzing 10Gb of Yelp Reviews Data

We will analyze a subset of Yelp's business, reviews and user data. This dataset comes to us from Kaggle although we have taken steps to pull this data into a publis s3 bucket: s3://sta9760-yelpdataset/yelp-light/*business.json

Note that the output of the code written is provided as a means to give you structure as you write your analysis. For Parts I, II & III, you must fill in the blanks (however way you want) to get the output provided in the file. (Mainly columns and aggregations, I don't care about the exact rows). For Parts III and IV, you are more flexibility to take the analysis further however you see fit.

Cluster and Notebook Configs

notebook cluster

About

AWS EMR backed Spark cluster for analyzing Yelp Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%