Construct and evaluate different kinds of models to predict a movie’s revenue and profitability as well as evaluate models to find the best predictive model
This project was performed as part of a group for an intro to data science class at the George Washington University. The requirement for this project was to find a sizeable dataset and perform Pre-processing, EDA, Statistial tests (chi-square,anova) and predictive modeling.
The dataset consists of movies released on or before July 2017. Data points include cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts and vote averages.
This dataset also has files containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a scale of 1-5 and have been obtained from the official GroupLens website.
The .csv file shows the raw data used for the project. The .Rmd file shows all the code and write-up. This file can be downloaded and knitted to get the final output. The .html file is the knitted R-Markdown file. The rendered file can be found at https://fajim1.github.io/Movies-Analysis/ The .ppt file contains my groups presentation slides for this project.