BoxOfficePrediction

The project aims to predict the box office and score of an upcoming movie

Group Members:

Quan Yuan, Yuheng Liu, Wenlong Wu

Instruction:

data

This directory stores all the data set used and generated in the project.

"boxoffice_dataset.csv": the raw dataset
"prediction.csv": the final result predicted by the model
"trainset.csv": training dataset
"testset.csv": testing dataset

data_processing

This directory stores the codes used for data cleaning and integration.
"boxoffice_dataset.csv" in the 'data' folder has already been cleaned and integrated.

visualization

This directory stores the codes used for data analysis. They presents the relationship between some features and the movie box office.

WebScraping

This directory stores the codes used for scraping raw dataset from multiple websites. We also built a proxy program to avoid Anti-Spider.

How to run:

feature_engineering.py: This file contains all functions used for processing each column in the raw dataset.

feature_selection.py: This file generates a new dataset which is fully prepared and divides the dataset into training set and testing set. By commenting some rows, the program can generate different combinations of features.
Run this file and the generated dataset will be saved under the "data" folder.

modeling.ipynb: This notebook builds the final model and has all the material used for model tuning. Running this notebook will generate a .pkl file under the root folder.

model_testing.py: This file runs the testing using the generated model and prints the r2_socre on the console. The result is saved under the "data" folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BoxOfficePrediction

Group Members:

Instruction:

data

data_processing

visualization

WebScraping

How to run:

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
WebScraping		WebScraping
data		data
data_processing		data_processing
visualization		visualization
.gitignore		.gitignore
README.md		README.md
feature_engineering.py		feature_engineering.py
feature_selection.py		feature_selection.py
model.pkl		model.pkl
model_testing.py		model_testing.py
modeling.ipynb		modeling.ipynb
modeling_pytorch.ipynb		modeling_pytorch.ipynb
report.pdf		report.pdf

libou/BoxOfficePrediction

Folders and files

Latest commit

History

Repository files navigation

BoxOfficePrediction

Group Members:

Instruction:

data

data_processing

visualization

WebScraping

How to run:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages