A project for Udacity Project 1, using Kaggle data on highest grossing movies.
Jupyter Notebook If you do not have the required libraries, they will need to be installed via pip3 on your terminal/command line:
- numpy
- pandas
- sklearn
- matplotlib
- seaborn
The project was created for project 1 of the Udacity Data Scientist Nanodegree programme to give me an opportunity to practice and communicate a data science problem.
BlockbusterAnalysis.ipynb: the jupyter notebook file containing all the code I have done and narrative to analyse the data blockbusters.csv: data file from Kaggle
The project can be viewed as a stand alone demonstration of a simple Data Science project. It can also be used as a reference if you are trying to analyse this data yourself.
The data licensing is as stated on Kaggle: The raw data was taken from a crowdflower dataset. The irrelevant columns like Poster URLs and Date of Release were dropped. The ratings from the original dataset (Rotten Tomatoes freshness and audience scores) were all sheared down to only the IMDb ratings of the movies. If you need the original dataset and want to see how the original data looked like, follow the above link.
The sole author of the work is Shruti Turner.