Predictive modelling of IMDb TV episode ratings based on episode description
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Text mining of IMDB episode descriptions to look for associations with episode rating.

The scripts are written in Python 3, and depend on numpy, pandas, scikit-learn and the IMDbPy module.

For more details on the initial analysis, see Predicting Doctor Who Episode Ratings.

Downloading Data

To download data, run the script. The script takes the name of the show as a command line argument.

$ python3 "Doctor Who"
Downloading data on Doctor Who

Predicting on New Data

To predict the rating for a new episode description, use the script. The prediction is made based on a gradient boosting machine with 100 iterations and a learning rate of 0.01.

$ python3 "Doctor Who" "The Doctor adopts a baby Dalek"
[ 8.10292221]

Benchmarking Models

The script can be used to assess the performance of different algorithms on predicting ratings for a particular show through h2o. The h2o leaderboard is printed to screen.

$ python3 "Doctor Who"


In addition to the main Python scripts, there is an R script make_plots.R for visualization. Examples of the output are shown below.

Doctor Who

Doctor Who

The West Wing

The West Wing

Game of Thrones

Game of Thrones