Skip to content

IMDb-Scraping is for retrieving user-generated movie text reviews as well as relevant movie characteristics from imdb.com.

Notifications You must be signed in to change notification settings

liaaaxu/IMDB-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 

Repository files navigation

IMDB Scraping

This web scraper is tailored for imdb.com. The main goal is to collect user-generated online text reviews as well as relevant movie features for future research purposes. Here I list all the variables that will be gathered.

Note: IMDB does offer public datasets: https://www.imdb.com/interfaces/

IMDB Text Reviews Scraping.py

  • imdbID : the IMDB ID of the movie title
  • totalNumReviews : total number of reviews of the movie title
  • userID : the IMDB ID of the user who posted the review
  • spoilerWarning : equals to 1 if the review is marked with "Warning: Spoilers"
  • reviewTitles : the title of the review
  • usefulNum : the number of users who found the review helpful
  • usefulTotal : the number of users who voted
  • reviewDates : the date when the review was posted
  • userReviews : the text content of the review
  • userRates : the rating given along with the review, a numerical value between 0 and 10

The final results in dataframe format should look like the following:

Screen Shot 2020-09-05 at 9 01 51 PM

IMDB Film Features Scraping.py

imdbID, runtimeMin, mpaa, genre, releaseDates, ratings, numVotes, plots, directors, writers, stars, metascore, numReviews, numCritics, country, language, budget, openingWeekend, color

About

IMDb-Scraping is for retrieving user-generated movie text reviews as well as relevant movie characteristics from imdb.com.

Topics

Resources

Stars

Watchers

Forks

Languages