Switch branches/tags
Nothing to show
Find file History
Latest commit 689cba0 Dec 23, 2016

README.markdown

Cornell Movie Dialogs Corpus

This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts:

  • 220,579 conversational exchanges between 10,292 pairs of movie characters
  • involves 9,035 characters from 617 movies
  • in total 304,713 utterances
  • movie metadata included:
    • genres
    • release year
    • IMDB rating
    • number of IMDB votes
    • IMDB rating
  • character metadata included:
    • gender (for 3,774 characters)
    • position on movie credits (3,321 characters)

Processed Data

The processed data can be downloaded by running pull_data.sh

./pull_data.sh