Skip to content
Code that supplements my article: Reliving Avengers: Infinity War with spaCy and Natural Language Processing
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
plots
README.md
clean-data.py
cleaned-script-subject.txt
cleaned-script.txt
notebook.ipynb
raw-script.txt
script.py

README.md

Reliving Avengers: Infinity War with spaCy and Natural Language Processing

Overview

This repo contains the scripts used in my latest experiment titled Reliving Avengers: Infinity War with spaCy and Natural Language Processing, available at this link Reliving Avengers: Infinity War with spaCy and Natural Language Processing.

Using spaCy, an NLP Python open source library designed to help us process and understand volumes of text, I analyzed the script of the movie to investigate the following concepts:

  • Overall top 10 verbs, nouns, adverbs and adjectives from the film.
  • Top verbs and nouns spoke by a particular character
  • Top 30 named entities from the film
  • The similarity between the lines spoken by each character pair, e.g., the similarity between Thor's and Thanos' lines.

Tools used

  • Python
  • spaCy

Repo content

Besides the scripts, the repo contains the full movie script (raw_script.txt), the script without comments, scenes descriptions, and the subjects (cleaned-script.txt), and the cleaned script but with the subjects (cleaned-script-subject.txt). Moreover, the plots directory contains all the plots that show the top nouns, adverbs, adjetives, verbs and entities per character.

Thanks to Manuel Romero (https://github.com/mrm8488) for writing the Jupyter notebook.

You can’t perform that action at this time.