Text Summarization on Movie Dialogue


Sterling Pilkington

Dataset provided by: C. Danescu-Niculescu-Mizil and L. Lee, “Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs,” Jun. 2011 [Online]. Available: https://arxiv.org/abs/1106.3077

https://www.kaggle.com/datasets/Cornell-University/movie-dialog-corpus

In [None]:
# import packages
import numpy as np
import pandas as pd
import nltk
import re
import warnings
from gensim.summarization import summarize
nltk.download('punkt') # one time execution
pd.set_option('display.max_colwidth',1000)
warnings.filterwarnings('ignore')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Read in lines from the tsv file (tab-separated values)

In [None]:
lines = pd.read_csv('movie_lines.tsv', sep = '\t', on_bad_lines='skip')
lines.head()

Unnamed: 0,lineID,characterID,movieID,character_name,text
0,L1045,u0,m0,BIANCA,They do not!
1,L1044,u2,m0,CAMERON,They do to!
2,L985,u0,m0,BIANCA,I hope so.
3,L984,u2,m0,CAMERON,She okay?
4,L925,u0,m0,BIANCA,Let's go.


In [None]:
# pick lines from only 1 movie (movie 0 --> 10 things i hate about you)
lines_m0 = lines.loc[lines['movieID']=='m0']
lines_m0.head()

Unnamed: 0,lineID,characterID,movieID,character_name,text
0,L1045,u0,m0,BIANCA,They do not!
1,L1044,u2,m0,CAMERON,They do to!
2,L985,u0,m0,BIANCA,I hope so.
3,L984,u2,m0,CAMERON,She okay?
4,L925,u0,m0,BIANCA,Let's go.


In [None]:
# looping through the lines and putting them in an array
lines_m0_text = []
for j in lines_m0.text:
  lines_m0_text.append(j)

In [None]:
# function for putting the sentences in a DataFrame
def show_sentences(text):
  return pd.DataFrame({'Sentence': sent_tokenize(text)})

# function for making lines into one long string, where input is the array of lines (such as lines_m0_text)
def concat_lines(m):
  lines_string = ""
  x = len(m)
  for k in range(x):
    lines_string+=str(m[k])
    # adding a space so they aren't jumbled together
    lines_string+=str(" ")
  return lines_string

In [None]:
# using the function on lines_m0_text
lines_string = concat_lines(lines_m0_text)
lines_string

"They do not! They do to! I hope so. She okay? Let's go. Wow Okay -- you're gonna need to learn how to lie. No Like my fear of wearing pastels? What good stuff? I figured you'd get to the good stuff eventually. Thank God!  If I had to hear one more story about your coiffure... Me.  This endless ...blonde babble. I'm like boring myself. What crap? do you listen to this crap? No... You always been this selfish? But Then that's all you had to say. Well no... You never wanted to go out with 'me did you? I was? Tons Have fun tonight? I believe we share an art instructor You know Chastity? Looks like things worked out tonight huh? Hi. Who knows?  All I've ever heard her say is that she'd dip before dating a guy that smokes. So that's the kind of guy she likes? Pretty ones? Lesbian?  No. I found a picture of Jared Leto in one of her drawers so I'm pretty sure she's not harboring same-sex tendencies. She's not a... I'm workin' on it. But she doesn't seem to be goin' for him. I really really re

In [None]:
# show a summary of the movie script
# ratio is the proportion of the original text to return in the summary
# trying it with a 100-word limit
summary = summarize(lines_string, ratio = 0.1, word_count = 100)
summary

"She used to be really popular when she started high school then it was just like she got sick of it or something.\nAll I know is -- I'd give up my private line to go out with a guy like Joey.\nBianca I don't think the highlights of dating Joey Dorsey are going to include door-opening and coat-holding.\nLike I'm supposed to know what that even means.\nOh I thought you might have a date  I don't know why I'm bothering to ask but are you going to Bogey Lowenstein's party Saturday night?\nI know everyone likes her and all but ..."

Let's try it on another movie

In [None]:
# trying on another movie (indiana jones and the temple of doom	1984)
lines_m99 = lines.loc[lines['movieID']=='m99']
lines_m99.head()

Unnamed: 0,lineID,characterID,movieID,character_name,text
48267,L300572,u1461,m99,CAPT. BLUMBURTT,As I said before we'd be happy to escort you to Delhi.
48268,L300571,u1462,m99,CHATTAR LAL,I'm sure that will please the Maharajah Captain.
48269,L300570,u1461,m99,CAPT. BLUMBURTT,Well Mr. Prime Minister my report will duly note that we found nothing unusual here in Pankot.
48270,L300343,u1461,m99,CAPT. BLUMBURTT,Of course. The Thuggees were an obscenity that worshipped Kali with human sacrifices. The British Army wiped them out about the time of the Mutiny of 1857.
48271,L300342,u1462,m99,CHATTAR LAL,Dr. Jones you know very well that the Thuggee cult has been dead for nearly a century.


In [None]:
# looping through the lines and putting them in an array
lines_m99_text = []
for j in lines_m99.text:
  lines_m99_text.append(j)

In [None]:
lines_string = concat_lines(lines_m99_text)
lines_string

"As I said before we'd be happy to escort you to Delhi. I'm sure that will please the Maharajah Captain. Well Mr. Prime Minister my report will duly note that we found nothing unusual here in Pankot. Of course. The Thuggees were an obscenity that worshipped Kali with human sacrifices. The British Army wiped them out about the time of the Mutiny of 1857. Dr. Jones you know very well that the Thuggee cult has been dead for nearly a century. The British worry so about their Empire -- it makes us feel like well- cared-for children. Just a routine inspection tour. Did you discover anything in that tunnel Dr. Jones? Then she must have run out of the room and you found her. When she saw the insects she passed out cold. I carried her back to her room. She was sleeping when I re- entered the tunnel to look around. Miss Scott panicked? I've spent by life crawling around in caves and tunnels -- I shouldn't have let somebody like Willie go in there with me. Even if they were trying to scare us awa

In [None]:
summary = summarize(lines_string, ratio = 0.1, word_count = 100)
summary

"I've spent by life crawling around in caves and tunnels -- I shouldn't have let somebody like Willie go in there with me.\nAnd they've got the sacred stones that Indy was searching for.\nThey've got Short Round and I think Indy's been -- Jones isn't in his room.\nwe came here from a small village and the peasants there told us that the Pankot Palace was growing powerful again -- because of some ancient evil.\nThe village knew their rock was magic -- but they didn't know it was one of the lost Sankara Stones..."

Let's try a third movie

In [None]:
# movie 245 --> Antz 1998
lines_m245 = lines.loc[lines['movieID']=='m245']

lines_m245_text = []
for j in lines_m245.text:
  lines_m245_text.append(j)

lines_string = concat_lines(lines_m245_text)
summary = summarize(lines_string, ratio = 0.1, word_count = 100)
summary

"It looks like there's <u>two</u> million ants in here.\nYou know you're not just workers -- you can be whatever you want to be!\nI don't know want you to end up like the guy who used to work next to me.\nSo...you never did tell me...what made you come out to the worker bar that night?\nYou know they do great prosthetic antennas nowadays -- You watch yours soldier or my worker friend will beat you up!\nWe know what makes an ant colony strong don't we?\nWe know that no ant can be an individual."

One more movie to prove that it's working ok

In [None]:
# movie 253 --> Back to the Future 1985
lines_m253 = lines.loc[lines['movieID']=='m253']

lines_m253_text = []
for j in lines_m253.text:
  lines_m253_text.append(j)

lines_string = concat_lines(lines_m253_text)
summary = summarize(lines_string, ratio = 0.1, word_count = 100)
summary

"Uh yeah well believe me I sure feel like I know you!\nThanks for everything Pro. We're at one and a half miles so you're just a little over a mile from where you want to be  Wait until minus 3 minutes before you go -- that should give you plenty of time and it should be close enough to zero hour that they can't do anything to stop you.\nIf we could get you the time machine and the power converter in the vicinity of an atomic blast we could send you back to the future."

It struggled a bit with this movie, it's talking about the driving the delorian (time machine) right into the powerlines right when the lightning strikes

Class Demo

In [None]:
# movie 117 --> Legally Blonde
lines_m117 = lines.loc[lines['movieID']=='m117']

lines_m117_text = []
for j in lines_m117.text:
  lines_m117_text.append(j)

lines_string = concat_lines(lines_m117_text)
summary = summarize(lines_string, ratio = 0.1, word_count = 150)
summary

"I'm not having an affair with Enrique -- you know a Delta Gamma would never sleep with a man who wears a thong!\nI got out of the shower walked downstairs saw her standing over my father and called the police.\nAnd if you in fact heard the gunshot then Brooke Windham wouldn't have had time to hide the gun before you got downstairs.\nThe people at law school don't Warner doesn't -- I don't even think my parents take me seriously.\nWell I don't want to look like I know what's coming...\nNot in class -- I waited until I got to my room but yeah she can pretty much shrivel your balls -- or you know your whatevers.\nElle believe me I never expected to be doing this but I think it's the right thing to do."