Author: Omar El Malki (omar.elmalki@epfl.ch)

### Coreference resolution on ROCStories

In [1]:
import pandas as pd
import spacy
import neuralcoref
from tqdm import tqdm

tqdm.pandas()

pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.options.mode.chained_assignment = None

In [3]:
# Read ROCStories into pandas DataFrame
roc_stories_2017_path_csv = "../../data/rocstories/ROCStories_winter2017.csv"
roc_stories_2016_path_csv = "../../data/rocstories/ROCStories_spring2016.csv"
roc_stories_2017_df = pd.read_csv(roc_stories_2017_path_csv, sep=',', header=0)
roc_stories_2016_df = pd.read_csv(roc_stories_2016_path_csv, sep=',', header=0)

roc_stories_df = pd.concat([roc_stories_2016_df, roc_stories_2017_df])

In [4]:
# # Read ROCStories into pandas DataFrame
# roc_stories_path_csv = "../data/rocstories-2017/ROCStories_winter2017.csv"
# roc_stories_df = pd.read_csv(roc_stories_path_csv, sep=',', header=0)

nlp = spacy.load('en')
neuralcoref.add_to_pipe(nlp)

<spacy.lang.en.English at 0x7ff4d94e6ef0>

In [5]:
# Example story
s1 = "David noticed he had put on a lot of weight recently."
s2 = "He examined his habits to try and figure out the reason."
s3 = "He realized he'd been eating too much fast food lately."
s4 = "He stopped going to burger places and started a vegetarian diet."
s5 = "After a few weeks, he started to feel much better."

In [6]:
def resolve_story(*args: str):
    """
    Return coreference clusters and resolved list of sentences from a list oof input sentences

    :param args: input sentences
    :return: coreference clusters
    :return: resolved sentences
    """
    story = ""
    n = len(args)

    for i in range(n):
        story += args[i] 
        if i != n-1:
            story += "\t"

    coref_res = nlp(story)
    result = [x for x in coref_res._.coref_resolved.split("\t")]
    return coref_res, result

In [7]:
resolve_story(s1, s2, s3, s4, s5)

(David noticed he had put on a lot of weight recently.	He examined his habits to try and figure out the reason.	He realized he'd been eating too much fast food lately.	He stopped going to burger places and started a vegetarian diet.	After a few weeks, he started to feel much better.,
 ['David noticed David had put on a lot of weight recently.',
  'David examined David habits to try and figure out the reason.',
  "David realized David'd been eating too much fast food lately.",
  'David stopped going to burger places and started a vegetarian diet.',
  'After a few weeks, David started to feel much better.'])

In [22]:
# Another example
t1 = "Joe was really excited for Christmas."
t2 = "Joe has never seen Santa Claus before."
t3 = "He decided to hide on top of the staircase to try to catch Santa."
t4 = "Joe waited as long as he could before he fell asleep."
t5 = "He woke up to many presents under the tree, and no Santa in sight!"

In [23]:
resolve_story(t1, t2, t3, t4, t5)

(Joe was really excited for Christmas.	Joe has never seen Santa Claus before.	He decided to hide on top of the staircase to try to catch Santa.	Joe waited as long as he could before he fell asleep.	He woke up to many presents under the tree, and no Santa in sight!,
 ['Joe was really excited for Christmas.',
  'Joe has never seen Santa Claus before.',
  'Joe decided to hide on top of the staircase to try to catch Santa.',
  'Joe waited as long as Joe could before Joe fell asleep.',
  'Joe woke up to many presents under the tree, and no Santa in sight!'])

In [8]:
def coref_res_to_resolved_sentence(coref_res, index):
    """
    Return resolved sentence for sentence indexed by index in story with coreference resolution result coref_res

    :param coref_res: coreference resolution result of story
    :param index: index of sentence in story
    :return: resolved sentence or "" if resolution failed
    """
    n = len(coref_res[1])
    if n < 5:
        print(f"Resolved sentence list has only {n} elements:\n{coref_res}\n")
        # ideally, to be handled manually
        return ""
    if index > 5:
        raise IndexError(f"Index {index} out of bounds for stories of 5 sentences")
    else:
        return coref_res[1][index-1]

In [58]:
coref_df = roc_stories_df.head(5000)

In [59]:
# Apply resolution to all rows
print("Apply resolution to all rows:\n")
coref_df['coref_result'] = coref_df.progress_apply(lambda row: resolve_story(row.sentence1, row.sentence2, row.sentence3, row.sentence4, row.sentence5), axis=1)

# Add coreference clusters to dataframe
print("Add coreference clusters to dataframe:\n")
coref_df['coref_clusters'] = coref_df['coref_result'].progress_apply(lambda x: x[0]._.coref_clusters)

# Add Resolved sentence to Dataframe for each sentence in the dataset
print("Add Resolved sentence to Dataframe for each sentence in the dataset:\n")
for i in range(1,6):
    coref_df[f'resolved{i}'] = coref_df['coref_result'].progress_apply(lambda x: coref_res_to_resolved_sentence(x, i))
del coref_df['coref_result']

Apply resolution to all rows:



100%|██████████| 5000/5000 [02:22<00:00, 35.01it/s]


Add coreference clusters to dataframe:



100%|██████████| 5000/5000 [00:00<00:00, 133487.29it/s]


Add Resolved sentence to Dataframe for each sentence in the dataset:



100%|██████████| 5000/5000 [00:00<00:00, 513768.59it/s]


Resolved sentence list has only 4 elements:
(A minivan parked in the middle of the driveway in our building.	The driver loaded some luggage.	He kept the minivan in the driveway.	Another car came behind the minivan and waited.	After the driver beeped, the minivan drove off., ['the minivanThe driver loaded some luggage.', 'The driver kept the minivan in the driveway in our building.', 'Another car came behind the minivan and waited.', 'After The driver beeped, the minivan drove off.'])

Resolved sentence list has only 4 elements:
(The delivery man handed a box to Gen.	She was excited since her new shoes were inside of it.	Once she tried the shoes on, she couldn't make them fit.	She spent hours trying to fix it but made no improvements.	As a result, she returned the shoes the next day., ['The delivery man handed a box to The delivery man was excited since The delivery man new shoes were inside of The delivery man.', "Once The delivery man tried her new shoes on, The delivery man couldn't 

100%|██████████| 5000/5000 [00:00<00:00, 457244.52it/s]


Resolved sentence list has only 4 elements:
(A minivan parked in the middle of the driveway in our building.	The driver loaded some luggage.	He kept the minivan in the driveway.	Another car came behind the minivan and waited.	After the driver beeped, the minivan drove off., ['the minivanThe driver loaded some luggage.', 'The driver kept the minivan in the driveway in our building.', 'Another car came behind the minivan and waited.', 'After The driver beeped, the minivan drove off.'])

Resolved sentence list has only 4 elements:
(The delivery man handed a box to Gen.	She was excited since her new shoes were inside of it.	Once she tried the shoes on, she couldn't make them fit.	She spent hours trying to fix it but made no improvements.	As a result, she returned the shoes the next day., ['The delivery man handed a box to The delivery man was excited since The delivery man new shoes were inside of The delivery man.', "Once The delivery man tried her new shoes on, The delivery man couldn't 

100%|██████████| 5000/5000 [00:00<00:00, 434561.84it/s]


Resolved sentence list has only 4 elements:
(A minivan parked in the middle of the driveway in our building.	The driver loaded some luggage.	He kept the minivan in the driveway.	Another car came behind the minivan and waited.	After the driver beeped, the minivan drove off., ['the minivanThe driver loaded some luggage.', 'The driver kept the minivan in the driveway in our building.', 'Another car came behind the minivan and waited.', 'After The driver beeped, the minivan drove off.'])

Resolved sentence list has only 4 elements:
(The delivery man handed a box to Gen.	She was excited since her new shoes were inside of it.	Once she tried the shoes on, she couldn't make them fit.	She spent hours trying to fix it but made no improvements.	As a result, she returned the shoes the next day., ['The delivery man handed a box to The delivery man was excited since The delivery man new shoes were inside of The delivery man.', "Once The delivery man tried her new shoes on, The delivery man couldn't 

100%|██████████| 5000/5000 [00:00<00:00, 491055.80it/s]


Resolved sentence list has only 4 elements:
(A minivan parked in the middle of the driveway in our building.	The driver loaded some luggage.	He kept the minivan in the driveway.	Another car came behind the minivan and waited.	After the driver beeped, the minivan drove off., ['the minivanThe driver loaded some luggage.', 'The driver kept the minivan in the driveway in our building.', 'Another car came behind the minivan and waited.', 'After The driver beeped, the minivan drove off.'])

Resolved sentence list has only 4 elements:
(The delivery man handed a box to Gen.	She was excited since her new shoes were inside of it.	Once she tried the shoes on, she couldn't make them fit.	She spent hours trying to fix it but made no improvements.	As a result, she returned the shoes the next day., ['The delivery man handed a box to The delivery man was excited since The delivery man new shoes were inside of The delivery man.', "Once The delivery man tried her new shoes on, The delivery man couldn't 

100%|██████████| 5000/5000 [00:00<00:00, 473975.50it/s]

Resolved sentence list has only 4 elements:
(A minivan parked in the middle of the driveway in our building.	The driver loaded some luggage.	He kept the minivan in the driveway.	Another car came behind the minivan and waited.	After the driver beeped, the minivan drove off., ['the minivanThe driver loaded some luggage.', 'The driver kept the minivan in the driveway in our building.', 'Another car came behind the minivan and waited.', 'After The driver beeped, the minivan drove off.'])

Resolved sentence list has only 4 elements:
(The delivery man handed a box to Gen.	She was excited since her new shoes were inside of it.	Once she tried the shoes on, she couldn't make them fit.	She spent hours trying to fix it but made no improvements.	As a result, she returned the shoes the next day., ['The delivery man handed a box to The delivery man was excited since The delivery man new shoes were inside of The delivery man.', "Once The delivery man tried her new shoes on, The delivery man couldn't 




In [83]:
# Manually add resolved sentences for the few stories with failing coreferences
story_ids = ['08f13294-29ea-412b-ad0d-e05ff9662003', '1382f6a2-459f-417d-9d7e-83e6d0e02e74']

story1_resolved = ['the minivan parked in the middle of the driveway in our building.', 'The driver loaded some luggage.', 'The driver kept the minivan in the driveway in our building.', 'Another car came behind the minivan and waited.', 'After the driver beeped, the minivan drove off.']
story2_resolved = ['The delivery man handed a box to Gen.', "Gen was excited since Gen's new shoes were inside of a box.", "Once Gen tried Gen's new shoes on, Gen couldn't make Gen's new shoes fit.", "Gen spent hours trying to fix it but made no improvements.", "As a result, Gen returned Gen's new shoes the next day."]

resolved_stories = [story1_resolved, story2_resolved]

for n in range(1, 3):
    for i in range(1, 6):
        coref_df.loc[coref_df['storyid'] == story_ids[n-1], f'resolved{i}'] = resolved_stories[n-1][i-1]

In [86]:
coref_df[coref_df['storyid'] == '1382f6a2-459f-417d-9d7e-83e6d0e02e74']

Unnamed: 0,storyid,storytitle,sentence1,sentence2,sentence3,sentence4,sentence5,coref_clusters,resolved1,resolved2,resolved3,resolved4,resolved5
4456,1382f6a2-459f-417d-9d7e-83e6d0e02e74,Shoe Situation,The delivery man handed a box to Gen.,She was excited since her new shoes were inside of it.,"Once she tried the shoes on, she couldn't make them fit.",She spent hours trying to fix it but made no improvements.,"As a result, she returned the shoes the next day.","[((The, delivery, man), (Gen., \t, She), (her), (it), (she), (she), (She), (she)), ((her, new, shoes), (the, shoes), (them), (the, shoes))]",The delivery man handed a box to Gen.,Gen was excited since Gen's new shoes were inside of a box.,"Once Gen tried Gen's new shoes on, Gen couldn't make Gen's new shoes fit.",Gen spent hours trying to fix it but made no improvements.,"As a result, Gen returned Gen's new shoes the next day."


In [87]:
coref_df[coref_df['storyid'] == '08f13294-29ea-412b-ad0d-e05ff9662003']

Unnamed: 0,storyid,storytitle,sentence1,sentence2,sentence3,sentence4,sentence5,coref_clusters,resolved1,resolved2,resolved3,resolved4,resolved5
4225,08f13294-29ea-412b-ad0d-e05ff9662003,Parking in the Driveway,A minivan parked in the middle of the driveway in our building.,The driver loaded some luggage.,He kept the minivan in the driveway.,Another car came behind the minivan and waited.,"After the driver beeped, the minivan drove off.","[((A, minivan, parked, in, the, middle, of, the, driveway, in, our, building, ., \t), (the, minivan), (the, minivan), (the, minivan)), ((the, driveway, in, our, building), (the, driveway)), ((The, driver), (He), (the, driver))]",the minivan parked in the middle of the driveway in our building.,The driver loaded some luggage.,The driver kept the minivan in the driveway in our building.,Another car came behind the minivan and waited.,"After the driver beeped, the minivan drove off."
