# Homework 6: Neural Coreference Resolution

**Due April 20, 2020 at 11:59pm**

In this homework,  you will be implementing parts of a Pytorch implementation for neural coreference resolution, inspired by [Lee et al.(2017), “End-to-end Neural Coreference Resolution” (EMNLP)](https://arxiv.org/pdf/1707.07045.pdf). 

### REMEMBER TO UPLOAD THE DATASET!
Click the Files icon > Upload > Upload train.conll and dev.conll that you have downloaded from bCourses: Files/HW_6

### Setup

In [0]:
import sys, re
from collections import Counter

import torch
from torch import nn
import torch.optim as optim

import numpy as np
from scipy.stats import spearmanr

We noticed that running this on CPU is faster than running on GPU. Thus, we will default to running on CPU. However, feel free to change it to GPU if you wish.

In [0]:
device = torch.device("cpu")
print("Running on {}".format(device))

Running on cpu


### Download and process data
Note: You do **not** have to modify this section.

In [0]:
!wget http://nlp.stanford.edu/data/glove.6B.zip
!unzip glove*.zip

--2020-04-27 03:56:31--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2020-04-27 03:56:31--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2020-04-27 03:56:31--  http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip’


2020-0

In [0]:
def read_conll(filename):

  docid=None
  partID=None

  # collection
  all_sents=[]
  all_ents=[]

  # for one doc
  all_doc_sents=[]
  all_doc_ents=[]

  # for one sentence
  sent=[]
  ents=[]

  named_ents=[]
  cur_tid=0
  open_count=0

  global_eid=0
  doc_eid_to_global_eid={}

  with open(filename, encoding="utf-8") as file:
    for line in file:
      if line.startswith("#begin document"):

        all_doc_ents=[]
        all_doc_sents=[]

        open_ents={}
        open_named_ents={}

        docid=None
        matcher=re.match("#begin document \((.*)\); part (.*)$", line.rstrip())
        if matcher != None:
          docid=matcher.group(1)
          partID=matcher.group(2)

      elif line.startswith("#end document"):

        all_sents.append(all_doc_sents)
        all_ents.append(all_doc_ents)

        
      else:

        parts=re.split("\s+", line.rstrip())

        # sentence boundary
        if len(parts) < 2:
    
          all_doc_sents.append(sent)

          ents=sorted(ents, key=lambda x: (x[0], x[1]))

          all_doc_ents.append(ents)

          sent=[]
          ents=[]

          cur_tid=0

          continue

        tid=cur_tid
        token=parts[3]
        cur_tid+=1

        identifier="%s.%s" % (docid, partID)

        coref=parts[-1].split("|")

        for c in coref:
          if c.startswith("(") and c.endswith(")"):
            c=re.sub("\(", "", c)
            c=int(re.sub("\)", "", c))

            if (identifier, c) not in doc_eid_to_global_eid:
              doc_eid_to_global_eid[(identifier, c)]=len(doc_eid_to_global_eid)

            ents.append((tid, tid, doc_eid_to_global_eid[(identifier, c)], identifier))

          elif c.startswith("("):
            c=int(re.sub("\(", "", c))

            if c not in open_ents:
              open_ents[c]=[]
            open_ents[c].append(tid)
            open_count+=1

          elif c.endswith(")"):
            c=int(re.sub("\)", "", c))

            assert c in open_ents

            start_tid=open_ents[c].pop()
            open_count-=1

            if (identifier, c) not in doc_eid_to_global_eid:
              doc_eid_to_global_eid[(identifier, c)]=len(doc_eid_to_global_eid)

            ents.append((start_tid, tid, doc_eid_to_global_eid[(identifier, c)], identifier))

        sent.append(token)

  return all_sents, all_ents

def load_embeddings(filename, vocab_size):
  # 0 idx is for padding
  # 1 idx is for unknown words

  # get the embedding size from the first embedding
  with open(filename, encoding="utf-8") as file:
    word_embedding_dim=len(file.readline().split(" "))-1

  vocab={"[PAD]":0, "[UNK]":1}

  print("word_embedding_dim:", word_embedding_dim)

  embeddings=np.zeros((vocab_size, word_embedding_dim))

  with open(filename, encoding="utf-8") as file:
    for idx,line in enumerate(file):

      if idx + 2 >= vocab_size:
        break

      cols=line.rstrip().split(" ")
      val=np.array(cols[1:])
      word=cols[0]
      embeddings[idx+2]=val
      vocab[word]=idx+2

  return torch.FloatTensor(embeddings), vocab

In [0]:
embeddingFile = "glove.6B.50d.txt"
trainFile = "train.conll"
devFile = "dev.conll"

all_sents, all_ents=read_conll(trainFile)	
dev_all_sents, dev_all_ents=read_conll(devFile)

embeddings, vocab=load_embeddings(embeddingFile, 50000)

word_embedding_dim: 50


### **Part 1. Implement B3**

In this part, you’ll implement the B3 coreference metric as discussed in class without importing external libraries. 

Recall the definition: 
$B^{_{precision}^{3}} = \frac{1}{n}\sum_{i}^{n} \frac{\left |Gold_{i} \cap  System_{i} \right |}{\left | System_{i} \right |}$
$B^{_{recall}^{3}} = \frac{1}{n}\sum_{i}^{n} \frac{\left |Gold_{i} \cap  System_{i} \right |}{\left | Gold_{i} \right |}$

You should be able to pass the sanity check b3_test() after implementing it.


In [0]:
def b3(gold, system):
  """ Calculate B3 metrics given the gold and system output
    Args:
        gold  : A dictionary that contains true references. The key is a tuple, (docid, absolute_start_idx, absolute_end_idx)
                representing a target to be predicted; the value is the true reference entity id.
        system: A dictionary that contains predicted referenece. The key in gold and system should be identical; the value
                is the predicted entity generated by the model.
    Returns:
        precision, recall, F(following the formula above)

    """
  # make this less based on this specific problem--need to find all indiv
  # mentions that are unique in gold and system
  # iterate through mentions, then find the rest of the mentions that are linked to that same entity as the initial mention
  precision=0.
  recall=0.
  F = 0.
  
  #values = list(zip(gold.values(),system.values()))
  zipped = list(zip(gold.values(),system.values()))
  n = len(zipped)
  tracking_gold = {}
  tracking_system = {}
  #for k,v in gold.items():
    #k's will be the same for gold and system
  for k, v in gold.items():
    tracking_gold.setdefault(v, []).append(k)
    tracking_system.setdefault(system[k],[]).append(k)
  #print('track_gold',tracking_gold)
  #print('track_system',tracking_system)
  for tuples in zipped:
    gold_values = tracking_gold[tuples[0]] # value=1
    system_values = tracking_system[tuples[1]] # value = 5
    num_common_elems = len(set(gold_values) & set(system_values)) #overlap
    len_gold = len(gold_values)
    len_system = len(system_values)
    precision += (num_common_elems/len_system)
    recall += (num_common_elems/len_gold)
  precision = (1/n)*precision
  recall = (1/n)*recall
  F = 2 * ((precision*recall)/(precision+recall))
  return precision, recall, F

In [0]:
def b3_test():
  gold={"a1":1, "a2": 2, "a3": 1, "a4":1, "a5": 3, "a6":3, "a7":2, "a8":2, "a9":1, "a10":1}
  system={"a1":5, "a2": 6, "a3": 6, "a4":6, "a5": 7, "a6":7, "a7":5, "a8":5, "a9":5, "a10":8}

  precision, recall, F=b3(gold, system)
  print("P: %.3f, R: %.3f, F: %.3f" % (precision, recall, F))

  assert abs(precision-0.667) < 0.001
  assert abs(recall-0.547) < 0.001
  assert abs(F-0.601) < 0.001
  
  print ("B3 sanity check passed")
b3_test()

P: 0.667, R: 0.547, F: 0.601
B3 sanity check passed


### **Part 2. Neural coref**
In part 2, the skeleton code for mention-ranking model is provided to you, you will not need to change any code until Part 2.1 begins. The following section provides the Mention class which is used to store relavant information about a mention and the BasicCorefModel. You will, at the very least, need to carefully read these two classes and understand the information stored in Mention and the structure of the model to complete this homework.


In [0]:
class Mention():

  """
  An object to contain information about each mention
  """

  def __init__(self, mention_id, docid, absolute_start_idx, absolute_end_idx, sentence_start_idx, sentence_end_idx, sentence, vocab):
    self.docid=docid

    # mention id (globally unique within one file, but not across different train and test files)
    self.mention_id=mention_id
    # the token index of the mention start position, measured from the beginning of the document
    self.absolute_start_idx=absolute_start_idx
    # the token index of the mention end position, measured from the beginning of the document
    self.absolute_end_idx=absolute_end_idx
    # the token index of the mention start position, measured from the beginning of the sentence
    self.sentence_start_idx=sentence_start_idx
    # the token index of the mention end position, measured from the beginning of the sentence
    self.sentence_end_idx=sentence_end_idx
    # a list of tokens for all the words in the mention's sentence
    self.sentence=sentence
    # a list of tokens ids for all the words in the mention's sentence
    self.sentence_ids=[]
    self.sentence_length=len(sentence)

    for word in sentence:
      word=word.lower()
      self.sentence_ids.append(vocab[word] if word in vocab else vocab["[UNK]"])
   

In [0]:
def convert_data_to_training_instances(all_sents, all_ents, vocab):
  X=[]
  Y=[]
  M=[]

  global_id=0
  truth={}

  for doc_idx, doc_ent in enumerate(all_ents):

    current_token_position=0

    existing_mentions=[]

    for sent_idx, mention_list in enumerate(doc_ent):
      sent=all_sents[doc_idx][sent_idx]

      for mention_idx, mention in enumerate(mention_list):

        start_sent_idx, end_sent_idx, entity_id, identifier=mention

        mention=Mention(global_id, identifier, current_token_position+start_sent_idx, current_token_position+end_sent_idx, start_sent_idx, end_sent_idx, sent, vocab)
        M.append(mention)
        truth[global_id]=entity_id

        global_id+=1

        x=[]
        y=[]

        for aidx, antecedent in enumerate(existing_mentions):
          x.append(antecedent)
          if truth[antecedent.mention_id] == truth[mention.mention_id]:
            y.append(aidx)

        X.append(x)
        Y.append(torch.LongTensor(y).to(device))

        existing_mentions.append(mention)

      current_token_position+=len(sent)

  return X, Y, M, truth

In [0]:
class BasicCorefModel(nn.Module):

	def __init__(self, vocab, embeddings):
		super(BasicCorefModel, self).__init__()

		self.vocab=vocab

		self.embeddings = nn.Embedding.from_pretrained(embeddings)

		_, embedding_size=embeddings.shape

		self.hidden_dim=50

		self.input_size=2 * embedding_size
		# W1 = R^(2*embedding_size (E) X 50)
		self.W1 = nn.Linear(self.input_size, self.hidden_dim)
		self.tanh=nn.Tanh()
		# W2 = R^(50 X 1)
		self.W2 = nn.Linear(self.hidden_dim, 1)	

	def scorer(self, batch_x, batch_m):

		"""
		Input: a batch containing:
			-- batch_m [list of Mention objects]: mention to resolve.  batch_m[i] contains a single Mention
			-- batch_x [list of [list of Mention objects]]: candidate antecedents. batch_x[i] contains a list of candidate antecedents for mention batch_m[i]

		Each input batch is batched to contain the same number of candidate antecedents

		Output: numpy matrix [batch_size, number_of_antecedents + 1, 1] containing scores for all antecedents
			-- for j < number_of_antecedents, output[i,j] contains the score of batch_x[i][j] being the correct antecedent for batch_m[i] 
			-- for j == number_of_antecedents, output[i,j] = 0 (the score for batch_m[i] being linked to no antecedent)
		Above is looping through mentions (i) while also looping through the candidate antecedents
		"""

		this_batch_size=len(batch_x) # number of batches of mentions
		num_ants=len(batch_x[0]) # number of candidate antecedents

		# get representations for mentions
		lastWordID=[]

		for idx, mention in enumerate(batch_m): # for each mention
			lastWordID.append(mention.sentence_ids[mention.sentence_end_idx])
	 		

		# [this_batch_size, 1, embedding_size]
		mention_LW_embeddings=self.embeddings(torch.LongTensor(lastWordID).to(device)).unsqueeze(1)

		# get representations for antecedents
		antLastWords=[]
		for idx in range(len(batch_x)):
			antWords=[]
			for ant_idx, ant in enumerate(batch_x[idx]):
				antWords.append(ant.sentence_ids[ant.sentence_end_idx])

			antLastWords.append(antWords)

		# [this_batch_size, num_ants, embedding_size]
		antecedent_LW_embeddings=self.embeddings(torch.LongTensor(antLastWords).to(device))

		# We want to generate a score for each antecedent for each mention. However,
		# mention_LW_embeddings is [this_batch_size, 1, embedding_size] while,
		# antecedent_LW_embeddings is [this_batch_size, num_ants, embedding_size].
		# So let's make a bunch of copies of mention_LW_embeddings (one for each of its candidate antecedents)

		# [this_batch_size, num_ants, embedding_size]
		mention_LW_embeddings_copies=mention_LW_embeddings.expand_as(antecedent_LW_embeddings)

		# Now that they're the same size, we can concatenate them together into one big matrix

		# [this_batch_size, num_ants, (embedding_size + embedding_size)]
		all_features=torch.cat([mention_LW_embeddings_copies, antecedent_LW_embeddings], 2)
		
		# [this_batch_size, num_ants, 1]
		preds=self.W2(self.tanh(self.W1(all_features))).squeeze(-1)

		# Let's fix the score for starting a new entity to be 0; all of the other scores for candidate antecedents will end up 
		# being relative to that.

		# [this_batch_size, 1]
		zeros=torch.FloatTensor(np.zeros((this_batch_size, 1))).to(device)

		# [this_batch_size, num_ants + 1, 1]		
		preds=torch.cat((preds, zeros), 1)

		return preds

In [0]:
######### HELPER FUNCTION FOR TRAINING STARTS #########
#########  DONT'T EDIT THIS SECTION OF CODE   #########
def forward_predict(batch_x, batch_m, scoring_function):

  this_batch_size=len(batch_x)
  num_ants=len(batch_x[0])

  # if this batch has no antecedents, then it must start a new entity
  if num_ants == 0:
    return torch.LongTensor([0]*this_batch_size)
  
  # get predictions
  preds=scoring_function(batch_x, batch_m)

  # 
  arg_sorts=torch.argsort(preds, descending=True, dim=1)
  tops=arg_sorts[:,0]

  return tops


def forward_train(batch_x, batch_y, batch_m, scoring_function):

  num_batch=len(batch_x)
  num_ants=len(batch_x[0])

  # if this batch has no candidate antecedents, then each mention must start a new entity so there is only only choice we could make (hence no loss)
  if num_ants == 0:
    return None

  preds=scoring_function(batch_x, batch_m)
  preds_sum=torch.logsumexp(preds, 1)

  running_loss=None


  for i in range(num_batch):

    # optimize marginal log-likelihood of true antecedents
    if batch_y[i].nelement() == 0:
      golds_sum=0.
    else:
      golds=torch.index_select(preds[i], 0, batch_y[i])
      golds_sum=torch.logsumexp(golds, 0)

    diff=preds_sum[i]-golds_sum

    running_loss = diff if running_loss is None else running_loss + diff

  return running_loss

def get_batches(X, Y, M, batchsize):
  sizes={}
  for i in range(len(M)):
    size=len(X[i])
    if size not in sizes:
      sizes[size]=[]
    sizes[size].append((X[i], Y[i], M[i]))

  batches=[]

  for size in sizes:
    i=0
    while (i < len(sizes[size])):

      data=sizes[size][i:i+batchsize]
      batch_x=[]
      batch_y=[]
      batch_m=[]
      for x, y, m in data:
        batch_x.append(x)
        batch_y.append(y)
        batch_m.append(m)

      batches.append((batch_x, batch_y, batch_m))
      i+=batchsize

  return batches


def train(X, Y, M, train_gold, test_X, test_Y, test_M, test_gold, model):

  batches=get_batches(X, Y, M, 32)
  test_batches=get_batches(test_X, test_Y, test_M, 32)
  optimizer = optim.Adam(model.parameters(), lr=0.001)

  for epoch in range(10):

    model.train()
    # train
    bigloss=0.
    for batch_x, batch_y, batch_m in batches:
      model.zero_grad()
      loss=forward_train(batch_x, batch_y, batch_m, model.scorer)
      if loss is not None:
        loss.backward()
        optimizer.step()
        bigloss+=loss

    # evaluate
    model.eval()

    gold={}
    predicted={}

    eid=0
    tot=0

    for batch_x, batch_y, batch_m in test_batches:
      predictions=forward_predict(batch_x, batch_m, model.scorer)

      for idx, mention in enumerate(batch_m):

        gold[mention.docid, mention.absolute_start_idx, mention.absolute_end_idx]=test_gold[mention.mention_id]
        prediction=predictions[idx]
        tot+=1
      
        # prediction is to start a new entity
        if prediction >= len(batch_x[idx]):
          predicted[mention.docid, mention.absolute_start_idx, mention.absolute_end_idx]=eid
          eid+=1

        # prediction is to link to a previous mention
        else:

          best_antecedent=batch_x[idx][prediction]
          predicted_entity_id=predicted[best_antecedent.docid, best_antecedent.absolute_start_idx, best_antecedent.absolute_end_idx]
          predicted[mention.docid, mention.absolute_start_idx, mention.absolute_end_idx]=predicted_entity_id

    P, R, F=b3(gold, predicted)
    print("loss: %.3f, B3 F: %.3f, unique entities: %s, num mentions: %s" % (bigloss, F, eid, tot))

def set_seed(seed):
  """
  Sets random seeds and sets model in deterministic
  training mode. Ensures reproducible results
  """
  torch.manual_seed(seed)
  torch.backends.cudnn.deterministic = True
  torch.backends.cudnn.benchmark = False
  np.random.seed(seed)
######### HELPER FUNCTION FOR TRAINING ENDS #########
#########  DONT'T EDIT THIS SECTION OF CODE   #########

Now, everything is set up to run the BasicCorefModel. Let's run the cell below to train the model and look at the result of the model.

In [0]:
X, Y, M, train_truth=convert_data_to_training_instances(all_sents, all_ents, vocab)
dev_X, dev_Y, dev_M, dev_truth=convert_data_to_training_instances(dev_all_sents, dev_all_ents, vocab)
set_seed(159)
model=BasicCorefModel(vocab, embeddings)
model=model.to(device)
print ("Training BasicCorefModel")
train(X, Y, M, train_truth, dev_X, dev_Y, dev_M, dev_truth, model)

Training BasicCorefModel
loss: 41274.820, B3 F: 0.764, unique entities: 29578, num mentions: 29597
loss: 33196.121, B3 F: 0.764, unique entities: 29569, num mentions: 29597
loss: 29578.078, B3 F: 0.765, unique entities: 29323, num mentions: 29597
loss: 27773.998, B3 F: 0.771, unique entities: 28797, num mentions: 29597
loss: 26788.646, B3 F: 0.779, unique entities: 27948, num mentions: 29597
loss: 26151.426, B3 F: 0.782, unique entities: 27427, num mentions: 29597
loss: 25682.389, B3 F: 0.783, unique entities: 27372, num mentions: 29597
loss: 25319.139, B3 F: 0.786, unique entities: 26932, num mentions: 29597
loss: 25024.688, B3 F: 0.789, unique entities: 26854, num mentions: 29597
loss: 24776.924, B3 F: 0.793, unique entities: 26450, num mentions: 29597


### **Part 2.1 Incorporate distance**

In this part, you should incorporate the word distance information to BasicCorefModel described in the HW. The below code structure provided to you is exactly the same as BasicCorefModel, your job is to add code into both __init__() and scorer() functions as you see fit.

Hint: You might consider initialize distance embedding in __init__() function, then concatenate the original embedding and the corresponding distance embedding in scorer(). 

After implementing this, run the sanity check, test_distance(), provided to you.

In [0]:
class DistanceCorefModel(nn.Module):

	""" The code provided here starts out as just a copy of BasicCorefModel """

	'''
	class Mention():
	  # mention id (globally unique within one file, but not across different train and test files)
    self.mention_id=mention_id

    # the token index of the mention START position, measured from the beginning of the document
    self.absolute_start_idx=absolute_start_idx

    # the token index of the mention END position, measured from the beginning of the document
    self.absolute_end_idx=absolute_end_idx

    # the token index of the mention START position, measured from the beginning of the sentence
    self.sentence_start_idx=sentence_start_idx

    # the token index of the mention END position, measured from the beginning of the sentence
    self.sentence_end_idx=sentence_end_idx

    # a list of tokens for ALL the words in the mention's sentence
    self.sentence=sentence

    # a list of tokens ids for ALL the words in the mention's sentence
    self.sentence_ids=[]
    self.sentence_length=len(sentence)

    for word in sentence:
      word=word.lower()
      self.sentence_ids.append(vocab[word] if word in vocab else vocab["[UNK]"])
	'''


	def __init__(self, vocab, embeddings):
		super(DistanceCorefModel, self).__init__()

		self.vocab=vocab

		self.embeddings = nn.Embedding.from_pretrained(embeddings)
	
		#initialize distance embeddings from something
		#self.distance_embeddings = nn.Embedding.from_pretrained(embeddings)
		
		#self.distance_embeddings = nn.Embedding.from_pretrained(torch.eye(10))
		#_, distance_embedding_size = self.distance_embeddings.shape

		_, embedding_size = embeddings.shape

		self.hidden_dim = 50
		self.input_size = (2 * embedding_size) + 10 # (2E+D) X 50
		self.W1 = nn.Linear(self.input_size, self.hidden_dim)
		self.tanh=nn.Tanh()
		self.W2 = nn.Linear(self.hidden_dim, 1)	
		self.distance_embeddings = nn.Embedding(10,10)

	def scorer(self, batch_x, batch_m):

		"""
		Input: a batch containing:
			-- batch_m [list of Mention objects]: mention to resolve.  batch_m[i] contains a single Mention
			-- batch_x [list of [list of Mention objects]]: candidate antecedents. batch_x[i] contains a list of candidate antecedents for mention batch_m[i]

		Each input batch is batched to contain the same number of candidate antecedents

		Output: numpy matrix [batch_size, number_of_antecedents + 1, 1] containing scores for all antecedents
			-- for j < number_of_antecedents, output[i,j] contains the score of batch_x[i][j] being the correct antecedent for batch_m[i] 
			-- for j == number_of_antecedents, output[i,j] = 0 (the score for batch_m[i] being linked to no antecedent)

		"""
		device = torch.device("cpu")

		this_batch_size=len(batch_x) # number of mentions
		num_ants=len(batch_x[0])
		#print('this btch size',this_batch_size)
		#print('num ants',num_ants)

		# get representations for mentions
		lastWordID=[]

		for idx, mention in enumerate(batch_m): # for each mention
			lastWordID.append(mention.sentence_ids[mention.sentence_end_idx])
			# get the index of word at the token index of the mention end position (sentence)

		# [this_batch_size, 1, embedding_size] --> representation for mentions (embeddings)
		mention_LW_embeddings=self.embeddings(torch.LongTensor(lastWordID).to(device)).unsqueeze(1)
		#print('mention embeddings',mention_LW_embeddings.shape)
		# get representations for antecedents
		antLastWords=[]
		for idx in range(len(batch_x)):
			antWords=[]
			for ant_idx, ant in enumerate(batch_x[idx]):
				antWords.append(ant.sentence_ids[ant.sentence_end_idx])
			#print('antwords this batch',len(antWords))
	 		#print('num ants',len(antWords))
			antLastWords.append(antWords)
	 	
		# [this_batch_size, num_ants, embedding_size] --> representation for antecedents (embeddings) for all mentions
		antecedent_LW_embeddings=self.embeddings(torch.LongTensor(antLastWords).to(device))
		#print('ant embeddings',antecedent_LW_embeddings.shape)
		# We want to generate a score for each antecedent for each mention. However,
		# mention_LW_embeddings is [this_batch_size, 1, embedding_size] while,
		# antecedent_LW_embeddings is [this_batch_size, num_ants, embedding_size]. 
		# LOTS more antecedents than mentions cuz repeats?
		# So let's make a bunch of copies of mention_LW_embeddings (one for each of its candidate antecedents)

		# [this_batch_size, num_ants, embedding_size]
		mention_LW_embeddings_copies=mention_LW_embeddings.expand_as(antecedent_LW_embeddings)
		# Now that they're the same size, we can concatenate them together into one big matrix
		





		# mention_abs_end_embeddings=self.embeddings(torch.LongTensor(doc_lastwordID).to(device)).unsqueeze(1)
		#print('list of mentions',doc_mention_lastwordID)
		#print(len(doc_mention_lastwordID)) # 32
		# compile all the absolute end indices for each antecedent
		
		# DISTANCE EMBEDDING CREATION
		# lastWordID.append(mention.sentence_ids[mention.sentence_end_idx])
		
		# compile all the absolute end indices for each mention
		doc_mention_lastwordID = []
		for idx,mention in enumerate(batch_m):
			#doc_mention_lastwordID.append(mention.sentence_ids[mention.absolute_end_idx])
			doc_mention_lastwordID.append(mention.absolute_end_idx)
	 
		doc_ant_lastwordID = []
		for idx in range(len(batch_x)):
			antWords=[]
			#for ant_idx, ant in enumerate(batch_x[idx]):
				#antWords.append(ant.sentence_ids[ant.absolute_end_idx])
				#antWords.append(ant.absolute_end_idx)
			for antecedent in batch_x[idx]:
				#print('antecendent',antecedent)
				antWords.append(antecedent.absolute_end_idx) # adding to list of antecedents
			doc_ant_lastwordID.append(antWords)
	  #print('list of antecedents based on mention index',doc_ant_lastwordID)
		abs_diff = 0
		abs_total = []
		#for idx,mention in enumerate(batch_m):
		for word_mention_idx in range(len(doc_mention_lastwordID)): #indices, 32
			abs_diff_list = []
			for word_ant_idx in range(len(doc_ant_lastwordID[word_mention_idx])): # 1 X 10
				diff = doc_mention_lastwordID[word_mention_idx] - doc_ant_lastwordID[word_mention_idx][word_ant_idx]
				#diff = doc_mention_lastwordID[word_mention_idx]
				abs_diff = abs(diff)
				if abs_diff == 0:
						abs_diff_list.append(0)
				elif abs_diff == 1:
						abs_diff_list.append(1)
				elif abs_diff == 2:
						abs_diff_list.append(2)
				elif abs_diff == 3:
						abs_diff_list.append(3)
				elif abs_diff == 4:
						abs_diff_list.append(4)
				elif abs_diff <= 7 and abs_diff >= 5:
						abs_diff_list.append(5)
				elif abs_diff <= 15 and abs_diff >= 8:
						abs_diff_list.append(6)
				elif abs_diff <= 31 and abs_diff >= 16:
						abs_diff_list.append(7)
				elif abs_diff <= 63 and abs_diff >= 32:
						abs_diff_list.append(8)
				else:
						abs_diff_list.append(9)
			abs_total.append(abs_diff_list) # 32, 1
		abs_distance_embeddings = self.distance_embeddings(torch.LongTensor(abs_total).to(device))
		all_features=torch.cat([mention_LW_embeddings_copies, antecedent_LW_embeddings,abs_distance_embeddings], 2)
		#print('distance embedding',abs_distance_embeddings.shape) # should be [32,1,10]
		# [32, 1, 110]
		# [this_batch_size, num_ants, (embedding_size + embedding_size + distance_embedding_size)]
		# concatenate these newly adjusted embeddings for a 2-dimensional space
		#print('mention',mention_LW_embeddings_copies.shape)
		#print('ant',antecedent_LW_embeddings.shape)
		#print('abs dist',abs_distance_embeddings.shape)
		
		#print('all features',all_features.shape)
	
		# [this_batch_size, num_ants, 1]
		preds=self.W2(self.tanh(self.W1(all_features))).squeeze(-1)

		# Let's fix the score for starting a new entity to be 0; all of the other scores for candidate antecedents will end up 
		# being relative to that.

		# [this_batch_size, 1]
		zeros=torch.FloatTensor(np.zeros((this_batch_size, 1))).to(device)

		# [this_batch_size, num_ants + 1, 1]		
		preds=torch.cat((preds, zeros), 1)
		return preds

In [0]:
def test_distance(model):
  batch_x=[]
  maxLen=100
  for i in range(maxLen):
    mention=Mention(i, "testdoc", i, i+1, 0, 1, ["John", "Smith", "is", "a", "person"], model.vocab)
    batch_x.append(mention)

  mention=Mention(maxLen, "testdoc", maxLen, maxLen, 0, 0, ["He", "is", "a", "person"], model.vocab)

  preds=model.scorer([batch_x], [mention])
  preds=preds.detach().cpu().numpy()[0]
  spearman, _=spearmanr(preds, np.arange(len(preds)))
  print("Distance check: %.3f" % spearman)
  with open("distance_predictions.txt", "w", encoding="utf-8") as out:
    out.write(' '.join(["%.5f" % x for x in preds]))

In [0]:
set_seed(159)
model=DistanceCorefModel(vocab, embeddings)
model=model.to(device)
print ("Training DistanceCorefModel")
train(X, Y, M, train_truth, dev_X, dev_Y, dev_M, dev_truth, model)
test_distance(model)

Training DistanceCorefModel
loss: 38109.875, B3 F: 0.769, unique entities: 27403, num mentions: 29597
loss: 31399.088, B3 F: 0.792, unique entities: 25583, num mentions: 29597
loss: 27612.166, B3 F: 0.806, unique entities: 24815, num mentions: 29597
loss: 25193.537, B3 F: 0.810, unique entities: 24451, num mentions: 29597
loss: 23620.645, B3 F: 0.812, unique entities: 24191, num mentions: 29597
loss: 22586.162, B3 F: 0.815, unique entities: 23948, num mentions: 29597
loss: 21875.182, B3 F: 0.816, unique entities: 23776, num mentions: 29597
loss: 21347.727, B3 F: 0.817, unique entities: 23646, num mentions: 29597
loss: 20928.213, B3 F: 0.817, unique entities: 23497, num mentions: 29597
loss: 20579.402, B3 F: 0.819, unique entities: 23437, num mentions: 29597
Distance check: 0.943


### **Part 2.2 Design a fancier model**
Here comes the fun part! After completing DistanceCorefModel, you have certain degree of familiarity with the model architecture. In the section, you will be implementing a fancier model using any features you'd like. Feel free to make changes to the architecture you see fit.

Submit this notebook to gradescope and a writeup file "fancymodel.txt" describing your model and the features you use.
**Your code must implement exactly what you describe in your writeup**

In [0]:
class FancyCorefModel(nn.Module):

	""" The code provided here starts out as just a copy of BasicCorefModel """

	def __init__(self, vocab, embeddings):
		super(FancyCorefModel, self).__init__()

		self.vocab=vocab

		self.embeddings = nn.Embedding.from_pretrained(embeddings)

		_, embedding_size=embeddings.shape
		self.hidden_dim = 50
		self.input_size = (2 * embedding_size) + 10 + 10 + 4 + 3# (2E+D) X 50
		self.W1 = nn.Linear(self.input_size, self.hidden_dim)
		self.tanh=nn.Tanh()
		self.W2 = nn.Linear(self.hidden_dim, 1)	
		self.mention_width_embeddings = nn.Embedding(10,10)
		self.distance_embeddings = nn.Embedding(10,10)
		self.gender_embeddings = nn.Embedding(4,4) #she,he,they
		self.number_embeddings = nn.Embedding(3,3) #she/he versus they
	def scorer(self, batch_x, batch_m):

		"""
		Input: a batch containing:
			-- batch_m [list of Mention objects]: mention to resolve.  batch_m[i] contains a single Mention
			-- batch_x [list of [list of Mention objects]]: candidate antecedents. batch_x[i] contains a list of candidate antecedents for mention batch_m[i]

		Each input batch is batched to contain the same number of candidate antecedents

		Output: numpy matrix [batch_size, number_of_antecedents + 1, 1] containing scores for all antecedents
			-- for j < number_of_antecedents, output[i,j] contains the score of batch_x[i][j] being the correct antecedent for batch_m[i] 
			-- for j == number_of_antecedents, output[i,j] = 0 (the score for batch_m[i] being linked to no antecedent)

		"""
		
		this_batch_size=len(batch_x)
		num_ants=len(batch_x[0])

		# get representations for mentions 
		lastWordID=[]

		for idx, mention in enumerate(batch_m):
			lastWordID.append(mention.sentence_ids[mention.sentence_end_idx])

		# [this_batch_size, 1, embedding_size]
		mention_LW_embeddings=self.embeddings(torch.LongTensor(lastWordID).to(device)).unsqueeze(1)

		# get representations for antecedents
		antLastWords=[]
		for idx in range(len(batch_x)):
			antWords=[]
			for ant_idx, ant in enumerate(batch_x[idx]):
				antWords.append(ant.sentence_ids[ant.sentence_end_idx])

			antLastWords.append(antWords)

		# [this_batch_size, num_ants, embedding_size]
		antecedent_LW_embeddings=self.embeddings(torch.LongTensor(antLastWords).to(device))

		# We want to generate a score for each antecedent for each mention. However,
		# mention_LW_embeddings is [this_batch_size, 1, embedding_size] while,
		# antecedent_LW_embeddings is [this_batch_size, num_ants, embedding_size].
		# So let's make a bunch of copies of mention_LW_embeddings (one for each of its candidate antecedents)

		# [this_batch_size, num_ants, embedding_size]
		mention_LW_embeddings_copies=mention_LW_embeddings.expand_as(antecedent_LW_embeddings)

		# Now that they're the same size, we can concatenate them together into one big matrix

		# IMPLEMENT FANCY COREFERENCE MODEL HERE
		# Grab each span i and assign each an antecedent, where each possible antecedent goes from start-i
		# Maybe use a BiLSTM? 
		# Maybe incorporate attention into the model
		# Impt features: context of spans and the internal info of a span
		# We also include an attention mechanism over words in each span to model head words

		# IMPLEMENT DISTANCE COREF FEATURES
		doc_mention_lastwordID = []
		for idx,mention in enumerate(batch_m):
			doc_mention_lastwordID.append(mention.absolute_end_idx)
	 
		doc_ant_lastwordID = []
		for idx in range(len(batch_x)):
			antWords=[]
			for antecedent in batch_x[idx]:
				antWords.append(antecedent.absolute_end_idx) # adding to list of antecedents
			doc_ant_lastwordID.append(antWords)
		abs_diff = 0
		abs_total = []
		for word_mention_idx in range(len(doc_mention_lastwordID)): #indices, 32
			abs_diff_list = []
			for word_ant_idx in range(len(doc_ant_lastwordID[word_mention_idx])): # 1 X 10
				diff = doc_mention_lastwordID[word_mention_idx] - doc_ant_lastwordID[word_mention_idx][word_ant_idx]
				abs_diff = abs(diff)
				if abs_diff == 0:
						abs_diff_list.append(0)
				elif abs_diff == 1:
						abs_diff_list.append(1)
				elif abs_diff == 2:
						abs_diff_list.append(2)
				elif abs_diff == 3:
						abs_diff_list.append(3)
				elif abs_diff == 4:
						abs_diff_list.append(4)
				elif abs_diff <= 7 and abs_diff >= 5:
						abs_diff_list.append(5)
				elif abs_diff <= 15 and abs_diff >= 8:
						abs_diff_list.append(6)
				elif abs_diff <= 31 and abs_diff >= 16:
						abs_diff_list.append(7)
				elif abs_diff <= 63 and abs_diff >= 32:
						abs_diff_list.append(8)
				else:
						abs_diff_list.append(9)
			abs_total.append(abs_diff_list) # 32, 1
		abs_distance_embeddings = self.distance_embeddings(torch.LongTensor(abs_total).to(device))
		#print('dist',abs_distance_embeddings)

		# SPAN WIDTH + DISTANCE BTWN SPANS
		total_span_widths = []
		for index in range(len(batch_x)):
			span_widths = []
			for ant in range(len(batch_x[index])):
				width_mention = abs(batch_m[index].sentence_start_idx - batch_m[index].sentence_end_idx)
				width_ant = abs(batch_x[index][ant].sentence_start_idx - batch_x[index][ant].sentence_end_idx)
				width = abs(width_mention-width_ant)
				if width == 0:
					span_widths.append(0)
				elif width == 1:
					span_widths.append(1)
				elif width == 2:
					span_widths.append(2)
				elif width == 3:
					span_widths.append(3)
				elif width == 4:
					span_widths.append(4)
				elif width >= 5 and width <= 7:
					span_widths.append(5)
				elif width >= 8 and width <= 15:
					span_widths.append(6)
				elif width >= 16 and width <= 31:
					span_widths.append(7)
				elif width >= 32 and width <= 63:
					span_widths.append(8)
				else:
					span_widths.append(9)
			total_span_widths.append(span_widths)
		mention_width_embeddings = self.mention_width_embeddings(torch.LongTensor(total_span_widths).to(device))
		#print('width',mention_width_embeddings)
		
		# GENDER AGREEMENT
		total_genders = []
		for index in range(len(batch_x)):
			genders = []
			for ant in range(len(batch_x[index])):
				if ("she" or "her") in batch_x[index][ant].sentence:
					#print(batch_x[index][ant].sentence)
					genders.append(0)
				elif ("he" or "him") in batch_x[index][ant].sentence:
					genders.append(1)
				elif ("they") in batch_x[index][ant].sentence:
					genders.append(2)
				else:
					genders.append(3)
			total_genders.append(genders)
		gender_embeddings = self.gender_embeddings(torch.LongTensor(total_genders).to(device))
		
		# NUMBER AGREEMENT
		
		total_numbers = []
		for index in range(len(batch_x)):
			numbers = []
			for ant in range(len(batch_x[index])):
				if ("she" or "her" or "he" or "him") in batch_x[index][ant].sentence:
					numbers.append(0)
				elif ("they") in batch_x[index][ant].sentence:
					numbers.append(1)
				else:
					numbers.append(2)
			total_numbers.append(numbers)
		number_embeddings = self.number_embeddings(torch.LongTensor(total_numbers).to(device))
		
		# ALL FEATURES
		all_features=torch.cat([mention_LW_embeddings_copies, antecedent_LW_embeddings,abs_distance_embeddings,mention_width_embeddings,gender_embeddings,number_embeddings], 2)
	
		# [this_batch_size, num_ants, 1]
		preds=self.W2(self.tanh(self.W1(all_features))).squeeze(-1)
		zeros=torch.FloatTensor(np.zeros((this_batch_size, 1))).to(device)

		# [this_batch_size, num_ants + 1, 1]		
		preds=torch.cat((preds, zeros), 1)

		return preds

In [0]:
model=FancyCorefModel(vocab, embeddings)
model=model.to(device)

print ("Training FancyCorefModel")
train(X, Y, M, train_truth, dev_X, dev_Y, dev_M, dev_truth, model)
test_distance(model)

Training FancyCorefModel
loss: 35070.379, B3 F: 0.793, unique entities: 25592, num mentions: 29597
loss: 28207.088, B3 F: 0.812, unique entities: 24530, num mentions: 29597
loss: 24093.018, B3 F: 0.821, unique entities: 23934, num mentions: 29597
loss: 21733.496, B3 F: 0.827, unique entities: 23453, num mentions: 29597
loss: 20391.447, B3 F: 0.831, unique entities: 23107, num mentions: 29597
loss: 19530.891, B3 F: 0.835, unique entities: 22924, num mentions: 29597
loss: 18914.557, B3 F: 0.836, unique entities: 22795, num mentions: 29597
loss: 18440.221, B3 F: 0.837, unique entities: 22648, num mentions: 29597
loss: 18057.422, B3 F: 0.838, unique entities: 22623, num mentions: 29597
loss: 17737.248, B3 F: 0.838, unique entities: 22596, num mentions: 29597
Distance check: 0.932
