# Week 11: Transformers Application

Additional references: 
- [Question Answering with huggingface](https://huggingface.co/transformers/usage.html)
- [Textual Entailment](https://nlp.stanford.edu/pubs/snli_paper.pdf)
- [SQuAD question answering](https://arxiv.org/abs/1606.05250)

In [1]:
#setup
import warnings; warnings.simplefilter('ignore')
import pandas as pd
import numpy as np

df = pd.read_pickle('sc_cases_cleaned.pkl', compression='gzip')
df = df.assign(author_id=(df['authorship']).astype('category').cat.codes)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 768 entries, 0 to 819
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   case_name       768 non-null    object        
 1   opinion_type    768 non-null    object        
 2   date_standard   768 non-null    datetime64[ns]
 3   authorship      768 non-null    object        
 4   x_republican    768 non-null    float64       
 5   maj_judges      768 non-null    object        
 6   dissent_judges  768 non-null    object        
 7   topic_id        768 non-null    float64       
 8   cite_count      768 non-null    float64       
 9   opinion_text    768 non-null    object        
 10  year            768 non-null    int64         
 11  log_cite_count  768 non-null    float64       
 12  author_id       768 non-null    int8          
dtypes: datetime64[ns](1), float64(4), int64(1), int8(1), object(6)
memory usage: 78.8+ KB


# Coref

In [6]:
from allennlp.predictors.predictor import Predictor
import allennlp_models.tagging

predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/coref-spanbert-large-2021.03.10.tar.gz")
predictions = predictor.predict(
    document="Paul Allen was born on January 21, 1953, in Seattle, Washington, to Kenneth Sam Allen and Edna Faye Allen. Allen attended Lakeside School, a private school in Seattle, where he befriended Bill Gates, two years younger, with whom he shared an enthusiasm for computers."
)


downloading:   0%|          | 0.00/1.25G [00:00<?, ?iB/s]

Downloading:   0%|          | 0.00/414 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/634M [00:00<?, ?B/s]

Some weights of BertModel were not initialized from the model checkpoint at SpanBERT/spanbert-large-cased and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [43]:
for cluster in predictions['clusters']:
    print("==Cluster==")
    for element in cluster:
        entity = ""
        for el in range(element[0], element[1]+1):
            entity += predictions['document'][el] + " "
        print(entity)

==Cluster==
Paul Allen 
Allen 
he 
he 
==Cluster==
Seattle , Washington 
Seattle 


In [10]:
clusters = predictions['clusters']
document = predictions['document']
document[:10], clusters

(['Paul', 'Allen', 'was', 'born', 'on', 'January', '21', ',', '1953', ','],
 [[[0, 1], [24, 24], [36, 36], [47, 47]], [[11, 13], [33, 33]]])

In [11]:
n = 0
doc = {}
for obj in document:    
    doc.update({n :  obj}) #creating a dictionary of each word with its respective index
    n = n+1

doc

{0: 'Paul',
 1: 'Allen',
 2: 'was',
 3: 'born',
 4: 'on',
 5: 'January',
 6: '21',
 7: ',',
 8: '1953',
 9: ',',
 10: 'in',
 11: 'Seattle',
 12: ',',
 13: 'Washington',
 14: ',',
 15: 'to',
 16: 'Kenneth',
 17: 'Sam',
 18: 'Allen',
 19: 'and',
 20: 'Edna',
 21: 'Faye',
 22: 'Allen',
 23: '.',
 24: 'Allen',
 25: 'attended',
 26: 'Lakeside',
 27: 'School',
 28: ',',
 29: 'a',
 30: 'private',
 31: 'school',
 32: 'in',
 33: 'Seattle',
 34: ',',
 35: 'where',
 36: 'he',
 37: 'befriended',
 38: 'Bill',
 39: 'Gates',
 40: ',',
 41: 'two',
 42: 'years',
 43: 'younger',
 44: ',',
 45: 'with',
 46: 'whom',
 47: 'he',
 48: 'shared',
 49: 'an',
 50: 'enthusiasm',
 51: 'for',
 52: 'computers',
 53: '.'}

In [12]:
clus_all = []
cluster = []
clus_one = {}
for i in range(0, len(clusters)):    
    one_cl = clusters[i]    
    for count in range(0, len(one_cl)):           
        obj = one_cl[count]        
        for num in range((obj[0]), (obj[1]+1)):            
            for n in doc:                
                if num == n:                 
                    cluster.append(doc[n]) 
    clus_all.append(cluster)       
    cluster = []
    
print(clus_all) 

[['Paul', 'Allen', 'Allen', 'he', 'he'], ['Seattle', ',', 'Washington', 'Seattle']]


# Question Answering

In [13]:
from transformers import pipeline

nlp = pipeline("question-answering")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the `run_squad.py`.
"""

print(nlp(question="What is extractive question answering?", context=context))
print(nlp(question="What is a good example of a question answering dataset?", context=context))

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

{'score': 0.6222440600395203, 'start': 34, 'end': 95, 'answer': 'the task of extracting an answer from a text given a question'}
{'score': 0.5115328431129456, 'start': 147, 'end': 160, 'answer': 'SQuAD dataset'}


# Textual Entailment

premise: Two women are wandering along the shore drinking iced tea.
hypothesis: Two women are sitting on a blanket near some rocks talking about politics.
label: (premise -> hypothesis, premise ? hypothesis, premise -x hypothesis)


In [14]:
# using AllenNLP

from allennlp.predictors.predictor import Predictor
import allennlp_models.tagging

predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/decomposable-attention-elmo-2020.04.09.tar.gz")
prediction = predictor.predict(
    premise="Two women are wandering along the shore drinking iced tea.",
    hypothesis="Two women are sitting on a blanket near some rocks talking about politics."
)

downloading:   0%|          | 0.00/665M [00:00<?, ?iB/s]

downloading:   0%|          | 0.00/336 [00:00<?, ?iB/s]

downloading:   0%|          | 0.00/357M [00:00<?, ?iB/s]

In [15]:
import numpy as np
id2label = {0:"Entailment", 1:"Contradiction", 2:"Neutral"} # https://demo.allennlp.org/textual-entailment/elmo-snli
print (prediction["label_probs"])
print (id2label[np.argmax(prediction["label_probs"])])

[0.00033908855402842164, 0.9735872745513916, 0.02607365883886814]
Contradiction


In [16]:
# using Transformers

from transformers import RobertaTokenizer, RobertaForSequenceClassification

model_name = "roberta-large-mnli" # mnli refers to the following dataset on which roberta was trained: https://cims.nyu.edu/~sbowman/multinli/
tokenizer = RobertaTokenizer.from_pretrained(model_name)
model = RobertaForSequenceClassification.from_pretrained(model_name)

Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [17]:
premise="Two women are wandering along the shore drinking iced tea."
hypothesis="Two women are sitting on a blanket near some rocks talking about politics."

model_input = tokenizer(premise, hypothesis, return_tensors="pt")
print (tokenizer.decode(model_input["input_ids"][0]))
# note how we obtain a single sequence with <s>premise</s></s>hypothesis</s>


<s>Two women are wandering along the shore drinking iced tea.</s></s>Two women are sitting on a blanket near some rocks talking about politics.</s>


In [18]:
from torch.nn import Softmax
import torch

output = model(**model_input)
softmax = Softmax()
probs = softmax(output.logits)

In [19]:
print (probs)

id2label = {0:"Contradiction", 1:"Neutral", 2:"Entailment"} # these are label2id from MNLI

argmax = torch.argmax(output.logits[0].detach()).item()
print (id2label[argmax])

#print (id2label[torch.argmax(output.logits)])

tensor([[0.8877, 0.1107, 0.0016]], grad_fn=<SoftmaxBackward0>)
Contradiction


In [20]:
# do it for a whole batch

premises = ["If you help the needy, God will reward you.", "An interplanetary spacecraft is in orbit around a gas giant's icy moon.", "A large, gray elephant walked beside a herd of zebras.", "A handmade djembe was on display at the Smithsonian."]
hypotheses = ["Giving money to the poor has good consequences.", "The spacecraft has the ability to travel between planets.", "The elephant was lost.", "Visitors could see the djembe."] 

model_inputs = tokenizer(premises, hypotheses, return_tensors="pt", padding=True, truncation=True, max_length=256)


In [21]:
output = model(**model_inputs)
softmax = Softmax()
probs = softmax(output.logits)

for premise, hypothesis, prediction in zip(premises, hypotheses, probs):
    argmax = torch.argmax(prediction).item()
    print (premise, "--", hypothesis, "--", id2label[argmax])

If you help the needy, God will reward you. -- Giving money to the poor has good consequences. -- Entailment
An interplanetary spacecraft is in orbit around a gas giant's icy moon. -- The spacecraft has the ability to travel between planets. -- Neutral
A large, gray elephant walked beside a herd of zebras. -- The elephant was lost. -- Neutral
A handmade djembe was on display at the Smithsonian. -- Visitors could see the djembe. -- Entailment
