<a href="https://colab.research.google.com/github/rostro36/Partisan-Responses/blob/master/QA_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import os
os.chdir('/content/drive/My Drive/Partisan-Responses-master')


In [None]:
# Install libraries if needed
#! pip install transformers
! pip install allennlp allennlp-models
! pip install hnswlib
! pip install wandb
! pip install neuralcoref

Import all libraries

In [None]:
import numpy as np
import pandas as pd
import spacy
import pickle
import torch
import torch.nn.functional as F
from itertools import islice
from transformers import pipeline, GPT2LMHeadModel, GPT2Tokenizer
from sklearn.feature_extraction.text import TfidfVectorizer
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from Search import Search
from Answer import Answer
from Speech import Speech
from KnowledgeGraph import KnowledgeGraph
import neuralcoref
import re
import gc
import utils

In [None]:
nltk.download('punkt')

Load data & search

In [None]:
# Last 14 Congresses
file_name = 'search_dataset_small.pkl'
speeches = pd.read_pickle(file_name)
speeches.head()
speeches = speeches.loc[:1000,:]
#search = Search(speeches=speeches)

In [4]:
search = pickle.load(open("search_results.pkl", "rb"))
search.head()

Unnamed: 0,question,answer_R,answer_D
0,Should abortion be illegal?,"Mr. President, my amendment simply remedies a ...",Will the Senator yield to answer a question so...
1,What do you believe about tax increases?,"Mr. Speaker, I note that the Speaker of the Ho...",Do you remember the other bill that he signed ...
2,Should same-sex marriage be legal?,Then I will submit all this for the RECORD and...,"Mr. Speaker, I yield myself such time as I may..."
3,Is climate change real?,"Mr. President, this amendment addresses the la...","Mr. Speaker, I thank the distinguished gentlem..."
4,Should immigrants be allowed to obtain citizen...,"Mr. Chairman, as a strong advocate of immigrat...","Mr. President, I am pleased to introduce legis..."


In [5]:
# Manual Questions (not from corpus)
questions = ["Should abortion be illegal?",
             "What do you believe about tax increases?",
             "Should same-sex marriage be legal?",
             "Is climate change real?",
             "Should immigrants be allowed to obtain citizenship?",
             "Who should be given voting rights?",
             "Should we have higher taxes for higher incomes?",
             "Should we allow death penalty?",
             "Should we have universal healthcare?",
             "Do government regulations hinder free market capitalism?",
             "What do you think about the current president?",
             "Should we reduce national debt?",
             "Should we increase spending on healthcare?",
             "Should we increase spending on education?",
             "Should every American have equal opportunities regardless of sex, age and race?",
             "Should be introduce more gun control measures?",
             "What do you think about the current president?",
             "Should Americans be free?",
             "What party do you support?",
             "What is the biggest threat to America?"
             ]

Graph construction

In [6]:
#from KnowledgeGraph import KnowledgeGraph
knowledgeGraphs=dict()
for question in questions:
  knowledgeGraphs[question]=KnowledgeGraph(question)

In [7]:
identifier='coref'
checkpoint=12
#verb_list_file = "verb_list"+identifier+str(checkpoint)+".pickle"
verb_dict = pickle.load(open("verb_dict.pickle", "rb"))
verb_list = None #pickle.load(verb_list_file)
graphWriterData=[]

In [8]:
def parse_entry(question,answer,verb_dict,verb_list):
    result=dict()
    result['question']=' '.join([token.text for token in utils.sp(question)])
    phrase_corpus, triplet_id, parsed_text,parsed=Answer(answer).create_test(verb_dict,verb_list)
    result['corpus']=' ; '.join(phrase_corpus)
    result['tags']=' '.join(['<phrase>']*len(phrase_corpus))
    result['triplet_id']=' ; '.join([re.sub('\,','',str(x))[1:-1] for x in triplet_id])
    result['parsed_text']=parsed_text
    result['parsed']=' '.join([str(x) for x in parsed])
    return result

Generate answers to question with GPT-2

In [None]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained('gpt2')

In [None]:
for question in questions:
    print("Question: {}".format(question))
    print("GPT-2 generation: ")
    generated = tokenizer.encode(question)
    context = torch.tensor([generated])
    past = None

    for i in range(100):
        output, past = model(context, past=past)
        token = torch.argmax(output[..., -1, :])

        generated += [token.tolist()]
        context = token.unsqueeze(0)

    sequence = tokenizer.decode(generated)

    print(sequence)

Question: Should abortion be illegal?
GPT-2 generation: 
Should abortion be illegal?

The Supreme Court has ruled that abortion is illegal under the Fourteenth Amendment. The Supreme Court has ruled that abortion is illegal under the Fourteenth Amendment.

The Supreme Court has ruled that abortion is illegal under the Fourteenth Amendment.

The Supreme Court has ruled that abortion is illegal under the Fourteenth Amendment.

The Supreme Court has ruled that abortion is illegal under the Fourteenth Amendment.

The Supreme Court has ruled that abortion is illegal under the Fourteenth Amendment.
Question: What do you believe about tax increases?
GPT-2 generation: 
What do you believe about tax increases?

I think that the tax increases are going to be very good for the economy. I think that the tax increases are going to be very good for the middle class. I think that the middle class is going to be very happy. I think that the middle class is going to be very happy.

I think that the mid

Search

In [54]:
re.search("(?<=ARG1: )[\w\s\'\",\.\:\$\-\(\)\*\/]*(?=])", '[ARG1: the criminal penalties for marriage fraud to 5 years imprisonment and/or a $ 250.000 fine]')

<_sre.SRE_Match object; span=(7, 96), match='the criminal penalties for marriage fraud to 5 ye>

In [55]:
import pandas as pd
# import allennlp_models.structured_prediction
# import allennlp_models.coref
import nltk
import re
import utils

auxillary_verbs=['can','could','may','might','must','shall','should','will','would'] #https://englishstudyonline.org/auxiliary-verbs/
distance_threshold=0.5

class Speech:
    def __init__(self, speech):
        #self.speaker = speech['lastname'] + " " + speech['firstname']
        self.party = speech['party']
        self.content = speech['speech']
        
    def change_comma(self):
        """
        Replace improper period to comma
        """
        self.content = re.sub("\.(?=\s[a-z0-9]|\sI[\W\s])", ",", self.content)

    def _find_triplets(self, openinfo_result):
        """
        Find one or more triplets of each sentence from allennlp OIE results
        Param:
        ========

        Return:
        ========
        speech_triplets: list, a list of lists of triplet tuples (of a speech)
        """
        arg0 = "ARG0: "
        arg1 = "ARG1: "
        modalverbs = ["can", "could", "may", "might", "must", "shall", "should", "will", "would"]
        speech_triplet = []
        for sentence in openinfo_result:
            sent_triplet = []
            if sentence is not []:
                for d in sentence: # Extract from 'description' result of OIE
                    verb = d['verb']
                    if verb not in modalverbs:
                        subjidx = d['description'].rfind(arg0) 
                        predidx = d['description'].rfind(arg1)
                        if subjidx != -1 and predidx != -1:
                            print(d['description'])
                            # TODO: * in arg0
                            subj = re.search("(?<=ARG0: )[\w\s\'\",\.\:\$\-\(\)\*\/]*(?=])", d['description']).group(0)
                            predicate = re.search("(?<=ARG1: )[\w\s\'\",\.\:\$\-\(\)\*\/]*(?=])", d['description']).group(0)
                            sent_triplet.append((subj, verb, predicate))
            speech_triplet.append(sent_triplet)
        return speech_triplet
                
    def create_triplet(self):
        """
        Generate (subject, verb, object) triplets of a speech text
        Param:
        ========
        coref_extractor: allennlp coreferece resolution predictor
        oi_extractor: allennlp open information extractor

        Return:
        ========
        triplets: list, a list of triplet tuples except the last item being party string
        """
        oie_result=self.create_oieresult()
        triplets = self._find_triplets(oie_result)
        triplets.append(self.party)
        return triplets
    
    def create_oieresult(self):
        coref_content = utils.coref_extractor.coref_resolved(self.content)
        sents = nltk.tokenize.sent_tokenize(coref_content)
        sents = [{"sentence":s} for s in sents] #Format for oie batch predictor
        oie_result = utils.open_info_extractor.predict_batch_json(sents)
        oie_result = [i['verbs'] for i in oie_result]
        return oie_result
        

In [61]:
from KnowledgeGraph import KnowledgeGraph
for i in range(len(search)):
    question = search.iloc[i].question
    print("Question: {}".format(question))
    answer_R = search.iloc[i].answer_R
    print("Republican Result: {}".format(answer_R))
    answer_D = search.iloc[i].answer_D
    print("Democrat Result: {}".format(answer_D))
    
    res_R = pd.Series({'speech': answer_R, 'party': 'R'})
    res_D = pd.Series({'speech': answer_D, 'party': 'D'})
    for res in [res_R, res_D]:
        print(res)
        triplets=Speech(res).create_triplet()
        knowledgeGraphs[question].add_edges(triplets)

        #parsed = parse_entry(question,full_result,verb_dict,verb_list)
        #graphWriterData.append(parsed)

Question: Should abortion be illegal?
Republican Result: Mr. President, my amendment simply remedies a major defect in this bill by ensuring that it does not cover illegal abortions. Why not limit protections of this bill to lawful abortions? I cannot imagine any rationale that could be used to rabut the import of that question. This whole debate shows how extreme this bill is on the proabortion side, I think it would have a lot more support if it was not so extreme, if it did not rush to support illegal abortions and illegal abortionists, to avoid the mere risk of abusive discovery, which is about the only argument they can make. That is a risk every litigant faces, I have been in all kinds of litigation in my lifetime as an attorney. Every case involves the potential abuse of discovery. But to use that as an excuse to not knock out illegal abortions in this bill shows how extreme this bill is. S, 636 very simply protects illegal abortion. It is that simple. Why is it so difficult to 

RuntimeError: ignored

In [60]:
torch.cuda.empty_cache()


In [None]:
!rm GraphWriter-master/data/preprocessed.test.tsv

In [None]:
m=pd.DataFrame(data=graphWriterData)
m.to_csv('GraphWriter-master/data/preprocessed.test.tsv', sep='\t', index=False, header=False)

In [None]:
for knowledgeGraph in knowledgeGraphs:
  knowledgeGraph.draw()

Text Generation via GraphWriter

In [None]:
!python GraphWriter-master/generator.py -save=GraphWriter-master/partisan-responses/2.vloss-3.949412.lr-0.05