# Full Contradiction Pipeline on DoD Dataset

Running contradiction classification on all pairwise-sentences in a document.

In [1]:
import spacy
import pandas as pd
import numpy as np

In [2]:
# Read in a single organization's corpus
df = pd.read_json("data/02. Data Sets/DoD Issuances/contradictions_datasets_dod_issuances.zip", orient='records', compression='infer')
df['fulltext'] = df.text_by_page.str.join(' ')
d0 = df.iloc[0]

In [3]:
d0.url

'https://www.esd.whs.mil/Portals/54/Documents/DD/issuances/140025/1400.25-V100.pdf'

In [4]:
### SPLIT THE DOCUMENT INTO SENTENCES
from spacy.tokenizer import Tokenizer
from spacy.lang.en import English
nlp_sentencizer = English()
nlp_sentencizer.add_pipe('sentencizer')

doc = nlp_sentencizer(d0.fulltext)
sents = list(doc.sents)
sents_text = [s.text for s in sents]
len(sents_text)

70

In [72]:
### CLEAN THE SENTENCES (a little bit)

# For now just remove sentences that are super short
cutoff_characters = 30
sents_text_clean = list(filter(lambda s: len(s)>cutoff_characters, sents_text))

In [73]:
### CREATE ALL SENTENCE PAIR COMBINATIONS
import itertools
sentence_combinations = list(itertools.combinations(sents_text_clean, 2))
# sentence_permutations = list(itertools.permutations(sents_text, 2))
print(len(sentence_combinations))
# print(len(sentence_permutations))

1081


In [2]:
### LOAD CONTRADICTION MODEL

# https://github.com/facebookresearch/anli/blob/main/src/hg_api/interactive.py
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

hg_model_hub_name = "ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli"

# Will take a moment to download
tokenizer = AutoTokenizer.from_pretrained(hg_model_hub_name)
model = AutoModelForSequenceClassification.from_pretrained(hg_model_hub_name)


Some weights of the model checkpoint at ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [10]:
### DEFINE CONTRADICTION FUNCTION
def evaluate(premise, hypothesis, tokenizer=tokenizer, model=model):
    max_length = 256

    tokenized_input_seq_pair = tokenizer.encode_plus(premise, hypothesis,
                                                     max_length=max_length,
                                                     return_token_type_ids=True, truncation=True)

    input_ids = torch.Tensor(tokenized_input_seq_pair['input_ids']).long().unsqueeze(0)
    # remember bart doesn't have 'token_type_ids', remove the line below if you are using bart.
    token_type_ids = torch.Tensor(tokenized_input_seq_pair['token_type_ids']).long().unsqueeze(0)
    attention_mask = torch.Tensor(tokenized_input_seq_pair['attention_mask']).long().unsqueeze(0)

    outputs = model(input_ids,
                    attention_mask=attention_mask,
                    token_type_ids=token_type_ids,
                    labels=None)
    
    predicted_probability = torch.softmax(outputs[0], dim=1)[0].tolist()  # batch_size only one

    # Note:
    # "id2label": {
    #     "0": "entailment",
    #     "1": "neutral",
    #     "2": "contradiction"
    # },
    return predicted_probability


In [74]:
### COMPUTE CONTRADICTION PROBABILITIES FOR ALL SENTENCE PAIRS
from tqdm import tqdm
outputs = []
for pair in tqdm(sentence_combinations):
    probs = evaluate(pair[0], pair[1])
    outputs.append(probs)


100%|██████████| 1081/1081 [04:36<00:00,  3.92it/s]


In [75]:
d = np.array(outputs)

In [76]:
d.shape

(1081, 3)

In [77]:
s = sentence_combinations
np.savez('dod_d0_sentence_contradiction_scores',
         sentence_pairs=s,
         scores=d)

In [78]:
data = np.load('dod_d0_sentence_contradiction_scores.npz')

In [31]:
# entailment ; neutral ; contradiction
d

array([[9.98847127e-01, 4.20214172e-04, 7.32676825e-04],
       [1.23911398e-03, 9.97674525e-01, 1.08637731e-03],
       [6.50454138e-04, 9.99249995e-01, 9.95745286e-05],
       ...,
       [1.66345462e-01, 3.52914363e-01, 4.80740219e-01],
       [5.51789859e-03, 9.86949384e-01, 7.53262779e-03],
       [6.93250820e-03, 9.86899793e-01, 6.16762321e-03]])

In [82]:

scores = pd.DataFrame(d, columns=['entailment', 'neutral', 'contradiction'])
scores.head()

Unnamed: 0,entailment,neutral,contradiction
0,0.001239,0.997675,0.001086
1,0.00065,0.99925,0.0001
2,0.000997,0.998892,0.000111
3,0.015816,0.984017,0.000168
4,0.000598,0.999117,0.000285


In [83]:
def shw(idx):
    from textwrap import wrap
    w = 100
    s0, s1 = sentence_combinations[idx]
    print('\n'.join(wrap(s0, width=w)), '\n\n')
    print('\n'.join(wrap(s1, width=w)))


In [84]:
# Pairs with the lowest entailment scores
scores.entailment.sort_values(ascending=True)[:10]

935    0.000082
624    0.000102
35     0.000108
840    0.000108
677    0.000110
944    0.000118
846    0.000130
40     0.000131
642    0.000132
933    0.000141
Name: entailment, dtype: float64

In [85]:
shw(935)

Managers and supervisors shall, when delegated civilian personnel management authorities, carry out
civilian personnel management policies, procedures, and programs as outlined in Reference (a), this
Instruction, and other DoD civilian personnel management issuances authorized by Reference (a) and
consistent with applicable negotiated agreements. 


b. In accordance with the policy and philosophy of the Secretary of Defense to streamline and
eliminate redundancy in Government regulations, supplementation shall be kept to a minimum.


In [86]:
shw(624)

f. Consistent with workload and mission requirements, the need to create flexible work arrangements
that allow employees to better balance their work and other (e.g., family) responsibilities shall be
incorporated into the design and implementation of civilian personnel policies, procedures, and
programs at all organizational levels. 


b. Monitor the implementation and effectiveness of this Instruction and revise it as appropriate.


In [87]:
# Pairs with the hightest contradictions scores
scores.contradiction.sort_values(ascending=False)[:10]

677    0.998084
788    0.989895
323    0.982381
559    0.972565
533    0.941926
312    0.934859
579    0.928340
29     0.902869
713    0.901415
640    0.895307
Name: contradiction, dtype: float64

In [88]:
shw(677) # Looks good!

Changes that conflict with existing negotiated agreements may not be implemented until the agreement
expires or is renewed unless: (1) The parties agree otherwise; or (2) The change is required by law
or by a rule or regulation implementing law governing prohibited personnel practices. 


This Volume is effective immediately.


In [89]:
shw(788)

Procedures DoDI 1400.25-V100, December 1996 4 CONTENTS TABLE OF CONTENTS RESPONSIBILITIES ..........
....................................................................................................
........5 DEPUTY UNDER SECRETARY OF DEFENSE FOR CIVILIAN PERSONNEL POLICY (DUSD(CPP)) ..............
.........................................................................................5 HEADS OF
THE DoD COMPONENTS
...................................................................................5 SUPERVISORS AND
MANAGERS ........................................................................................5
PROCEDURES .........................................................................................
.......................................6 GENERAL ...................................................
..............................................................................6 SUPPLEMENTATION ....
...........................................................................

In [90]:
shw(323)

DoDI 1400.25-V100, December 1996 2 (3) Be issued only if necessary to comply with Executive orders,
law, or regulation, or to assist civilian personnel offices and human resource offices (CPOs/HROs),
managers, supervisors, employees, and their representatives with civilian personnel management
issues. ( 


DoDI 1400.25-V100, December 1996 6 ENCLOSURE 2 ENCLOSURE 2 PROCEDURES 1.


In [91]:
shw(559)

Existing DoD Component civilian personnel policies, procedures, and programs may continue until
superseded by law, controlling regulations, new provisions of this Instruction, or related DoD
issuance provisions. 


This Volume is effective immediately.


In [92]:
shw(533) # Looks good!

d. Civilian personnel policies, procedures, and programs as set forth in this Instruction are
binding on all the DoD Components. 


d. Waive the provisions of this Instruction or other DoD civilian personnel management issuances
authorized by Reference (a) as appropriate.


In [95]:
shw(312)
print('\n============================================\n')
shw(640) 

DoDI 1400.25-V100, December 1996 2 (3) Be issued only if necessary to comply with Executive orders,
law, or regulation, or to assist civilian personnel offices and human resource offices (CPOs/HROs),
managers, supervisors, employees, and their representatives with civilian personnel management
issues. ( 


DoDI 1400.25-V100, December 1996 3 4.


f. Consistent with workload and mission requirements, the need to create flexible work arrangements
that allow employees to better balance their work and other (e.g., family) responsibilities shall be
incorporated into the design and implementation of civilian personnel policies, procedures, and
programs at all organizational levels. 


No further delegation is authorized.


In [96]:
shw(579)
print('\n============================================\n')
shw(29)
print('\n============================================\n')
shw(713)


Existing DoD Component civilian personnel policies, procedures, and programs may continue until
superseded by law, controlling regulations, new provisions of this Instruction, or related DoD
issuance provisions. 


No further delegation is authorized.


othe he Department of Defense INSTRUCTION NUMBER 1400.25, Volume 100 December 3, 1996
Administratively reissued April 13, 2009 USD(P&R) SUBJECT: DoD Civilian Personnel Management System:
General Provisions References: (a) DoD Directive 1400.25, “DoD Civilian Personnel Management
System,” November 25, 1996 (b) Title 5, United States Code (c) Title 5, Code of Federal Regulations
1. 


DoDI 1400.25-V100, December 1996 6 ENCLOSURE 2 ENCLOSURE 2 PROCEDURES 1.


DoDI 1400.25-V100, December 1996 3 4. 


DoDI 1400.25-V100, December 1996 6 ENCLOSURE 2 ENCLOSURE 2 PROCEDURES 1.


> **Notes:**
> * Sentences that are too short should not be considered
> * Need to do much more cleaning
>   - Headers, footers, table of contents, etc
> * May want to look into a knowledge graph to make sure that sentences/paragraphs are talking about the same topic...