# Natural Language Inference- MultiNLI, QuestionNLI, Sentiment Analysis, Semantic Similarity

### Written by: Rodrigo Escandon

# Executive Summary

Natural Language Inference models are being displayed to showcase the ability to determine if sentences are similar, if questions are being answered, if sentiments are positive or if texts are located withing a written paragraph.

## Model Performance

The models that are being used are transformer models from Huggingface, NLTK and SpaCy.

In [1]:
from transformers import pipeline
import spacy
from spacy.matcher import Matcher
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [2]:
#Uploading language library (python -m spacy download en_core_web_sm)
nlp = spacy.load("en_core_web_sm")

## Multi-NLI - Huggingface

In [2]:
#Providing type of task and model
#Bringing in a Multi Genre NLI model
classifier_m= pipeline("text-classification", model = "roberta-large-mnli")

Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading pytorch_model.bin:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


[{'label': 'ENTAILMENT', 'score': 0.9883741140365601}]

In [10]:
#Evaluating to see if both sentences are similar (Entailment), different (Contradiction) or neutral (Neutral)
print(classifier_m("Today is a hot day. Today's temperature is hot."))
print(classifier_m("I am very excited for tomorrow's activities. I don't have any plans for tomorrow."))

[{'label': 'ENTAILMENT', 'score': 0.9913302659988403}]
[{'label': 'CONTRADICTION', 'score': 0.9994875192642212}]


## Question-NLI - Huggingface

In [11]:
#Providing type of task and model
#Bringing in a Question NLI model
classifier_q= pipeline("text-classification", model = "cross-encoder/qnli-electra-base")

Downloading (…)lve/main/config.json:   0%|          | 0.00/771 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/268 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [14]:
#Evaluating to see if question is answered by second sentence
#If question is answered score is around 1 if not is around 0
print(classifier_q("Did it rain last night? Yes, last night was raining."))
print(classifier_q("What time is it? I am tired."))

[{'label': 'LABEL_0', 'score': 0.9925422668457031}]
[{'label': 'LABEL_0', 'score': 0.000511180202011019}]


## Sentiment Analysis- Hugginface and NLTK

In [15]:
#Providing type of task
#Bringing in a Sentiment Analysis model from Huggingface
classifier_s = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [17]:
#Evaluating to see if sentence is positive, negative or neutral
print(classifier_s("I had such a great day!"))
print(classifier_s("I don't feel well."))

[{'label': 'POSITIVE', 'score': 0.9998375177383423}]
[{'label': 'NEGATIVE', 'score': 0.9997040629386902}]


In [26]:
#Bringing in Sentiment Analisys model from NLTK
#Might have to download (nltk.download('vader_lexicon'))
sent_nltk=SentimentIntensityAnalyzer()

In [29]:
#Calculating scores. Compound is the overall score.
#Anything above 0 is assumed to be positive
print(sent_nltk.polarity_scores("I had such a great day!"))
print(sent_nltk.polarity_scores("I don't feel well."))

{'neg': 0.0, 'neu': 0.406, 'pos': 0.594, 'compound': 0.6588}
{'neg': 0.476, 'neu': 0.524, 'pos': 0.0, 'compound': -0.2057}


## Semantic Similarity - SpaCy

In [10]:
#Initialize the matcher with a share vocabulary
matcher=Matcher(nlp.vocab)

In [11]:
#Building different patterns for word
#mulberrybush
pattern1=[{"LOWER":"mulberrybush"}]
pattern2=[{"LOWER": "mulberry"}, {"IS_PUNCT": True,'OP':'*'}, {"LOWER": "bush"}]
pattern3=[{"LOWER": "mulberry"}, {"IS_UPPER": True,'OP':'*'}, {"LOWER": "bush"}]
pattern4=[{"ORTH": "mulberry_bush"}]

In [12]:
#Add those different patterns as match rules to the matcher
matcher.add('mulberry bush',[pattern1,pattern2,pattern3,pattern4])

In [13]:
#Created text that will be used
doc=nlp('''Here we go round the mulberrybush,
The mulberry_bush,
The mulberry-bush.
Here we go round the MULBERRY BUSH
On a cold and frosty morning.''')

In [14]:
found_matches=matcher(doc)

In [15]:
print(found_matches)

[(15321705705911178809, 5, 6), (15321705705911178809, 9, 10), (15321705705911178809, 13, 16), (15321705705911178809, 23, 25)]


In [16]:
#Displaying matches and adding string after
for match_id,start,end in found_matches:
    string_id=nlp.vocab.strings[match_id]
    span=doc[start:end+1]
    print(match_id,string_id,start,end,span)

15321705705911178809 mulberry bush 5 6 mulberrybush,
15321705705911178809 mulberry bush 9 10 mulberry_bush,
15321705705911178809 mulberry bush 13 16 mulberry-bush.
15321705705911178809 mulberry bush 23 25 MULBERRY BUSH

