# ATiCS Sentence Inference Demo

This notebook shows how to load a pretrained sentence encoder and evaluate it on custom Natural Language Inference (NLI) examples.  
Given a premise and a hypothesis, the model predicts one of three labels: **entailment**, **neutral**, or **contradiction**.

Before running the notebook:
- Activate the `atics` environment
- Download the model checkpoint and `vocab.pkl` from the link in the `README.md`
- Make sure the correct GloVe embeddings (e.g. `glove.840B.300d.txt`) are available in the `glove/` directory

See the [README.md](./README.md). for setup instructions.

In [1]:
# Imports
import torch
import torch.nn.functional as F
from utils.dataset import tokenizer, build_vocab, load_glove_embeddings
from models import get_model
import numpy as np

# Label mapping
ID2LABEL = {0: 'entailment', 1: 'neutral', 2: 'contradiction'}

In [4]:
import pickle

with open("checkpoints/vocab.pkl", "rb") as f:
    vocab = pickle.load(f)

glove_path = "./glove/glove.6B.300d.txt"
embedding_matrix = load_glove_embeddings(glove_path, vocab)

Found 28250 vectors out of 36704 words.


In [7]:
# Model setup
class Args:
    hidden_dim = 512
    num_classes = 3
    max_len = 50
    embedding_dim = 300

args = Args()
model = get_model("bilstm_max", embedding_matrix, args)
model.load_state_dict(torch.load("checkpoints/bilstm_max_best.pt", map_location=torch.device('cpu')))
model.eval()

  model.load_state_dict(torch.load("checkpoints/bilstm_max_best.pt", map_location=torch.device('cpu')))


BiLSTMMaxPoolClassifier(
  (embedding): Embedding(36704, 300)
  (bilstm): LSTM(300, 512, batch_first=True, bidirectional=True)
  (mlp): Sequential(
    (0): Linear(in_features=4096, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=3, bias=True)
  )
)

## Inference Function

In [8]:
def encode_sentence(sentence, vocab, max_len=50):
    tokens = tokenizer.tokenize(sentence.lower())[:max_len]
    ids = [vocab.get(tok, vocab["<unk>"]) for tok in tokens]
    ids += [vocab["<pad>"]] * (max_len - len(ids))
    return torch.tensor(ids).unsqueeze(0), torch.tensor([min(len(tokens), max_len)])

def predict(premise, hypothesis):
    x1, len1 = encode_sentence(premise, vocab)
    x2, len2 = encode_sentence(hypothesis, vocab)

    with torch.no_grad():
        logits = model(x1, len1, x2, len2)
        probs = F.softmax(logits, dim=1).squeeze()
        pred = torch.argmax(probs).item()

    print(f"Premise   : {premise}")
    print(f"Hypothesis: {hypothesis}")
    print(f"Prediction: {ID2LABEL[pred]}")
    print(f"Confidence: {probs[pred]:.4f}")

## Example Predictions

### Correct Predictions
1. The model likely picked up on strong semantic opposition between *having fun* and *fighting*. It may also have learned patterns of contradiction when subjects are similar but actions conflict. Thus it leads to a very high confidence of the contradiction

2. The model likely recognizes that *playing a guitar* is a specific instance of *making music*. SNLI contains many paraphrastic examples with *person*/*man* substitutions and verb generalizations, so this pattern is familiar. There's strong lexical and semantic overlap, and no contradiction or neutral information.


In [38]:
predict("The sisters are having fun", "The two girls are fighting") # Ground truth: Contradiction
predict("A woman is playing a bass.", "A person is making music.")  # Ground truth: Entailment

Premise   : The sisters are having fun
Hypothesis: The two girls are fighting
Prediction: contradiction
Confidence: 0.9999
Premise   : A woman is playing a bass.
Hypothesis: A person is making music.
Prediction: entailment
Confidence: 0.9468


### Wrong Predictions

1. The model appears overly sensitive to negation ("nobody") and treats *sitting in the sun* versus *sitting in the shade* as a contradiction. However, the presence of negation in this context doesn’t necessarily imply a contradiction. Both statements can be true at the same time, so the relationship should be neutral.

2. The model is likely misled by the negation ("no cat") and interprets it as contradicting the mention of a dog. It may incorrectly treat cat and dog as opposites or mutually exclusive. However, the two sentences are about different subjects, and there's no real semantic conflict. This suggests the model lacks real-world knowledge and struggles with distinguishing unrelated facts from contradictions.


In [43]:
predict("Two men are sitting in the sun.", "Nobody is sitting in the shade.")  # Ground truth: Neutral
predict("A man is walking a dog.", "No cat is outside.")  # Ground truth: Neutral

Premise   : Two men are sitting in the sun.
Hypothesis: Nobody is sitting in the shade.
Prediction: contradiction
Confidence: 0.9747
Premise   : A man is walking a dog.
Hypothesis: No cat is outside.
Prediction: contradiction
Confidence: 0.9998


## Try Your Own

In [47]:
# Enter your own sentences here
premise = "My girlfriend is hungry."
hypothesis = "The miss wants to eat."
predict(premise, hypothesis)

Premise   : My girlfriend is hungry.
Hypothesis: The miss wants to eat.
Prediction: neutral
Confidence: 0.8370
