# Assignment 5: Model Inference
## Using the Trained Text Classification Model for Predictions

This notebook demonstrates how to load and use the trained model to make predictions on new arXiv paper abstracts.

## 1. Setup and Load Model

In [1]:
import json
import numpy as np
from pathlib import Path
import pickle
from sentence_transformers import SentenceTransformer
import warnings
warnings.filterwarnings('ignore')

print("✓ All imports successful!")

✓ All imports successful!


In [2]:
# Load the embedding model
print("Loading SentenceTransformer model...")
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print("✓ Embedding model loaded")

# Load the trained classifier
model_dir = Path('./models')
classifier_path = model_dir / 'classifier.pkl'

with open(classifier_path, 'rb') as f:
    classifier = pickle.load(f)

print(f"✓ Classifier loaded from {classifier_path}")

# Load configuration
config_path = model_dir / 'config.json'
with open(config_path, 'r') as f:
    config = json.load(f)

print(f"\nModel Configuration:")
for key, value in config.items():
    print(f"  {key}: {value}")

Loading SentenceTransformer model...
✓ Embedding model loaded
✓ Classifier loaded from models/classifier.pkl

Model Configuration:
  embedding_model: all-MiniLM-L6-v2
  embedding_dimension: 384
  classifier_type: LogisticRegression
  training_samples: 210
  validation_samples: 45
  test_samples: 45
  class_weight: balanced


## 2. Define Inference Function

In [3]:
def predict_relevance(abstract, title=None, verbose=True):
    embedding = embedding_model.encode([abstract])
    prediction = classifier.predict(embedding)[0]
    probability = classifier.predict_proba(embedding)[0]
    confidence = probability[prediction]
    
    result = {
        'prediction': int(prediction),
        'relevance_label': 'Relevant' if prediction == 1 else 'Not Relevant',
        'confidence': float(confidence),
        'probability_not_relevant': float(probability[0]),
        'probability_relevant': float(probability[1])
    }
    
    if verbose:
        print("\n" + "="*70)
        if title:
            print(f"Title: {title}")
            print("-"*70)
        print(f"Abstract (first 300 chars): {abstract[:300]}...")
        print("="*70)
        print(f"Prediction: {result['relevance_label']}")
        print(f"Confidence: {result['confidence']:.4f}")
        print(f"  - P(Not Relevant): {result['probability_not_relevant']:.4f}")
        print(f"  - P(Relevant): {result['probability_relevant']:.4f}")
        print("="*70)
    
    return result

print("✓ Inference function defined")

✓ Inference function defined


## 3. Load Sample Data from Dataset

In [4]:
# Load the original dataset
data_path = Path('./data/data.json')
with open(data_path, 'r') as f:
    data = json.load(f)

print(f"✓ Loaded {len(data)} samples from dataset")

# Find examples
relevant_papers = [item for item in data if item['relevance'] == 1]
not_relevant_papers = [item for item in data if item['relevance'] == 0]

print(f"\nDataset composition:")
print(f"  Relevant papers: {len(relevant_papers)}")
print(f"  Not relevant papers: {len(not_relevant_papers)}")

✓ Loaded 300 samples from dataset

Dataset composition:
  Relevant papers: 53
  Not relevant papers: 247


## 4. Inference Examples

### Example 1: Relevant Paper (from dataset)

In [5]:
# Example 1: A paper relevant to NLP/AI research
example1 = relevant_papers[0]

result1 = predict_relevance(
    abstract=example1['abstract'],
    title=example1['title'],
    verbose=True
)


Title: Code Researcher: Deep Research Agent for Large Systems Code and Commit History
----------------------------------------------------------------------
Abstract (first 300 chars): Large Language Model (LLM)-based coding agents have shown promising results on coding benchmarks, but their effectiveness on systems code remains underexplored. Due to the size and complexities of systems code, making changes to a systems codebase is a daunting task, even for humans. It requires res...
Prediction: Relevant
Confidence: 0.6556
  - P(Not Relevant): 0.3444
  - P(Relevant): 0.6556


### Example 2: Not Relevant Paper (from dataset)

In [6]:
# Example 2: A paper NOT relevant to research interests
example2 = not_relevant_papers[0]

result2 = predict_relevance(
    abstract=example2['abstract'],
    title=example2['title'],
    verbose=True
)


Title: Chronoamperometry with Room-Temperature Ionic Liquids: Sub-Second Inference Techniques
----------------------------------------------------------------------
Abstract (first 300 chars): Chronoamperometry (CA) is a fundamental electrochemical technique used for quantifying redox-active species. However, in room-temperature ionic liquids (RTILs), the high viscosity and slow mass transport often lead to extended measurement durations. This paper presents a novel mathematical regressio...
Prediction: Not Relevant
Confidence: 0.7688
  - P(Not Relevant): 0.7688
  - P(Relevant): 0.2312


### Example 3: Another Relevant Paper (from dataset)

In [7]:
# Example 3: Another relevant paper
example3 = relevant_papers[1] if len(relevant_papers) > 1 else relevant_papers[0]

result3 = predict_relevance(
    abstract=example3['abstract'],
    title=example3['title'],
    verbose=True
)


Title: FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion
----------------------------------------------------------------------
Abstract (first 300 chars): High-quality, large-scale audio captioning is crucial for advancing audio understanding, yet current automated methods often generate captions that lack fine-grained detail and contextual accuracy, primarily due to their reliance on limited unimodal or superficial multimodal information. Drawing ins...
Prediction: Relevant
Confidence: 0.6629
  - P(Not Relevant): 0.3371
  - P(Relevant): 0.6629


### Example 4: Custom Abstract - NLP Research

In [8]:
# Example 4: Custom abstract about NLP and transformers
custom_abstract_nlp = "This paper proposes a novel approach to improving language model efficiency by introducing a new attention mechanism that reduces computational complexity. We evaluate our method on standard benchmarks including GLUE and SQuAD, demonstrating consistent improvements over baselines. The proposed technique can be easily integrated into existing transformer architectures."

result4 = predict_relevance(
    abstract=custom_abstract_nlp,
    title="Efficient Attention Mechanisms for Large Language Models",
    verbose=True
)


Title: Efficient Attention Mechanisms for Large Language Models
----------------------------------------------------------------------
Abstract (first 300 chars): This paper proposes a novel approach to improving language model efficiency by introducing a new attention mechanism that reduces computational complexity. We evaluate our method on standard benchmarks including GLUE and SQuAD, demonstrating consistent improvements over baselines. The proposed techn...
Prediction: Relevant
Confidence: 0.6291
  - P(Not Relevant): 0.3709
  - P(Relevant): 0.6291


### Example 5: Custom Abstract - Different Domain

In [9]:
# Example 5: Custom abstract about chemistry (likely not relevant to ML research)
custom_abstract_chem = "We investigate the catalytic properties of transition metal complexes in organic synthesis. Using X-ray crystallography and spectroscopic analysis, we characterize novel intermediates formed during the reaction mechanism. Our findings suggest new pathways for selective C-C bond formation in pharmaceutical applications."

result5 = predict_relevance(
    abstract=custom_abstract_chem,
    title="Catalytic Properties of Transition Metal Complexes",
    verbose=True
)


Title: Catalytic Properties of Transition Metal Complexes
----------------------------------------------------------------------
Abstract (first 300 chars): We investigate the catalytic properties of transition metal complexes in organic synthesis. Using X-ray crystallography and spectroscopic analysis, we characterize novel intermediates formed during the reaction mechanism. Our findings suggest new pathways for selective C-C bond formation in pharmace...
Prediction: Not Relevant
Confidence: 0.8112
  - P(Not Relevant): 0.8112
  - P(Relevant): 0.1888


## 5. Batch Inference

In [10]:
# Perform batch inference on the first 10 papers
print("\nBatch Inference on First 10 Papers from Dataset:")
print("="*70)

import pandas as pd

batch_results = []
for i, paper in enumerate(data[:10]):
    result = predict_relevance(
        abstract=paper['abstract'],
        title=paper['title'],
        verbose=False
    )
    batch_results.append({
        'index': i,
        'title': paper['title'][:50] + '...',
        'prediction': result['relevance_label'],
        'confidence': f"{result['confidence']:.4f}",
        'actual': 'Relevant' if paper['relevance'] == 1 else 'Not Relevant',
        'correct': (result['prediction'] == paper['relevance'])
    })

# Print batch results
batch_df = pd.DataFrame(batch_results)
print(batch_df.to_string(index=False))

# Calculate accuracy on this batch
batch_accuracy = batch_df['correct'].sum() / len(batch_df)
print(f"\nBatch Accuracy: {batch_accuracy:.2%}")


Batch Inference on First 10 Papers from Dataset:
 index                                                 title   prediction confidence       actual  correct
     0 Chronoamperometry with Room-Temperature Ionic Liqu... Not Relevant     0.7688 Not Relevant     True
     1 Code Researcher: Deep Research Agent for Large Sys...     Relevant     0.6556     Relevant     True
     2 FusionAudio-1.2M: Towards Fine-grained Audio Capti...     Relevant     0.6629     Relevant     True
     3 Prithvi-EO-2.0: A Versatile Multi-Temporal Foundat... Not Relevant     0.7529 Not Relevant     True
     4 Exploiting Dialect Identification in Automatic Dia...     Relevant     0.5785 Not Relevant    False
     5 SPICED: Syntactical Bug and Trojan Pattern Identif... Not Relevant     0.6219 Not Relevant     True
     6 On the Effectiveness of LLMs for Manual Test Verif...     Relevant     0.7039 Not Relevant    False
     7            Efficient Curvature-aware Graph Network... Not Relevant     0.9025 Not Relev

## 6. Results Summary

In [11]:
print("\n" + "="*70)
print("INFERENCE RESULTS SUMMARY")
print("="*70)

print(f"\nExample 1 - Relevant Paper:")
print(f"  Prediction: {result1['relevance_label']} (confidence: {result1['confidence']:.4f})")

print(f"\nExample 2 - Not Relevant Paper:")
print(f"  Prediction: {result2['relevance_label']} (confidence: {result2['confidence']:.4f})")

print(f"\nExample 3 - Another Relevant Paper:")
print(f"  Prediction: {result3['relevance_label']} (confidence: {result3['confidence']:.4f})")

print(f"\nExample 4 - Custom NLP Abstract:")
print(f"  Prediction: {result4['relevance_label']} (confidence: {result4['confidence']:.4f})")

print(f"\nExample 5 - Custom Chemistry Abstract:")
print(f"  Prediction: {result5['relevance_label']} (confidence: {result5['confidence']:.4f})")

print(f"\nBatch Results (First 10 Papers):")
print(f"  Batch Accuracy: {batch_accuracy:.2%}")

print("\n" + "="*70)
print("Model successfully demonstrates inference capabilities!")
print("="*70)


INFERENCE RESULTS SUMMARY

Example 1 - Relevant Paper:
  Prediction: Relevant (confidence: 0.6556)

Example 2 - Not Relevant Paper:
  Prediction: Not Relevant (confidence: 0.7688)

Example 3 - Another Relevant Paper:
  Prediction: Relevant (confidence: 0.6629)

Example 4 - Custom NLP Abstract:
  Prediction: Relevant (confidence: 0.6291)

Example 5 - Custom Chemistry Abstract:
  Prediction: Not Relevant (confidence: 0.8112)

Batch Results (First 10 Papers):
  Batch Accuracy: 80.00%

Model successfully demonstrates inference capabilities!
