# Sentence Transformers (Sentence-BERT)
## Objective

Generate dense, semantically meaningful sentence and document embeddings using Sentence-BERT (SBERT), enabling:

- Semantic similarity

- Clustering and retrieval

- High-quality downstream ML features

This notebook treats embeddings as general-purpose semantic signals.

## Why Sentence Transformers Matter

Earlier methods:

- BoW / TF-IDF → sparse, lexical

- Word2Vec / GloVe → static, word-level

- Doc2Vec → weak contextual modeling

Sentence Transformers:

- Encode full-sentence context

- Handle polysemy and word order

- Work well out-of-the-box

## Imports and Setup

In [2]:
import numpy as np
import pandas as pd

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity


Models are downloaded automatically on first use.

# Load a Pre-Trained SBERT Model
## Recommended Baseline

In [5]:
model = SentenceTransformer("all-MiniLM-L6-v2")

## Why this model?

- Strong semantic performance

- Lightweight (~384 dims)

- Fast inference

- Good production default

# Example Sentences

In [8]:
sentences = [
    "This model works very well",
    "The system shows excellent performance",
    "Terrible results and poor accuracy",
    "The predictions are bad and unreliable"
]

# Generate Sentence Embeddings

In [11]:
embeddings = model.encode(
    sentences,
    convert_to_numpy=True,
    normalize_embeddings=True
)

embeddings.shape


(4, 384)

# Semantic Similarity

In [14]:
cosine_similarity(embeddings)

array([[0.9999999 , 0.5054958 , 0.2634276 , 0.13862145],
       [0.5054958 , 1.        , 0.40614137, 0.2123204 ],
       [0.2634276 , 0.40614137, 0.9999996 , 0.47412014],
       [0.13862145, 0.2123204 , 0.47412014, 0.99999994]], dtype=float32)

## Interpret Similarity Matrix

- High similarity → semantic closeness

- Robust to synonym choice

- Context-aware

## Sentence Retrieval Example

In [17]:
query = "poor model performance"

query_embedding = model.encode(
    query,
    convert_to_numpy=True,
    normalize_embeddings=True
)

scores = cosine_similarity(
    query_embedding.reshape(1, -1),
    embeddings
)[0]

results = pd.DataFrame({
    "sentence": sentences,
    "similarity": scores
}).sort_values("similarity", ascending=False)

results


Unnamed: 0,sentence,similarity
2,Terrible results and poor accuracy,0.508541
1,The system shows excellent performance,0.438835
0,This model works very well,0.402197
3,The predictions are bad and unreliable,0.329885


# Document-Level Embeddings

SBERT can encode:

- Sentences
- Paragraphs
- Short documents

In [21]:
documents = [
    "Clean text improves machine learning models.",
    "Tree-based models often struggle with sparse NLP features.",
    "Transformers capture semantic meaning effectively."
]

doc_embeddings = model.encode(
    documents,
    convert_to_numpy=True,
    normalize_embeddings=True
)


## Clustering Example

In [24]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=2, random_state=42)
clusters = kmeans.fit_predict(doc_embeddings)

pd.DataFrame({
    "document": documents,
    "cluster": clusters
})


Unnamed: 0,document,cluster
0,Clean text improves machine learning models.,0
1,Tree-based models often struggle with sparse N...,0
2,Transformers capture semantic meaning effectiv...,1


# Using SBERT Embeddings for Classification
## Feature Matrix

In [27]:
X = embeddings
y = np.array([1, 1, 0, 0])

## Train a Classifier

In [30]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()
clf.fit(X, y)

clf.score(X, y)


1.0

# Why Normalize Embeddings?

- Cosine similarity assumes unit vectors
- Stabilizes downstream ML
- Improves clustering behavior
- 
# Performance and Cost Considerations
| Aspect            | SBERT           |
| ----------------- | --------------- |
| Quality           | Very high       |
| Inference speed   | Medium          |
| Memory            | Moderate        |
| Training required | None (baseline) |


# When Sentence Transformers Are the Best Choice

- `[ok] -` Semantic similarity
- `[ok] -` Search / retrieval
- `[ok] -` Clustering
- `[ok] -` Low-label regimes
- `[ok] -` Production-ready NLP

# When They May Be Overkill

- `[neg] -` Simple keyword tasks
- `[neg] -` Extreme latency constraints
- `[neg] -` Highly domain-specific jargon (without fine-tuning)

- `[neg] -` Common Mistakes

- `[neg] -` Treating SBERT as a classifier
- `[neg] -` Forgetting normalization
- `[neg] -` Using huge models unnecessarily
- `[neg] -` Fine-tuning without enough data

# Key Takeaways

- Sentence Transformers are the modern default
- They produce high-quality semantic embeddings
- Minimal preprocessing required
- Excellent balance of performance and usability