# Sentence Transformers - Basic Example 🚀

A simple tutorial to get started with sentence transformers in Jupyter Lab.

## What we'll do:
1. Install and import libraries
2. Load a pre-trained model
3. Encode sentences into embeddings
4. Calculate similarity between sentences
5. See the results!


## 1. Setup and Imports


In [1]:
# Install sentence-transformers if not already installed
# !pip install sentence-transformers

from sentence_transformers import SentenceTransformer, util
import numpy as np

print("✅ Libraries imported successfully!")


✅ Libraries imported successfully!


## 2. Load Pre-trained Model


In [2]:
# Load a small, fast model
model = SentenceTransformer('all-MiniLM-L6-v2')
print(f"✅ Model loaded! Embedding dimension: {model.get_sentence_embedding_dimension()}")


✅ Model loaded! Embedding dimension: 384


## 3. Basic Example - Encode Sentences


In [3]:
# Example sentences
sentences = [
    "I love programming",
    "I enjoy coding", 
    "The weather is nice today",
    "It's a beautiful sunny day",
    "Python is great for machine learning"
]

# Convert sentences to embeddings (vectors)
embeddings = model.encode(sentences)

print(f"📊 Encoded {len(sentences)} sentences")
print(f"🔢 Each sentence becomes a vector of {len(embeddings[0])} numbers")
print(f"\n📝 Example - First sentence: '{sentences[0]}'")
print(f"🎯 First 5 numbers of its embedding: {embeddings[0][:5]}")


📊 Encoded 5 sentences
🔢 Each sentence becomes a vector of 384 numbers

📝 Example - First sentence: 'I love programming'
🎯 First 5 numbers of its embedding: [-0.03617874 -0.01277374  0.00300631 -0.01690345  0.00948433]


## 4. Calculate Sentence Similarities


In [4]:
# Calculate similarity between all sentences
similarity_matrix = util.cos_sim(embeddings, embeddings)

print("🔍 Similarity Results (1.0 = identical, 0.0 = completely different):\n")

# Show similarities between sentence pairs
for i in range(len(sentences)):
    for j in range(i+1, len(sentences)):
        similarity = similarity_matrix[i][j].item()
        print(f"📊 Similarity: {similarity:.3f}")
        print(f"   📝 '{sentences[i]}'")
        print(f"   📝 '{sentences[j]}'\n")


🔍 Similarity Results (1.0 = identical, 0.0 = completely different):

📊 Similarity: 0.817
   📝 'I love programming'
   📝 'I enjoy coding'

📊 Similarity: 0.109
   📝 'I love programming'
   📝 'The weather is nice today'

📊 Similarity: 0.206
   📝 'I love programming'
   📝 'It's a beautiful sunny day'

📊 Similarity: 0.417
   📝 'I love programming'
   📝 'Python is great for machine learning'

📊 Similarity: 0.110
   📝 'I enjoy coding'
   📝 'The weather is nice today'

📊 Similarity: 0.168
   📝 'I enjoy coding'
   📝 'It's a beautiful sunny day'

📊 Similarity: 0.295
   📝 'I enjoy coding'
   📝 'Python is great for machine learning'

📊 Similarity: 0.670
   📝 'The weather is nice today'
   📝 'It's a beautiful sunny day'

📊 Similarity: 0.072
   📝 'The weather is nice today'
   📝 'Python is great for machine learning'

📊 Similarity: 0.079
   📝 'It's a beautiful sunny day'
   📝 'Python is great for machine learning'



## 5. Try Your Own Example!


In [5]:
# Try comparing your own sentences! Change these:
sentence1 = "I like artificial intelligence"
sentence2 = "Machine learning is fascinating"

# Encode both sentences
emb1 = model.encode([sentence1])
emb2 = model.encode([sentence2])

# Calculate similarity
similarity = util.cos_sim(emb1, emb2)[0][0].item()

print(f"🎯 Comparing your sentences:")
print(f"📝 Sentence 1: '{sentence1}'")
print(f"📝 Sentence 2: '{sentence2}'")
print(f"📊 Similarity: {similarity:.3f}")

# Interpretation
if similarity > 0.5:
    print("✅ These sentences are quite similar!")
elif similarity > 0.3:
    print("🤔 These sentences have some similarity")
else:
    print("❌ These sentences are quite different")


🎯 Comparing your sentences:
📝 Sentence 1: 'I like artificial intelligence'
📝 Sentence 2: 'Machine learning is fascinating'
📊 Similarity: 0.546
✅ These sentences are quite similar!


## 🎉 Congratulations!

You've successfully:
- ✅ Loaded a sentence transformer model
- ✅ Converted sentences into numerical embeddings  
- ✅ Calculated semantic similarity between sentences
- ✅ Seen how the model understands meaning, not just words!

**Next steps you could try:**
- Change the sentences in the examples above
- Try different languages
- Use different models like `all-mpnet-base-v2` for higher quality
- Build a simple search system
