# 🔥 Sentence Similarity Checker

[notebook link](https://colab.research.google.com/drive/1tYrjzeQtzxR0BjTmVBSfmh5L6pIN4NWK#scrollTo=gpSR8U-m3Zg5)
Type two sentences, and this tool tells you how close their meanings are — like a vibe check for your text.

It turns your sentences into math magic (embeddings), then scores how alike they are from 0% (nah, different vibes) to 100% (same energy).

You’ll get a color-coded score with:

* 🟢 Green = They’re basically twins
* 🟡 Yellow = Kinda similar, meh
* 🔴 Red = Nope, totally different

Try it out below — don’t stress, just two sentences at a time!


In [8]:
# Install the model library
!pip install -U sentence-transformers




In [9]:
from sentence_transformers import SentenceTransformer, util

# Load a pre-trained model for sentence embeddings
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [14]:
# Step 3: Clear Explanation + Input 2 sentences ONLY

print("""
👋 Welcome to Sentence Similarity Checker!

You will enter **exactly 2 sentences**.

The program will compute how similar their meanings are using a pretrained BERT model.

Similarity score is between 0% (completely different) and 100% (identical).

Based on the score, you'll see a colored icon:

✅ Green  = Very similar (score > 75%)
⚠️ Yellow = Somewhat similar (40% - 75%)
❌ Red    = Not similar (score < 40%)

Let's get started!
""")

# Get exactly 2 sentences from user
sentence1 = input("Enter Sentence 1: ")
sentence2 = input("Enter Sentence 2: ")



👋 Welcome to Sentence Similarity Checker!

You will enter **exactly 2 sentences**.

The program will compute how similar their meanings are using a pretrained BERT model.

Similarity score is between 0% (completely different) and 100% (identical).

Based on the score, you'll see a colored icon:

✅ Green  = Very similar (score > 75%)
⚠️ Yellow = Somewhat similar (40% - 75%)
❌ Red    = Not similar (score < 40%)

Let's get started!

Enter Sentence 1: love
Enter Sentence 2: heat


In [17]:
# Step 4: Embed and Calculate Cosine Similarity

emb1 = model.encode([sentence1])
emb2 = model.encode([sentence2])

cos_val = util.cos_sim(emb1, emb2)[0][0].item() * 100  # extract scalar and convert to %


In [18]:
# Step 5: Show colored output based on thresholds

# ANSI escape codes for colors (works in Colab output)
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
RESET = '\033[0m'

# Decide icon + color by score
if cos_val > 75:
    icon = '✅'
    color = GREEN
elif cos_val >= 40:
    icon = '⚠️'
    color = YELLOW
else:
    icon = '❌'
    color = RED

# Print result with colors and icon
print(f"\nSimilarity Score: {color}{cos_val:.2f}% {icon}{RESET}")
print(f"Sentence 1: {sentence1}")
print(f"Sentence 2: {sentence2}")



Similarity Score: [93m42.48% ⚠️[0m
Sentence 1: love
Sentence 2: heat


## 🧠 Use Case Ideas

* 🔍 **Plagiarism Checker**: Compare student answers.
* 💬 **Duplicate Detection**: Check if two forum posts are too similar.
* 🤖 **AI-powered FAQ matcher**: Match user queries to stored questions.
