# Title:

Finding Similar Questions in MySQL Database using Sentence Transformer


## Introduction:
This notebook demonstrates how to use the Sentence Transformer model to find similar questions in a MySQL database. 
The Sentence Transformer model will encode the questions into embeddings, and then the cosine similarity will be used to measure the similarity between pairs of questions.

## Purpose:
Similarity Check for Questions:

1. To detect similar or duplicate questions in question banks or assessments

2. Ensuring fairness

3. Maintaining question diversity, and 

4. Enhancing the quality of exams.

## For Example:
Let's consider a real-world example in the context of question generation and exam assessment:

Imagine you are an educator or an assessment platform that creates exams with multiple-choice questions. Over time, as you create new exams or update existing ones, there is a chance of inadvertently generating similar questions or using questions from previous exams with slight modifications.

Here's how the similarity check can be valuable in this scenario:

1. Duplicate Question Detection:
Identify pairs of questions that have similar content or are nearly identical.

2. Quality Assurance:
To review questions that might be too similar to each other, helping you maintain the quality and diversity of questions in an exam. This ensures that the assessment evaluates different aspects of students' knowledge and understanding.



3.  Question Pool Maintenance: 
As you accumulate a large question pool over time, the similarity check helps you organize and manage the questions efficiently. You can group similar questions together, making it easier to select questions for future exams.

 This approach helps educators and assessment platforms continuously improve their question banks and assessments over time.

## Setup
First, we'll import the necessary libraries and load the Sentence Transformer model.


In [1]:
import mysql.connector
from sentence_transformers import SentenceTransformer, util



In [2]:
# Load pre-trained sentence transformer model
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

## Data Connection
To begin, let's connect to the MySQL database and fetch the questions.

In [3]:
try:
    # Connect to the MySQL database
    connection = mysql.connector.connect(
        host="localhost",
        user="root",
        password="Shweta9967013742",
        database="test"
    )

    if connection.is_connected():
        print("Connected to the MySQL database!")

    # Fetch all questions and their question numbers from the database
    cursor = connection.cursor()
    cursor.execute("SELECT `Question No`, `Question` FROM `mcq_questions`")
    questions = cursor.fetchall()

except mysql.connector.Error as error:
    print("Error while connecting to the MySQL database:", error)


Connected to the MySQL database!


## Similarity Check
Now, let's perform the similarity check between the questions and store the similar question indices.

In [11]:
# Perform similarity check and store similar question pairs with similarity scores
similarity_threshold = 0.8
similar_question_pairs = []
for i, (qid1, q1) in enumerate(questions):
    for j, (qid2, q2) in enumerate(questions[i+1:], i+1):
        embeddings = model.encode([q1, q2], convert_to_tensor=True)
        similarity_score = util.pytorch_cos_sim(embeddings[0], embeddings[1])
        if similarity_score > similarity_threshold:
            similar_question_pairs.append((qid1, qid2, similarity_score.item()))

## Results
Let's print out the similar question pairs (question numbers) found.

In [12]:
if similar_question_pairs:
    print("Similar question pairs with similarity percentage:")
    for qid1, qid2, similarity_score in similar_question_pairs:
        similarity_percentage = similarity_score * 100
        print(f"Question {qid1} and Question {qid2}: {similarity_percentage:.2f}% similar")
else:
    print("No similar questions found.")


Similar question pairs with similarity percentage:
Question 1 and Question 1: 100.00% similar
Question 2 and Question 2: 100.00% similar


## Conclusion
In this notebook, we successfully connected to the MySQL database, fetched questions, and used the Sentence Transformer model to find similar question pairs. The Sentence Transformer model's embeddings and cosine similarity allowed us to identify questions with similar semantic meanings.

## References
Sentence Transformers Documentation: Link-https://www.sbert.net/



MySQL Connector/Python Documentation: Link-https://dev.mysql.com/doc/connector-python/en/