# Overview

### Quora Question Pairs

It is a large corpus of different questions and is used to detect similar/repeating questions by understanding the semantic meaning of them

### Qdrant

Qdrant is an Open-Source Vector Database and Vector Search Engine written in Rust. It provides fast and scalable vector similarity search service.

### Abstract

This notebook implements a search engine using the `Quora Duplicate Questions` dataset and the `Qdrant library`. It aims to identify similar questions based on user input queries.

### Methodology

Here's a detailed overview of implementation:

- Load the Quora Dataset and apply preprocessing steps.
- Vectorize the textual data and store in a vector space, where questions entered by users can be vectorized and compared in the same vector space - All these steps are covered by internal functionality of Qdrant.
- Several example queries are provided to demonstrate the functionality of the search engine.

### Summary

In summary, the notebook demonstrates how easily and efficiently, complete search engine can be created using Qdrant Vector Database and Client.

### Explore More!

- This notebook has been covered in an article on Medium: [Build a search engine in 5 minutes using Qdrant](https://medium.com/@raoarmaghanshakir040/build-a-search-engine-in-5-minutes-using-qdrant-f43df4fbe8d1)
- [E-Commerce Products Search Engine Using Qdrant](https://www.kaggle.com/code/sacrum/e-commerce-products-search-engine-using-qdrant)
- [Qdrant](https://qdrant.tech)
- [Qdrant Documentation](https://qdrant.tech/documentation/)
- [Qdrant Python Client Documentation](https://python-client.qdrant.tech)
- [Quora Question Pair](https://www.kaggle.com/competitions/quora-question-pairs)


# Dataset

### Loading
1. Install `datasets` library
2. Load `Quora` dataset
3. Extract Questions
4. Concatenate all the questions

In [None]:
!pip install datasets

In [3]:
from datasets import load_dataset

dataset = load_dataset("quora", split="train")

In [4]:
questions = []
for q in dataset['questions']:
	questions.extend(q['text'])

len(questions), questions[:10]

(808580,
 ['What is the step by step guide to invest in share market in india?',
  'What is the step by step guide to invest in share market?',
  'What is the story of Kohinoor (Koh-i-Noor) Diamond?',
  'What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?',
  'How can I increase the speed of my internet connection while using a VPN?',
  'How can Internet speed be increased by hacking through DNS?',
  'Why am I mentally very lonely? How can I solve it?',
  'Find the remainder when [math]23^{24}[/math] is divided by 24,23?',
  'Which one dissolve in water quikly sugar, salt, methane and carbon di oxide?',
  'Which fish would survive in salt water?'])

### Preprocess

In [5]:
# Remove all duplicates
questions = list(set(questions))

In [6]:
# Filter shorter or longer questions

min_len = 10
max_len = 50

def filter_function(question):
	words = question.split()
	n_words = len(words)
	if n_words in range(min_len, max_len):
		return True
	return False

questions = list(filter(filter_function, questions))

In [7]:
import random

# Shuffle and Sample the dataset
# Since complete data is very large
# and can take longer processing time
N = 30_000

questions = random.choices(questions, k=N)

In [8]:
len(questions), questions[:10]

(30000,
 ['What is it like to be black (African migrant or African American) in Australia?',
  'What are these Canada people? Why Canada is not a state of America? Do they look like us?',
  'Why do we use long transmission line for longer than 240 km?',
  "I'm 11 and I want my nose pierced, I'm ok with waiting till I'm 12 which is in January, my dad said he will think about when I'm 12, can I u think?",
  'What is If there was one thing you would like to change about Quora what would it be? That one thing where Quora need to improve?',
  'My car steering not working as my wheel got stuck?',
  'What is the best shipping option for an online business in Nigeria sending products to the USA?',
  'Who is the best center to ever play in the NBA?',
  'How are red blood cells structured and how do they function?',
  'What are some things new employees should know going into their first day at Commerce Bank?'])

# Qdrant

In [None]:
!pip install qdrant-client[fastembed]

In [9]:
# Name of Qdrant Collection for saving vectors
QD_COLLECTION_NAME = "collection_name"

In [10]:
from qdrant_client import QdrantClient

client = QdrantClient(":memory:")

client.add(
    collection_name=QD_COLLECTION_NAME,
    documents=questions,
)

print("Completed")

Completed


In [11]:
def pretty_print(query):
    results = client.query(
        collection_name=QD_COLLECTION_NAME,
        query_text=query,
        limit=5
    )
    print("Query:", query)
    for i, result in enumerate(results):
        print()
        print(f"{i+1}) {result.document}")

In [12]:
pretty_print("what is the best earyly morning meal?")

Query: what is the best earyly morning meal?

1) What is your favorite food for a chilly winter day?

2) Can you give me some recipes for a healthy and easy packed lunch?

3) What is the best meal you ever had in your life?

4) What's the first thing you put in your mouth in the morning?

5) What is served for breakfast on a typical US army base?


In [13]:
pretty_print("How should one introduce themselves?")

Query: How should one introduce themselves?

1) What is the first step anyone will take before stating his/her own business?

2) How can a fresh graduate face his 1st interview for a bank job if the question is "say something about yourself / introduce yourself"?

3) What is the best way to respond to an email introduction?

4) How do I give a welcoming speech to students of the freshman year?

5) How do I speak up in front of many people?


In [14]:
pretty_print("Why is the Earth a sphere?")

Query: Why is the Earth a sphere?

1) Why do extraterrestrial bodies always appear as a spherical shape? Why not square or cylindrical?

2) If the earth is a sphere, how is it that wherever we stand, we never fall off?

3) All things are supposed to fall to the earth because of gravity, then why do clouds float?

4) Why do some people think that the Earth is flat?

5) What is the reason for existence of Earth's magnetic field?


# Explore More

- This notebook has been covered in an article on Medium: [Build a search engine in 5 minutes usingÂ Qdrant](https://medium.com/@raoarmaghanshakir040/build-a-search-engine-in-5-minutes-using-qdrant-f43df4fbe8d1)
- [E-Commerce Products Search Engine Using Qdrant](https://www.kaggle.com/code/sacrum/e-commerce-products-search-engine-using-qdrant)
- [Qdrant](https://qdrant.tech)
- [Qdrant Documentation](https://qdrant.tech/documentation/)
- [Qdrant Python Client Documentation](https://python-client.qdrant.tech)
- [Quora Question Pair](https://www.kaggle.com/competitions/quora-question-pairs)
