Semantic Search with MongoDB and FastAPI

For more details refer to the blog post: Semantic Search with MongoDB and FastAPI: Comprehensive Guide.

Description

This project demonstrates how you can enhance standard CRUD operations in your application using Semantic Search mechanism.

MongoDB hosted on Atlas is used as a primary Database, leveraging its Vector Search feature to perform Semantic Search.

Open-Source Sentence Transformers from Hugging Face are used for creation of Embedding Vectors, which are stored directly in MongoDB documents and are used in Semantic Search.

Application implementation is done in Python using the FastAPI framework, it heavily relies on Pymongo for communication with MongoDB and on Pydantic for data modeling and validation. All necessary data processing needed for Semantic Search (data vectorization and storage managing) is encapsulated and hidden from the API user, which makes standard CRUD operations easy to use.

This project works with the data from TMDB 5000 Movie Dataset from Kaggle.

Prerequisites

Python Environment management tool like conda or venv
MongoDB Atlas account
MongoDB Atlas cluster v6.0.11, v7.0.2, or later

Setup

The steps to get the project up and running are:

Clone the repository to your local machine
MongoDB Atlas Cluster setup
1. Create account on MongoDB Atlas (if you don't already have one) and log in
2. Create a new project and deploy a free cluster
3. Add database user and save credentials (username and password)
4. Whitelist your current IP address
5. Get connection string, should look like this: mongodb+srv://<username>:<password>@<host>/?retryWrites=true&w=majority, part after host is optional
MongoDB Atlas Vector Search setup
1. Find deployed cluster in the Database section and create a database called 'semantic_search' with 'movies' collection in it
2. Create a vector search index with name 'moviesVectorSearch' and link it to created collection. For Index definition use the following JSON Editor:
```
{
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "dimensions": 384,
        "similarity": "cosine",
        "type": "knnVector"
      }
    }
  }
}
```

Create a .env file in project root and fill in with your user credentials and host from the MongoDB connection string. Fill in the DB name, movies collection name and search index name as you named them in MongoDB Atlas

# MongoDB Atlas Credentials
MONGODB_ATLAS_USERNAME=<username>
MONGODB_ATLAS_PASSWORD=<password>
MONGODB_ATLAS_HOST=<host>

# MongoDB Atlas Database
MONGODB_ATLAS_DB_NAME=semantic_search
MONGODB_ATLAS_MOVIES_COLLECTION_NAME=movies

# MongoDB Atlas Vector Search
MONGODB_ATLAS_MOVIES_VECTOR_SEARCH_INDEX_NAME=moviesVectorSearch

Create Python virtual environment with version 3.11 (should work with older versions like 3.10 and 3.9)
```
conda create --name your_environment_name python=3.11
```

Activate the environment and install the packages

conda activate your_environment_name

pip install -r requirements.txt

Usage

To run the application, navigate to project root folder, activate your environment and run main.py script.

python main.py

Notes:

At application startup the Database will be populated using TMDB 5000 Movies Dataset from data_source folder, so this may take some time (up to 1 minute on my machine using CPU). After Database is initialized, this step is skipped at application startup/reload.
During the initial run of the application, the all-miniLM-L6-v2 model needs to be downloaded from the Hugging Face Hub, which also takes some time. For subsequent executions, the model will be loaded from cache.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data_source		data_source
database		database
routes		routes
.gitignore		.gitignore
README.md		README.md
config.py		config.py
language_model.py		language_model.py
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Search with MongoDB and FastAPI

Description

Prerequisites

Setup

Usage

About

Languages

lukovicaleksa/semantic-search-mongodb-fastapi

Folders and files

Latest commit

History

Repository files navigation

Semantic Search with MongoDB and FastAPI

Description

Prerequisites

Setup

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages