Computer Vision Hackathon NLP Part

Persian to English Vector Database Search

This Jupyter notebook demonstrates how to create and use a vector database for searching English text content using Persian queries. The system uses LanceDB for vector storage and includes a translation pipeline to convert Persian queries to English before searching.

Features

Vector database creation with LanceDB
Persian to English query translation
Full-text search capabilities
Sample dataset of 30 English sentences across various topics

Requirements

The following packages are required:

pip install lancedb
pip install tantivy
pip install transformers

Components

1. Database Creation

Uses LanceDB to store vector embeddings and text data
Creates a table called 'news' with vector embeddings and text content
Implements full-text search indexing on the text field

2. Translation System

Utilizes the Hugging Face transformers library
Uses the "barghavani/Farsi-to-English" translation model
Translates Persian queries to English before searching

3. Sample Dataset

The database includes 30 sample sentences covering various topics:

Technology and Computing
Business and Work
Education and Learning
Health and Wellness
Environment
Entertainment and Culture

Usage

Run the installation cells to set up required packages
Execute the database creation function:

tbl = create_database()

Search using Persian queries:

results = search('مهارت', tbl)  # Searches for "skill" in English

Output Format

The search results are returned as a pandas DataFrame containing:

vector: The embedding vector
text: The matched English text
_score: The relevance score of the match

Notes

The vector values in the sample data are simplified (1-30) for demonstration purposes
In a production environment, these should be replaced with actual embeddings
The full-text search index is created automatically on the 'text' field

Example

# Create database
tbl = create_database()

# Search in Persian
result = search('فرهنگ', tbl)
# Returns matches related to "skill" in the English dataset

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
VectorDatabasePractice.ipynb		VectorDatabasePractice.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Computer Vision Hackathon NLP Part

Persian to English Vector Database Search

Features

Requirements

Components

1. Database Creation

2. Translation System

3. Sample Dataset

Usage

Output Format

Notes

Example

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

setarekhosravi/MTN-Hackathon-Vector-Database

Folders and files

Latest commit

History

Repository files navigation

Computer Vision Hackathon NLP Part

Persian to English Vector Database Search

Features

Requirements

Components

1. Database Creation

2. Translation System

3. Sample Dataset

Usage

Output Format

Notes

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages