# Introduction to Weaviate API

There are many vector databases available, and Weaviate is one of them. It's demonstrated in DeepLearning.AI's course on RAG so I used it here as well. The workflow is similar to other vector databases, so you can (supposedly) easily switch to another one if you prefer.

My first encounter with vector databases was with Chroma, which is also a popular choice. I will cover Chroma in another notebook.



## Typical Vector Database Workflow

1. `Set up the vector database`: This involves creating a connection to the database and defining the schema for your data.

2. `Load data into the vector database`: This step involves ingesting your data into the database, which may include text, images, or other types of data.

3. `Create sparse vectors for keyword search`: This step involves generating sparse vectors (e.g., TF-IDF or bag-of-words) for your data, which allows for keyword-based search capabilities.

4. `Create dense vectors for semantic search`: This step involves generating embeddings for your data, which allows for semantic search capabilities.

5. `Create HNSW index to power ANN`: This step involves creating an index that enables approximate nearest neighbor (ANN) search, which is essential for efficient retrieval of similar items in large datasets.

### Terminology Crash Course

`KNN` stands for K-Nearest Neighbors, which is a type of algorithm used to find the closest points in a dataset to a given query point. It is commonly used in classification and regression tasks.

<img src = "./resource/knn.jpg" width="400"  alt="KNN Algorithm" align="center">

`ANN` stands for Approximate Nearest Neighbor, which is a technique used to quickly find items in a dataset that are similar to a given query item. It is used to speed up search operations in high-dimensional spaces.

`HNSW` stands for Hierarchical Navigable Small World, which is a type of graph-based index used for approximate nearest neighbor search. It allows for efficient retrieval of similar items in high-dimensional spaces.
