# Using Vector db

RAG(검색 강화 생성) 시스템에서 벡터 데이터베이스는 매우 중요한 역할을 합니다. 이들은 고차원 벡터 데이터의 효율적인 저장, 관리, 그리고 색인을 제공하여, 다른 구성 요소와의 원활한 통합을 기반으로 하여 빠르고 정확한 정보 검색을 가능하게 합니다. 여기 RAG에서 벡터 데이터베이스의 역할과 중요성을 설명하는 몇 가지 핵심 포인트가 있습니다:

효율적인 지식 검색: 벡터 데이터베이스는 시스템에서 효율적인 지식 검색의 기반을 제공합니다. 이들은 학습과 정확한 응답 생성에 검색을 많이 의존하는 RAG 모델에 필수적인 고차원 데이터를 저장하고 관리합니다.

정보 검색 속도 향상: 벡터 데이터베이스는 속도와 효율성에서 뛰어나며, 빠른 검색이 정확한 응답 생성의 핵심인 RAG 모델과 같은 시스템에서 없어서는 안 될 존재입니다.

정확성 향상: 관련 데이터 포인트를 정밀하게 연결함으로써, 벡터 데이터베이스는 모델이 사용자 질문에 맞춤화된 정확하고 문맥적으로 풍부한 답변을 생성할 수 있게 하여, RAG 응답의 전반적인 정확성을 향상시킵니다.

학습 맞춤화: 벡터 데이터베이스는 차원적 유연성을 제공하여, 진화하는 요구 사항과 증가하는 벡터 데이터 양을 처리할 수 있도록 맞춤형 학습 경험을 제공할 수 있습니다.

다른 구성 요소와의 통합: 벡터 데이터베이스는 RAG 모델과 생성적 AI 응용 프로그램과 같은 다른 구성 요소와의 원활한 통합을 통해 전례 없는 속도로 정확하고 개인화된 응답을 제공합니다.

요약하자면, 벡터 데이터베이스는 RAG 시스템의 필수적인 구성 요소입니다. 고차원 데이터의 효율적인 저장 및 검색을 제공하고, 속도와 정확성을 통해 모델 성능을 향상시키며, 맞춤형 학습 경험을 지원합니다.

전통적인 키워드 기반의 검색 방식은 사용자의 질문에 대한 정확한 답변을 찾아내는 데 한계가 있다 보니 사용자의 질문을 더 깊이 이해하고, 대용량 데이터에서 정확한 답변을 추출할 수 있는 기술의 필요성이 높아지고 있습니다.
 

이러한 고민을 해결하고자 LLM과 벡터 임베딩이라는 두 가지 기술이 주목받고 있습니다.  
- LLM은 텍스트 데이터의 패턴을 깊게 파악하여 사용자의 질문에 대한 연관된 답변을 생성하거나 추출하는 데 탁월한 성능을 보입니다. 
- 벡터 임베딩은 단어나 문장, 심지어 문서 전체의 의미를 벡터 형태로 표현하는 기술로, 이를 통해 문서 간의 유사성을 빠르게 계산할 수 있습니다.
 
이 두 기술의 결합은 RAG (Retrieval Augmented Generation) 기술로 연결됩니다.  
사용자의 질문을 벡터로 변환하고, 이를 기반으로 데이터베이스 내에서 가장 연관된 문서나 정보를 찾아내는 것을 의미하는데요, LLM은 그 후 이 문서들을 기반으로 사용자의 질문에 가장 적절한 답변을 생성하게 됩니다.

## Introduction to Vector Databases

Vector databases play a crucial role in RAG (Retrieval Augmented Generation) systems by providing efficient storage, management, and indexing of high-dimensional vector data. 

They form the foundation for seamless integration with other components, enabling quick and precise information retrieval. Here are some key points explaining the role and importance of vector databases in RAG:
Efficient Knowledge Retrieval: Vector databases act as the backbone for efficient knowledge retrieval in systems.  
They store and manage high-dimensional data, which is essential for RAG models that rely heavily on retrieval for learning and generating accurate responses

Traditional keyword-based search methods have limitations in finding accurate answers to users' questions.
So new method is needed to better understand what users are asking and to find accurate answers in large amounts of data
 

To address this challenge, LLM and vector embedding are key technology.  
- LLM performs well in generating or extracting related answers to users' questions by deeply grasping patterns in textual data. 
- Vector embedding is a technique for representing a word or sentence, or even the meaning of a document as a whole, in vector form, which allows us to quickly compute similarities between documents.
 
The combination of these two technologies is linked by Retrieval Augmented Generation (RAG). 
It means converting a user's question into a vector and finding the most relevant documents or information in the database, which LLM then generates the most appropriate answer to the user's question.

## Understanding Vector Databases

1. Definition and Purpose
Vector databases are a type of database that store and manage unstructured data, such as text, images, or audio, in vector embeddings (high-dimensional vectors) to make it easier to search and query.  
They are designed to handle complex data types and perform high-speed computations, making them well-suited for tasks involving similarity searches and machine learning tasks

2. How Vector Databases Differ from Traditional Databases
- Data Representation: Traditional databases store data in tables, rows, and columns, while vector databases store data in vectors, which are mathematical representations of data points

- Querying Approach: Traditional databases often rely on exact matching queries based on keys or specific attribute values, while vector databases use similarity-based queries, where the goal is to find vectors that are most similar to a given query vector

- Optimization Techniques: Vector databases employ specialized algorithms for Approximate Nearest Neighbor (ANN) search, which optimize the search process. These algorithms may involve techniques such as hashing, quantization, or graph-based search

- Use Cases: Vector databases are often used in applications involving similarity matching, recommendation systems, image recognition, natural language processing, and other tasks that require vector-based operations

In [None]:
!pip install llama-index

## Getting Started with Sample Techcrunch Articles

In this tutorial, we will use the [techcrunch](https://techcrunch.com/) dataset to illustrate how to use the RAG system. The dataset contains 10,000 techcrunch articles, each of which contains 10,000 words. The goal of this tutorial is to learn how to use the RAG system to retrieve the most relevant articles for a given question.

#### Download articles
> TechCrunch Article

In [1]:
!wget -q https://github.com/kairess/toy-datasets/raw/master/techcrunch-articles.zip
!unzip -q techcrunch-articles.zip -d articles

In [2]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="articles")
docs = reader.load_data()

print(f"Count of Techcrunch articles: {len(docs)}")

Count of Techcrunch articles: 21


### 1. Simple Vector Store

In [5]:
from llama_index.core import VectorStoreIndex


# 1. Load VectorStoreIndex directly from Documents
index = VectorStoreIndex.from_documents(docs, show_progress=True)

Parsing nodes:   0%|          | 0/21 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/51 [00:00<?, ?it/s]

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


#### Persist techcrunch articles in Simple VectorStore

In [6]:
index.set_index_id("techcrunch_articles")
index.storage_context.persist("./stroage/simple")

#### Load articles from Simple VectorStore

In [8]:
from llama_index.core import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage/simple")

# load index
simple_vc_index = load_index_from_storage(storage_context, index_id="techcrunch_articles")

INFO:llama_index.core.indices.loading:Loading indices with ids: ['techcrunch_articles']
Loading indices with ids: ['techcrunch_articles']


### 2. Chroma: Simplifying Vector Database Operations

Chroma is a vector database that is particularly suited for RAG (Retrieval Augmented Generation) systems due to its focus on simplifying the development of large language model (LLM) applications. Chroma is an open-source embedding database that provides developers with a highly-scalable and efficient solution for storing, searching, and retrieving high-dimensional vectors

It is known for its flexibility, allowing deployment on the cloud or as an on-premise solution, and supports multiple data types and formats, making it suitable for a wide range of applications

When comparing Chroma to other vector databases used in RAG systems, it is important to consider their specific strengths and trade-offs. Chroma excels in its flexibility and scalability, making it a popular choice for audio-based search engines, music recommendations, and other audio-related use cases

On the other hand, Pinecone, another vector database, is known for its simple, intuitive interface and extensive support for high-dimensional vector databases, making it suitable for various use cases, including similarity search, recommendation systems, personalization, and semantic search

In terms of scalability, Chroma and Pinecone both support large volumes of high-dimensional data and efficient search performance

However, Pinecone is a fully-managed service, which means it can't be run locally, while Chroma and other vector databases like Milvus, Weaviate, Faiss, Elasticsearch, and Qdrant can be run locally

When choosing the right vector database for your specific needs, consider factors such as scalability, performance, flexibility, ease of use, reliability, and deployment options
Each vector database has its own strengths and trade-offs, so it's essential to evaluate your objectives and choose a vector database that best meets your requirements.

In [None]:
# install chromadb

!pip install chromadb

#### Persist techcrunch articles in Chroma VectorStore

In [10]:
import chromadb
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

""" SAVE TO LOCAL"""
db = chromadb.PersistentClient(path="./storage/chroma")
chroma_collection = db.get_or_create_collection("techcrunch_articles")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

chroma_index = VectorStoreIndex.from_documents(docs, storage_context=storage_context, show_progress=True)

INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


Parsing nodes:   0%|          | 0/21 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/51 [00:00<?, ?it/s]

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


##### Load articles from Chroma VectorStore

In [11]:
db = chromadb.PersistentClient(path="./storage/chroma")
chroma_collection = db.get_or_create_collection("techcrunch_articles")
chroma_vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

chroma_index = VectorStoreIndex.from_vector_store(vector_store=chroma_vector_store)

### 3. Faiss: Efficient Similarity Search and Clustering

In [15]:
!pip install llama-index-vector-stores-faiss faiss-cpu



In [14]:
import faiss

# dimensions of text-ada-embedding-002 in OpenAIEmbedding
d = 1536
faiss_index = faiss.IndexFlatL2(d)

INFO:faiss.loader:Loading faiss.
Loading faiss.
INFO:faiss.loader:Successfully loaded faiss.
Successfully loaded faiss.


In [18]:
from llama_index.core import (
    VectorStoreIndex,
    StorageContext,
)
from llama_index.vector_stores.faiss import FaissVectorStore

faiss_vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=faiss_vector_store)

faiss_index = VectorStoreIndex.from_documents(docs, storage_context=storage_context, show_progress=True)

# Save Into Faiss Vector Store
faiss_index.storage_context.persist(persist_dir="./storage/faiss")


Parsing nodes:   0%|          | 0/21 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/51 [00:00<?, ?it/s]

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [21]:
# load faiss index from disk
vector_store = FaissVectorStore.from_persist_dir("./storage/faiss")
storage_context = StorageContext.from_defaults(
    vector_store=vector_store, persist_dir="./storage/faiss"
)
faiss_index = load_index_from_storage(storage_context=storage_context)

INFO:root:Loading llama_index.vector_stores.faiss.base from ./storage/faiss/default__vector_store.json.
Loading llama_index.vector_stores.faiss.base from ./storage/faiss/default__vector_store.json.
INFO:llama_index.core.indices.loading:Loading all indices.
Loading all indices.


### 4. Qdrant: A Comprehensive Vector Database

In [23]:
!pip install llama-index-vector-stores-qdrant qdrant_client



In [None]:
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

# This is for Cloud Storage
# client = QdrantClient(
#     "localhost",
#     port="6333",
# )
client = QdrantClient(path="./storage/qdrant")

qdrant_vector_store = QdrantVectorStore(
    client=client, 
    collection_name="techcrunch_articles", 
    enable_hybrid=False, #  whether to enable hybrid search using dense and sparse vectors
)

storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)
qdrant_index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)

In [None]:
# load Qdrant index from disk

client = QdrantClient(path="./storage/qdrant")

qdrant_vector_store = QdrantVectorStore(client=client, collection_name="techcrunch_articles")
qdrant_index = VectorStoreIndex.from_vector_store(vector_store=qdrant_vector_store)

### Conclusion

1. Recap of Key Points
2. Future of Vector Databases in Data-Driven Applications

### 1. Recap of Key Points

In this blog content, we have explored the basics of vector databases and their role in data-driven applications. We have discussed the differences between vector databases and traditional databases, focusing on their data representation, querying approach, optimization techniques, and use cases. We have also provided an overview of three popular vector databases: Chroma, Faiss, and Qdrant, including their key features and usage.

### 2. Future of Vector Databases in Data-Driven Applications
Vector databases are becoming increasingly important in the field of data-driven applications, particularly in tasks involving similarity searches and machine learning.   
As the demand for efficient and scalable data management solutions grows, vector databases are expected to play a more significant role in powering large language models, image recognition, and other AI applications.   

In the future, we can expect to see advancements in vector database technologies, such as improved performance, scalability, and adaptability to different data types. Additionally, we may see the development of more user-friendly APIs and toolkits to facilitate the integration of vector databases into various applications.   

Overall, vector databases are a promising technology that is poised to revolutionize the way we handle and analyze complex data. As the field continues to evolve, we can anticipate new applications and use cases that will further expand the potential of vector databases in data-driven applications.