## Qdrant

- Vector Database
- Open Source
- An alternative to Pinecone database
- Managed Services

www.qdrant.tech

### Setup
- Setup 1GB Free cluster in cloud service
- Vector database persistent in time
- Database available from a URL
- Data available via simple APIs

In [None]:
# https://87cf3a23-f1db-424a-9d5f-da212d4074aa.us-east4-0.gcp.cloud.qdrant.io

In [None]:
# api_key = ''

In [None]:
# Cluster > Collection > Vector Store > Point or Vector

In [None]:
# Vector is a numerical representation of your text

In [1]:
!pip install qdrant_client openai tiktoken langchain-openai langchain_community

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [2]:
import os
import qdrant_client

from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

In [3]:
# # Create a qdrant client

# os.environ['QDRANT_HOST'] = ''
# os.environ['QDRANT_API_KEY'] = ''
os.environ['QDRANT_HOST']


'http://qdrant:6333'

In [4]:
client = qdrant_client.QdrantClient(
    os.getenv('QDRANT_HOST'),
    #api_key = os.getenv('QDRANT_API_KEY')
)

In [5]:
os.getenv('QDRANT_HOST')

'http://qdrant:6333'

In [6]:
from qdrant_client.http import models

In [7]:
# create a collection (A database with vectors)
# name, size or dimension, distance calculations = COSINE

os.environ["QDRANT_COLLECTION_NAME"] = 'collection1'

vector_config = models.VectorParams(
    size = 1536, # same as openAI
    distance = models.Distance.COSINE
)

client.recreate_collection(
    collection_name = os.getenv("QDRANT_COLLECTION_NAME"),
    vectors_config = vector_config
)

True

In [8]:
client.get_collections()

CollectionsResponse(collections=[CollectionDescription(name='collection1')])

In [9]:
# Create a vector store to store the documents

In [10]:
os.environ["OPENAI_API_KEY"]

'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'

In [11]:
# Qdrant client - client, collection name, embeddings

#os.environ["OPENAI_API_KEY"] = ""

embeddings = OpenAIEmbeddings()

vector_store = Qdrant(
    client = client,
    collection_name = os.getenv("QDRANT_COLLECTION_NAME"),
    embeddings = embeddings
)

In [12]:
# Add a document to vector store

In [13]:
# long documents > split the document into chunks > 1000 characters

In [14]:
with open('tsla_news.txt') as f:
  raw_text = f.read()
  print(raw_text)

'As a company, Tesla (NASDAQ: TSLA) has been at the center of investors’ attention for many years. On the one hand, it represents the world’s most recognizable EV maker, piloted by one of the world’s most recognized billionaires.', 'On the other hand, many have assessed that the firm is overvalued and priced as a big tech company and not a car maker – with Musk himself claiming the former on at least one occasion.', 'No matter the stock market sentiment toward the EV maker, the numbers paint a decisive yet curious picture. Throughout 2020, Tesla delivered approximately 499 million vehicles and made $31.5 billion in revenue. In the first three quarters of 2023, the company delivered more than 1.3 million vehicles and already accrued more than $72 billion in revenue.', 'Tesla’s earnings per share and profits have similarly vastly increased between 2020 and 2023 – $0.24 to $3.14 and $782 million to $11 billion, respectively. Despite the indisputable growth, the price of the EV maker’s sha

In [15]:
from langchain.text_splitter import CharacterTextSplitter

def get_chunks(text):
  text_splitter = CharacterTextSplitter(
      separator = "\n",
      chunk_size = 200,
      chunk_overlap = 40,
      length_function = len
  )

  chunks = text_splitter.split_text(text)

  return chunks


In [16]:
texts = get_chunks(raw_text)
print(texts)

["'As a company, Tesla (NASDAQ: TSLA) has been at the center of investors’ attention for many years. On the one hand, it represents the world’s most recognizable EV maker, piloted by one of the world’s most recognized billionaires.', 'On the other hand, many have assessed that the firm is overvalued and priced as a big tech company and not a car maker – with Musk himself claiming the former on at least one occasion.', 'No matter the stock market sentiment toward the EV maker, the numbers paint a decisive yet curious picture. Throughout 2020, Tesla delivered approximately 499 million vehicles and made $31.5 billion in revenue. In the first three quarters of 2023, the company delivered more than 1.3 million vehicles and already accrued more than $72 billion in revenue.', 'Tesla’s earnings per share and profits have similarly vastly increased between 2020 and 2023 – $0.24 to $3.14 and $782 million to $11 billion, respectively. Despite the indisputable growth, the price of the EV maker’s s

In [17]:
len(texts)

1

In [18]:
vector_store.add_texts(texts)

APIConnectionError: Connection error.

In [None]:
# query the data

In [None]:
# retriever

from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.schema import retriever


In [None]:
qa = RetrievalQA.from_chain_type(
    llm = OpenAI(),
    chain_type = "stuff",
    retriever = vector_store.as_retriever()
)

In [None]:
query = "What was the tesla earning price"
response = qa.run(query)
print(response)

In [None]:
query = "Give me the top 5 main key points of the news"
response = qa.run(query)
print(response)