https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/qdrant.html

In [1]:
import os
import qdrant_client
import sys
sys.path.insert(0, '../')
import local_secrets as secrets
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Qdrant, Pinecone
from langchain.document_loaders import TextLoader
from llama_index.readers.qdrant import QdrantReader
from llama_index import GPTListIndex
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.chains.question_answering import load_qa_chain
import ipywidgets as widgets
from IPython.display import clear_output
import pinecone
import langchain

In [2]:
os.environ['OPENAI_API_KEY'] = secrets.techstyle_openai_key
os.environ['PINECONE_API_KEY'] = secrets.techstyle_pinecone_api_key
print(langchain.__version__)

0.0.212


In [6]:
loader = TextLoader('../state_of_the_union.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
qdrant = Qdrant.from_documents(docs, embeddings, url='http://localhost:6333', collection_name="state_of_the_union",)

In [22]:
reader = QdrantReader(host="localhost")
query_vector=[0.3, 0.7]*768
documents = reader.load_data(collection_name="github", query_vector=query_vector, limit=5)
index = GPTListIndex.from_documents(documents)
print(len(documents[0].get_embedding()))
[doc.doc_id for doc in documents]

1536


['062f83e9a6a7963d21c71b6e9a79dd698aa7a9a9',
 '2f13b031f5179ebb3a591406e5e6a6ef5e2769de',
 'f5f98621416fb23d459bd8cdc6f41ae8f8ea8a53',
 'fff1e849717211a1bcc1d0154ecc6543bd646e5f',
 '103df1e6df518549cd29d0129bcf6217e8e9b0eb']

In [33]:
# https://qdrant.tech/articles/langchain-integration/
os.environ['OPENAI_API_KEY'] = secrets.techstyle_openai_key
llm = ChatOpenAI()
# pinecone
client = pinecone.init(api_key=secrets.techstyle_pinecone_api_key, environment='us-east-1-aws')
retriever = Pinecone(index=pinecone.Index('ssk'), embedding_function=OpenAIEmbeddings, text_key='text').as_retriever()
chain = load_qa_chain(llm=llm, chain_type='stuff') 

question = 'What are the classes in the llama_index package?'
question = 'How is the llama_index Node class used'
question = 'Show an example of how to use the llama_index Node class'
question = 'How can several llama_indexes be composed?'
question = 'Explain llama_index nodes'
# documents = retriever.get_relevant_documents(question)
#chain.run(input_documents=documents, question=question)

In [8]:
def on_button_clicked(b):
    with output:
        clear_output()
        documents = retriever.get_relevant_documents(question.value)
        print(chain.run(input_documents=documents, question=question.value))
output = widgets.Output()
question = widgets.Text()
button = widgets.Button(description='Chat')
button.on_click(on_button_clicked)
widgets.VBox([question, button, output])

VBox(children=(Text(value=''), Button(description='Chat', style=ButtonStyle()), Output()))

In [25]:
print(chain.document_prompt)
print(len(documents))

PromptTemplate(input_variables=['page_content'], output_parser=None, partial_variables={}, template='{page_content}', template_format='f-string', validate_template=True)

In [3]:
#index = ('ssk')
#print(index.describe_index_stats())
vectorstore = Qdrant(client=qdrant_client.QdrantClient(url='http://localhost:6333'), collection_name='github', embeddings=OpenAIEmbeddings(), content_payload_key='text',metadata_payload_key="Payload",)
documents = vectorstore.similarity_search('How can several llama_indexes be composed?')
[doc for doc in documents]

[Document(page_content='# Composability\n\n\nLlamaIndex offers **composability** of your indices, meaning that you can build indices on top of other indices. This allows you to more effectively index your entire document tree in order to feed custom knowledge to GPT.\n\nComposability allows you to to define lower-level indices for each document, and higher-order indices over a collection of documents. To see how this works, imagine defining 1) a tree index for the text within each document, and 2) a list index over each tree index (one document) within your collection.\n\n### Defining Subindices\nTo see how this works, imagine you have 3 documents: `doc1`, `doc2`, and `doc3`.\n\n```python\ndoc1 = SimpleDirectoryReader(\'data1\').load_data()\ndoc2 = SimpleDirectoryReader(\'data2\').load_data()\ndoc3 = SimpleDirectoryReader(\'data3\').load_data()\n```\n\n![](/_static/composability/diagram_b0.png)\n\nNow let\'s define a tree index for each document. In Python, we have:\n\n```python\nindex

In [50]:
index = pinecone.Index('ssk')
print(index.describe_index_stats())
vectorstore = Pinecone.from_existing_index(index_name='ssk', embedding=OpenAIEmbeddings(), namespace='github-llama-index')
documents = vectorstore.similarity_search('How can several llama_indexes be composed?')
[doc.metadata for doc in documents]

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'github-llama-index': {'vector_count': 335}},
 'total_vector_count': 335}


[{'doc_id': 'cc1c9dba6db2572cc3b8db8427880ca8a7240366',
  'document_id': 'cc1c9dba6db2572cc3b8db8427880ca8a7240366',
  'file_name': 'composability.md',
  'file_path': 'docs/how_to/index_structs/composability.md',
  'node_info': '{"_node_type": "1"}',
  'ref_doc_id': 'cc1c9dba6db2572cc3b8db8427880ca8a7240366',
  'relationships': '{"1": "cc1c9dba6db2572cc3b8db8427880ca8a7240366"}'},
 {'doc_id': 'e62fe521e0df705f5651ca17f5656ca3b13cd533',
  'document_id': 'e62fe521e0df705f5651ca17f5656ca3b13cd533',
  'file_name': 'index_guide.md',
  'file_path': 'examples/multimodal/data/llama/index_guide.md',
  'node_info': '{"_node_type": "1"}',
  'ref_doc_id': 'e62fe521e0df705f5651ca17f5656ca3b13cd533',
  'relationships': '{"1": "e62fe521e0df705f5651ca17f5656ca3b13cd533"}'},
 {'doc_id': 'c24042be22bf069bc8c283c253fe4c2fcdca3a9a',
  'document_id': 'c24042be22bf069bc8c283c253fe4c2fcdca3a9a',
  'file_name': 'index_guide.md',
  'file_path': 'docs/guides/primer/index_guide.md',
  'node_info': '{"_node_type"

In [49]:
documents[0]

Document(page_content='# Composability\n\n\nLlamaIndex offers **composability** of your indices, meaning that you can build indices on top of other indices. This allows you to more effectively index your entire document tree in order to feed custom knowledge to GPT.\n\nComposability allows you to to define lower-level indices for each document, and higher-order indices over a collection of documents. To see how this works, imagine defining 1) a tree index for the text within each document, and 2) a list index over each tree index (one document) within your collection.\n\n### Defining Subindices\nTo see how this works, imagine you have 3 documents: `doc1`, `doc2`, and `doc3`.\n\n```python\ndoc1 = SimpleDirectoryReader(\'data1\').load_data()\ndoc2 = SimpleDirectoryReader(\'data2\').load_data()\ndoc3 = SimpleDirectoryReader(\'data3\').load_data()\n```\n\n![](/_static/composability/diagram_b0.png)\n\nNow let\'s define a tree index for each document. In Python, we have:\n\n```python\nindex1

In [34]:
llm = ChatOpenAI(model='gpt-3.5-turbo-16k')
retriever = Pinecone.from_existing_index(index_name='ssk', embedding=OpenAIEmbeddings(), namespace='github-llama-index').as_retriever()
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="map_reduce", retriever=retriever, return_source_documents=True)
result = qa({"query": 'How can several llama_indexes be composed? Show an example.'})
result['result']

'Several `llama_index` instances can be composed using the `IndexComposer` class. Here\'s an example:\n\n```python\nfrom llama_index import GPTVectorStoreIndex, IndexComposer\n\n# Create the first index\ndocuments1 = SimpleDirectoryReader(\'data1\').load_data()\nindex1 = GPTVectorStoreIndex.from_documents(documents1)\n\n# Create the second index\ndocuments2 = SimpleDirectoryReader(\'data2\').load_data()\nindex2 = GPTVectorStoreIndex.from_documents(documents2)\n\n# Compose the two indexes\ncomposer = IndexComposer()\ncomposer.add_index(index1)\ncomposer.add_index(index2)\n\n# Use the composed index for querying\ncomposed_index = composer.compose_index()\n\nquery_engine = composed_index.as_query_engine()\nresponse = query_engine.query("What did the author do growing up?")\nprint(response)\n```\n\nIn this example, we create two `GPTVectorStoreIndex` instances (`index1` and `index2`) from different sets of documents (`data1` and `data2`). We then use the `IndexComposer` class to compose th

In [44]:
retriever = Qdrant(client=qdrant_client.QdrantClient(url='http://localhost:6333'), collection_name='github', embeddings=OpenAIEmbeddings(), content_payload_key='text').as_retriever()
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="map_reduce", retriever=retriever, return_source_documents=True)
result = qa({"query": 'How can several llama_indexes be composed? Show an example.'})
result['result']

'To compose several `llama_index` instances, you can use the `ComposableGraph` class provided by the LlamaIndex library. Here\'s an example:\n\n```python\nfrom llama_index import GPTVectorStoreIndex, GPTListIndex\nfrom llama_index.indices.composability import ComposableGraph\n\n# Create the individual indexes\nindex1 = GPTVectorStoreIndex.from_documents(documents1)\nindex2 = GPTListIndex.from_documents(documents2)\n\n# Compose the indexes into a single graph\ngraph = ComposableGraph.from_indices(GPTListIndex, [index1, index2], index_summaries=["summary1", "summary2"])\n\n# Use the composed index for querying\nquery_engine = graph.as_query_engine()\nresponse = query_engine.query("What is the answer to my question?")\nprint(response)\n```\n\nIn this example, `index1` and `index2` are two separate `llama_index` instances representing different sets of documents. The `ComposableGraph.from_indices()` method is used to compose these indexes into a single graph. \n\nYou need to provide the ty

In [47]:
result['source_documents'][0]

Document(page_content='# How Each Index Works\n\nThis guide describes how each index works with diagrams. We also visually highlight our "Response Synthesis" modes.\n\nSome terminology:\n- **Node**: Corresponds to a chunk of text from a Document. LlamaIndex takes in Document objects and internally parses/chunks them into Node objects.\n- **Response Synthesis**: Our module which synthesizes a response given the retrieved Node. You can see how to \n    [specify different response modes](setting-response-mode) here. \n    See below for an illustration of how each response mode works.\n\n## List Index\n\nThe list index simply stores Nodes as a sequential chain.\n\n![](/_static/indices/list.png)\n\n### Querying\n\nDuring query time, if no other query parameters are specified, LlamaIndex simply loads all Nodes in the list into\nour Reponse Synthesis module.\n\n![](/_static/indices/list_query.png)\n\nThe list index does offer numerous ways of querying a list index, from an embedding-based que