[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/llama-index/Using_LlamaIndex_with_Pinecone.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/llama-index/Using_LlamaIndex_with_Pinecone.ipynb)

# Using LlamaIndex with Pinecone

Accompanying notebook to this documentation {insert link}

# Set up environment


In [1]:
!pip install -qU \
    llama-index==0.9.34 \
    "pinecone-client[grpc]"==3.0.0 \
    arxiv==2.1.0


In [2]:
import os
from getpass import getpass

pinecone_api_key = os.getenv("PINECONE_API_KEY") or getpass("Enter your Pinecone API Key: ")
openai_api_key = os.getenv("OPENAI_API_KEY") or getpass("Enter your OpenAI API Key: ")


Enter your Pinecone API Key: ··········
Enter your OpenAI API Key: ··········


In [3]:
# Notebook runs on Python version:
!python -V

Python 3.10.12


# Load the data


In [7]:
import arxiv
from pathlib import Path
from llama_index import download_loader

# Download paper to local file system (LFS)
# `id_list` contains 1 item that matches our PDF's arXiv ID
paper = next(arxiv.Client().results(arxiv.Search(id_list=["1603.09320"])))
paper.download_pdf(filename="hnsw.pdf")

# Download and instantiate `PDFReader` from LlamaHub
PDFReader = download_loader("PDFReader")
loader = PDFReader()

# Load HNSW PDF from LFS
documents = loader.load_data(file=Path('./hnsw.pdf'))

# Preview one of our documents
documents[0]

Document(id_='8ace8b98-2057-4113-ac54-dcb474cbf89b', embedding=None, metadata={'page_label': '1', 'file_name': 'hnsw.pdf'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='786d6f516cefedb35510cd9443b8bbef8256f606a479688b4ade59c548117475', text="IEEE TRANSACTIONS ON  JOURNAL NAME,  MANUS CRIPT ID  1 \n Efficient and robust approximate nearest \nneighbor search using Hierarchical Navigable \nSmall World graphs  \nYu. A. Malkov,  D. A. Yashunin  \nAbstract  — We present a new approach for the approximate K -nearest neighbor search based on navigable small world \ngraphs with controllable hierarchy (Hierarchical NSW , HNSW ). The proposed solution is fully graph -based, without any need for \nadditional search structures, which are typically used at the coarse search stage of the most proximity graph techniques. \nHierarchical NSW incrementally builds  a multi -layer structure consisting from hierarchical set of proximity graphs (layers) for \nnested

In [8]:
# Clean up our Documents' content
import re

def clean_up_text(content: str) -> str:
    """
    Remove unwanted characters and patterns in text input.

    :param content: Text input.

    :return: Cleaned version of original text input.
    """

    # Fix hyphenated words broken by newline
    content = re.sub(r'(\w+)-\n(\w+)', r'\1\2', content)

    # Remove specific unwanted patterns and characters
    unwanted_patterns = [
        "\\n", "  —", "——————————", "—————————", "—————",
        r'\\u[\dA-Fa-f]{4}', r'\uf075', r'\uf0b7'
    ]
    for pattern in unwanted_patterns:
        content = re.sub(pattern, "", content)

    # Fix improperly spaced hyphenated words and normalize whitespace
    content = re.sub(r'(\w)\s*-\s*(\w)', r'\1-\2', content)
    content = re.sub(r'\s+', ' ', content)

    return content

# Call function
cleaned_docs = []
for d in documents:
    cleaned_text = clean_up_text(d.text)
    d.text = cleaned_text
    cleaned_docs.append(d)

In [9]:
cleaned_docs[0].get_content()

"IEEE TRANSACTIONS ON JOURNAL NAME, MANUS CRIPT ID 1 Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs Yu. A. Malkov, D. A. Yashunin Abstract We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW , HNSW ). The proposed solution is fully graph-based, without any need for additional search structures, which are typically used at the coarse search stage of the most proximity graph techniques. Hierarchical NSW incrementally builds a multi-layer structure consisting from hierarchical set of proximity graphs (layers) for nested subsets of the stored elements. The maximum layer in which an element is present is selected randomly with an exponentially decaying probability distribution. This allows producing graphs similar to the previously studied Navigable Sma ll World (NSW) struc tures while additionally having the links separated by the

# Transform the data

## Metadata

In [10]:
# Unhelpful metadata
cleaned_docs[0].metadata

{'page_label': '1', 'file_name': 'hnsw.pdf'}

In [11]:
# Iterate through `documents` and add our new key:value pairs

# Helpful metadata additions
metadata_additions = {"authors": ["Yu. A. Malkov", "D. A. Yashunin"],
  "title": "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"}

[cd.metadata.update(metadata_additions) for cd in cleaned_docs]  # Updates dict in place

# Let's confirm everything worked:
cleaned_docs[0].metadata

# Great!

{'page_label': '1',
 'file_name': 'hnsw.pdf',
 'authors': ['Yu. A. Malkov', 'D. A. Yashunin'],
 'title': 'Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs'}

## Ingestion pipeline

In [12]:
from llama_index.node_parser import SemanticSplitterNodeParser
from llama_index.embeddings import OpenAIEmbedding
from llama_index.ingestion import IngestionPipeline

# This will be the model we use both for Node parsing and for vectorization
embed_model = OpenAIEmbedding(api_key=openai_api_key)

# Define the initial pipeline
pipeline = IngestionPipeline(
    transformations=[
        SemanticSplitterNodeParser(
            buffer_size=1,
            breakpoint_percentile_threshold=95,
            embed_model=embed_model,
            ),
        embed_model,
        ],
    )


# Upsert the data

In [14]:
from pinecone.grpc import PineconeGRPC
from pinecone import ServerlessSpec

from llama_index.vector_stores import PineconeVectorStore

# Initialize connection to Pinecone
pc = PineconeGRPC(api_key=pinecone_api_key)
index_name = "llama-integration-example"

# Create your index (can skip this step if your index already exists)
pc.create_index(
    index_name,
    dimension=1536,
    spec=ServerlessSpec(cloud="aws", region="us-west-2"),
)

# Initialize your index
pinecone_index = pc.Index(index_name)

# Initialize VectorStore
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)


In [12]:
# Our pipeline with the addition of our PineconeVectorStore
pipeline = IngestionPipeline(
    transformations=[
        SemanticSplitterNodeParser(
            buffer_size=1,
            breakpoint_percentile_threshold=95,
            embed_model=embed_model,
            ),
        embed_model,
        ],
        vector_store=vector_store  # Our new addition
    )

# Now we run our pipeline!
pipeline.run(documents=cleaned_docs)


Upserted vectors:   0%|          | 0/46 [00:00<?, ?it/s]

[TextNode(id_='1363c481-8016-4b69-bc55-7bdb99667a09', embedding=[-0.0019414897542446852, 0.026176759973168373, 0.0069093164056539536, -0.015489788725972176, 0.0006429210188798606, 0.004423647653311491, -0.008243432268500328, -0.01650090701878071, -0.03148513659834862, -0.05153900757431984, -0.0050169783644378185, 0.0018150998512282968, -0.00043973163701593876, 0.01759628765285015, 0.004725579172372818, 0.009661808609962463, 0.022300802171230316, -0.0036547754425555468, 0.01020247582346201, -0.004258638713508844, -0.031681742519140244, 0.0038127629086375237, -0.008770057000219822, -0.03302990272641182, 0.0014060880057513714, 0.03395676240324974, 0.015574048273265362, -0.01136105041950941, -0.005684036295861006, -0.015307225286960602, 0.020433038473129272, -0.02086838148534298, -0.00025321872089989483, 0.019787047058343887, -0.014345257543027401, 0.003244008170440793, 0.018452929332852364, -0.022230584174394608, 0.02852199412882328, 0.0034090173430740833, -0.0019678210373967886, -0.01874

In [15]:
pinecone_index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 46}},
 'total_vector_count': 46}

# Query the data

In [17]:
from llama_index import VectorStoreIndex
from llama_index.retrievers import VectorIndexRetriever

# Due to how LlamaIndex works here, if your Open AI API key was
# not set to an environment variable before, you have to set it at this point
if not os.getenv('OPENAI_API_KEY'):
    os.environ['OPENAI_API_KEY'] = openai_api_key

# Instantiate VectorStoreIndex object from our vector_store object
vector_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

# Grab 5 search results
retriever = VectorIndexRetriever(index=vector_index, similarity_top_k=5)

# # Query vector DB
answer = retriever.retrieve('How does logarithmic complexity affect graph construction?')

# # Inspect results
print([i.get_content() for i in answer])

['AUTHOR ET AL.: TITL E 7 be auto-configured by using sample data. The construction process can be easily and efficiently parallelized with only few synchronization points (as demonstrated in Fig. 9) and no measurable effect on index quality. Construction speed/index q uality tradeoff is co ntrolled via the efConstruction parameter. The tradeoff between the search time and the index construction time is presented in Fig. 10 for a 10M SIFT dataset and shows that a reasonable quality index can be constructed for efConstruct ion=100 on a 4X 2.4 GHz 10-core Xeon E5-4650 v2 CPU server in just 3 minutes. Further increase of the efConstruction leads to little extra performance but in exchange of significantly longer construction time. 4.2 Complexity analysis 4.2.1 Search complex ity The complexity scaling of a single search can be strictly analyzed under the assumption that we build exact D elaunay graphs instead of the approximate ones. Suppose we have found the closest element on some layer

In [18]:
# See how many Nodes we we retrieved (should be 5, because we made similarity_top_k=5)

len([i.get_content() for i in answer])

5

# Build a RAG app with the data

In [19]:
from llama_index.query_engine import RetrieverQueryEngine

# Pass in your retriever from above, which is configured to return the top 5 results
query_engine = RetrieverQueryEngine(retriever=retriever)

# Now you query:
llm_query = query_engine.query('How does logarithmic complexity affect graph construction?')

llm_query.response


'Logarithmic complexity in graph construction allows for efficient and scalable routing in the graph. It achieves this by organizing the graph into different layers based on their length scale. The search procedure starts from the top layer, which contains the longest links, and greedily traverses through the elements until a local minimum is reached. Then, the search switches to the lower layer with shorter links and repeats the process. By separating the links into layers, the maximum number of connections per element in all layers can be kept constant, resulting in a logarithmic complexity scaling for routing in the graph. This logarithmic complexity is achieved by selecting an integer level for each element, which determines the maximum layer it belongs to. The construction algorithm incrementally builds a proximity graph for each layer, and the search procedure is an iterative greedy search starting from the top layer and finishing at the zero layer. Overall, logarithmic complexit

In [61]:
# To get the context (Nodes) again, we can now use the `.source_nodes` attribute.
# Let's inspect the 1st Node:

llm_response_source_nodes = [i.get_content() for i in llm_query.source_nodes]
llm_response_source_nodes

['AUTHOR ET AL.: TITL E 7 be auto-configured by using sample data. The construction process can be easily and efficiently parallelized with only few synchronization points (as demonstrated in Fig. 9) and no measurable effect on index quality. Construction speed/index q uality tradeoff is co ntrolled via the efConstruction parameter. The tradeoff between the search time and the index construction time is presented in Fig. 10 for a 10M SIFT dataset and shows that a reasonable quality index can be constructed for efConstruct ion=100 on a 4X 2.4 GHz 10-core Xeon E5-4650 v2 CPU server in just 3 minutes. Further increase of the efConstruction leads to little extra performance but in exchange of significantly longer construction time. 4.2 Complexity analysis 4.2.1 Search complex ity The complexity scaling of a single search can be strictly analyzed under the assumption that we build exact D elaunay graphs instead of the approximate ones. Suppose we have found the closest element on some layer

# Evaluate the data

In [57]:
from llama_index.evaluation import RelevancyEvaluator

# (Need to avoid peripheral asyncio issues)
import nest_asyncio
nest_asyncio.apply()

# Define evaluator
evaluator = RelevancyEvaluator()

# Issue query
llm_response = query_engine.query(
    "How does logarithmic complexity affect graph construction?"
)

# Grab context used in answer query & make it pretty
llm_response_source_nodes = [i.get_content() for i in llm_response.source_nodes]

# # Take your previous question and pass in the response youwe got above
eval_result = evaluator.evaluate_response(query="How does logarithmic complexity affect graph construction?", response=llm_response)

print(f'\nGiven the {len(llm_response_source_nodes)} chunks of content (below), is your LLM\'s response relevant? {eval_result.passing}\n \
        \n ----Contexts----- \n \
        \n{llm_response_source_nodes}')



Given the 5 chunks of content (below), is your LLM's response relevant? True
         
 ----Contexts----- 
         
['AUTHOR ET AL.: TITL E 7 be auto-configured by using sample data. The construction process can be easily and efficiently parallelized with only few synchronization points (as demonstrated in Fig. 9) and no measurable effect on index quality. Construction speed/index q uality tradeoff is co ntrolled via the efConstruction parameter. The tradeoff between the search time and the index construction time is presented in Fig. 10 for a 10M SIFT dataset and shows that a reasonable quality index can be constructed for efConstruct ion=100 on a 4X 2.4 GHz 10-core Xeon E5-4650 v2 CPU server in just 3 minutes. Further increase of the efConstruction leads to little extra performance but in exchange of significantly longer construction time. 4.2 Complexity analysis 4.2.1 Search complex ity The complexity scaling of a single search can be strictly analyzed under the assumption that we

In [58]:
# For good measure, let's print out the context so that we can see it more clearly:

for l in llm_response_source_nodes:
    print(f'\n{l}')


AUTHOR ET AL.: TITL E 7 be auto-configured by using sample data. The construction process can be easily and efficiently parallelized with only few synchronization points (as demonstrated in Fig. 9) and no measurable effect on index quality. Construction speed/index q uality tradeoff is co ntrolled via the efConstruction parameter. The tradeoff between the search time and the index construction time is presented in Fig. 10 for a 10M SIFT dataset and shows that a reasonable quality index can be constructed for efConstruct ion=100 on a 4X 2.4 GHz 10-core Xeon E5-4650 v2 CPU server in just 3 minutes. Further increase of the efConstruction leads to little extra performance but in exchange of significantly longer construction time. 4.2 Complexity analysis 4.2.1 Search complex ity The complexity scaling of a single search can be strictly analyzed under the assumption that we build exact D elaunay graphs instead of the approximate ones. Suppose we have found the closest element on some layer 