# LlamaIndex + Pinecone 

In this tutorial, we show how to use LlamaIndex with Pinecone to answer complex queries over multiple data sources.  
1. Compare and contrast queries over Wikipedia articles
2. Temporal queries (TBD)

In [1]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

#### Load Dataset

Fetch and load Wikipedia pages

In [4]:
from llama_index import SimpleDirectoryReader

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.


  from .autonotebook import tqdm as notebook_tqdm


In [5]:
wiki_titles = ["Toronto", "Seattle", "San Francisco", "Chicago", "Boston", "Washington, D.C.", "Cambridge, Massachusetts", "Houston"]

In [6]:
from pathlib import Path
import requests

data_path = Path('data_wiki')

for title in wiki_titles:
    response = requests.get(
        'https://en.wikipedia.org/w/api.php',
        params={
            'action': 'query',
            'format': 'json',
            'titles': title,
            'prop': 'extracts',
            # 'exintro': True,
            'explaintext': True,
        }
    ).json()
    page = next(iter(response['query']['pages'].values()))
    wiki_text = page['extract']

    if not data_path.exists():
        Path.mkdir(data_path)

    with open(data_path / f"{title}.txt", 'w') as fp:
        fp.write(wiki_text)


In [7]:
# Load all wiki documents
city_docs = {}
all_docs = []
for wiki_title in wiki_titles:
    city_docs[wiki_title] = SimpleDirectoryReader(input_files=[data_path / f"{wiki_title}.txt"]).load_data()
    all_docs.extend(city_docs[wiki_title])


#### Creating a Pinecone Index

In [8]:
import pinecone

In [9]:
api_key = "8771f9e8-9830-4711-8735-1cfc716b9bc3"
pinecone.init(api_key=api_key, environment="eu-west1-gcp")

In [None]:
# create index if it does not already exist
# dimensions are for text-embedding-ada-002
pinecone.create_index("quickstart-index", dimension=1536, metric="euclidean", pod_type="p1")

In [11]:
pinecone_index = pinecone.Index("quickstart-index")

#### Build Indices

In [12]:
from llama_index import GPTVectorStoreIndex, StorageContext
from llama_index.vector_stores import PineconeVectorStore
from IPython.display import Markdown, display

In [13]:
# Build index for each city document
city_indices = {}
index_summaries = {}
for wiki_title in wiki_titles:
    print(f"Building index for {wiki_title}")
    # create storage context
    vector_store = PineconeVectorStore(pinecone_index=pinecone_index, namespace=wiki_title)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    
    # build index
    city_indices[wiki_title] = GPTVectorStoreIndex.from_documents(city_docs[wiki_title], storage_context=storage_context)

    # set summary text for city
    index_summaries[wiki_title] = f"Wikipedia articles about {wiki_title}"

Building index for Toronto
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 20744 tokens
> [build_index_from_nodes] Total embedding token usage: 20744 tokens
Building index for Seattle
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 16942 tokens
> [build_index_from_nodes] Total embedding token usage: 16942 tokens
Building index for San Francisco
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_inde

#### Build Graph Query Engine for Compare & Contrast Query

In [14]:
from llama_index.indices.composability import ComposableGraph
from llama_index.indices.keyword_table.simple_base import GPTSimpleKeywordTableIndex

In [15]:
graph = ComposableGraph.from_indices(
    GPTSimpleKeywordTableIndex,
    [index for _, index in city_indices.items()], 
    [summary for _, summary in index_summaries.items()],
    max_keywords_per_chunk=50
)

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token usage: 0 tokens


In [16]:
from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
from llama_index.query_engine.transform_query_engine import TransformQueryEngine

decompose_transform = DecomposeQueryTransform(verbose=True)

custom_query_engines = {}
for wiki_title in wiki_titles:
    index = city_indices[wiki_title]
    query_engine = index.as_query_engine()
    query_engine = TransformQueryEngine(
        query_engine,
        query_transform=decompose_transform,
        transform_extra_info={'index_summary': index_summaries[wiki_title]},
    )
    custom_query_engines[index.index_id] = query_engine

custom_query_engines[graph.root_id] = graph.root_index.as_query_engine(
    retriever_mode='simple',
    response_mode='tree_summarize',
)


In [17]:
# with query decomposition in subindices
query_engine = graph.as_query_engine(custom_query_engines=custom_query_engines)

#### Run Compare & Contrast Query

In [18]:
response = query_engine.query("Compare and contrast the demographics in Seattle, Houston, and Toronto.")

INFO:llama_index.indices.keyword_table.retrievers:> Starting query: Compare and contrast the demographics in Seattle, Houston, and Toronto.
> Starting query: Compare and contrast the demographics in Seattle, Houston, and Toronto.
INFO:llama_index.indices.keyword_table.retrievers:query keywords: ['contrast', 'demographics', 'seattle', 'toronto', 'compare', 'houston']
query keywords: ['contrast', 'demographics', 'seattle', 'toronto', 'compare', 'houston']
INFO:llama_index.indices.keyword_table.retrievers:> Extracted keywords: ['seattle', 'toronto', 'houston']
> Extracted keywords: ['seattle', 'toronto', 'houston']
[33;1m[1;3m> Current query: Compare and contrast the demographics in Seattle, Houston, and Toronto.
[0m[38;5;200m[1;3m> New query:  What is the population of Seattle?
[0mINFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embed

In [19]:
from llama_index.response.notebook_utils import display_response

display_response(response, show_source=False)

**`Final Response:`** Seattle, Houston, and Toronto are all large cities with diverse populations. Seattle has the smallest population of the three cities, with 753,675 people as of 2021. Houston has the second largest population, with 2,304,580 people according to the 2020 U.S. census. Toronto has the largest population of the three cities, with 6,202,225 people in 2021. All three cities have a mix of different ethnicities and cultures, with Seattle having the most diverse population. Houston and Toronto have larger immigrant populations than Seattle. All three cities have a mix of different economic classes, with Toronto having the highest median household income.

### Comparison to a Single Vector Index over all Documents

In [20]:
# also setup a global vector index 
vector_store = PineconeVectorStore(pinecone_index=pinecone_index, namespace='all wikis')
storage_context = StorageContext.from_defaults(vector_store=vector_store) 
global_index = GPTVectorStoreIndex.from_documents(all_docs, storage_context=storage_context)

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 162197 tokens
> [build_index_from_nodes] Total embedding token usage: 162197 tokens


In [21]:
global_query_engine = global_index.as_query_engine(similarity_top_k=4)

In [23]:
response = global_query_engine.query("Compare and contrast the demographics in Seattle, Houston, and Toronto.")

INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 13 tokens
> [retrieve] Total embedding token usage: 13 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 4526 tokens
> [get_response] Total LLM token usage: 4526 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens


In [24]:
display_response(response, show_source=False)

**`Final Response:`** Seattle, Houston, and Toronto are all major cities with diverse populations. Seattle has a population of approximately 753,000 people, while Houston has a population of 2.3 million and Toronto has a population of 2.9 million. 

Seattle has a notably large lesbian, gay, bisexual, and transgender community, with 12.9% of city residents identifying as LGBT. Houston and Toronto have smaller LGBT populations, with 8.8% and 8.2% of their respective populations identifying as LGBT. 

Seattle has a higher percentage of college and university graduates than the national average, with 53.8% of the population over the age of 25 holding a bachelor's degree or higher. Houston and Toronto have lower percentages of college and university graduates, with 37.2% and 43.2% of their respective populations over the age of 25 holding a bachelor's degree or higher. 

Seattle has a higher percentage of foreign-born residents than Houston and Toronto, with 15.2% of the population being foreign-born. Houston and Toronto have lower percentages of foreign-born residents, with 13.7% and 44.7% of their respective populations being foreign-born. Seattle also has a higher percentage of white residents than Houston

In [28]:
display_response(response, show_source=True, source_length=1000)

**`Final Response:`** Seattle, Houston, and Toronto are all major cities with diverse populations. Seattle has a population of approximately 753,000 people, while Houston has a population of 2.3 million and Toronto has a population of 2.9 million. 

Seattle has a notably large lesbian, gay, bisexual, and transgender community, with 12.9% of city residents identifying as LGBT. Houston and Toronto have smaller LGBT populations, with 8.8% and 8.2% of their respective populations identifying as LGBT. 

Seattle has a higher percentage of college and university graduates than the national average, with 53.8% of the population over the age of 25 holding a bachelor's degree or higher. Houston and Toronto have lower percentages of college and university graduates, with 37.2% and 43.2% of their respective populations over the age of 25 holding a bachelor's degree or higher. 

Seattle has a higher percentage of foreign-born residents than Houston and Toronto, with 15.2% of the population being foreign-born. Houston and Toronto have lower percentages of foreign-born residents, with 13.7% and 44.7% of their respective populations being foreign-born. Seattle also has a higher percentage of white residents than Houston

---

**`Source Node 1/4`**

**Document ID:** 4365f83b-9f38-48bc-834c-a9b5caf84498<br>**Similarity:** 0.816627681<br>**Text:** Pacific Northwest (which has the lowest rate of church attendance in the United States and consistently reports the highest percentage of atheism), church attendance, religious belief, and political influence of religious leaders are much lower than in other parts of America. Seattle's political culture is very liberal and progressive for the United States, with over 80% of the population voting for the Democratic Party. All precincts in Seattle voted for Democratic Party candidate Barack Obama in the 2012 presidential election. In partisan elections for the Washington State Legislature and United States Congress, nearly all elections are won by Democrats. Although local elections are nonpartisan, most of the city's elected officials are known to be Democrats.In 1926, Seattle became the first major American city to elect a female mayor, Bertha Knight Landes. It has also elected an openly gay mayor, Ed Murray, and a third-party socialist councillor, Kshama Sawant. For the first time ...<br>

---

**`Source Node 2/4`**

**Document ID:** ecb39676-0c44-4855-8bf4-975bf026e39f<br>**Similarity:** 0.815992534<br>**Text:** in the United States.According to the ACS 1-year estimates, in 2018, the median income of a city household was $93,481, and the median income for a family was $130,656. 11.0% of the population and 6.6% of families were below the poverty line. Of people living in poverty, 11.4% were under the age of 18 and 10.9% were 65 or older.It is estimated that King County has 8,000 homeless people on any given night, and many of those live in Seattle. In September 2005, King County adopted a "Ten-Year Plan to End Homelessness", one of the near-term results of which is a shift of funding from homeless shelter beds to permanent housing.In recent years, the city has experienced steady population growth, and has been faced with the issue of accommodating more residents. In 2006, after growing by 4,000 citizens per year for the previous 16 years, regional planners expected the population of Seattle to grow by 200,000 people by 2040. However, former mayor Greg Nickels supported plans that would incre...<br>

---

**`Source Node 3/4`**

**Document ID:** 0b5a9857-9b03-455f-bbf7-2f87f84f23a0<br>**Similarity:** 0.81528604<br>**Text:** climate because it is cooler and wetter than a "true" Mediterranean climate, but shares the characteristic dry summer (which has a strong influence on the region's vegetation).Temperature extremes are moderated by the adjacent Puget Sound, greater Pacific Ocean, and Lake Washington. Thus extreme heat waves are rare in the Seattle area, as are very cold temperatures (below about 15 °F (−9 °C)). The Seattle area is the cloudiest region of the United States, due in part to frequent storms and lows moving in from the adjacent Pacific Ocean. With many more "rain days" than other major American cities, Seattle has a well-earned reputation for frequent rain. In an average year, at least 0.01 inches (0.25 mm) of precipitation falls on 150 days, more than nearly all U.S. cities east of the Rocky Mountains. However, because it often has merely a light drizzle falling from the sky for many days, Seattle actually receives significantly less rainfall (or other precipitation) overall than many ot...<br>

---

**`Source Node 4/4`**

**Document ID:** 85ded3af-3d00-4057-84cd-b186ff001426<br>**Similarity:** 0.811879575<br>**Text:** people were killed in an illegal gambling club in the Seattle Chinatown-International District. Beginning with Microsoft's 1979 move from Albuquerque, New Mexico, to nearby Bellevue, Washington, Seattle and its suburbs became home to a number of technology companies including Amazon, F5 Networks, RealNetworks, Nintendo of America, and T-Mobile. This success brought an influx of new residents with a population increase within city limits of almost 50,000 between 1990 and 2000, and saw Seattle's real estate become some of the most expensive in the country. In 1993, the movie Sleepless in Seattle brought the city further national attention, as did the television sitcom Frasier. The dot-com boom caused a great frenzy among the technology companies in Seattle but the bubble ended in early 2001.Seattle in this period attracted widespread attention as home to these many companies, but also by hosting the 1990 Goodwill Games and the APEC leaders conference in 1993, as well as through the wo...<br>