# Using LlamaIndex with Pinecone

Test complex queries over both text-davinci-003 and ChatGPT

In [None]:
!pip install llama-index

In [25]:
# My OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = ""

In [1]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [16]:
from llama_index import (
    GPTPineconeIndex, 
    GPTSimpleKeywordTableIndex, 
    GPTListIndex, 
    SimpleDirectoryReader,
    LLMPredictor,
    ServiceContext
)
from langchain.llms.openai import OpenAIChat, OpenAI
import requests

#### Load Datasets

Load Wikipedia pages as well as Paul Graham's "What I Worked On" essay

In [3]:
wiki_titles = ["Toronto", "Seattle", "San Francisco", "Chicago", "Boston", "Washington, D.C.", "Cambridge, Massachusetts", "Houston"]
pinecone_titles = ["toronto", "seattle", "san-francisco", "chicago", "boston", "dc", "cambridge", "houston"]

In [10]:
from pathlib import Path

import requests
for title in wiki_titles:
    response = requests.get(
        'https://en.wikipedia.org/w/api.php',
        params={
            'action': 'query',
            'format': 'json',
            'titles': title,
            'prop': 'extracts',
            # 'exintro': True,
            'explaintext': True,
        }
    ).json()
    page = next(iter(response['query']['pages'].values()))
    wiki_text = page['extract']

    data_path = Path('data')
    if not data_path.exists():
        Path.mkdir(data_path)

    with open(data_path / f"{title}.txt", 'w') as fp:
        fp.write(wiki_text)


In [4]:
# Load all wiki documents
city_docs = {}
for wiki_title in wiki_titles:
    city_docs[wiki_title] = SimpleDirectoryReader(input_files=[f"data/{wiki_title}.txt"]).load_data()


### Initialize Pinecone Indexes

In [5]:
import pinecone

In [6]:
api_key = ""
pinecone.init(api_key=api_key, environment="us-west1-gcp")

In [7]:
index = pinecone.Index("quickstart")

### Building the document indices
Build a vector index for the wiki pages about cities and persons, and PG essay

In [17]:
# LLM Predictor (gpt-3.5-turbo)
llm_predictor_chatgpt = LLMPredictor(llm=OpenAIChat(temperature=0, model_name="gpt-3.5-turbo"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor_chatgpt)

In [None]:
# Build city document index
city_indices = {}
for pinecone_title, wiki_title in zip(pinecone_titles, wiki_titles):
    metadata_filters = {"wiki_title": wiki_title}
    city_indices[wiki_title] = GPTPineconeIndex.from_documents(
        city_docs[wiki_title], pinecone_index=index, metadata_filters=metadata_filters
    )
    # set summary text for city
    city_indices[wiki_title].index_struct.doc_id = pinecone_title
    city_indices[wiki_title].save_to_disk(f'index_{wiki_title}.json')

### Loading the indices
Build a vector index for the NYC wiki page and PG essay

In [18]:
# If indices already saved, try loading
city_indices = {}
for wiki_title in wiki_titles:
    city_indices[wiki_title] = GPTPineconeIndex.load_from_disk(
      f'index_{wiki_title}.json', pinecone_index=index
    )

### Query Index

In [20]:
response = city_indices["Boston"].query(
    "Tell me about the arts and culture of Boston",
    service_context=service_context
)

INFO:gpt_index.token_counter.token_counter:> [query] Total LLM token usage: 4381 tokens
> [query] Total LLM token usage: 4381 tokens
INFO:gpt_index.token_counter.token_counter:> [query] Total embedding token usage: 9 tokens
> [query] Total embedding token usage: 9 tokens


In [22]:
print(str(response))
print(response.get_formatted_sources())

Boston has a rich arts and culture scene, with numerous art galleries and museums such as the Institute of Contemporary Art, Boston Children's Museum, Museum of Science, and the New England Aquarium. The city is also home to several historic churches, including the oldest church in Boston, First Church in Boston, and King's Chapel, the city's first Anglican church. Boston has a strong religious presence, with the Roman Catholic Archdiocese of Boston and the Episcopal Diocese of Massachusetts both based in the city. The city also has a love for sports, with teams in the four major North American men's professional sports leagues plus Major League Soccer, and has won 39 championships in these leagues. Boston Common, the oldest public park in the United States, and the adjacent Boston Public Garden are part of the Emerald Necklace, a string of parks designed by Frederick Law Olmsted to encircle the city. The city's park system is well-reputed nationally, with Boston tied with Sacramento a

### Build Graph: Keyword Table Index on top of vector indices! 

We compose a keyword table index on top of all the vector indices.

In [24]:
from llama_index.composability.graph import ComposableGraph

In [25]:
# set summaries for each city
index_summaries = {}
for wiki_title in wiki_titles:
    # set summary text for city
    index_summaries[wiki_title] = f"Wikipedia articles about {wiki_title}"

In [26]:
graph = ComposableGraph.from_indices(
    GPTSimpleKeywordTableIndex,
    [index for _, index in city_indices.items()], 
    [summary for _, summary in index_summaries.items()],
    max_keywords_per_chunk=50
)

INFO:gpt_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:gpt_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token usage: 0 tokens


[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/jerryliu/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [27]:
# [optional] save to disk
graph.save_to_disk("index_multi_doc_graph.json")

In [28]:
# [optional] load from disk
graph = ComposableGraph.load_from_disk("index_multi_doc_graph.json")

In [29]:
# set query config
# NOTE: we need to specify a query config for every pinecone index 
query_configs = []
for pinecone_title, wiki_title in zip(pinecone_title, wiki_title):
    query_config = {
        "index_struct_id": pinecone_title,
        "index_struct_type": "pinecone",
        "query_mode": "default",
        "query_kwargs": {
            "similarity_top_k": 1,
            "pinecone_index": index,
        }
    }
    query_configs.append(query_config)

query_configs.append({
    "index_struct_type": "keyword_table",
    "query_mode": "simple",
    "query_kwargs": {
        "response_mode": "tree_summarize"
    }
})

### Compare Queries (text-davinci-003 vs. ChatGPT)

**Simple Query**

In [31]:
query_str = "Tell me more about Boston"
response_chatgpt = graph.query(query_str, query_configs=query_configs, service_context=service_context)

INFO:gpt_index.indices.query.keyword_table.query:> Starting query: Tell me more about Boston
> Starting query: Tell me more about Boston
INFO:gpt_index.indices.query.keyword_table.query:query keywords: ['tell', 'boston']
query keywords: ['tell', 'boston']
INFO:gpt_index.indices.query.keyword_table.query:> Extracted keywords: ['boston']
> Extracted keywords: ['boston']
INFO:gpt_index.indices.common.tree.base:> Building index from nodes: 0 chunks
> Building index from nodes: 0 chunks


In [32]:
print(response_chatgpt)
print(response_chatgpt.get_formatted_sources())

Boston is a historic city located in the northeastern United States and is the capital of Massachusetts. It has a population of over 700,000 people and played a significant role in the American Revolution, with events such as the Boston Tea Party and the Battle of Bunker Hill taking place there. Boston is also known for its universities, including Harvard and MIT, and its thriving economy, which is driven by industries such as finance, healthcare, and technology. The city underwent urban renewal projects in the mid-20th century, which aimed to revitalize the city but also resulted in the displacement of many low-income residents. However, the city experienced an economic recovery in the 1970s and 1980s, which led to the development of new businesses and cultural institutions. Today, Boston is a vibrant and diverse city with a rich history and a bright future.
> Source (Doc id: None): The existing answer provides a comprehensive overview of Boston, including its history, populatio...

>