In [1]:
import logging, sys
# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

# Uncomment if you want to temporarily disable logger
logging.disable(sys.maxsize)

In [2]:
# fetch "New York City" page from Wikipedia
from pathlib import Path

import requests
response = requests.get(
    'https://en.wikipedia.org/w/api.php',
    params={
        'action': 'query',
        'format': 'json',
        'titles': 'New York City',
        'prop': 'extracts',
        # 'exintro': True,
        'explaintext': True,
    }
).json()
page = next(iter(response['query']['pages'].values()))
nyc_text = page['extract']

data_path = Path('data')
if not data_path.exists():
    Path.mkdir(data_path)

with open('data/nyc_text.txt', 'w') as fp:
    fp.write(nyc_text)

In [3]:
# My OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = ""

In [8]:
from gpt_index import GPTTreeIndex, SimpleDirectoryReader, LLMPredictor
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI

In [10]:
# gpt-3 (davinci)
llm_predictor_gpt3 = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3"))

# gpt-4
llm_predictor_gpt4 = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4"))

In [6]:
from gpt_index.prompts.prompts import SimpleInputPrompt

prompt = SimpleInputPrompt("{query_str}")
llm_predictor.predict(prompt, query_str="hello world")

('Hello! How can I help you today?', 'hello world')

In [7]:
documents = SimpleDirectoryReader('data').load_data()

In [None]:
index = GPTTreeIndex(documents, llm_predictor=llm_predictor_gpt4)

In [17]:
index.save_to_disk('index_gpt4.json')

In [4]:
# try loading
new_index = GPTTreeIndex.load_from_disk('index_gpt4.json')

In [5]:
response_gpt4 = new_index.query(
    "What battles took place in New York City in the American Revolution?",
    llm_predictor=llm_predictor_gpt4,
    verbose=True
)

> Starting query: What battles took place in New York City in the American Revolution?
>[Level 0] Current response: ANSWER: 2

This summary was selected because it mentions the American Revolution and the Battle of Long Island, which took place in New York City. The other summaries do not discuss the American Revolution or any battles that occurred in the city during that time.
>[Level 0] Selected node: [2]/[2]
>[Level 0] Node [2] Summary text: New York City has a rich history, from its beginnings as a Dutch trading post to its growth into ...
>[Level 1] Current response: ANSWER: 1

This summary was selected because it mentions the Battle of Long Island, which was the largest battle of the American Revolutionary War and took place in the modern-day borough of Brooklyn within New York City.
>[Level 1] Selected node: [1]/[1]
>[Level 1] Node [1] Summary text: South Carolina. Most cases were that of domestic slavery, as a New York household then commonly e...
[36;1m[1;3m>[Level 1] Got no

In [10]:
str(response_gpt4)

'The Battle of Long Island took place in New York City in the American Revolution.'

In [None]:
response_gpt3 = new_index.query(
    "What battles took place in New York City in the American Revolution?",
    llm_predictor=llm_predictor_gpt3,
    verbose=True
)

In [None]:
str(response_gpt3)

In [6]:
response.source_nodes[0]

SourceNode(source_text="South Carolina. Most cases were that of domestic slavery, as a New York household then commonly enslaved few or several people. Others were hired out to work at labor. Slavery became integrally tied to New York's economy through the labor of slaves throughout the port, and the banking and shipping industries trading with the American South. During construction in Foley Square in the 1990s, the African Burying Ground was discovered; the cemetery included 10,000 to 20,000 of graves of colonial-era Africans, some enslaved and some free.The 1735 trial and acquittal in Manhattan of John Peter Zenger, who had been accused of seditious libel after criticizing colonial governor William Cosby, helped to establish the freedom of the press in North America. In 1754, Columbia University was founded under charter by King George II as King's College in Lower Manhattan.\n\n\n=== American Revolution ===\n\nThe Stamp Act Congress met in New York in October 1765, as the Sons of L

In [40]:
# GPT doesn't find the corresponding evidence in the leaf node, but still gives the correct answer
# set Logging to DEBUG for more detailed outputs

response = new_index.query(
    "What are the airports in New York City?",
    llm_predictor=llm_predictor
)

INFO:gpt_index.indices.query.tree.leaf_query:> Starting query: What are the airports in New York City?
> Starting query: What are the airports in New York City?
> Starting query: What are the airports in New York City?
INFO:gpt_index.indices.query.tree.leaf_query:>[Level 0] Selected node: [9]/[9]
>[Level 0] Selected node: [9]/[9]
>[Level 0] Selected node: [9]/[9]
INFO:gpt_index.indices.query.tree.leaf_query:>[Level 1] Selected node: [9]/[9]
>[Level 1] Selected node: [9]/[9]
>[Level 1] Selected node: [9]/[9]
INFO:gpt_index.token_counter.token_counter:> [query] Total LLM token usage: 5796 tokens
> [query] Total LLM token usage: 5796 tokens
> [query] Total LLM token usage: 5796 tokens
INFO:gpt_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens
> [query] Total embedding token usage: 0 tokens
> [query] Total embedding token usage: 0 tokens


In [41]:
response.source_nodes[0]

SourceNode(source_text="were the busiest and fourth busiest U.S. gateways for international air passengers, respectively, in 2012; as of 2011, JFK was the busiest airport for international passengers in North America.Plans have advanced to expand passenger volume at a fourth airport, Stewart International Airport near Newburgh, New York, by the Port Authority of New York and New Jersey. Plans were announced in July 2015 to entirely rebuild LaGuardia Airport in a multibillion-dollar project to replace its aging facilities. Other commercial airports in or serving the New York metropolitan area include Long Island MacArthur Airport, Trenton–Mercer Airport and Westchester County Airport. The primary general aviation airport serving the area is Teterboro Airport.\n\n\n=== Ferries ===\n\nThe Staten Island Ferry is the world's busiest ferry route, carrying more than 23 million passengers from July 2015 through June 2016 on the 5.2-mile (8.4 km) route between Staten Island and Lower Manhattan 

In [42]:
print(str(response))

The context information does not provide a specific list of airports in New York City. However, it mentions JFK (John F. Kennedy International Airport) and LaGuardia Airport as two of the busiest U.S. gateways for international air passengers. It also mentions plans to expand passenger volume at Stewart International Airport near Newburgh, New York. Other airports mentioned in the context serving the New York metropolitan area include Long Island MacArthur Airport, Trenton–Mercer Airport, Westchester County Airport, and Teterboro Airport (for general aviation).


In [None]:
# Try using embedding query
new_index.query("What are the airports in New York City?", mode="embedding")