In [None]:
# My OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"

## Using GPT Tree Index

#### [Demo] Default leaf traversal 

In [1]:
from gpt_index import GPTTreeIndex, SimpleDirectoryReader
from IPython.display import Markdown, display

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/jerryliu/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [2]:
documents = SimpleDirectoryReader('data').load_data()
index = GPTTreeIndex(documents)

> Building index from nodes: 5 chunks
0/56
> 0/56, summary: 
In February 2021, the author worked on writing and programming before college. He used an IBM 1401 to write programs in an early version of Fortran, but he couldn't figure out what to do with it. When microcomputers became available, he wrote simple games, a program to predict how high his model rockets would fly, and a word processor. In college, he studied philosophy but found it boring and switched to AI. He reverse-engineered SHRDLU for his undergraduate thesis and wrote a book about Lisp hacking. He realized that AI, as practiced at the time, was a hoax and decided to focus on Lisp. He visited Rich Draves at CMU and realized he could make art that would last. He started taking art classes at Harvard and applied to RISD and the Accademia di Belli Arti in Florence. RISD accepted him and he left for Providence, but he got a letter from the Accademia inviting him to take the entrance exam in Florence. He had enough money sav

In [3]:
index.save_to_disk('index.json')

In [2]:
# try loading
new_index = GPTTreeIndex.load_from_disk('index.json')

In [None]:
# try verbose=True for more detailed outputs
response = new_index.query("What did the author do growing up?")

In [6]:
display(Markdown(f"<b>{response}</b>"))

<b>The author grew up writing short stories and programming on an IBM 1401 computer.</b>

In [None]:
# try verbose=True for more detailed outputs
response = new_index.query("What did the author do after his time at Y Combinator?", verbose=True)

In [10]:
display(Markdown(f"<b>{response}</b>"))

<b>ANSWER: 18. This summary explains that the author left Y Combinator and describes the difficulty of the transition, which is relevant to the question of what the author did after his time at Y Combinator.</b>

#### [Demo] Leaf traversal with child_branch_factor=2

In [None]:
# try using branching factor 2
response = new_index.query("What did the author do growing up?", child_branch_factor=2)

In [13]:
display(Markdown(f"<b>{response}</b>"))

<b>The author grew up programming on an IBM 1401 computer, experimenting with model rockets, using a word processor to write books, studying philosophy in high school, attending the Accademia di Belle Arti in Florence, Italy, learning Italian and painting, working at Interleaf, and attending RISD to learn about color and painting. They also wrote a book on Lisp and started a company to put art galleries online, and eventually realized that online stores were similar to the sites they had been generating for galleries and started writing software to build online stores.</b>

#### [Demo] Build Tree Index with query_str, directly retrieve answer from root node

In [3]:
from gpt_index import Prompt

In [None]:
documents = SimpleDirectoryReader('data').load_data()

query_str = "What did the author do growing up?"
DEFAULT_TEXT_QA_PROMPT_TMPL = (
    "Context information is below. \n"
    "---------------------\n"
    "{text}"
    "\n---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question: {query_str}\n"
)
DEFAULT_TEXT_QA_PROMPT = Prompt(
    input_variables=["query_str", "text"],
    template=DEFAULT_TEXT_QA_PROMPT_TMPL
)
index_with_query = GPTTreeIndex(documents, summary_template=DEFAULT_TEXT_QA_PROMPT, query_str=query_str)

In [6]:
index_with_query.save_to_disk("index_with_query.json")

In [7]:
index_with_query = GPTTreeIndex.load_from_disk("index_with_query.json")

In [None]:
# directly retrieve response from root nodes instead of traversing tree
response = index_with_query.query(query_str, mode="retrieve")

In [10]:
display(Markdown(f"<b>{response}</b>"))

<b>
The author was homeschooled and then attended a prestigious art school. The author grew up writing essays and thinking about other things he could work on.</b>

## Using GPT Keyword Table Index

In [6]:
from gpt_index import GPTKeywordTableIndex, SimpleDirectoryReader
from IPython.display import Markdown, display

In [5]:
# build keyword index
documents = SimpleDirectoryReader('data').load_data()
index = GPTKeywordTableIndex(documents)

> Processing chunk 0 of 6, id 6633467239764706071: 		

What I Worked On

February 2021

Before col...
> Keywords: {'artificial', 'word', 'philosophy', 'trs', 'winograd', 'intelligence', 'programming', '80', 'lisp', 'work', 'systems', 'di', 'systems work', 'accademia', 'carnegie', 'terry winograd', 'natural language', 'carnegie institute', 'paintings', 'italian', 'structures', 'cmu', 'accademia di belli arti', 'rockets', 'word processor', 'ibm', 'model', 'natural', 'language', 'belli', 'fortran', 'cezanne', 'institute', 'trs-80', 'data structures', 'reverse', 'arti', 'artificial intelligence', 'florence', 'data', 'model rockets', 'shrdlu', 'ai', 'engineered', '1401', 'processor', 'ibm 1401', 'art', 'reverse-engineered', 'heinlein', 'terry', 'microcomputers'}
> Processing chunk 1 of 6, id 4100179696019853008: of excluding them, because there were so many s...
> Keywords: {'world', 'online', 'online stores', 'lisp', 'cart', 'accademia', 'idelle weber', 'weber', 'user', 'yorkville', 'inter

In [6]:
# save index
index.save_to_disk('index_table.json')

In [7]:
# reload index
index = GPTKeywordTableIndex.load_from_disk('index_table.json')

In [8]:
response = index.query("What did the author do after his time at Y Combinator?")

> Starting query: What did the author do after his time at Y Combinator?
Extracted keywords: ['combinator', 'y combinator']
> Querying with idx: 4100179696019853008: of excluding them, because there were so many s...
> Querying with idx: 2240695989319726955: an alarming prospect, because neither of us kne...
> Querying with idx: 7674903471241618344: it was like living in another country, and sinc...
> Querying with idx: 3531546443662073264: browser, and then host the resulting applicatio...


In [9]:
display(Markdown(f"<b>{response}</b>"))

<b>

After his time at Y Combinator, the author returned to painting. He moved to New York City and rented a rent-controlled apartment in Yorkville. He wrote a book on Lisp and did freelance Lisp hacking work to support himself. He also became the de facto studio assistant for Idelle Weber, an early photorealist painter. He experimented with a new kind of still life, blowing up the images on canvas and using that as the underpainting for a second still life, painted from the same objects. He wrote lots of essays about all kinds of different topics. He co-founded Y Combinator and wrote a book called Bel. He also wrote essays about topics he had stacked up. He wrote an essay for himself to answer how he chose what to work on in the past. He wrote a more detailed version for others to read. He then cooked up something he called the Summer Founders Program, and posted an announcement on his site, inviting undergrads to apply. He invited about 20 of the 225 groups to interview in person, and from those he picked 8 to fund. After his options vested, he left Yahoo in the summer of 1999 and started painting again. He looked for an apartment to buy and eventually had the idea to build a</b>

## Using GPT List Index

In [1]:
from gpt_index import GPTListIndex, SimpleDirectoryReader
from IPython.display import Markdown, display

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/jerryliu/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [11]:
# build linked list index
documents = SimpleDirectoryReader('data').load_data()
index = GPTListIndex(documents)
# save index
index.save_to_disk('index_list.json')

> Adding chunk: 		

What I Worked On

February 2021

Before col...
> Adding chunk: Florence that the Italian students would otherw...
> Adding chunk: Robert wrote a shopping cart, and I wrote a new...
> Adding chunk: this idea that I couldn't think about anything ...
> Adding chunk: luck that the first batch was so good. You had ...
> Adding chunk: I wrote a bunch about topics I'd had stacked up...


In [2]:
# load index from disk
index = GPTListIndex.load_from_disk('index_list.json')

In [None]:
response = index.query("What did the author do after his time at Y Combinator?", verbose=True)

In [5]:
display(Markdown(f"<b>{response}</b>"))

<b>

The author attended the Accademia di Belle Arti di Firenze in Italy after his time at Y Combinator. He learned some useful things at Interleaf, such as the importance of product people over sales people, and that the low end eats the high end. He then moved back to Providence to continue at RISD, but dropped out after a year. He then moved to New York City and started to write a book on Lisp. He also worked as a studio assistant for Idelle Weber and tried to start a company to put art galleries online. He eventually realized that he could build online stores and started to write software to do so. He and Robert worked out of Robert's apartment in Cambridge, and wrote a shopping cart and a new site generator for stores in Lisp. They then had the idea to run the software on the server, and let users control it by clicking on links, creating a web app.

The author then started a new company called Viaweb, and got $10,000 in seed funding from Idelle's husband Julian. They opened for business in January 1996, and had about 70 stores at the end of 1996 and about 500 at the end of 1997. In the summer of 1998, Yahoo bought Viaweb and</b>