# Composable Indices Demo

In [1]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [2]:
from llama_index import (
    GPTSimpleVectorIndex,
    GPTEmptyIndex,
    GPTTreeIndex,
    GPTListIndex,
    SimpleDirectoryReader
)

### Load Datasets

Load PG's essay

In [3]:
# load PG's essay
essay_documents = SimpleDirectoryReader('../paul_graham_essay/data/').load_data()

### Building the document indices
- Build a vector index for PG's essay
- Also build an empty index (to store prior knowledge)

In [None]:
# build essay index
essay_index = GPTSimpleVectorIndex(essay_documents, chunk_size_limit=512)
empty_index = GPTEmptyIndex()

In [5]:
essay_index.save_to_disk('index_pg.json')

### Loading the indices
Build a vector index for PG's essay, build empty index.

In [None]:
# try loading
essay_index = GPTSimpleVectorIndex.load_from_disk('index_pg.json')
empty_index = GPTEmptyIndex()

### Set summaries for the indices

Add text summaries to indices, so we can compose other indices on top of it

In [7]:
essay_index.set_text( 
    "This document describes Paul Graham's life, from early adulthood to the present day."
)

empty_index.set_text("This can be used for general knowledge purposes.")

### Query Indices
See the response of querying each index

In [None]:
response = essay_index.query(
    "Tell me about what Sam Altman did during his time in YC",
    similarity_top_k=3,
    response_mode="tree_summarize"
)

In [9]:
print(str(response))


During his time in YC, Sam Altman reorganized the company to ensure its longevity, implemented the batch model of funding startups twice a year, and organized weekly dinners at the YC headquarters in Cambridge. At these dinners, experts on startups would give talks. He also spent time helping the startups through Demo Day.


In [None]:
response = empty_index.query(
    "Tell me about what Sam Altman did during his time in YC",
)

In [11]:
print(str(response))



Sam Altman was the President of Y Combinator from 2014 to 2020. During his time at YC, he oversaw the growth of the accelerator from a small startup incubator to a global network of over 2,000 companies. He also helped launch YC's Fellowship program, which provides early-stage funding to promising startups. He also helped launch YC's Growth program, which provides late-stage funding to companies that have already achieved product-market fit. Altman also helped launch YC's Research program, which provides grants to researchers working on projects related to technology and entrepreneurship. He also helped launch YC's Startup School, which provides free online courses to help entrepreneurs launch their own startups. Altman also helped launch YC's Demo Day, which provides a platform for startups to pitch their ideas to investors. Finally, Altman helped launch YC's Women in Tech program, which provides mentorship and resources to female founders.


### Define Graph (List Index as Parent Index)

This allows us to synthesize responses both using a knowledge corpus as well as prior knowledge.

In [12]:
from llama_index.composability import ComposableGraph

In [13]:
# set query config
query_configs = [
    {
        "index_struct_type": "simple_dict",
        "query_mode": "default",
        "query_kwargs": {
            "similarity_top_k": 3,
            "response_mode": "tree_summarize"
        }
    },
]

In [None]:
list_index = GPTListIndex([essay_index, empty_index])

In [15]:
graph = ComposableGraph.build_from_index(list_index)

In [16]:
# [optional] save to disk
graph.save_to_disk("index_graph.json")

In [17]:
# [optional] load from disk
graph = ComposableGraph.load_from_disk("index_graph.json")

In [None]:
# set Logging to DEBUG for more detailed outputs
# ask it a question about Sam Altman
response = graph.query(
    "Tell me about what Sam Altman did during his time in YC", 
    query_configs=query_configs
)

In [19]:
print(str(response))



During his time in YC, Sam Altman reorganized the company to ensure its longevity, implemented the batch model of funding startups twice a year, and organized weekly dinners at the YC headquarters in Cambridge. At these dinners, experts on startups would give talks. He also spent time helping the startups through Demo Day and taking over more and more of the running of YC. He also oversaw the growth of the accelerator from a small startup incubator to a global network of over 2,000 companies. He also helped launch YC's Fellowship program, which provides early-stage funding to promising startups, YC's Growth program, which provides late-stage funding to companies that have already achieved product-market fit, YC's Research program, which provides grants to researchers working on projects related to technology and entrepreneurship, YC's Startup School, which provides free online courses to help entrepreneurs launch and grow their businesses, and YC's Women in Tech program, which provid

In [20]:
# Get source of response
print(response.get_formatted_sources())

> Source (Doc id: 081e45f9-9dfd-4425-82c6-6eb81cc75bff): This document describes Paul Graham's life, from early adulthood to the present day....

> Source (Doc id: 9268910d-4749-46a8-847d-1279ea88737b): one of them I realized I was ready to hand YC over to someone else.

I asked Jessica if she wante...

> Source (Doc id: 9268910d-4749-46a8-847d-1279ea88737b): one works harder than the boss." He meant it both descriptively and prescriptively, and it was th...

> Source (Doc id: 9268910d-4749-46a8-847d-1279ea88737b): had done for us that seemed to us like magic was to get us set up as a company. We were fine writ...

> Source (Doc id: 4bfea35b-864a-41d9-a69c-a63fae2f2a8d): This can be used for general knowledge purposes....


### Define Graph (Tree Index as Parent Index)

This allows us to "route" a query to either a knowledge-augmented index, or to the LLM itself.

In [21]:
from llama_index.composability import ComposableGraph

In [22]:
# set query config
query_configs = [
    {
        "index_struct_type": "simple_dict",
        "query_mode": "default",
        "query_kwargs": {
            "similarity_top_k": 3,
            "response_mode": "tree_summarize"
        }
    },
]

In [23]:
tree_index = GPTTreeIndex([essay_index, empty_index])

INFO:root:> Building index from nodes: 0 chunks
> Building index from nodes: 0 chunks
INFO:root:> [build_index_from_documents] Total LLM token usage: 245 tokens
> [build_index_from_documents] Total LLM token usage: 245 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 0 tokens
> [build_index_from_documents] Total embedding token usage: 0 tokens


In [24]:
graph2 = ComposableGraph.build_from_index(tree_index)

In [25]:
# [optional] save to disk
graph2.save_to_disk("index_graph2.json")

In [26]:
# [optional] load from disk
graph2 = ComposableGraph.load_from_disk("index_graph2.json")

In [None]:
# set Logging to DEBUG for more detailed outputs
# ask it a question about NYC 
response = graph2.query(
    "Tell me about what Paul Graham did growing up?", 
    query_configs=query_configs
)

In [28]:
str(response)

'Paul Graham grew up in England and was interested in writing and coding from a young age. He spent much of his time writing essays and working on coding projects, such as an interpreter written in itself and a project called Bel. He also worked on an application builder, network infrastructure, and two services (images and phone calls).'

In [29]:
print(response.get_formatted_sources())

> Source (Doc id: 081e45f9-9dfd-4425-82c6-6eb81cc75bff): This document describes Paul Graham's life, from early adulthood to the present day....

> Source (Doc id: 9268910d-4749-46a8-847d-1279ea88737b): from writing essays during most of this time, or I'd never have finished. In late 2015 I spent 3 ...

> Source (Doc id: 9268910d-4749-46a8-847d-1279ea88737b): Aspra.

I started working on the application builder, Dan worked on network infrastructure, and t...

> Source (Doc id: 9268910d-4749-46a8-847d-1279ea88737b): one party. So for every guest, two thirds of the other guests would be people they didn't know bu...


In [None]:
response = graph2.query(
    "Tell me about Barack Obama", 
    query_configs=query_configs
)

In [31]:
str(response)

'Barack Obama is an American politician who served as the 44th President of the United States from 2009 to 2017. He is the first African American to have held the office. Obama previously served as a U.S. Senator from Illinois from 2005 to 2008 and in the Illinois State Senate from 1997 to 2004. He was born in Honolulu, Hawaii, and is a graduate of Columbia University and Harvard Law School. Obama was a community organizer in Chicago before earning his law degree. He worked as a civil rights attorney and taught constitutional law at the University of Chicago Law School from 1992 to 2004. He was elected president in 2008 and re-elected in 2012. During his two terms, Obama signed the Affordable Care Act, ended the military\'s "Don\'t Ask, Don\'t Tell" policy, and ordered U.S. military involvement in the 2011 Libyan civil war. He also oversaw the economic stimulus package of 2009 and the American Recovery and Reinvestment Act of 2009.'

In [43]:
response.get_formatted_sources()

'> Source (Doc id: 8a2fa560-e31e-4066-a3de-6d1259c6c886): This can be used for general knowledge purposes....'