In [None]:
%load_ext autoreload
%autoreload 2

## Query performance with different configurations

There are many ways to query a collection of documents.
Configuring the query different ways can result in different quality of results.
The time taken will vary as well.
The kinds of things we need to consider are:

1. How long each "document" is --> text splitting.
2. When querying, how many documents do we consider as context for answering the query?

Test collection of documents that I will use is my blog's collection of posts.
I will vary the following parameters:

1. Chunk size when loading documents.
2. K - the number of documents to pass into the response synthesis module,

What I'll be doing is qualitatively judging the response,
and measuring the time it took to generate the response.

In [None]:
from llama_index import Document, GPTSimpleVectorIndex
from llama_index.docstore import DocumentStore
from langchain.chat_models import ChatOpenAI
from llamabot.bot import openai #just to run the environment variables code

chat = ChatOpenAI(model_name="gpt-4", temperature=0.0)
chat

In [None]:
from pyprojroot import here
import glob 
from pathlib import Path 

blog_contents = glob.glob(str(here()) + "/data/blog/**/*.lr")
pngs = glob.glob(str(here()) + "/data/blog/**/*.png")
jpgs = glob.glob(str(here()) + "/data/blog/**/*.jpg")
jpegs = glob.glob(str(here()) + "/data/blog/**/*.jpeg")
pdfs = glob.glob(str(here()) + "/data/blog/**/*.pdf")
ais = glob.glob(str(here()) + "/data/blog/**/*.ai")

delete = [] + pngs + jpgs + ais  + jpegs + pdfs

for file in delete:
    Path(file).unlink()

In [None]:
def read_blogpost(filename):
    with open(filename, 'r') as f:
        lines = f.read()
    return lines

## Split text by using Markdown Text Splitting

In [None]:
from langchain.text_splitter import MarkdownTextSplitter, TokenTextSplitter
from llama_index import LLMPredictor, ServiceContext

llm_predictor = LLMPredictor(llm=chat)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)


# Read each blog post, split according to the MarkdownTextSplitter, and
# cast it back into the LlamaIndex Document format.

blog_posts = []
splitter = TokenTextSplitter(chunk_size=500, chunk_overlap=50)

for f in blog_contents:
    blog_post = read_blogpost(f)
    chunks = splitter.split_text(blog_post)
    blog_posts.extend([Document(chunk) for chunk in chunks])
len(blog_posts)


## Done with GPT4

In [None]:
vector_index = GPTSimpleVectorIndex.from_documents(blog_posts, service_context=service_context)

Takes about 23 seconds with GPT4 to build a vector index.

### Take top 1 node

In [None]:
result_1 = vector_index.query("How do you think about career development?", response_mode="default")
result_1

In [None]:
from IPython.display import display, Markdown

display(Markdown(result_1.response))


Very fast, ~15 seconds to answer, including API call latency.

### Take top 3 nodes

In [None]:
result_3 = vector_index.query("How do you think about career development?", response_mode="default", similarity_top_k=3)
result_3


In [None]:
display(Markdown(result_3.response))


Took about 90 seconds.

### Take top 5 nodes

In [None]:
result_5 = vector_index.query("How do you think about career development?", response_mode="default", similarity_top_k=5)
result_5

In [None]:
display(Markdown(result_5.response))

Took about 3 minutes.

## Do with GPT-3 (default in LlamaIndex)

In [None]:
vector_index = GPTSimpleVectorIndex.from_documents(blog_posts)


In [None]:
result_1 = vector_index.query("How do you think about career development?", response_mode="default")
display(Markdown(result_1.response))


In [None]:
result_3 = vector_index.query("How do you think about career development?", response_mode="default", similarity_top_k=3)
display(Markdown(result_3.response))


In [None]:
result_5 = vector_index.query("How do you think about career development?", response_mode="default", similarity_top_k=5)
display(Markdown(result_5.response))


My thoughts so far: 

1. This speed is not suited to a chat bot.
2. It can, however, be used for an email bot (which I've been building secretly!).
3. If we don't use GPT4, the response synthesis quality is much worse. 
4. If we use GPT4, the response synthesis time is much slower.

## Async Querying

In [None]:
result_1_async = vector_index.aquery("How do you think about career development?", response_mode="default")
await result_1_async

In [None]:
from llama_index import GPTTreeIndex
tree_index = GPTTreeIndex.from_documents(blog_posts)


The tree index took about 4 minutes to build, with lots of tokens used up.

In [None]:
result = tree_index.query("How do you think about career development?", response_mode="default")
result