# Purpose

Build out 1+ simple RAG pipelines to see how well we can try and troubleshoot issues via a chat interface as a starting point (and using a few different manuals we found online for DCFC chargers as the corpus).

# Imports

In [1]:
%load_ext autoreload
%autoreload 2

import pandas as pd
import numpy as np
from rich import print
import os

# Constants

In [2]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

# Check Available Models

In [3]:
from evlens.models.openai_tools import find_models

In [4]:
find_models('gpt-3')

['gpt-3.5-turbo-0301',
 'gpt-3.5-turbo-1106',
 'gpt-3.5-turbo-16k-0613',
 'gpt-3.5-turbo-16k',
 'gpt-3.5-turbo',
 'gpt-3.5-turbo-0613',
 'gpt-3.5-turbo-instruct',
 'gpt-3.5-turbo-instruct-0914']

# llama-index Tutorials

We will go ahead and see how to use `llama-index` for a quick RAG implementation to simply POC things and see what we can do. 

## [Quickstart](https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html)

In [10]:
%%time
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("../data/pg/").load_data()
index = VectorStoreIndex.from_documents(documents)

DEBUG:llama_index.readers.file.base:> [SimpleDirectoryReader] Total files added: 1
> [SimpleDirectoryReader] Total files added: 1
DEBUG:llama_index.node_parser.node_utils:> Adding chunk: What I Worked On

February 2021

Before college...
> Adding chunk: What I Worked On

February 2021

Before college...
DEBUG:llama_index.node_parser.node_utils:> Adding chunk: I couldn't have put this into words when I was ...
> Adding chunk: I couldn't have put this into words when I was ...
DEBUG:llama_index.node_parser.node_utils:> Adding chunk: So I looked around to see what I could salvage ...
> Adding chunk: So I looked around to see what I could salvage ...
DEBUG:llama_index.node_parser.node_utils:> Adding chunk: I didn't want to drop out of grad school, but h...
> Adding chunk: I didn't want to drop out of grad school, but h...
DEBUG:llama_index.node_parser.node_utils:> Adding chunk: We actually had one of those little stoves, fed...
> Adding chunk: We actually had one of those little stoves, fe

In [9]:
%%time
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

CPU times: user 80.4 ms, sys: 5.44 ms, total: 85.8 ms
Wall time: 3.66 s


In [11]:
# Save embeddings from in-memory vector store to disk for later usage
# Saves to a directory storage/ in same directory as where the script is housed
index.storage_context.persist(persist_dir='./storage')

DEBUG:fsspec.local:open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/docstore.json
open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/docstore.json
DEBUG:fsspec.local:open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/index_store.json
open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/index_store.json
DEBUG:fsspec.local:open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/graph_store.json
open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/graph_store.json
DEBUG:fsspec.local:open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/default__vector_store.json
open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/default__vector_store.json
DEBUG:fsspec.local:open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/image__vector_store.json
open file: /Users/davemcrench/Documents/Projects/evlens/notebook

In [12]:
# Do it again, but this time generate and store data/embeddings if they don't exist and load them up if they do
import os.path
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

# check if storage already exists
PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

# either way we can now query the index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

DEBUG:llama_index.storage.kvstore.simple_kvstore:Loading llama_index.storage.kvstore.simple_kvstore from ./storage/docstore.json.
Loading llama_index.storage.kvstore.simple_kvstore from ./storage/docstore.json.
DEBUG:fsspec.local:open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/docstore.json
open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/docstore.json
DEBUG:llama_index.storage.kvstore.simple_kvstore:Loading llama_index.storage.kvstore.simple_kvstore from ./storage/index_store.json.
Loading llama_index.storage.kvstore.simple_kvstore from ./storage/index_store.json.
DEBUG:fsspec.local:open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/index_store.json
open file: /Users/davemcrench/Documents/Projects/evlens/notebooks/storage/index_store.json
DEBUG:llama_index.graph_stores.simple:Loading llama_index.graph_stores.simple from ./storage/graph_store.json.
Loading llama_index.graph_stores.simple from ./storage/graph_stor

In [16]:
response.response

'The author, growing up, worked on writing and programming. They wrote short stories and also tried writing programs on an IBM 1401 computer. Later, they got a microcomputer and started programming more extensively, writing simple games and a word processor. They initially planned to study philosophy in college but ended up switching to AI.'

## Quickstart on our own data

In [17]:
%%time
# import logging
# import sys

# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

import os.path
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

# check if storage already exists
PERSIST_DIR = "../data/vector_dbs/llamaindex_techdocs_quickstart"
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("../data/external/technical_docs/").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

# either way we can now query the index
query_engine = index.as_query_engine()
response = query_engine.query("What methods can be used to activate EV charging equipment?")
print(response)

DEBUG:llama_index.readers.file.base:> [SimpleDirectoryReader] Total files added: 1
> [SimpleDirectoryReader] Total files added: 1
> [SimpleDirectoryReader] Total files added: 1
DEBUG:httpcore.connection:close.started
close.started
close.started
DEBUG:httpcore.connection:close.complete
close.complete
close.complete
DEBUG:httpcore.connection:close.started
close.started
close.started
DEBUG:httpcore.connection:close.complete
close.complete
close.complete
DEBUG:httpcore.connection:close.started
close.started
close.started
DEBUG:httpcore.connection:close.complete
close.complete
close.complete
DEBUG:httpcore.connection:close.started
close.started
close.started
DEBUG:httpcore.connection:close.complete
close.complete
close.complete
DEBUG:httpcore.connection:close.started
close.started
close.started
DEBUG:httpcore.connection:close.complete
close.complete
close.complete
DEBUG:llama_index.node_parser.node_utils:> Adding chunk: EVITP (EVCS) Test Review
Study online at https:...
> Adding chunk: EVIT

CPU times: user 290 ms, sys: 35.3 ms, total: 325 ms
Wall time: 2.19 s


In [18]:
response.response

'RFID tags and swipe cards can be used to activate EV charging equipment.'

In [31]:
def get_source_files(response):
    filenames = set()
    filenames.update([n.node.metadata['file_name'] for n in response.source_nodes])
    return list(filenames)

In [32]:
get_source_files(response)

['evitp_training_flashcards.pdf']

Nice! It figured out from the quiz data how to answer that simple question, I'm encouraged.

In [33]:
response = query_engine.query("How do you repair a frayed charging cable in a Veefile tritium charger?")
print(response.response)
print("\n\n")
get_source_files(response)

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create.<locals>.parser at 0x2c6d044a0>, 'json_data': {'input': ['How do you repair a frayed charging cable in a Veefile tritium charger?'], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}}
Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create.<locals>.parser at 0x2c6d044a0>, 'json_data': {'input': ['How do you repair a frayed charging cable in a Veefile tritium charger?'], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}}
Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create.<locals>.parser at 0x2c6d044a0>, 'json_data': {'input': ['How do you repair a frayed charging cable in a Veefile tritium charger?'], 'mo

['evitp_training_flashcards.pdf']

Shoot, seems like it effectively hallucinated that response instead of saying that the tech docs it had been given didn't include this info...

# Customizing the RAG Application

> The ServiceContext is a bundle of services and configurations used across a LlamaIndex pipeline.

## Reduce chunk size

In [35]:
# Parse into smaller chunks
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(chunk_size=1000)

In [41]:
avoid_guessing_str = "Respond with 'I do not know the answer to that.' if the answer is not in the provided context. Do not guess or speculate outside of the provided context."

In [38]:
%%time

documents = SimpleDirectoryReader("../data/external/technical_docs/").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()
response = query_engine.query(f"What methods can be used to activate EV charging equipment? {avoid_guessing_str}")
print(response.response)

DEBUG:llama_index.readers.file.base:> [SimpleDirectoryReader] Total files added: 1
> [SimpleDirectoryReader] Total files added: 1
> [SimpleDirectoryReader] Total files added: 1
> [SimpleDirectoryReader] Total files added: 1
DEBUG:llama_index.node_parser.node_utils:> Adding chunk: EVITP (EVCS) Test Review
Study online at https:...
> Adding chunk: EVITP (EVCS) Test Review
Study online at https:...
> Adding chunk: EVITP (EVCS) Test Review
Study online at https:...
> Adding chunk: EVITP (EVCS) Test Review
Study online at https:...
DEBUG:llama_index.node_parser.node_utils:> Adding chunk: EVITP (EVCS) Test Review
Study online at https:...
> Adding chunk: EVITP (EVCS) Test Review
Study online at https:...
> Adding chunk: EVITP (EVCS) Test Review
Study online at https:...
> Adding chunk: EVITP (EVCS) Test Review
Study online at https:...
DEBUG:llama_index.node_parser.node_utils:> Adding chunk: EVITP (EVCS) Test Review
Study online at https:...
> Adding chunk: EVITP (EVCS) Test Review
Study onl

CPU times: user 191 ms, sys: 38.5 ms, total: 229 ms
Wall time: 2.15 s


In [40]:
response = query_engine.query(f"How do you repair a frayed charging cable in a Veefile tritium charger? {avoid_guessing_str}")
print(response.response)
print("\n\n")
get_source_files(response)

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create.<locals>.parser at 0x2c688d9e0>, 'json_data': {'input': ["How do you repair a frayed charging cable in a Veefile tritium charger? Respond with 'I do not know the answer to that.' if the answer is not in the provided context. Do not guess or speculate outside of the provided context."], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}}
Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create.<locals>.parser at 0x2c688d9e0>, 'json_data': {'input': ["How do you repair a frayed charging cable in a Veefile tritium charger? Respond with 'I do not know the answer to that.' if the answer is not in the provided context. Do not guess or speculate outside of the provided context."], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: '

['evitp_training_flashcards.pdf']

In [43]:
logging.getLogger().setLevel(logging.INFO)

## Switch vector store

> *StorageContext* defines the storage backend for where the documents, embeddings, and indexes are stored.

In [44]:
%%time

import chromadb
from llama_index.vector_stores import ChromaVectorStore
from llama_index import StorageContext

chroma_client = chromadb.PersistentClient()
chroma_collection = chroma_client.create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
CPU times: user 550 ms, sys: 135 ms, total: 684 ms
Wall time: 1.73 s


In [48]:
# Whoops, wasn't searching subdirectories!
documents = SimpleDirectoryReader("../data/external/technical_docs/", recursive=True).load_data()

In [49]:
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)
query_engine = index.as_query_engine()
response = query_engine.query(f"How do you repair a frayed charging cable in a Veefile tritium charger? {avoid_guessing_str}")
print(response.response)
print("\n\n")
get_source_files(response)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 20

['Users-Manual-3059907.pdf']

In [51]:
set([d.metadata['file_name'] for d in documents])

{'EFAPOWER EV-QC45 Standalone - V 05.01 (100-135).pdf',
 'External-Photos-3059906.pdf',
 'Internal-Photos-3059905.pdf',
 'README.md',
 'Users-Manual-3059907.pdf',
 'evitp_training_flashcards.pdf',
 'instruction_manual.pdf'}

## Increase Context (by increasing chunk counts retrieved)

> `as_query_engine` builds a default retriever and query engine on top of the index. You can configure the retriever and query engine by passing in keyword arguments. Here, we configure the retriever to return the top 5 most similar documents (instead of the default of 2).

In [52]:
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=5)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [53]:
response = query_engine.query(f"How do you repair a frayed charging cable in a Veefile tritium charger? {avoid_guessing_str}")
print(response.response)
print("\n\n")
get_source_files(response)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


['Users-Manual-3059907.pdf']

## Swap Out LLMs

In [54]:
from llama_index import ServiceContext
from llama_index.llms import PaLM

service_context = ServiceContext.from_defaults(llm=PaLM())

ValueError: PaLM is not installed. Please install it with `pip install google-generativeai`.

## Changing [Response Mode](https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/response_modes.html)

This does *nothing* to the retrieval part of the process, it simply changes what the LLM does with the chunks before spitting out the final answer (e.g. instead of stuffing everything you retrieved into a single prompt and discarding anything over the token limit, it will summarize chunks through some strategy to make them fit the context window).