# Using Redis and OpenAI to chat with PDF documents

This notebook demonstrates how to use RedisAI and OpenAI to chat with PDF documents. The PDF included is
a informational brochure about the Chevy Colorado pickup truck.

In this notebook, we will use LLamaIndex to chunk, vectorize, and store the PDF document in Redis as vectors
alongside associated text. The query interface provided by LLamaIndex will be used to search for relevant
information given queries from the user.

In [None]:
# Install the requirements
%pip install redis PyPDF2 python-dotenv transformers

In [None]:
%pip install git+https://github.com/redisventures/llama_index@redis_index_delete_logic_change

In [1]:
# Import
import os
import sys
import textwrap

import logging
logging.basicConfig(stream=sys.stdout, level=logging.INFO) # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores import RedisVectorStore
from llama_index.storage.storage_context import StorageContext


  from .autonotebook import tqdm as notebook_tqdm


### LLamaIndex

[LlamaIndex](https://github.com/jerryjliu/llama_index) (GPT Index) is a project that provides a central interface to connect your LLM's with external data sources. It provides a simple interface to vectorize and store embeddings in Redis, create search indices using Redis, and perform vector search to find context for generative models like GPT.

Here we will use it to load in the documents (Chevy Colorado Brochure).

In [3]:
# make the document directory
#!mkdir -p docs
#!wget https://www.chevrolet.com/content/dam/chevrolet/na/us/english/index/shopping-tools/download-catalog/03-pdf/2022-chevrolet-colorado-ebrochure.pdf -P docs


# load documents
documents = SimpleDirectoryReader('./docs').load_data()
print('Document ID:', documents[0].doc_id)

<llama_index.readers.file.docs_parser.PDFParser object at 0x127933670>
Document ID: dea6edaa-d84f-4a13-bb9c-f2568b951b49 Document Hash: 958a61679fec883f58d6d490edebe15d4bd473e121e03057295d5dda81584204


### Initialize Redis as a Vector Database

Now we have our documents read in, we can initialize the ``RedisVectorStore``. This will allow us to store our vectors in Redis and create an index.

The ``GPTVectorStoreIndex`` will then create the embeddings from the text chunks by calling out to OpenAI's API. The embeddings will be stored in Redis and an index will be created.

NOTE: If you didn't set the ``OPENAI_API_KEY`` environment variable, you will get an error here.

In [None]:
redis_address = f'redis://{os.getenv("REDIS_ADDRESS")}:{os.getenv("REDIS_PORT")}'

vector_store = RedisVectorStore(
    index_name="chevy_docs",
    index_prefix="llama",
    redis_url=redis_address,
    overwrite=True
)

In [None]:
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context)

## Start Querying information from the Document

Now that we have our document stored in the index, we can ask questions against the index. The index will use the data stored in itself as the knowledge base for chatgpt.

In [23]:
query_engine = index.as_query_engine()
response = query_engine.query("What types of variants are available for the Chevrolet Colorado?")
print(textwrap.fill(str(response), 100))

 The Chevrolet Colorado is available in four models: WT, LT, Z71, and ZR2. It is available in both
Extended Cab and Crew Cab configurations, and offers three engine choices: 2.5L 4-cylinder, 3.6L V6,
and Duramax 2.8L Turbo-Diesel. It also offers a variety of features, including Apple CarPlay and
Android Auto compatibility, ZR2 Bison Edition, ZR2 Dusk Special Edition, and ZR2 Midnight Special
Edition.


In [24]:
response = query_engine.query("What is the maximum towing capacity of the chevy colorado?")
print(textwrap.fill(str(response), 100))

 The maximum towing capacity of the Chevy Colorado is 7,700 lbs. with the available Duramax 2.8L
Turbo-Diesel engine.


In [25]:
response = query_engine.query("What are the main differences between the three engine types available for the Chevy Colorado?")
print(textwrap.fill(str(response), 100))

 The three engine types available for the Chevy Colorado are the 2.5L 4-cylinder, 3.6L V6, and
Duramax 2.8L Turbo-Diesel. The 2.5L 4-cylinder engine is standard on the WT and LT models, while the
3.6L V6 is standard on the Z71 and ZR2 models. The Duramax 2.8L Turbo-Diesel engine is available on
the LT, Z71, and ZR2 models.   The main differences between the three engine types are their power
output, fuel efficiency, and towing capacity. The 2.5L 4-cylinder engine is the least powerful of
the three, with an estimated EPA-estimated MPG city/highway of 20/30. The 3.6L V6 engine is more
powerful than the 2.5L 4-cylinder, with an estimated EPA-estimated MPG city/highway of 17/24. The
Duramax 2.8L Turbo-Diesel engine is the most powerful of the three, with an estimated EPA-estimated
MPG city/highway of 20/30 and a maximum towing capacity of up to 7,700 lbs
