<a href="https://colab.research.google.com/github/iviamontes/chat-with-a-pdf/blob/master/ChatGTP_a_PDF_%7C%7C_Tech_Day.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import os
os.environ["OPENAI_API_KEY"] = "..."
os.environ["PINECONE_API_KEY"] = "..."
os.environ["PINECONE_API_ENV"] = "..."

In [None]:
# Install packages
!pip install openai langchain pypdf
!pip install pinecone-client
!pip install tiktoken

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

BASE_DIR = "/content/drive/MyDrive/Colab Notebooks"
os.chdir(BASE_DIR)

**STEP 1- Import some libraries and load Associate Handbook PDF.**

In [6]:
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import UnstructuredPDFLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFLoader("./Associate_Handbook_March_2023.pdf")
pages = loader.load_and_split()

**STEP 2- Split document in chunks of 1000 characters**

In [13]:
#chunk data into smaller documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(pages)

print(f'{len(texts)} chunks created')
print(texts[1])

151 chunks created
page_content='3OUR MISSION\nTo provide quality home furnishings, at excellent values,  \nin an exciting and fun environment.OUR VISION\nTo be the ultimate furniture and mattress store.\nOUR PURPOSE\nTo enrich people’s lives \nand make the world a better place.\nDear Fellow Associate:\nDuring the year 1971, with a handful of dedicated people, we began \nwhat is known today as one of the nation’s finest furniture retailers. \nWhile the handful of people has grown, our need for dedicated \nAssociates has not changed. CITY will only be as successful as \nthe people who sell and extend service to our customers. Each \nAssociate is important to the success of our company.\nThis handbook is designed to acquaint you with our organization, \nprinciples, policies and benefits. We are sure you will find this \ninformation helpful to you as a new Associate.\nWe, personally, welcome you to our family of Associates and wish \nyou every success in your new position.\nSincerely,\nKe

**STEP 3- Create Semantic Index**

In [9]:
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
import pinecone

#get embdding
embeddings = OpenAIEmbeddings(openai_api_key=os.getenv("OPENAI_API_KEY"))

#initialize Pinecone
pinecone.init(api_key=os.getenv("PINECONE_API_KEY"), environment=os.getenv("PINECONE_API_ENV"))
index_name="handbook"
namespace = "standards"

#store vectors in Pinecone (Vector DB)
docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name, namespace=namespace)

**STEP 4 - Ask/Retrieve**

In [10]:
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

llm = OpenAI(temperature=0, openai_api_key=os.getenv('OPENAI_API_KEY'))
chain = load_qa_chain(llm, chain_type="stuff")


In [14]:
query = "What are CITY Furniture core values?"
docs = docsearch.similarity_search(query)

chain.run(input_documents=docs, question=query)

" CITY Furniture's core values are to provide quality home furnishings, at excellent values, in an exciting and fun environment, to be the ultimate furniture and mattress store, and to enrich people’s lives and make the world a better place."