## iPython notebook for Q&A using Langchain, OpenAI (API key required), and existing Pinecone index (API key required).
No need to ingest the documents/texts in this file; it is assumed that they have already beein ingested and the embeddings have already been stored in Pinecone with an index

In [None]:
#!pip3 install langchain pinecone-client	# install if needed

* Open AI API key (from .bashrc, Windows environment variables, etc. Or .env) and embeddings

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
import os

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

* Use Open AI LLM with gpt-3.5-turbo. Set the temperature to be 0 if you do not want it to make up things

In [None]:
from langchain.chat_models import ChatOpenAI
llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", openai_api_key=OPENAI_API_KEY)

* Set up Pinecone env

In [None]:
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
import pinecone

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
# initialize pinecone
PINECONE_API_KEY = os.environ['PINECONE_API_KEY']
PINECONE_API_ENV = os.environ['PINECONE_API_ENV']
pinecone.init(
    api_key=PINECONE_API_KEY,
    environment=PINECONE_API_ENV
)

  from tqdm.autonotebook import tqdm


* Load the pre-created Pinecone index

In [None]:
# the index which has already be stored in pinecone.io as long-term memory
index_name = "langchaints"	# example pinecone index; replace by yours
if index_name in pinecone.list_indexes():
    docsearch = Pinecone.from_existing_index(index_name, embeddings)
else:
	raise ValueError('Cannot find the specified Pinecone index. Create one in pinecone.io or using pinecone.create_index(name=index_name, dimension=1536, metric="cosine", shards=1)')


* Set up QA chain

In [None]:
from langchain.chains.question_answering import load_qa_chain
# 4 types of qa chains: stuff, mapreduce, refine, map_rerank
chain = load_qa_chain(llm, chain_type="stuff")


* Provide a prompt, search the index for similar texts (from the split-documents), and generate a reply based on only the found texts

In [None]:
query = "What is the summary of the documents on RS?"
# similarity search narrows down to most similar 'texts' and can speed up the response
docs = docsearch.similarity_search(query, include_metadata=True)
# the answer will be provided after the following line
chain.run(input_documents=docs, question=query)
