# Documentation chat bot
this notebook explores using llm to set up chatbot for documentation
* this chatbot should be able to answer basic questions and provide the documents that were used to generate its responses

there were many good resources from langchain about this appliaction
* https://python.langchain.com/v0.2/docs/tutorials/rag/
* https://python.langchain.com/v0.1/docs/use_cases/question_answering/citations/
* https://www.youtube.com/watch?v=Vw52xyyFsB8&list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x&index=4

In [7]:
%load_ext dotenv
%dotenv ../.env

The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


In [8]:
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

### making model

In [9]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")



### making doc loader

In [16]:
from langchain_community.document_loaders import TextLoader

from os import listdir
from os.path import isfile, join

mypath = r"./text"
docs = []

for f in listdir(mypath):
    file_path = join(mypath, f)
    if isfile(file_path):
        loader = TextLoader(file_path, encoding='utf8')
        docs.extend(loader.load())
print(len(docs))


11


In [17]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

In [18]:
stored = False


In [26]:
from langchain_community.vectorstores import FAISS
if(not stored):
    vectorstore = FAISS.from_documents(splits, OpenAIEmbeddings())
    vectorstore.save_local("faiss_index")
    stored=True
else: 
    vectorstore = FAISS.load_local("faiss_index", OpenAIEmbeddings(), allow_dangerous_deserialization=True)
retriever = vectorstore.as_retriever()

In [27]:
prompt = hub.pull("rlm/rag-prompt") #pulled prompt from langchain prompt repo 


In [28]:
print(prompt)

input_variables=['context', 'question'] metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))]


In [29]:
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, PromptTemplate

my_prompt = ChatPromptTemplate(
    input_variables=['context','question'],
    messages=[
        HumanMessagePromptTemplate(
            prompt=PromptTemplate(
                input_variables=['context', 'question'], 
                template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"
                )
            )
        ]
)

In [30]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)



## local and citing sources

In [31]:
from langchain_core.runnables import RunnableParallel

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)




In [32]:
ans = rag_chain_with_source.invoke("what is China's economy classified as?")

{'context': [Document(page_content="China has an upper middle income,[28] developing, mixed, socialist market economy incorporating industrial policies and strategic five-year plans.[29] It is the world's second largest economy by nominal GDP, behind the United States, and the world's largest economy since 2016 when measured by purchasing power parity (PPP).[note 2] China accounted for 19% of the global economy in 2022 in PPP terms,[30] and around 18% in nominal terms in 2022.[30][31] The economy consists of public sector enterprises, state-owned enterprises (SOEs) and mixed-ownership enterprises, as well as a large domestic private sector and openness to foreign businesses in their system. According to the annual data of major economic indicators released by the National Bureau of Statistics since 1952, China's GDP grew by an average of 6.17% per year in the 26 years from 1953 to 1978. China implemented economic reform in 1978, and from 1979 to 2023, the country's GDP growth rate grew

In [43]:
print("question = ",ans["question"])
print("answer = ", ans['answer'])
print("Documents used:")
for d in ans['context']:
    if len(d.page_content) > 40:
        print("\tsource: "+d.metadata['source']+"\t"+d.page_content[:40]+"..."+d.page_content[-30:])
    else:
        print("\tsource: "+d.metadata['source']+"\t"+d.page_content)


question =  what is China's economy classified as?
answer =  China's economy is classified as an upper middle income, developing, mixed, socialist market economy. It is the world's second largest economy by nominal GDP and the largest economy by purchasing power parity. China has a large public sector, state-owned enterprises, mixed-ownership enterprises, a domestic private sector, and openness to foreign businesses.
Documents used:
	source: ./text\econ_china.txt	China has an upper middle income,[28] de...n average of 8.93% per year in
	source: ./text\econ_china.txt	China has bilateral free trade agreement...accounted for 5.2% of GDP.[68]
	source: ./text\econ_china.txt	China is the world's largest manufacturi...cently, inbound FDI has fallen
	source: ./text\econ_china.txt	With 791 million workers, the Chinese la...g over RMB 120,000 a year.[61]
