# Chat

Recall the overall workflow for retrieval augmented generation (RAG):

![overview.jpeg](overview.jpeg)

We discussed `Document Loading` and `Splitting` as well as `Storage` and `Retrieval`.

We then showed how `Retrieval` can be used for output generation in Q+A using `RetrievalQA` chain.

In [None]:
%pip install openai



In [None]:
import os
import openai
import sys
sys.path.append('../..')

import panel as pn  # GUI
pn.extension()

# from dotenv import load_dotenv, find_dotenv
# _ = load_dotenv(find_dotenv()) # read local .env file

from google.colab import userdata
userdata.get('OPENAI_API_KEY')
openai.api_key  = userdata.get('OPENAI_API_KEY')

The code below was added to assign the openai LLM version filmed until it is deprecated, currently in Sept 2023.
LLM responses can often vary, but the responses may be significantly different when using a different model version.

In [None]:
import datetime
current_date = datetime.datetime.now().date()
if current_date < datetime.date(2023, 9, 2):
    llm_name = "gpt-3.5-turbo-0301"
else:
    llm_name = "gpt-3.5-turbo"
print(llm_name)

gpt-3.5-turbo


 If you wish to experiment on the `LangSmith platform` (previously known as LangChain Plus):

 * Go to [LangSmith](https://www.langchain.com/langsmith) and sign up
 * Create an api key from your account's settings
 * Use this api key in the code below

In [None]:
#import os
#os.environ["LANGCHAIN_TRACING_V2"] = "true"
#os.environ["LANGCHAIN_ENDPOINT"] = "https://api.langchain.plus"
#os.environ["LANGCHAIN_API_KEY"] = "..."

In [None]:
%pip install langchain
%pip install langchain-community
%pip install chromadb



In [None]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings(userdata.get('OPENAI_API_KEY'))
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

In [None]:
%pip install tiktoken



In [None]:
%pip install pypdf



In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings(api_key=userdata.get('OPENAI_API_KEY'))

from langchain.document_loaders import PyPDFLoader
from google.colab import drive

drive.mount('/content/drive')
%cd /content/drive/MyDrive/

loader = PyPDFLoader("2406.11903v1.pdf")
pages = loader.load()

from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)
# docs = []
# for loader in loaders:
#     docs.extend(loader.load())
splits = text_splitter.split_documents(pages)


persist_directory = 'docs/chroma/'
vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embedding,
    persist_directory=persist_directory
)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive


In [None]:
question = "llm在金融領域可以做什麼工作?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

In [None]:
docs[0]

Document(metadata={'page': 15, 'source': '2406.11903v1.pdf'}, page_content='16\nprocessing vast amounts of financial information, identify-\ning patterns and trends that help inform better decision-\nmaking. Secondly, LLMs can be used for predictive mod-\neling , allowing them to forecast market conditions and\nasset performance, which may lead to robust investment\nrecommendations. Additionally, LLMs could offer person-\nalized advisory services . They can analyze a person’s or\norganization’s financial situation, goals, and risk tolerance\nto provide customized advice. Another benefit could be\nreal-time monitoring and alerts , where LLMs can mon-\nitor financial market trends and news, providing timely\nupdates and alerts to help users adjust their strategies as\nneeded. Moreover, LLMs may improve accessibility and\nengagement . By integrating these models into user-friendly\ninterfaces like chatbots, financial planning and advisory\nbecome more accessible and engaging, where indivi

In [None]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-4o", temperature=0, userdata.get('OPENAI_API_KEY'))
llm.predict("Hello world!")

'Hello! How can I assist you today?'

In [None]:
# Build prompt
from langchain.prompts import PromptTemplate
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum. Keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)

# Run chain
from langchain.chains import RetrievalQA
question = "這篇文章的主題是什麼?"
qa_chain = RetrievalQA.from_chain_type(llm,
                    retriever=vectordb.as_retriever(),
                    return_source_documents=True,
                    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT})


result = qa_chain({"query": question})
result["result"]

'這篇文章的主題是探討在金融領域中使用大型語言模型（LLMs）進行文本分類和信息提取的應用，包括處理財務文件、環境、社會和治理（ESG）信息的分類，以及應對處理多樣化文件結構的挑戰。文章還討論了數據和建模問題、基準測試以及倫理問題。 \n\nThanks for asking!'

### Memory

In [None]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

### ConversationalRetrievalChain

In [None]:
from langchain.chains import ConversationalRetrievalChain
retriever=vectordb.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
)

In [None]:
question = "這篇文章的主題是什麼? 請用條列式的方式回答"
result = qa({"question": question})

In [None]:
print(result['answer'])

這篇文章的主題包括以下幾個方面：

1. **金融文本分類**：
   - 分類金融文件或文本，如新聞文章或公司文件，進行主題或主題分類。
   - 使用FinBERT模型提取和分類關鍵審計事項（KAM）。
   - ESG（環境、社會和治理）信息分類。

2. **金融事件關係提取**：
   - 多類型中文金融事件關係提取。
   - 使用CFERE框架進行事件識別和語法語義依賴解析。

3. **文本分類在金融領域的應用**：
   - 行業/公司分類。
   - 文件/主題分類。

4. **處理多樣化的文件結構**：
   - 處理包含圖像、圖表和表格的PDF文件格式的挑戰。
   - 將PDF文件轉換為機器可讀的純文本。

5. **其他應用**：
   - 日本金融術語的翻譯。
   - 自動化加密貨幣領域文本摘要模型的微調。
   - 多任務學習策略應用於金融事件的分類、檢測和摘要。
   - 提高金融信息提取的準確性和減少錯誤。
   - 從年度報告中提取信息以增強股票投資策略。

6. **數據集、代碼和基準**：
   - 提供數據集、代碼和基準測試。

7. **挑戰和機會**：
   - 數據問題。
   - 建模問題。
   - 基準測試。
   - 道德問題。


In [None]:
question = "仔細描述關於其他應用中，關於自動化加密貨幣領域的內容"
result = qa({"question": question})

In [None]:
print(result['answer'])

在加密貨幣領域，自動化技術的應用主要集中在以下幾個方面：

1. **自動化微調文本摘要模型**：在加密貨幣領域，研究者致力於自動化微調文本摘要模型，這樣可以在不需要人工註釋的情況下生成高質量的摘要。例如，某些研究已經探索了如何在不依賴人工標註的情況下，通過自動化技術來微調文本摘要模型，以便更有效地處理和總結加密貨幣相關的文本數據[80]。

2. **多任務學習策略**：多任務學習策略被用來分類、檢測和總結金融事件。這些策略可以同時處理多個相關任務，從而提高模型的整體性能和準確性[81]。

3. **確保準確性和減少錯誤**：在金融信息提取過程中，確保數據的準確性和減少錯誤是至關重要的。研究者們正在開發各種技術來提高信息提取的準確性，從而為投資決策提供更可靠的數據支持[82]。

4. **從年度報告中提取信息**：有些研究專注於從公司的年度報告中提取有價值的信息，以增強股票投資策略。這些技術可以幫助投資者更好地理解公司的財務狀況和未來前景，從而做出更明智的投資決策[83]。

這些自動化技術的應用，不僅提高了數據處理的效率，還大大減少了人工干預的需求，從而使得加密貨幣領域的分析和決策更加精確和高效。


# Create a chatbot that works on your documents

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.document_loaders import TextLoader
from langchain.chains import RetrievalQA,  ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.document_loaders import PyPDFLoader

The chatbot code has been updated a bit since filming. The GUI appearance also varies depending on the platform it is running on.

In [None]:
def load_db(file, chain_type, k):
    # load documents
    loader = PyPDFLoader(file)
    documents = loader.load()
    # split documents
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
    docs = text_splitter.split_documents(documents)
    # define embedding
    embeddings = OpenAIEmbeddings(api_key=userdata.get('OPENAI_API_KEY'))
    # create vector database from data
    db = DocArrayInMemorySearch.from_documents(docs, embeddings)
    # define retriever
    retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": k})
    # create a chatbot chain. Memory is managed externally.
    qa = ConversationalRetrievalChain.from_llm(
        llm=ChatOpenAI(model_name=llm_name, temperature=0, api_key=userdata.get('OPENAI_API_KEY')),
        chain_type=chain_type,
        retriever=retriever,
        return_source_documents=True,
        return_generated_question=True,
    )
    return qa


In [None]:
import panel as pn
import param

class cbfs(param.Parameterized):
    chat_history = param.List([])
    answer = param.String("")
    db_query  = param.String("")
    db_response = param.List([])

    def __init__(self,  **params):
        super(cbfs, self).__init__( **params)
        self.panels = []
        self.loaded_file = "2406.11903v1.pdf"
        self.qa = load_db(self.loaded_file,"stuff", 4)

    def call_load_db(self, count):
        if count == 0 or file_input.value is None:  # init or no file specified :
            return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")
        else:
            file_input.save("temp.pdf")  # local copy
            self.loaded_file = file_input.filename
            button_load.button_style="outline"
            self.qa = load_db("temp.pdf", "stuff", 4)
            button_load.button_style="solid"
        self.clr_history()
        return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")

    def convchain(self, query):
        if not query:
            return pn.WidgetBox(pn.Row('User:', pn.pane.Markdown("", width=600)), scroll=True)
        result = self.qa({"question": query, "chat_history": self.chat_history})
        self.chat_history.extend([(query, result["answer"])])
        self.db_query = result["generated_question"]
        self.db_response = result["source_documents"]
        self.answer = result['answer']
        self.panels.extend([
            pn.Row('User:', pn.pane.Markdown(query, width=600)),
            pn.Row('ChatBot:', pn.pane.Markdown(self.answer, width=600, style={'background-color': '#F6F6F6'}))
        ])
        inp.value = ''  #clears loading indicator when cleared
        return pn.WidgetBox(*self.panels,scroll=True)

    @param.depends('db_query ', )
    def get_lquest(self):
        if not self.db_query :
            return pn.Column(
                pn.Row(pn.pane.Markdown(f"Last question to DB:", styles={'background-color': '#F6F6F6'})),
                pn.Row(pn.pane.Str("no DB accesses so far"))
            )
        return pn.Column(
            pn.Row(pn.pane.Markdown(f"DB query:", styles={'background-color': '#F6F6F6'})),
            pn.pane.Str(self.db_query )
        )

    @param.depends('db_response', )
    def get_sources(self):
        if not self.db_response:
            return
        rlist=[pn.Row(pn.pane.Markdown(f"Result of DB lookup:", styles={'background-color': '#F6F6F6'}))]
        for doc in self.db_response:
            rlist.append(pn.Row(pn.pane.Str(doc)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    @param.depends('convchain', 'clr_history')
    def get_chats(self):
        if not self.chat_history:
            return pn.WidgetBox(pn.Row(pn.pane.Str("No History Yet")), width=600, scroll=True)
        rlist=[pn.Row(pn.pane.Markdown(f"Current Chat History variable", styles={'background-color': '#F6F6F6'}))]
        for exchange in self.chat_history:
            rlist.append(pn.Row(pn.pane.Str(exchange)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    def clr_history(self,count=0):
        self.chat_history = []
        return


### Create a chatbot

In [None]:
cb = cbfs()

file_input = pn.widgets.FileInput(accept='.pdf')
button_load = pn.widgets.Button(name="Load DB", button_type='primary')
button_clearhistory = pn.widgets.Button(name="Clear History", button_type='warning')
button_clearhistory.on_click(cb.clr_history)
inp = pn.widgets.TextInput( placeholder='Enter text here…')

bound_button_load = pn.bind(cb.call_load_db, button_load.param.clicks)
conversation = pn.bind(cb.convchain, inp)

jpg_pane = pn.pane.Image( './img/convchain.jpg')

tab1 = pn.Column(
    pn.Row(inp),
    pn.layout.Divider(),
    pn.panel(conversation,  loading_indicator=True, height=300),
    pn.layout.Divider(),
)
tab2= pn.Column(
    pn.panel(cb.get_lquest),
    pn.layout.Divider(),
    pn.panel(cb.get_sources ),
)
tab3= pn.Column(
    pn.panel(cb.get_chats),
    pn.layout.Divider(),
)
tab4=pn.Column(
    pn.Row( file_input, button_load, bound_button_load),
    pn.Row( button_clearhistory, pn.pane.Markdown("Clears chat history. Can use to start a new topic" )),
    pn.layout.Divider(),
    pn.Row(jpg_pane.clone(width=400))
)
dashboard = pn.Column(
    pn.Row(pn.pane.Markdown('# ChatWithYourData_Bot')),
    pn.Tabs(('Conversation', tab1), ('Database', tab2), ('Chat History', tab3),('Configure', tab4))
)
dashboard

  pn.Row('ChatBot:', pn.pane.Markdown(self.answer, width=600, style={'background-color': '#F6F6F6'}))
  pn.Row('ChatBot:', pn.pane.Markdown(self.answer, width=600, style={'background-color': '#F6F6F6'}))


In [None]:
%pip install docarray

Collecting docarray
  Downloading docarray-0.40.0-py3-none-any.whl (270 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/270.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m122.9/270.2 kB[0m [31m3.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.2/270.2 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
Collecting types-requests>=2.28.11.6 (from docarray)
  Downloading types_requests-2.32.0.20240712-py3-none-any.whl (15 kB)
Installing collected packages: types-requests, docarray
Successfully installed docarray-0.40.0 types-requests-2.32.0.20240712


Feel free to copy this code and modify it to add your own features. You can try alternate memory and retriever models by changing the configuration in `load_db` function and the `convchain` method. [Panel](https://panel.holoviz.org/) and [Param](https://param.holoviz.org/) have many useful features and widgets you can use to extend the GUI.


## Acknowledgments

Panel based chatbot inspired by Sophia Yang, [github](https://github.com/sophiamyang/tutorials-LangChain)