### Retrieval-Augmented Generation, RAG 檢索增強生成

<img src="images/RAG.jpg" width="900">

### 環境準備

1. git clone https://github.com/sevenbai/20240720_GDG_Tainan_BwAI
2. 把 .env.sample 改名為 .env
3. 到 https://platform.openai.com/api-keys 申請 OpenAI API Key
4. 將 API key 填入 .env 檔

In [None]:
from dotenv import load_dotenv
load_dotenv()


### 載入文件

In [None]:
from langchain_community.document_loaders import WebBaseLoader

# 黃仁勳 維基百科
url = 'https://zh.wikipedia.org/zh-tw/%E9%BB%83%E4%BB%81%E5%8B%B3'
loader = WebBaseLoader(url)
document = loader.load()

### 切割 Chunk

In [None]:
import re
from langchain.text_splitter import RecursiveCharacterTextSplitter

for doc in document:
    doc.page_content = re.sub(r'[\n\t\s]+', ' ', doc.page_content)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=30)
document_chunks = text_splitter.split_documents(document)

In [None]:
print(len(document_chunks))
document_chunks

### 產生 embeddings 並建立 RetrievalQA Chain

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(document_chunks, embeddings)

In [None]:
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=512)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

### 測試

In [None]:
from langchain_core.messages import HumanMessage

print(f'Chat with {url}')
orig_ans_msgs = []
while True:
    query_statement = input('Enter your question: ')
    if query_statement == '':
        break
    print('Q: ' + query_statement)
    # 原始 LLM 的回答
    orig_ans_msgs.append(HumanMessage(query_statement))
    orig_ans_msg = llm.invoke(orig_ans_msgs)
    print('Original answer from LLM: ' + orig_ans_msg.content, flush=True)
    orig_ans_msgs.append(orig_ans_msg)
    # RAG 的回答
    ans = qa.invoke({"query": query_statement})
    print(*docsearch.as_retriever().invoke(query_statement), sep='\n', flush=True)
    print('A: ' + ans['result'], flush=True)
# 黃仁勳在哪裡出生？
# 他的妻子是誰？
# 他創辦了什麼公司？
# NVIDIA是全球市值第幾大？
# 總結黃仁勳的職涯

### RAG 的發展

##### Adaptive RAG

<img src="images/adaptive_RAG.png" width="1000">

##### Multimodal RAG

<img src="images/multimodal_RAG.png" width="1000">

##### Knowledge Graph RAG

<img src="images/knowledge_graph.jpg" width="600">

---

#### 白勝文 Seven [sevenbai@gmail.com](mailto://sevenbai@gmail.com)
<img src="images/fb_qrcode.png" width="200">　　　　　　　　　
<img src="images/linkedin_qrcode.png" width="200">