<a href="https://colab.research.google.com/github/ryanhao1115/Active_Learning/blob/main/Private_Chinese_Documents_Using_RAG_with_LangChain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![zedun_logo](/content/drive/MyDrive/AInew/zedun_logo.jpg)

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import os, sys
nb_path = '/content/notebooks'
if not os.path.islink(nb_path):
    os.symlink("/content/drive/My Drive/pythonpackages", nb_path)
    sys.path.insert(0, nb_path)
    print("成功创建符号链接！")



成功创建符号链接！


# **Summarize Private Documents Using RAG, LangChain, and LLMs**


## __Table of Contents__

<ol>
    <li><a href="#背景">背景</a>
        <ol>
            <li><a href="#什么是RAG?">什么是RAG?</a></li>
            <li><a href="#RAG-architecture">RAG architecture</a></li>
        </ol>
    </li>
    <li>
        <a href="#Objectives">Objectives</a>
    </li>
    <li>
        <a href="#Setup">Setup</a>
        <ol>
            <li><a href="#Installing-required-libraries">Installing required libraries</a></li>
            <li><a href="#Importing-required-libraries">Importing required libraries</a></li>
        </ol>
    </li>
    <li>
        <a href="#Preprocessing">Preprocessing</a>
        <ol>
            <li><a href="#Load-the-document">Load the document</a></li>
            <li><a href="#Splitting-the-document-into-chunks">Splitting the document into chunks</a></li>
            <li><a href="#Embedding-and-storing">Embedding and storing</a></li>
        </ol>
    </li>
    <li>
        <a href="#LLM-model-construction">LLM model construction</a>
    </li>
    <li>
        <a href="#Integrating-LangChain">Integrating LangChain</a>
    </li>
    <li>
        <a href="#Dive-deeper">Dive deeper</a>
        <ol>
            <li><a href="#Using-prompt-template">Using prompt template</a></li>
            <li><a href="#Make-the-conversation-have-memory">Make the conversation have memory</a></li>
            <li><a href="#Wrap-up-and-make-it-an-agent">Wrap up and make it an agent</a></li>
        </ol>
    </li>
</ol>

<a href="#Exercises">Exercises</a>
<ol>
    <li><a href="#Exercise-1:-Work-on-your-own-document">Exercise 1: Work on your own document</a></li>
    <li><a href="#Exercise-2:-Return-the-source-from-the-document">Exercise 2: Return the source from the document</a></li>
    <li><a href="#Exercise-3:-Use-another-LLM-model">Exercise 3: Use another LLM model</a></li>
</ol>



## **背景**

### **什么是RAG?**
LLMs能够推理广泛的主题，但它们的知识仅限于训练时截止的公开数据。当企业需要把LLM应用于实际场景时，需要LLM能基于企业的私有数据做准确的回答。因此必须用模型所需的特定信息来增强模型的知识。将适当的信息引入并插入到模型提示中用了一种称为检索增强生成（RAG）的技术。

LangChain是关于RAG最有代表性的开源的架构之一。通过有几个组件，旨在帮助构建问答应用程序和更广泛的RAG应用程序。

### RAG 架构
一个典型的RAG应用有两个主要的组成部分：

* **建立索引**: 一个采集数据并为数据建立索引的管道。这通常在离线状态下完成。.

* **检索与生成**: 实际的RAG链在运行时接收用户查询，并从索引中检索出相关数据，随后将这些数据传递给模型。




从原始数据到生成答案的最常见完整流程如下例所示。
- **建立索引**
1. 加载: 首先，您必须加载您的数据。这一步骤通过以下方式完成 [DocumentLoaders](https://python.langchain.com/docs/modules/data_connection/document_loaders/).

2. 分割: [文本分割器](https://python.langchain.com/docs/modules/data_connection/document_transformers/) 将大型“文档”分割成较小的片段。这对于索引数据和将其输入模型都非常有用，因为较大的片段更难搜索，并且无法适应模型的有限上下文窗口.

3. 存储: 您需要一个地方来存储和索引您的分割片段，以便后续进行搜索。这通常通过使用一个 [向量数据库](https://python.langchain.com/docs/modules/data_connection/vectorstores/) 以及 [向量嵌入](https://python.langchain.com/docs/modules/data_connection/text_embedding/) 模型.

<img src="RAG1.png" width="50%" alt="indexing"/> <br>



- **检索与生成**
1. 检索: 给定用户输入后，使用检索器从存储中检索出相关的分割片段。
2. 生成: 聊天模型/大型语言模型（ChatModel / LLM）通过一个包含问题和检索到的数据的提示来生成答案。

<img src="RAG2.png" width="50%" alt="retrieval"/> <br>




## 目标


 - 验证使用LangChain 实现RAG
 - 试验不同的分割器，向量嵌入模型，向量数据库，以及LLM的组合.找出在本地部署中文私有数据应用的最优路径。


----


## 设置


安装需要的库s:


*   [`LangChain`](https://www.langchain.com/) 引用其不同的链以及提示函数
*   [`Hugging Face`](https://huggingface.co/models?other=embeddings) and [`Hugging Face Hub`](https://huggingface.co/models?other=embeddings) 使用其向量嵌入功能处理文本数据
*   [`SentenceTransformers`](https://www.sbert.net/) for transforming sentences into high-dimensional vectors
*   [`Chroma DB`](https://www.trychroma.com/) for efficient storage and retrieval of high-dimensional text vector data
*   [`wget`](https://pypi.org/project/wget/) for downloading files from remote systems
* [`chromadb`](https://www.trychroma.com/) is a open-source vector database used to store embeddings.
* [`faiss-cpu`](https://pypi.org/project/faiss-cpu/) is used to support the using of FAISS vector database.


### 安装需要的库


**主意:** 如果没有特别标明，就使用最新的库.



 `%%capture` 用于屏蔽安装过程的输出.


In [None]:
%%capture
!pip install langchain
!pip install langchain_community
!pip install faiss-gpu-cu12
!pip install --upgrade transformers
!pip install accelerate
!pip install bitsandbytes
!pip install sentencepiece

### 导入库



In [5]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document
import pandas as pd
import os
from glob import glob
from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig

## 私有数据处理流程
### 文档导入

这里的流程从文本文档导入开始. 如果有其他非文本资料。前面应该还要加上OCR等对应流程。


<img src="RAG.png3" width="50%" alt="Load"/>


In [6]:
import pandas as pd
import os

def load_excel_to_paragraphs(file_path):
    """
    从一个Excel文件中加载所有工作表，并将每一行转换为指定格式的文本段落。

    格式： 文件名/工作表名/列1名：内容/列2名：内容/...
    """
    # 获取Excel文件的文件名（包括扩展名）
    file_name = os.path.basename(file_path)

    # 使用ExcelFile加载整个Excel文件（支持多个worksheet）
    xls = pd.ExcelFile(file_path)
    paragraphs = []

    # 遍历每个worksheet
    for sheet_name in xls.sheet_names:
        # 读取当前工作表为DataFrame
        df = pd.read_excel(xls, sheet_name=sheet_name)
        # 如果当前工作表为空，则跳过
        if df.empty:
            continue

        # 遍历DataFrame的每一行
        for index, row in df.iterrows():
            # 构造一个列表存储当前行的各列信息
            row_parts = []
            for col in df.columns:
                value = row[col]
                # 如果该单元格不为空，则格式化为“列名：内容”
                if pd.notna(value):
                    row_parts.append(f"{col}：{value}")
            # 如果当前行有有效内容，则构造完整的段落
            if row_parts:
                paragraph = f"{file_name}/{sheet_name}/" + "/".join(row_parts)
                paragraphs.append(paragraph)
    return paragraphs

# 示例使用：请将file_path替换为你的Excel文件路径
file_path = "/content/drive/My Drive/AInew/data/一周监管动态汇总（2017年3月第7期0306-0312）.xls"
paragraphs = load_excel_to_paragraphs(file_path)

# 输出所有生成的段落
for para in paragraphs:
    print(para)


一周监管动态汇总（2017年3月第7期0306-0312）.xls/监管政策汇总/Unnamed: 1：发布时间/Unnamed: 2：文件名称/Unnamed: 3：主要内容/Unnamed: 4：简评/Unnamed: 5：链接/Unnamed: 6：过渡期
一周监管动态汇总（2017年3月第7期0306-0312）.xls/监管政策汇总/Unnamed: 1：本期无
一周监管动态汇总（2017年3月第7期0306-0312）.xls/监管层讲话及问答汇总/Unnamed: 1：发布时间/Unnamed: 2：标题/Unnamed: 3：主要内容/Unnamed: 4：链接
一周监管动态汇总（2017年3月第7期0306-0312）.xls/监管层讲话及问答汇总/监管层讲话及问答：上交所/Unnamed: 1：2017-03-10 00:00:00/Unnamed: 2：《关于三只PPP项目资产证券化产品获得上交所确认的答记者问》/Unnamed: 3：问：据了解，今日三只PPP项目资产证券化产品获得上交所挂牌转让无异议函，三只产品作为国家发展改革委和证监会联合发文后的首批PPP项目资产证券化产品，能否介绍一下具体情况？
答：中信证券-首创股份污水处理PPP项目收费收益权、华夏幸福固安工业园区新型城镇化PPP项目供热收费收益权、中信建投-网新建投庆春路隧道PPP项目三只资产支持证券符合挂牌转让条件，本所出具无异议函，标志着国家发展改革委和中国证监会推进的传统基础设施领域PPP项目资产证券化产品正式落地。中信证券-首创股份污水处理PPP项目收费收益权资产支持专项计划，由首创股份旗下4家全资水务子公司作为原始权益人，以该等水务子公司持有的污水处理PPP项目项下的污水处理收费收益权为基础资产，产品总规模5.3亿元，其中优先级规模5亿元，获AAA评级。华夏幸福固安工业园区新型城镇化PPP项目供热收费收益权资产支持专项计划，以华夏幸福全资子公司九通公用事业在固安工业园区内未来6年向供热用户收取供热收费的收益权为基础资产，产品总规模7.06亿元，其中优先级6.7亿元，获AAA评级。中信建投-网新建投庆春路隧道PPP项目资产支持专项计划，以浙江浙大网新集团有限公司下属子公司拥有的杭州市庆春路隧道专营权合同债权作为基础资产，产品总规模11.58亿元，其中优先级11亿元，获AAA

In [7]:
len(paragraphs)

18

### 把文本分割成(Split)小片段(chunks)


在这一步，把文本分割成小片段。这属于建立检索中的分割步骤。
<img src="RAG4.png" width="50%" alt="split"/>


LangChain 用于分割文档并创建块。它帮助你将一个长文档分成更小的部分，这些部分被称为块，以便更容易处理。

在分割过程中，目标是确保每个段落在你数到一定数量的字符并遇到分割符时尽可能完整。这个特定的数量被称为块大小。在这个项目中，我们将块大小设置为500。这里采用 LangChain 内置的 RecursiveCharacterTextSplitter，并针对中文文本指定常用标点作为分隔符。


In [8]:

# 创建文本切分器
# 注意：separators 参数顺序很重要，优先级高的分隔符应放在前面
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,         # 每个文本块最大128个字符
    chunk_overlap=20,       # 每个块之间重叠20个字符
    separators=["\n\n", "。", "！", "？", "\n"]  # 分隔符按优先级排序
)

chunks = []
# 针对每个段落单独进行切分，并将结果追加到chunks列表中
for paragraph in paragraphs:
    result = text_splitter.split_text(paragraph)
    chunks.extend(result)

print("文本切分结果：")
for i, chunk in enumerate(chunks, start=1):
    print(f"【块 {i}】 {chunk}")


文本切分结果：
【块 1】 一周监管动态汇总（2017年3月第7期0306-0312）.xls/监管政策汇总/Unnamed: 1：发布时间/Unnamed: 2：文件名称/Unnamed: 3：主要内容/Unnamed: 4：简评/Unnamed: 5：链接/Unnamed: 6：过渡期
【块 2】 一周监管动态汇总（2017年3月第7期0306-0312）.xls/监管政策汇总/Unnamed: 1：本期无
【块 3】 一周监管动态汇总（2017年3月第7期0306-0312）.xls/监管层讲话及问答汇总/Unnamed: 1：发布时间/Unnamed: 2：标题/Unnamed: 3：主要内容/Unnamed: 4：链接
【块 4】 一周监管动态汇总（2017年3月第7期0306-0312）.xls/监管层讲话及问答汇总/监管层讲话及问答：上交所/Unnamed: 1：2017-03-10 00:00:00/Unnamed: 2：《关于三只PPP项目资产证券化产品获得上交所确认的答记者问》/Unnamed: 3：问：据了解，今日三只PPP项目资产证券化产品获得上交所挂牌转让无异议函，三只产品作为国家发展改革委和证监会联合发文后的首批PPP项目资产证券化产品，能否介绍一下具体情况？
答：中信证券-首创股份污水处理PPP项目收费收益权、华夏幸福固安工业园区新型城镇化PPP项目供热收费收益权、中信建投-网新建投庆春路隧道PPP项目三只资产支持证券符合挂牌转让条件，本所出具无异议函，标志着国家发展改革委和中国证监会推进的传统基础设施领域PPP项目资产证券化产品正式落地。中信证券-首创股份污水处理PPP项目收费收益权资产支持专项计划，由首创股份旗下4家全资水务子公司作为原始权益人，以该等水务子公司持有的污水处理PPP项目项下的污水处理收费收益权为基础资产，产品总规模5.3亿元，其中优先级规模5亿元，获AAA评级。华夏幸福固安工业园区新型城镇化PPP项目供热收费收益权资产支持专项计划，以华夏幸福全资子公司九通公用事业在固安工业园区内未来6年向供热用户收取供热收费的收益权为基础资产，产品总规模7.06亿元，其中优先级6.7亿元，获AAA评级。中信建投-网新建投庆春路隧道PPP项目资产支持专项计划，以浙江浙大网新集团有限公司下属子公司拥有的杭州市庆春路隧道专营权合同债权作为

### 向量嵌入与存储
This step is the `embed` and `store` processes in `Indexing`. <br>
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/u_oJz3v2cSR_lr0YvU6PaA.png" width="50%" alt="split"/>


采用 LangChain 提供的 HuggingFaceEmbeddings 接口，并加载适合中文的预训练模型（例如“GanymedeNil/text2vec-large-chinese”或“BAAI/bge-base-chinese”）。


In [10]:
# 选择中文预训练模型（可根据实际情况调整模型名称）
# 指定中文预训练模型
model_name = "GanymedeNil/text2vec-large-chinese"

# 指定缓存目录到 Google Drive
cache_dir = "/content/drive/MyDrive/AInew/HuggingFace_cache"

# 创建嵌入对象，模型会首先检查该目录，如果模型已下载则直接加载
embeddings = HuggingFaceEmbeddings(model_name=model_name)

# 修改实例的 cache_folder 属性，使其指向 Google Drive 中的目录
embeddings.cache_folder = cache_dir

'''
# 使用上一步切分后的文本块列表进行向量化
embedding_vectors = embeddings.embed_documents(chunks)

print("每个文本块的向量表示：")
for i, vec in enumerate(embedding_vectors, start=1):
    print(f"【块 {i}】 向量维度：{len(vec)}")
'''

  embeddings = HuggingFaceEmbeddings(model_name=model_name)
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/821 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.30G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/514 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/110k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/439k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

'\n# 使用上一步切分后的文本块列表进行向量化\nembedding_vectors = embeddings.embed_documents(chunks)\n\nprint("每个文本块的向量表示：")\nfor i, vec in enumerate(embedding_vectors, start=1):\n    print(f"【块 {i}】 向量维度：{len(vec)}")\n'

这里使用 LangChain 内置的 FAISS 作为向量数据库，将文本块及其向量存储起来，并演示如何根据查询进行相似性搜索。


In [12]:


# 重新创建 docs，确保是 Document 对象
docs = [
    Document(page_content=chunk, metadata={"source": file_name})
    for chunk in chunks
]


# 通过 FAISS 构建向量数据库，内部会调用上面的 embeddings 进行向量化
vector_store = FAISS.from_documents(docs, embeddings)

# 指定保存向量数据库的目录（这里设在 Google Drive 中）
persist_directory = "/content/drive/MyDrive/FAISS_vector_store"

# 保存向量数据库到指定目录
vector_store.save_local(persist_directory)



In [20]:
# 加载向量数据库，使用相同的 embeddings 对象
#loaded_vector_store = FAISS.load_local(persist_directory, embeddings,allow_dangerous_deserialization=True) #允许反序列化

# 示例查询（例如：查询企业核心竞争力相关内容）
query = "案件类型属于信息披露有问题有哪些？"
results = vector_store.similarity_search(query, k=3)

print("相似性搜索结果：")
for i, doc in enumerate(results, start=1):
    print(f"【结果 {i}】 {doc.page_content}")
    print(doc.metadata)

相似性搜索结果：
【结果 1】 一周监管动态汇总（2017年3月第7期0306-0312）.xls/其他/时间：2017-03-07 00:00:00/标题：违规私募有六个“标签” 投资者需额外关注谨慎投资 
/主要内容：广东证监局相关人士指出，通过调查发现，私募机构常见的违法违规行为包括：
1.向不特定对象公开宣传和推介私募基金产品。通过公司网站、微信等方式向不特定对象宣传推介私募基金产品；公司网站未设定特定对象的确定程序，非特定对象均可点击阅读产品；部分私募机构与第三方机构平台合作，推介产品未备案或不存在，信息虚假。
2.私募机构未按规定办理基金产品备案手续。如设立以投资为目的有限合伙企业，由私募机构担任普通合伙人，向投资者非公开募集资金(有限合伙企业份额)。实际上属于有限合伙型私募基金产品，但私募机构经常不将这种产品办理备案。
3.私募机构募集时未对投资者进行有效风险测评，并向不合格投资者募集资金。也存在部分私募机构通过一些互联网平台开展私募投资基金或私募产品收益权的拆分转让业务。
4.私募机构投资运作行为违规，挪用或侵占基金财产，未将募集资金进行专户存管，违反合同约定列支费用、进行利益输送等。
5.部分私募机构管理的私募基金存在保本保收益情形。如约定的分期支付利息和收益，以及期末大股东约定回购的条款，涉嫌变相或暗示承诺保底保收益。
6.私募机构二级市场证券投资行为违法违规。/链接：http://www.sac.net.cn/hyfw/hydt/201703/t20170307_130640.html
{'source': '一周监管动态汇总（2017年3月第7期0306-0312）.xls'}
【结果 2】 一周监管动态汇总（2017年3月第7期0306-0312）.xls/案例汇总/业务种类：投行业务/发布时间：2017-03-06 00:00:00/案例类型：持续督导及提供财务顾问服务未尽职/案件概况：湘财证券为上海盟云移软网络科技股份有限公司重大资产重组提供独立财务顾问服务，同时担任其主办券商，未能督促盟云公司按照《非上市公众公司重大资产重组管理办法》的规定履行相关程序，在明知本次重大资产重组存在重大程序瑕疵的情况下，仍为重大资产重组出具独立财务顾问报告，认为“重组资产权属清晰，资产过户或转移不存在法律障碍”。北京市大成（深圳）律师事务所提供法律服务过

## LLM 模型构建


First, define a model ID and choose which model that you want to use. There are many other model options. Refer to [Foundation Models](https://ibm.github.io/watsonx-ai-python-sdk/foundation_models.html) for other model options. This tutorial uses the `FLAN_UL2` model as an example.


In [None]:
# 1. 加载本地 ChatGLM-6B 模型
model_name = "THUDM/chatglm-6b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).half().cuda()

# 2. 构建 HuggingFace pipeline
hf_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=512,  # 增加长度以适应更长的对话
    device=0
)

# 3. 使用 LangChain 的 HuggingFacePipeline 作为 LLM
llm_local = HuggingFacePipeline(pipeline=hf_pipeline)

# 4. 创建 PromptTemplate
prompt_template = """Use the information from the document to answer the question at the end.
If you don't know the answer, just say that you don't know, definitely do not try to make up an answer.

{context}

Question: {question}
"""
PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

# 5. 加载向量数据库（假设之前已存储在 Google Drive）
vector_store_path = "/content/drive/MyDrive/faiss_index"
vector_store = FAISS.load_local(vector_store_path, llm_local)  # 载入 FAISS 数据库

# 6. 创建 Memory 机制（对话历史）
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

# 7. 结合 LLM、检索、Memory 构建 ConversationalRetrievalChain
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm_local,
    retriever=vector_store.as_retriever(),  # 让向量数据库成为检索器
    memory=memory,
    combine_docs_chain_kwargs={"prompt": PROMPT}  # 绑定 Prompt
)

# 8. 运行 Q&A 查询
query = "这家企业的核心竞争力是什么？"
results = vector_store.similarity_search(query, k=3)  # 从向量数据库中检索相关文档
context = "\n\n".join([doc.page_content for doc in results])  # 提取文本内容

# 9. 通过 Chain 运行
response = qa_chain({"question": query, "context": context})

print("本地 ChatGLM-6B 回答：")
print(response["answer"])


Define parameters for the model.

The decoding method is set to `greedy` to get a deterministic output.

For other commonly used parameters, you can refer to [Foundation model parameters: decoding and stopping criteria](https://www.ibm.com/docs/en/watsonx-as-a-service?utm_source=skills_network&utm_content=in_lab_content_link&utm_id=Lab-RAG_v1_1711546843&topic=lab-model-parameters-prompting).


In [None]:
parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 130, # this controls the minimum number of tokens in the generated output
    GenParams.MAX_NEW_TOKENS: 256,  # this controls the maximum number of tokens in the generated output
    GenParams.TEMPERATURE: 0.5 # this randomness or creativity of the model's responses
}

Define `credentials` and `project_id`,  which are necessary parameters to successfully run LLMs from watsonx.ai.

(Keep `credentials` and `project_id` as they are now so that you do not need to create your own keys to run models. This supports you in running the model inside this lab environment. However, if you want to run the model locally, refer to this [tutorial](https://medium.com/the-power-of-ai/ibm-watsonx-ai-the-interface-and-api-e8e1c7227358) for creating your own keys.


In [None]:
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com"
}

project_id = "skills-network"

Wrap the parameters to the model.


In [None]:
model = Model(
    model_id=model_id,
    params=parameters,
    credentials=credentials,
    project_id=project_id
)

Build a model called `flan_ul2_llm` from watsonx.ai.


In [None]:
flan_ul2_llm = WatsonxLLM(model=model)

This completes the `LLM` part of the `Retrieval` task. <br>
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/UZXQ44Tgv4EQ2-mTcu5e-A.png" width="50%" alt="split"/>


## Integrating LangChain


LangChain has a number of components that are designed to help retrieve information from the document and build question-answering applications, which helps you complete the `retrieve` part of the `Retrieval` task. <br>
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/M4WpkkMMbfK0Wkz0W60Jiw.png" width="50%" alt="split"/>


In the following steps, you create a simple Q&A application over the document source using LangChain's `RetrievalQA`.

Then, you ask the query "what is mobile policy?"


In [None]:
qa = RetrievalQA.from_chain_type(llm=flan_ul2_llm,
                                 chain_type="stuff",
                                 retriever=docsearch.as_retriever(),
                                 return_source_documents=False)
query = "what is mobile policy?"
qa.invoke(query)

From the response, it seems fine. The model's response is the relevant information about the mobile policy from the document.


Now, try to ask a more high-level question.


In [None]:
qa = RetrievalQA.from_chain_type(llm=flan_ul2_llm,
                                 chain_type="stuff",
                                 retriever=docsearch.as_retriever(),
                                 return_source_documents=False)
query = "Can you summarize the document for me?"
qa.invoke(query)

At this time, the model seems to not have the ability to summarize the document. This is because of the limitation of the `FLAN_UL2` model.


So, try to use another model, `LLAMA_3_70B_INSTRUCT`. You should do the model construction again.


In [None]:
model_id = 'meta-llama/llama-3-70b-instruct'

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MAX_NEW_TOKENS: 256,  # this controls the maximum number of tokens in the generated output
    GenParams.TEMPERATURE: 0.5 # this randomness or creativity of the model's responses
}

credentials = {
    "url": "https://us-south.ml.cloud.ibm.com"
}

project_id = "skills-network"

model = Model(
    model_id=model_id,
    params=parameters,
    credentials=credentials,
    project_id=project_id
)

llama_3_llm = WatsonxLLM(model=model)

Try the same query again on this model.


In [None]:
qa = RetrievalQA.from_chain_type(llm=llama_3_llm,
                                 chain_type="stuff",
                                 retriever=docsearch.as_retriever(),
                                 return_source_documents=False)
query = "Can you summarize the document for me?"
qa.invoke(query)

Now, you've created a simple Q&A application for your own document. Congratulations!


## Dive deeper


This section dives deeper into how you can improve this application. You might want to ask "How to add the prompt in retrieval using LangChain?" <br>

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/bvw3pPRCYRUsv-Z2m33hmQ.png" width="50%" alt="split"/>


You use prompts to guide the responses from an LLM the way that you want. For instance, if the LLM is uncertain about an answer, you instruct it to simply state, "I do not know," instead of attempting to generate a speculative response.

Let's see an example.


In [None]:
qa = RetrievalQA.from_chain_type(llm=flan_ul2_llm,
                                 chain_type="stuff",
                                 retriever=docsearch.as_retriever(),
                                 return_source_documents=False)
query = "Can I eat in company vehicles?"
qa.invoke(query)

As you can see, the query is asking something that does not exist in the document. The LLM responds with information that actually is not true. You don't want this to happen, so you must add a prompt to the LLM.


### Using prompt template


In the following code, you create a prompt template using `PromptTemplate`.

`context` and `question` are keywords in the RetrievalQA, so LangChain can automatically recognize them as document content and query.


In [None]:
prompt_template = """Use the information from the document to answer the question at the end. If you don't know the answer, just say that you don't know, definately do not try to make up an answer.

{context}

Question: {question}
"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

chain_type_kwargs = {"prompt": PROMPT}

You can ask the same question that does not have an answer in the document again.


In [None]:
qa = RetrievalQA.from_chain_type(llm=llama_3_llm,
                                 chain_type="stuff",
                                 retriever=docsearch.as_retriever(),
                                 chain_type_kwargs=chain_type_kwargs,
                                 return_source_documents=False)

query = "Can I eat in company vehicles?"
qa.invoke(query)

From the answer, you can see that the model responds with "don't know".


### Make the conversation have memory


Do you want your conversations with an LLM to be more like a dialogue with a friend who remembers what you talked about last time? An LLM that retains the memory of your previous exchanges builds a more coherent and contextually rich conversation.


Take a look at a situation in which an LLM does not have memory.

You start a new query, "What I cannot do in it?". You do not specify what "it" is. In this case, "it" means "company vehicles" if you refer to the last query.


In [None]:
query = "What I cannot do in it?"
qa.invoke(query)

From the response, you see that the model does not have the memory because it does not provide the correct answer, which is something related to "smoking is not permitted in company vehicles."


To make the LLM have memory, you introduce the `ConversationBufferMemory` function from LangChain.


In [None]:
memory = ConversationBufferMemory(memory_key = "chat_history", return_message = True)

Create a `ConversationalRetrievalChain` to retrieve information and talk with the LLM.


In [None]:
qa = ConversationalRetrievalChain.from_llm(llm=llama_3_llm,
                                           chain_type="stuff",
                                           retriever=docsearch.as_retriever(),
                                           memory = memory,
                                           get_chat_history=lambda h : h,
                                           return_source_documents=False)

Create a `history` list to store the chat history.


In [None]:
history = []

In [None]:
query = "What is mobile policy?"
result = qa.invoke({"question":query}, {"chat_history": history})
print(result["answer"])

Append the previous query and answer to the history.


In [None]:
history.append((query, result["answer"]))

In [None]:
query = "List points in it?"
result = qa({"question": query}, {"chat_history": history})
print(result["answer"])

Append the previous query and answer to the chat history again.


In [None]:
history.append((query, result["answer"]))

In [None]:
query = "What is the aim of it?"
result = qa({"question": query}, {"chat_history": history})
print(result["answer"])

### Wrap up and make it an agent


The following code defines a function to make an agent, which can retrieve information from the document and has the conversation memory.


In [None]:
def qa():
    memory = ConversationBufferMemory(memory_key = "chat_history", return_message = True)
    qa = ConversationalRetrievalChain.from_llm(llm=llama_3_llm,
                                               chain_type="stuff",
                                               retriever=docsearch.as_retriever(),
                                               memory = memory,
                                               get_chat_history=lambda h : h,
                                               return_source_documents=False)
    history = []
    while True:
        query = input("Question: ")

        if query.lower() in ["quit","exit","bye"]:
            print("Answer: Goodbye!")
            break

        result = qa({"question": query}, {"chat_history": history})

        history.append((query, result["answer"]))

        print("Answer: ", result["answer"])

Run the function.

Feel free to answer questions for your chatbot. For example:

_What is the smoking policy? Can you list all points of it? Can you summarize it?_

To **stop** the agent, you can type in 'quit', 'exit', 'bye'. Otherwise you cannot run other cells.


In [None]:
qa()

Congratulations! You have finished the project. Following are three exercises to help you to extend your knowledge.


# Exercises


### Exercise 1: Work on your own document


You are welcome to use your own document to practice. Another document has also been prepared that you can use for practice. Can you load this document and make the LLM read it for you? <br>
Here is the URL to the document: https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/XVnuuEg94sAE4S_xAsGxBA.txt


In [None]:
# Add your code here

<details>
    <summary>Click here for solution</summary>
<br>
    
```python
filename = 'stateOfUnion.txt'
url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/XVnuuEg94sAE4S_xAsGxBA.txt'

wget.download(url, out=filename)
print('file downloaded')
```

</details>


### Exercise 2: Return the source from the document


Sometimes, you not only want the LLM to summarize for you, but you also want the model to return the exact content source from the document to you for reference. Can you adjust the code to make it happen?


In [None]:
# Add your code

<details>
    <summary>Click here for a hint</summary>
All you must do is change the return_source_documents to True when you create the chain. And when you print, print the ['source_documents'][0]
<br><br>

    
```python
qa = RetrievalQA.from_chain_type(llm=llama_3_llm, chain_type="stuff", retriever=docsearch.as_retriever(), return_source_documents=True)
query = "Can I smoke in company vehicles?"
results = qa.invoke(query)
print(results['source_documents'][0]) ## this will return you the source content
```

</details>


### Exercise 3: Use another LLM model


IBM watsonx.ai also has many other LLM models that you can use; for example, `GRANITE_13B_CHAT_V2`. Can you change the model to see the difference of the response?


In [None]:
# Add your code here

<details>
    <summary>Click here for a hint</summary>

To use the Granite model in your notebook, go to the cell where the `model_id` is specified to `LLAMA_3_70B_INSTRUCT` and replace the current `model_id` with the following code. Expect different results and performance when using other models.

```python
model_id = ModelTypes.GRANITE_13B_CHAT_V2
```
</br>

After updating, run the remaining cells in the notebook to ensure the Granite model is used for subsequent operations.

</details>


## Authors


[Kang Wang](https://author.skills.network/instructors/kang_wang) <br>
Kang Wang is a Data Scientist Intern in IBM. He is also a PhD Candidate in the University of Waterloo.

[Faranak Heidari](https://www.linkedin.com/in/faranakhdr/) <br>
Faranak Heidari is a Data Scientist Intern in IBM with a strong background in applied machine learning. Experienced in managing complex data to establish business insights and foster data-driven decision-making in complex settings such as healthcare. She is also a PhD candidate at the University of Toronto.


### Other Contributors


[Sina Nazeri](https://author.skills.network/instructors/sina_nazeri) <br>
I am grateful to have had the opportunity to work as a Research Associate, Ph.D., and IBM Data Scientist. Through my work, I have gained experience in unraveling complex data structures to extract insights and provide valuable guidance.

[Wojciech Fulmyk](https://author.skills.network/instructors/wojciech_fulmyk) <br>
As a data scientist at the Ecosystems Skills Network at IBM and a Ph.D. candidate in Economics at the University of Calgary, I bring a wealth of experience in unraveling complex problems through the lens of data. What sets me apart is my ability to seamlessly merge technical expertise with effective communication, translating intricate data findings into actionable insights for stakeholders at all levels. From modeling to storytelling, I bring a holistic approach to data science. Leveraging machine learning algorithms, I construct predictive models tailored to both real-world challenges as well as old, well-understood problems. My knack for data-driven storytelling ensures that the insights uncovered resonate with both technical and non-technical audiences. Open to collaboration, I'm eager to take on new challenges and contribute to transformative data-driven endeavors. Whether you seek to extract insights, enhance predictive models, or explore untapped potential within your datasets, I'm here to help. Feel free to connect to me via my LinkedIn profile. Let's learn from each other!


```{## Change Log}
```


```{|Date (YYYY-MM-DD)|Version|Changed By|Change Description||-|-|-|-||2024-03-22|0.1|Kang Wang|Create the Project|}
```


© Copyright IBM Corporation. All rights reserved.
