# 实践学习使用LangChain构建RAG

实践RAG
需要预先安装下列模块

```
# 环境准备，安装相关依赖
!pip install langchain sentence_transformers chromadb

!pip install -U langchain-community
!pip install pypdf
!pip install -U langchain-huggingface
!pip install -qU langchain-ollama

安装Ollama、
ollama pull llama3

```

In [1]:

from langchain.document_loaders import TextLoader

loader = TextLoader("./data/What I Worked on.txt")
documents = loader.load()

In [2]:
# 文档分割
from langchain.text_splitter import CharacterTextSplitter

# 创建拆分器
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=10)
# 拆分文档
documents = text_splitter.split_documents(documents)

Created a chunk of size 508, which is longer than the specified 500
Created a chunk of size 777, which is longer than the specified 500
Created a chunk of size 557, which is longer than the specified 500
Created a chunk of size 587, which is longer than the specified 500
Created a chunk of size 622, which is longer than the specified 500
Created a chunk of size 775, which is longer than the specified 500
Created a chunk of size 604, which is longer than the specified 500
Created a chunk of size 618, which is longer than the specified 500
Created a chunk of size 520, which is longer than the specified 500
Created a chunk of size 602, which is longer than the specified 500
Created a chunk of size 1004, which is longer than the specified 500
Created a chunk of size 1203, which is longer than the specified 500
Created a chunk of size 844, which is longer than the specified 500
Created a chunk of size 910, which is longer than the specified 500
Created a chunk of size 674, which is longer t

In [3]:
# 向量化

# from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma

# embedding model: m3e-base
model_name = "moka-ai/m3e-base"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
embedding = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
# 数据入库

# 指定 persist_directory 将会把嵌入存储到磁盘上。
persist_directory = 'db'
db = Chroma.from_documents(documents, embedding, persist_directory=persist_directory)

In [5]:
# 检索
retriever = db.as_retriever()

In [6]:
# 增强

from langchain.prompts import ChatPromptTemplate

template = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

In [10]:
# 生成结果
# pip install -U langchain-ollama
from langchain_ollama import ChatOllama
# from langchain_community.chat_models import ChatOllama
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

llm = ChatOllama(model='llama3')

rag_chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
)

query = "What did the author do growing up?"
query = "Who is the author?"
query = "Which art schools did the author apply?"
response = rag_chain.invoke(query)
print(response)

The author applied to RISD (Rhode Island School of Design) in the fall of 1992. There is no mention of Accademia or any other art school being considered as a separate application, but rather it's mentioned that the Accademia was a "joke" previously, suggesting that it may have been an initial consideration that didn't pan out.
