## AI Agent智能应用从0到1定制开发 
## AI Agent Intelligent Application Custom Development from 0 to 1
******
- 此代码为网课《AI Agent智能应用从0到1定制开发》的配套代码，需要注意本套代码建议与网课适配配合食用。
- This code for the online course <AI Agent Intelligent Applications from 0 to 1 custom development> supporting code, need to pay attention to this set of code is recommended with the online course adapted to work with consumption.
- 需要注意由于课程开发周期的原因，langchain版本跨越了3个大版本，部分代码会与视频演示有差别!
- Note that due to the course development cycle, the langchain version spans 3 major releases and some of the code will differ from the video demo!
- 课程地址：https://coding.imooc.com/class/822.html
- Course address: https://coding.imooc.com/class/822.html

### 从环境变量中读取密钥
### Read the key from the environment variable
- 注意：尽量将你的OpenAI Key存储在类似.env文件中，而不是明文暴露在代码里，这是一种基本的安全措施
- Note: Try to store your OpenAI Key in something like an .env file, rather than exposing it explicitly in code, as a basic safety measure!
******

In [4]:

import os
from dotenv import load_dotenv
# Load environment variables from openai.env file
load_dotenv("asset/openai.env")

# Read the OPENAI_API_KEY from the environment
api_key = os.getenv("OPENAI_API_KEY")
api_base = os.getenv("OPENAI_API_BASE")
os.environ["OPENAI_API_KEY"] = api_key
os.environ["OPENAI_API_BASE"] = api_base

### Lost in the middle: 长上下文精度问题
*****

- 使用Hugging Face Inference API来访问在Huggingface托管的嵌入模型
- Use the Hugging Face Inference API to access embedded models hosted in Huggingface

In [5]:
import getpass

inference_api_key = getpass.getpass("Enter your HF Inference API Key:\n\n")

In [6]:
from langchain.chains import LLMChain,StuffDocumentsChain
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_transformers import LongContextReorder
#使用huggingface托管的开源LLM来做嵌入，MiniLM-L6-v2是一个较小的LLM 
# use a smaller LLM model for embeddings
embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=inference_api_key, model_name="sentence-transformers/all-MiniLM-l6-v2"
)

text = [
    "篮球是一项伟大的运动。",
    "带我飞往月球是我最喜欢的歌曲之一。",
    "凯尔特人队是我最喜欢的球队。",
    "这是一篇关于波士顿凯尔特人的文件。",
    "我非常喜欢去看电影。",
    "波士顿凯尔特人队以20分的优势赢得了比赛。",
    "这只是一段随机的文字。",
    "《艾尔登之环》是过去15年最好的游戏之一。",
    "L.科内特是凯尔特人队最好的球员之一。",
    "拉里.伯德是一位标志性的NBA球员。"
]

retrieval = Chroma.from_texts(text,embeddings).as_retriever(
    search_kwargs={"k": 10}
)
query = "关于我的喜好都知道什么?"

#根据相关性返回文本块
# retrieve documents based on relevance
docs = retrieval.invoke(query)
docs


ValueError: Expected each embedding in the embeddings to be a list, got [{'error': "Authorization header is invalid, use 'Bearer API_TOKEN'"}]

In [7]:
#对检索结果进行重新排序，根据论文的方案
# according to the paper, re-order the retrieved documents
#问题相关性越低的内容块放在中间
# the lower the relevance to the question, the more in the middle
#问题相关性越高的内容块放在头尾
# the higher the relevance to the question, the more in the beginning and end

reordering = LongContextReorder()
reo_docs = reordering.transform_documents(docs)

#头尾共有4个高相关性内容块
# there are 4 high-relevance documents in the beginning and end
reo_docs

[Document(page_content='我非常喜欢去看电影。'),
 Document(page_content='带我飞往月球是我最喜欢的歌曲之一。'),
 Document(page_content='拉里.伯德是一位标志性的NBA球员。'),
 Document(page_content='这是一篇关于波士顿凯尔特人的文件。'),
 Document(page_content='《艾尔登之环》是过去15年最好的游戏之一。'),
 Document(page_content='L.科内特是凯尔特人队最好的球员之一。'),
 Document(page_content='波士顿凯尔特人队以20分的优势赢得了比赛。'),
 Document(page_content='这只是一段随机的文字。'),
 Document(page_content='篮球是一项伟大的运动。'),
 Document(page_content='凯尔特人队是我最喜欢的球队。')]

In [12]:
#检测下这种方案的精度效果
# check the accuracy of this approach
from langchain_openai import OpenAI
from langchain_core.prompts import PromptTemplate
import os
api_base = os.getenv("OPENAI_API_BASE")
api_key = os.getenv("OPENAI_KEY")
#设置llm
# set up the LLM
llm = OpenAI(
    model="gpt-3.5-turbo-instruct",
    temperature=0,
    openai_api_key=api_key,
    openai_api_base=api_base
    )

document_prompt = PromptTemplate(
    input_variables=["page_content"],template="{page_content}"
)

stuff_prompt_override ="""Given this text extracts:
----------------------------------------
{context}
----------------------------------------
Please answer the following questions:
{query}
"""

prompt = PromptTemplate(
    template=stuff_prompt_override,
    input_variables=["context","query"]
)

llm_chain = LLMChain(
    llm=llm,
    prompt=prompt
)

WorkChain = StuffDocumentsChain(
    llm_chain=llm_chain,
    document_prompt=document_prompt,
    document_variable_name="context"
)

#调用
# invoke
WorkChain.run(
    input_documents=reo_docs,
    query="我最喜欢做什么事情？"
)


'我最喜欢做的事情是看电影。'