## AI Agent智能应用从0到1定制开发 
## AI Agent Intelligent Application Custom Development from 0 to 1
******
- 此代码为网课《AI Agent智能应用从0到1定制开发》的配套代码，需要注意本套代码建议与网课适配配合食用。
- This code for the online course <AI Agent Intelligent Applications from 0 to 1 custom development> supporting code, need to pay attention to this set of code is recommended with the online course adapted to work with consumption.
- 需要注意由于课程开发周期的原因，langchain版本跨越了3个大版本，部分代码会与视频演示有差别!
- Note that due to the course development cycle, the langchain version spans 3 major releases and some of the code will differ from the video demo!
- 课程地址：https://coding.imooc.com/class/822.html
- Course address: https://coding.imooc.com/class/822.html

### 从环境变量中读取密钥
### Read the key from the environment variable
- 注意：尽量将你的OpenAI Key存储在类似.env文件中，而不是明文暴露在代码里，这是一种基本的安全措施
- Note: Try to store your OpenAI Key in something like an .env file, rather than exposing it explicitly in code, as a basic safety measure!
******

In [1]:
import os
import os
from dotenv import load_dotenv
# Load environment variables from openai.env file
load_dotenv("asset/openai.env")

# Read the OPENAI_API_KEY from the environment
api_key = os.getenv("OPENAI_API_KEY")
api_base = os.getenv("OPENAI_API_BASE")
os.environ["OPENAI_API_KEY"] = api_key
os.environ["OPENAI_API_BASE"] = api_base

### 示例选择器
### Example selectors
- 根据长度要求智能选择示例
- Select by length
- 根据输入相似度选择示例(最大边际相关性)
- Select by maximal marginal relevance (MMR)
- 根据输入相似度选择示例（最大余弦相似度）
- Example selection based on input similarity (maximum cosine similarity)
*****

- 根据长度要求智能选择示例

In [2]:
# 根据输入的提示词长度综合计算最终长度，智能截取或者添加提示词的示例
# Example of intelligently intercepting or adding prompts by calculating the final length based on the length of the input prompts.
from langchain_core.example_selectors import LengthBasedExampleSelector
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate

#假设已经有这么多的提示词示例组：
# Suppose there are so many prompt examples:
examples = [
    {"input":"happy","output":"sad"},
    {"input":"tall","output":"short"},
    {"input":"sunny","output":"gloomy"},
    {"input":"windy","output":"calm"},
    {"input":"高兴","output":"悲伤"}
]

#构造提示词模板
# Construct prompt template
example_prompt = PromptTemplate(
    input_variables=["input","output"],
    template="原词：{input}\n反义：{output}"
)

#调用长度示例选择器
# Call the length example selector
example_selector = LengthBasedExampleSelector(
    #传入提示词示例组
    # Pass in the prompt example group
    examples=examples,
    #传入提示词模板
    example_prompt=example_prompt,
    #设置格式化后的提示词最大长度
    # Set the maximum length of the formatted prompt
    max_length=25,
    #内置的get_text_length,如果默认分词计算方式不满足，可以自己扩展
    # Built-in get_text_length, if the default word segmentation calculation method does not meet the requirements, you can expand it yourself
    #get_text_length:Callable[[str],int] = lambda x:len(re.split("\n| ",x))
)

#使用小样本提示词模版来实现动态示例的调用
# Use the small sample prompt template to realize the call of dynamic examples
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="给出每个输入词的反义词",
    suffix="原词：{adjective}\n反义：",
    input_variables=["adjective"]
)



In [3]:
#小样本获得所有示例,这样可以有效减少输入的提示词长度
# Small sample to get all examples, which can effectively reduce the length of the input prompts
print(dynamic_prompt.format(adjective="big"))

给出每个输入词的反义词

原词：happy
反义：sad

原词：tall
反义：short

原词：sunny
反义：gloomy

原词：windy
反义：calm

原词：高兴
反义：悲伤

原词：big
反义：


In [4]:
#如果输入长度很长，则最终输出会根据长度要求减少
# If the input length is very long, the final output will be reduced according to the length requirements
long_string = "big and huge adn massive and large and gigantic and tall and much much much much much much bigger then everyone"
print(dynamic_prompt.format(adjective=long_string))

给出每个输入词的反义词

原词：happy
反义：sad

原词：tall
反义：short

原词：big and huge adn massive and large and gigantic and tall and much much much much much much bigger then everyone
反义：


- 根据输入相似度选择示例(最大边际相关性)
- MMR是一种在信息检索中常用的方法，它的目标是在相关性和多样性之间找到一个平衡。MMR会首先找出与输入最相似（即余弦相似度最大）的样本。然后在迭代添加样本的过程中，对于与已选择样本过于接近（即相似度过高）的样本进行惩罚。MMR既能确保选出的样本与输入高度相关，又能保证选出的样本之间有足够的多样性。关注如何在相关性和多样性之间找到一个平衡。
- MMR is a method commonly used in information retrieval that aims to find a balance between relevance and diversity.MMR will first identify the samples that are most similar to the input (i.e., have the largest cosine similarity). Then, as iteratively adding samples, it penalizes samples that are too close to the selected samples (i.e., too similar).MMR ensures both that the selected samples are highly relevant to the input and that there is sufficient diversity among the selected samples. Concerns how to find a balance between relevance and diversity.

In [5]:
#使用MMR来检索相关示例，以使示例尽量符合输入
# Use MMR to retrieve relevant examples to make the examples as close as possible to the input

import os
from langchain_community.vectorstores import FAISS
from langchain_core.example_selectors import (
    MaxMarginalRelevanceExampleSelector,
    SemanticSimilarityExampleSelector,
)
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate
from langchain_openai import OpenAIEmbeddings

api_base = os.getenv("OPENAI_PROXY")
api_key = os.getenv("OPENAI_API_KEY")

#假设已经有这么多的提示词示例组：
# Suppose there are so many prompt examples:
examples = [
    {"input":"happy","output":"sad"},
    {"input":"tall","output":"short"},
    {"input":"sunny","output":"gloomy"},
    {"input":"windy","output":"calm"},
    {"input":"高兴","output":"悲伤"}
]

#构造提示词模版
# Construct prompt template
example_prompt = PromptTemplate(
    input_variables=["input","output"],
    template="原词：{input}\n反义：{output}"
)

- 使用FAISS向量数据库能力，需要安装
- Using the FAISS vector database capability requires the installation of the

In [6]:
! pip install faiss-cpu

Collecting faiss-cpu
  Obtaining dependency information for faiss-cpu from https://files.pythonhosted.org/packages/3a/0a/d18ff177cab09587918b6e67ce75b7e0a2b90ea0b4fdc7c3535cca39c5e8/faiss_cpu-1.8.0.post1-cp312-cp312-win_amd64.whl.metadata
  Using cached faiss_cpu-1.8.0.post1-cp312-cp312-win_amd64.whl.metadata (3.8 kB)
Using cached faiss_cpu-1.8.0.post1-cp312-cp312-win_amd64.whl (14.6 MB)
Installing collected packages: faiss-cpu
Successfully installed faiss-cpu-1.8.0.post1



[notice] A new release of pip is available: 23.2.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [7]:
#调用MMR
# Call MMR
example_selector = MaxMarginalRelevanceExampleSelector.from_examples(
    #传入示例组
    # Pass in the example group
    examples,
    #使用openai的嵌入来做相似性搜索
    # Use openai's embedding for similarity search
    OpenAIEmbeddings(openai_api_base=api_base,openai_api_key=api_key),
    #设置使用的向量数据库是什么
    # Set what vector database is used
    FAISS,
    #结果条数
    # Number of results
    k=2,
)

#使用小样本模版
mmr_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="给出每个输入词的反义词",
    suffix="原词：{adjective}\n反义：",
    input_variables=["adjective"]
)

In [8]:
#当我们输入一个描述情绪的词语的时候，应该选择同样是描述情绪的一对示例组来填充提示词模版
# When we enter a word describing emotions, we should choose a pair of examples that also describe emotions to fill the prompt template
print(mmr_prompt.format(adjective="难过"))

给出每个输入词的反义词

原词：高兴
反义：悲伤

原词：tall
反义：short

原词：难过
反义：


- 根据输入相似度选择示例(最大余弦相似度)
- 一种常见的相似度计算方法。它通过计算两个向量（在这里，向量可以代表文本、句子或词语）之间的余弦值来衡量它们的相似度。余弦值越接近1，表示两个向量越相似。主要关注的是如何准确衡量两个向量的相似度
- A common method of similarity calculation. It measures the similarity between two vectors (in this case, vectors can represent text, sentences or words) by calculating their cosine value. The closer the cosine value is to 1, the more similar the two vectors are. The main concern is how to accurately measure the similarity of two vectors

In [9]:
# 使用最大余弦相似度来检索相关示例，以使示例尽量符合输入
# Use the maximum cosine similarity to retrieve relevant examples to make the examples as close as possible to the input

from langchain_community.vectorstores import Chroma
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate
from langchain_openai import OpenAIEmbeddings

import os
api_base = os.getenv("OPENAI_PROXY")
api_key = os.getenv("OPENAI_API_KEY")


example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="原词: {input}\n反义: {output}",
)

# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

- 使用Chromdb向量数据库能力，需要安装
- Using the Chromdb vector database capability requires the installation of the

In [10]:
! pip install chromadb==0.4.15

Collecting chromadb==0.4.15
  Obtaining dependency information for chromadb==0.4.15 from https://files.pythonhosted.org/packages/f1/2a/549be867b5ab45112aacd9d113af788768a015e9e6d0ade831b45c1df877/chromadb-0.4.15-py3-none-any.whl.metadata
  Using cached chromadb-0.4.15-py3-none-any.whl.metadata (7.2 kB)
Collecting chroma-hnswlib==0.7.3 (from chromadb==0.4.15)
  Downloading chroma-hnswlib-0.7.3.tar.gz (31 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting fastapi>=0.95.2 (from chromadb==0.4.15)
  Obtaining dependency information for fastapi>=0.95.2 from https://files.pythonhosted.org/packages/06/ab/a1f7eed031aeb1c406a6e9d45ca04bff401c8a25a30dd0e4fd2caae767c3/fastapi-0.115.0-py3-none-any.whl.metadata


[notice] A new release of pip is available: 23.2.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Downloading opentelemetry_api-1.27.0-py3-none-any.whl (63 kB)
   ---------------------------------------- 0.0/64.0 kB ? eta -:--:--
   -------------------------------------- - 61.4/64.0 kB 1.7 MB/s eta 0:00:01
   ---------------------------------------- 64.0/64.0 kB 1.1 MB/s eta 0:00:00
Downloading opentelemetry_exporter_otlp_proto_grpc-1.27.0-py3-none-any.whl (18 kB)
Downloading opentelemetry_exporter_otlp_proto_common-1.27.0-py3-none-any.whl (17 kB)
Downloading opentelemetry_proto-1.27.0-py3-none-any.whl (52 kB)
   ---------------------------------------- 0.0/52.5 kB ? eta -:--:--
   ---------------------------------------- 52.5/52.5 kB 1.4 MB/s eta 0:00:00
Downloading opentelemetry_sdk-1.27.0-py3-none-any.whl (110 kB)
   ---------------------------------------- 0.0/110.5 kB ? eta -:--:--
   ---------------------- ----------------- 61.4/110.5 kB 3.4 MB/s eta 0:00:01
   ---------------------------------------- 110.5/110.5 kB 1.6 MB/s eta 0:00:00
Downloading opentelemetry_semantic_conv

In [11]:
example_selector = SemanticSimilarityExampleSelector.from_examples(
    # 传入示例组.
    # Pass in the example group.
    examples,
    # 使用openAI嵌入来做相似性搜索
    # Use openAI embeddings for similarity search
    OpenAIEmbeddings(openai_api_key=api_key,openai_api_base=api_base),
    # 使用Chroma向量数据库来实现对相似结果的过程存储
    # Use the Chroma vector database to implement the process storage of similar results
    Chroma,
    # 结果条数
    # Number of results
    k=1,
)

#使用小样本提示词模板
similar_prompt = FewShotPromptTemplate(
    # 传入选择器和模板以及前缀后缀和输入变量
    # Pass in the selector and template, as well as the prefix and suffix and input variables
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="给出每个输入词的反义词",
    suffix="原词: {adjective}\n反义:",
    input_variables=["adjective"],
)

In [12]:
# 输入一个形容感觉的词语，应该查找近似的 happy/sad 示例
print(similar_prompt.format(adjective="worried"))

给出每个输入词的反义词

原词: happy
反义: sad

原词: worried
反义:
