## AI Agent智能应用从0到1定制开发 
## AI Agent Intelligent Application Custom Development from 0 to 1
******
- 此代码为网课《AI Agent智能应用从0到1定制开发》的配套代码，需要注意本套代码建议与网课适配配合食用。
- This code for the online course <AI Agent Intelligent Applications from 0 to 1 custom development> supporting code, need to pay attention to this set of code is recommended with the online course adapted to work with consumption.
- 需要注意由于课程开发周期的原因，langchain版本跨越了3个大版本，部分代码会与视频演示有差别!
- Note that due to the course development cycle, the langchain version spans 3 major releases and some of the code will differ from the video demo!
- 课程地址：https://coding.imooc.com/class/822.html
- Course address: https://coding.imooc.com/class/822.html

### 从环境变量中读取密钥
### Read the key from the environment variable
- 注意：尽量将你的OpenAI Key存储在类似.env文件中，而不是明文暴露在代码里，这是一种基本的安全措施
- Note: Try to store your OpenAI Key in something like an .env file, rather than exposing it explicitly in code, as a basic safety measure!
******

In [1]:

import os
from dotenv import load_dotenv
# Load environment variables from openai.env file
load_dotenv("asset/openai.env")

# Read the OPENAI_API_KEY from the environment
api_key = os.getenv("OPENAI_API_KEY")
api_base = os.getenv("OPENAI_API_BASE")
os.environ["OPENAI_API_KEY"] = api_key
os.environ["OPENAI_API_BASE"] = api_base

### 文档切割
###  Document cutting
#### 原理
#### Principle
1. 将文档分成小的、有意义的块(句子).
1.  Divide the document into small, meaningful chunks (sentences). 
2. 将小的块组合成为一个更大的块，直到达到一定的大小.
2.  Combine small chunks into larger chunks until a certain size is reached. 
3. 一旦达到一定的大小，接着开始创建与下一个块重叠的部分.
3. Once it reaches a certain size, it starts creating overlapping parts of the next block.
#### 示例
#### Example
- 第一个文档分割
- First Document Splitting
- 按字符切割
- Cutting by Character
- 代码文档切割
- Code Document Splitting
- 按token来切割
-  Cut by token
********

- 第一个文档分割
- First Document Splitting

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

#加载要切割的文档
#load the document to be split
with open("asset/test.txt") as f:
    zuizhonghuanxiang = f.read()

#初始化切割器
#initialize the splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=50,#切分的文本块大小，一般通过长度函数计算 cut the text block size, generally calculated through the length function
    chunk_overlap=20,#切分的文本块重叠大小，一般通过长度函数计算 cut the text block overlap size, generally calculated through the length function
    length_function=len,#长度函数,也可以传递tokenize函数 length function, you can also pass the tokenize function
    add_start_index=True,#是否添加起始索引
)

text = text_splitter.create_documents([zuizhonghuanxiang])
text[0]
text[1]

- 按字符切割
- Cutting by Character

In [None]:
from langchain_text_splitters import CharacterTextSplitter

#加载要切分的文档
with open("asset/test.txt") as f:
    zuizhonghuanxiang = f.read()

#初始化切分器
text_splitter = CharacterTextSplitter(
    separator="。",#切割的标志字符，默认是\n\n
    chunk_size=50,#切分的文本块大小，一般通过长度函数计算
    chunk_overlap=20,#切分的文本块重叠大小，一般通过长度函数计算
    length_function=len,#长度函数,也可以传递tokenize函数
    add_start_index=True,#是否添加起始索引
    is_separator_regex=False,#是否是正则表达式
)
text = text_splitter.create_documents([zuizhonghuanxiang])
print(text[0])

- 代码文档切割
- Code Document Splitting

In [None]:

from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

#支持解析的编程语言
#Supported programming languages
#[e.value for e in Language]

#要切割的代码文档
#Code document to be split
PYTHON_CODE = """
def hello_world():
    print("Hello, World!")
#调用函数
hello_world()
"""
py_spliter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON,
    chunk_size=50,
    chunk_overlap=10,
)
python_docs = py_spliter.create_documents([PYTHON_CODE])
python_docs



- 按token来切割
-  Cut by token

In [None]:
from langchain_text_splitters import CharacterTextSplitter

#要切割的文档
with open("asset/test.txt") as f:
    zuizhonghuanxiang = f.read()

#初始化切分器
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=4000,#切分的文本块大小，一般通过长度函数计算
    chunk_overlap=30,#切分的文本块重叠大小，一般通过长度函数计算
)

text = text_splitter.create_documents([zuizhonghuanxiang])
print(text[0])

### 文档的元数据提取、询问和翻译
### document MetaData, interrogate and translating documents
****

In [None]:
! pip show doctran

In [None]:
! pip install  doctran

In [10]:
from dotenv import load_dotenv
import os
# Load environment variables from openai.env file
load_dotenv("asset/openai.env")
OPENAI_API_KEY = os.environ.get("OPEN_API_KEY")
OPENAI_API_BASE = os.environ.get("OPENAI_API_BASE")
OPENAI_API_MODEL = "gpt-4"

In [2]:
#加载文档
# load the document
with open("asset/letter.txt") as f:
    content = f.read()

In [3]:
print(content)

[Generated with ChatGPT]

Confidential Document - For Internal Use Only

Date: July 1, 2023

Subject: Updates and Discussions on Various Topics

Dear Team,

I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.

Security and Privacy Measures
As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.

HR Updates and Emp

In [4]:
import json

from langchain_community.document_transformers import DoctranPropertyExtractor
from langchain_core.documents import Document

In [5]:
documents = [Document(page_content=content)]
#设置元数据格式
#Set the metadata format
properties = [
    {
        "name": "category",
        "description": "What type of email this is.",
        "type": "string",
        "enum": ["update", "action_item", "customer_feedback", "announcement", "other"],
        "required": True,
    },
    {
        "name": "mentions",
        "description": "A list of all people mentioned in this email.",
        "type": "array",
        "items": {
            "name": "full_name",
            "description": "The full name of the person mentioned.",
            "type": "string",
        },
        "required": True,
    },
    {
        "name": "eli5",
        "description": "Explain this email to me like I'm 5 years old.",
        "type": "string",
        "required": True,
    },
]
#初始化属性提取器
#Initialize the property extractor
property_extractor = DoctranPropertyExtractor(
    openai_api_model=OPENAI_API_MODEL,
    properties=properties
    )

In [6]:
#提取文档
#Extract the document
extracted_document = property_extractor.transform_documents(
    documents, properties=properties
)

In [7]:
print(json.dumps(extracted_document[0].metadata, indent=2))

{
  "extracted_properties": {
    "category": "update",
    "eli5": "This email provides updates and discusses various topics that require attention.",
    "mentions": [
      "John Doe",
      "Jane Smith",
      "Michael Johnson",
      "Sarah Thompson",
      "David Rodriguez"
    ]
  }
}


- 询问文档
- interrogate Documents
- 使用这个文档转换方法，可以将普通的文本格式，转换为QA的对话格式，在输入大模型参考的时候，增加检索相关文档的可能性
- Using this document conversion method, you can convert a normal text format into a QA dialog format, increasing the likelihood of retrieving relevant documents when entering large model references.

In [11]:
import json

from langchain_community.document_transformers import DoctranQATransformer
from langchain_core.documents import Document

In [12]:
#加载文档
#Load the document
documents = [Document(page_content=content)]
#初始化问答转换器
#Initialize the Q&A transformer
qa_transformer = DoctranQATransformer(
    openai_api_model=OPENAI_API_MODEL,
)
transformed_document = qa_transformer.transform_documents(documents)

In [13]:
transformed_document = qa_transformer.transform_documents(documents)
print(json.dumps(transformed_document[0].metadata, indent=2))

{
  "questions_and_answers": [
    {
      "question": "What is the main subject of the document?",
      "answer": "Updates and Discussions on Various Topics"
    },
    {
      "question": "Who has been commended for enhancing network security?",
      "answer": "John Doe"
    },
    {
      "question": "Which department is John Doe from?",
      "answer": "IT department"
    },
    {
      "question": "Where should potential security risks be reported?",
      "answer": "To the dedicated team at security@example.com"
    },
    {
      "question": "What position does the newcomer, Jane Smith, hold?",
      "answer": "Customer service"
    },
    {
      "question": "Who is the HR representative to contact for assistance or questions regarding employee benefits?",
      "answer": "Michael Johnson"
    },
    {
      "question": "What has Sarah Thompson achieved in the past month on the company's social media platforms?",
      "answer": "She has successfully increased the company's f

- 翻译文档
- translating

In [14]:
from langchain_community.document_transformers import DoctranTextTranslator
from langchain_core.documents import Document

In [15]:
#加载文档
#Load the document
documents = [Document(page_content=content)]
#初始化文本翻译器
#Initialize the text translator
qa_translator = DoctranTextTranslator(
    openai_api_model=OPENAI_API_MODEL,
    language="chinese",
    )

In [16]:
translated_document = qa_translator.transform_documents(documents)
print(translated_document[0].page_content)

保密文件 - 仅供内部使用

日期：2023年7月1日

主题：关于各个主题的更新与讨论

亲爱的团队：

希望这封邮件找到你时一切都好。在本文件中，我想为你提供一些重要的更新，并讨论需要我们注意的不同主题。请将本文所含的信息视为高度保密。

安全与隐私措施
作为我们持续确保客户数据安全和隐私的承诺的一部分，我们在所有系统中实施了强大的措施。我们要表扬来自 IT 部门的 John Doe（电子邮件：john.doe@example.com）对提升我们网络安全的勤勉工作。向前看，我们恳请大家严格遵守我们的数据保护政策和指南。另外，如果你发现任何可能的安全风险或事件，请立即报告给我们的专门团队，电邮地址为 security@example.com。

人力资源更新和员工福利
近期，我们欢迎了几位在各自部门做出重大贡献的新团队成员。我想表彰 Jane Smith（社会保障号：049-45-5928）在客户服务中的卓越表现。Jane 始终得到我们客户的好评。此外，请记住我们的员工福利计划的开放登记期即将到来。如果你有任何问题或需要协助，请联系我们的人事代表，Michael Johnson（电话：418-492-3850，电子邮件：michael.johnson@example.com）。

营销计划和运动
我们的营销团队一直积极地制定新的策略，以提高品牌知名度并推动客户参与。我们要感谢 Sarah Thompson（电话：415-555-1234）在管理我们社交媒体平台上的卓越努力。Sarah 在过去的一个月里成功地将我们的粉丝基础增加了 20%。此外，请在你的日历上标记即将于7月15日举行的产品发布活动。我们鼓励所有团队成员参加并支持我们公司的这个重要里程碑。

研究和开发项目
在我们追求创新的过程中，研究和开发部门一直在各种项目上不懈努力。我要承认 David Rodriguez（电子邮件：david.rodriguez@example.com）在项目领导角色上的出色表现。David 对我们开发尖端技术的贡献是至关重要的。此外，我们想提醒大家在我们每月一次的研发头脑风暴会议上分享他们对可能的新项目的想法和建议，会议定于7月10日。

请对这份文件的信息保密，并确保不会向未经授权的个人分享。如果你对所讨论的主题有任何疑问或疑虑，随时直接与我联系。

感谢你的关注，让我们继