标题: 使用 # 来表示不同级别的标题，比如：

# 一级标题
## 二级标题
### 三级标题
粗体: 使用 ** 或 __ 来包围文本，例如：

**这是粗体文本** 或 __这是粗体文本__
斜体: 使用 * 或 _ 来包围文本，例如：

*这是斜体文本* 或 _这是斜体文本_
删除线: 使用 ~~ 来包围文本，例如：

~~这是带有删除线的文本~~
链接: 使用 []() 来创建一个链接，例如：

[百度](http://www.baidu.com)
图片: 使用 ![]() 来插入一个图片，例如：

![图片描述](图片链接)
无序列表: 使用 * 或 - 或 + 后接空格来创建无序列表，例如：

* 项目1
- 项目2
+ 项目3
有序列表: 使用数字和点后接空格来创建有序列表，例如：

1. 项目1
2. 项目2
3. 项目3
引用: 使用 > 来创建引用，例如：

> 这是一段引用
代码: 使用 ` ` 来插入内联代码，例如 这是一段代码。使用 ``` ``` 来插入代码块，

In [22]:
import os
# os.environ['HTTP_PROXY'] = "127.0.0.1:10809"
# os.environ['HTTPS_PROXY']="127.0.0.1:10809"
os.environ['HTTP_PROXY'] = "127.0.0.1:33210"
os.environ['HTTPS_PROXY']="127.0.0.1:33210"
os.environ["OPENAI_API_KEY"] = ""

In [23]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name='gpt-3.5-turbo-16k', temperature=0)

## 从key文件读取key值
* 输入参数: file_path 文件路径
* 返回值: 存有key的列表

In [24]:
def read_keys_from_file(file_path):
    key_pool = []
    with open(file_path, 'r') as file:
        for line in file:
            key = line.strip()  # 去掉每一行的换行符
            key_pool.append(key)  # 将每一行存入key池中
    return key_pool

In [25]:
# 读取key
file_path = "key.txt"  # 将这个替换为你的txt文件的路径
key_pool = read_keys_from_file(file_path)

## 读取PDF文档函数
* 参数:文件路径file_path,提取前n页
* 返回值:一个列表

In [26]:
import pdfplumber

def read_pdf(file_path, n):
    inner = []
    text = ""
    with pdfplumber.open(file_path) as pdf:
            # 提取前n页
            for i, page in enumerate(pdf.pages):
                # 如果已经处理了n页，就停止
                if i == n:
                    break
                inner.append(page.extract_text())
    return inner

# 按页数进行划分
* 参数:inner->一个装有每一页pdf的列表
* 参数:group_size->每个组的大小
* 返回值:一个列表

> 分割逻辑:
> 以 inner = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 举例
第一次循环：i = 0，子列表为 inner[0:0+4] = [1, 2, 3, 4]
第二次循环：i = 3，子列表为 inner[3:3+4] = [4, 5, 6, 7]
第三次循环：i = 6，子列表为 inner[6:6+4] = [7, 8, 9, 10]

In [27]:
def group_inner(inner, group_size):
    # 将inner按照每n页为一组进行分割
    """

    :param inner:
    :param group_size:
    :return:
    """
    grouped_inner = [inner[i:min(i + group_size, len(inner))] for i in range(0, len(inner), group_size - 1)]
    return grouped_inner


In [28]:
content = read_pdf(r'KT820说明书V2.0 201909.pdf', 15)

In [29]:
inner = group_inner(content, 5)

In [30]:
from langchain.prompts import  ChatPromptTemplate
prompts = """
    ----
    {text}
    ----
    帮我把上面内容总结成2-4个目录，目录不要子目录，目录为简洁摘要，其它内容不用给我
    """

prompts2 = """
    abstract:
    ----
    {abstract}
    ----
    text:
    ----
    {text}
    ----

    跟据上面的摘要，找出下面内容中每个摘要对应的开始位置内容,如果涉及到目录部分,有一大串的...就只返回三个就行,不用返回全部.：
    返回格式:
    [{respond_struction}]

"""

prompts3 = """
    text1:
    ----
    {text1}
    ----
    text2:
    ----
    {text2}
    ----

        现在，你有两个字典格式的文本字符串，它们分别命名为 text1 和 text2。这两个字典中都有一个content字段，你需要判断这两个字段中的内容是否相似。
        如果它们的content相似，那么你需要合并这两个文本中的content字段的内容，并基于这个新的content生成一个新的abstract。返回的数据格式应该为一个字典，如下所示：
        返回格式:
        [{respond_struction}]
        如果它们的content不相似，那么直接返回原来的两个字典格式的字符串文本 text1 和 text2。

"""
promptsTemplate = ChatPromptTemplate.from_template(prompts)
promptsTemplate2 = ChatPromptTemplate.from_template(prompts2)
promptsTemplate3 = ChatPromptTemplate.from_template(prompts3)

## 指定返回值格式
> 返回值格式采用json数据,包含三部分:关键词,开始词,结束词
### langchain的json格式
1. 先定义单个模式,Schema = ResponseSchema(name="",description="")
2. 然后将多个模式组合在一起response_schemas = [abstract_schema, start_schema, end_schema]
3. 通过结构输出解析函数解析成结构对象 = StructuredOutputParser(response_schemas = response_schemas)
4. 最后将结构对象解析成json格式的内容


In [31]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser
result_schema = ResponseSchema(name='result', description="这一部分是一个数组,用来记录返回的内容的,里面的每一项都是一个字典,每个字典里面有两个key,一个key名为abstract。其对应的内容是摘要部分的内容, 也就是用来记录摘要的。另一个key的名字是content,这一部分是返回和上面摘要部分相关的全部内容,有n个摘要就返回n个内容")
# abstract_schema = ResponseSchema(name="abstract",description="这是摘要部分的内容, 也就是用来记录摘要的")
# content_schema = ResponseSchema(name="content",description="返回和上面摘要部分相关的全部内容,有n个摘要就返回n个内容")

response_schemas = [result_schema]
out_parse = StructuredOutputParser(response_schemas = response_schemas)
format_instructions = out_parse.get_format_instructions()

## 制作两条链来定位
* 第一条链用来摘要总结文本内容,第一条链用3.5的模型
* 第二条链用来根据摘要来定位文本内容,第二条链用3.5模型

In [47]:
from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=promptsTemplate)
# llm2 = ChatOpenAI(model_name='gpt-4-0613', temperature=0.2)
chain2 = LLMChain(llm=llm, prompt=promptsTemplate2)
chain3 = LLMChain(llm=llm, prompt=promptsTemplate3)

['功 能 描 述 规 格 指 标\n液晶显示 8英寸，TFT真彩显示\n位置，程序，刀补，\n显示\n报警，诊断，参数， 显示内容丰富，直观\n设置，U盘,图形\n程序导入导出 有\n第\nU盘功能 参数导入导出 有 一\n篇\n系统U盘升级 有\n编\n输入口 54路开关量，光电隔离输入 程\n篇\n输出口 48路开关量输出（OC输出）\n变频器模拟量控制或 S1～S4 档位控制；主轴\n主轴功能\n模拟量输出倍率可调0～150%；\nM，S，T机能\n刀位号：T01～T08，刀补号：01～24；电动刀\n刀具功能 架，排刀刀架或专用刀架；运行中修整刀补值；\n程序控制动态刀补补偿。\n辅助T功能 有，特定T代码执行特定子程序\n辅助M功能 有，特定M代码执行特定子程序\n快捷MDI方式 在位置界面下直接输入要执行的程序段\nMDI方式\n传统MDI方式 进入MDI输入界面，按字段输入\n补偿机能 补偿功能 刀具补偿、反向间隙补偿、丝杠螺距误差补偿\nG93 外圆车削循环\nG94 端面车削循环（平面，锥面）\n螺纹循环（直、锥螺纹，公、英制，单头、多\nG92\n头螺纹、任意螺纹切入角）\nG86，G87 螺纹复合循环\nG70,G71,G72,G73 复合循环\nG74 端面钻孔循环\nG75 切槽或割断循环\nG76 多重螺纹切削循环\nG33 刚性攻丝循环\nG32 单刀螺纹功能\n其他螺纹功能\nG34 变螺距螺纹功能\n倒角功能 G01 L/R 直线或圆弧倒角\n9', '功 能 描 述 规 格 指 标\n信号跳转机能 G31 进给运行中遇信号跳转\n程序段自动速度过渡功能，过渡曲线自动动态\n段平滑过渡 G61，G64\n调整\n无限、有限循环 程序或部分程序段进行无限次循环加工或有\nM92\n功能 限次循环加工\n第\n一 程序条件 根据外部条件信号，跳转到程序的不同指令流\n篇 M91\n跳转机能 程执行。\n编\n程 扩展输出口电平输出方式或脉冲输出方式控\n扩展输出口控制 M20,M21,M22\n篇 制\n外部条件\nM01 等待外部有效信号输入，超时报警\n等待机能\n输出自动重复 适用于自动上下料的功能，检测上料状态，重\nM35\n控制功能 复连续上料\n旋转轴控制\nM26,M27,M28 进行旋转速度和方向设定\n（Y轴）\n

In [74]:
# import random
# import re
# import json
#
# abstract=[]
# result = []
# unprocess_list = []
#
# for text in inner:
#     try:
#         os.environ["OPENAI_API_KEY"] = key_pool[random.randint(0,len(key_pool)-1)]
#         print(os.environ.get("OPENAI_API_KEY"))
#         response=chain.run(text = text)
#         print(response)
#         os.environ["OPENAI_API_KEY"] = key_pool[random.randint(0,len(key_pool)-1)]
#         res = chain2.run(text=text,abstract=response,respond_struction=format_instructions)
#         print(res)
#         unprocess_list.append(out_parse.parse(res))
#     except Exception as e:
#         print("An error occurred:", e)

import os
import random
import concurrent.futures
import time
unprocess_list = [None] * len(inner)
print(len(unprocess_list))

# 处理文本函数
def process_text(index, text):
    time.sleep(random.uniform(10,30))  # 添加随机时间
    try:
        os.environ["OPENAI_API_KEY"] = key_pool[index%(len(key_pool) - 1)]
        print(os.environ.get("OPENAI_API_KEY"))
        response = chain.run(text = text)
        print(response)
        time.sleep(5)
        time.sleep(random.uniform(1*index, 3*index))  #添加随机时间
        os.environ["OPENAI_API_KEY"] = key_pool[index%(len(key_pool) - 1)]
        res = chain2.run(text=text, abstract=response, respond_struction=format_instructions)
        print(res)
        return index, out_parse.parse(res)
    except Exception as e:
        print("An error occurred:", e)
        return index, None

# 创建一个ThreadPoolExecutor线程池
with concurrent.futures.ThreadPoolExecutor() as executor:
    # 用于提交任务到线程池并获取结果的Future对象
    futures = [executor.submit(process_text, index, text) for index, text in enumerate(inner)]
    for future in concurrent.futures.as_completed(futures):
        index, result = future.result()
        if result is not None:
            # 结果是按照顺序放入unprocess_list
            unprocess_list[index] = result

# 收集为unprocess_list为none的下标
none_index = []
for i, item in enumerate(unprocess_list):
    if item is None:
        none_index.append(i)
print(none_index)

4
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
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


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per day. Limit: 200 / day. Please try again in 7m12s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per day. Limit: 200 / day. Please try again in 7m12s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate

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


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. 

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


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. 

An error occurred: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..


An error occurred: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per day. Limit: 200 / day. Please try again in 7m12s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 16.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate li

An error occurred: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 16.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit.

An error occurred: Rate limit reached for default-gpt-3.5-turbo-16k in organization org-DNaLFn24WKfrrYhbUpFtdQIU on requests per day. Limit: 200 / day. Please try again in 7m12s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.
[0, 1, 2, 3]


第三条链:用来去重,去掉每一次切割部分重叠的内容

In [62]:
# for i in range(len(unprocess_list)-1):
#     data1 = unprocess_list[i]['result']
#     data2 = unprocess_list[i+1]['result']
#     need_process_data = []
#     need_process_data.append(data1[len(data1)-1])
#     need_process_data.append(data2[0])
#     os.environ["OPENAI_API_KEY"] = key_pool[random.randint(0,len(key_pool)-1)]
#     respone = chain3.run(text1=need_process_data[0],text2=need_process_data[1],respond_struction=format_instructions)
#     print(respone)


[0, 1, 2, 3]
