假设我们需要开发一个数据分析助手，它的主要功能是根据我们每天卖出去的花，统计成本和收入，最后将收入和成本保存到数据库里。

Assistant	助手，可以使用指定模型根据的一个实体，如果把助手比作某个人的化，这里就是指具备某些能力的一个具体的人

Thread	没有合适的翻译，这里就不翻译了，可以认为这个是和助手的沟通的上下文对话信息， 就好比你和某宝客服沟通，整个对话就可以认为是一个Thread

Run	也没有合适的翻译，可以认为是你向助手发起一次对话，整个对话响应的过程及工程中的状态变化，就可以当成一个run，一个run里不仅仅可以有模型的回复，还可以有函数调用、代码解释器调用、文件召回……

Run Step	Run各个步骤的详情，可以看到整个助手的运行过程，主要是方便问题排查和助手优化


In [60]:
import openai
from openai import OpenAI
import os
openai.api_key = os.getenv("OPENAI_API_KEY")

from pathlib import Path


# 数据文件准备


In [61]:
client = OpenAI()

# 文件上传
file = client.files.create(
    file=open("data/covid_worldwide.csv", "rb"),
    purpose="assistants"
)


In [62]:
file

FileObject(id='file-3tcLztUj39qmbRWW1Dbijn88', bytes=15552, created_at=1700473444, filename='covid_worldwide.csv', object='file', purpose='assistants', status='processed', status_details=None)

In [63]:
# 定义保存账单的方法
def get_total_revovered(totalRecovered):
    '''保存总成本和总的收入'''
    print(totalRecovered)
    return "success"

function = {
        "type": "function",
        "function": {
            "name": "get_total_revovered",
            "description": "保存恢复人数",
            "parameters": {
                "type": "object",
                "properties": {
                    "totalRecovered": {
                        "type": "number",
                        "description": "总恢复人数",
                    }
                },
                "required": ["totalRecovered"],
            },
        }
    }
available_functions = { "get_total_revovered": get_total_revovered}  


# 创建助手


In [65]:
assistant = client.beta.assistants.create(
    name="Data analyst",
    # description="按照每种商品的售出量，统计成本和收入，计算出总利润",
    instructions="You are a professional data analyst with years of experience. Answer the question as truthfully as possible based on the information provided.",
    model="gpt-3.5-turbo-1106",
    tools=[{"type": "code_interpreter"}, {"type": "retrieval"}, function],
    file_ids = [file.id]
)

In [66]:
assistant

Assistant(id='asst_9LR3yFBZUCyQr3yGZ5wigthE', created_at=1700473594, description=None, file_ids=['file-3tcLztUj39qmbRWW1Dbijn88'], instructions='You are a professional data analyst with years of experience. Answer the question as truthfully as possible based on the information provided.', metadata={}, model='gpt-3.5-turbo-1106', name='Data analyst', object='assistant', tools=[ToolCodeInterpreter(type='code_interpreter'), ToolRetrieval(type='retrieval'), ToolFunction(function=FunctionDefinition(name='get_total_revovered', parameters={'type': 'object', 'properties': {'totalRecovered': {'type': 'number', 'description': '总恢复人数'}}, 'required': ['totalRecovered']}, description='保存恢复人数'), type='function')])

# 创建Thread

In [67]:
thread = client.beta.threads.create(
    messages=[
        {
            "role": "user",
            "content": "各个国家恢复人数占比是多少？"
        }
    ]
)

thread

Thread(id='thread_5AATsGVU0T7rpmb2lONsuKHm', created_at=1700473668, metadata={}, object='thread')

这里看到Thead并没有和Assistant关联到一起，猜测这里只是在本地代码里创建了一个Thread对象，实际上在OpenAI那边还没有任何操作，这个可能是OpenAI利用懒加载来减轻对服务端的压力。

# 创建 run

In [68]:
run = client.beta.threads.runs.create(
    thread_id = thread.id,
    assistant_id = assistant.id
)
run

Run(id='run_3hoGRwWWwVf8IE7mIgjxWxQX', assistant_id='asst_9LR3yFBZUCyQr3yGZ5wigthE', cancelled_at=None, completed_at=None, created_at=1700473673, expires_at=1700474273, failed_at=None, file_ids=['file-3tcLztUj39qmbRWW1Dbijn88'], instructions='You are a professional data analyst with years of experience. Answer the question as truthfully as possible based on the information provided.', last_error=None, metadata={}, model='gpt-3.5-turbo-1106', object='thread.run', required_action=None, started_at=None, status='queued', thread_id='thread_5AATsGVU0T7rpmb2lONsuKHm', tools=[ToolAssistantToolsCode(type='code_interpreter'), ToolAssistantToolsRetrieval(type='retrieval'), ToolAssistantToolsFunction(function=FunctionDefinition(name='get_total_revovered', parameters={'type': 'object', 'properties': {'totalRecovered': {'type': 'number', 'description': '总恢复人数'}}, 'required': ['totalRecovered']}, description='保存恢复人数'), type='function')])

Run创建好之后，需要让OpenAI运行起来，这里就需要调用Retrieve方法，来获取Run的运行结果，这里如果你打印出run的话，你可能会看到类似的信息。

In [69]:
# 通过retrieve 获取run 状态
run = client.beta.threads.runs.retrieve(
    thread_id = thread.id,
    run_id = run.id
)

run

Run(id='run_3hoGRwWWwVf8IE7mIgjxWxQX', assistant_id='asst_9LR3yFBZUCyQr3yGZ5wigthE', cancelled_at=None, completed_at=None, created_at=1700473673, expires_at=1700474273, failed_at=None, file_ids=['file-3tcLztUj39qmbRWW1Dbijn88'], instructions='You are a professional data analyst with years of experience. Answer the question as truthfully as possible based on the information provided.', last_error=None, metadata={}, model='gpt-3.5-turbo-1106', object='thread.run', required_action=None, started_at=1700473673, status='in_progress', thread_id='thread_5AATsGVU0T7rpmb2lONsuKHm', tools=[ToolAssistantToolsCode(type='code_interpreter'), ToolAssistantToolsRetrieval(type='retrieval'), ToolAssistantToolsFunction(function=FunctionDefinition(name='get_total_revovered', parameters={'type': 'object', 'properties': {'totalRecovered': {'type': 'number', 'description': '总恢复人数'}}, 'required': ['totalRecovered']}, description='保存恢复人数'), type='function')])

In [70]:
list(run)

[('id', 'run_3hoGRwWWwVf8IE7mIgjxWxQX'),
 ('assistant_id', 'asst_9LR3yFBZUCyQr3yGZ5wigthE'),
 ('cancelled_at', None),
 ('completed_at', None),
 ('created_at', 1700473673),
 ('expires_at', 1700474273),
 ('failed_at', None),
 ('file_ids', ['file-3tcLztUj39qmbRWW1Dbijn88']),
 ('instructions',
  'You are a professional data analyst with years of experience. Answer the question as truthfully as possible based on the information provided.'),
 ('last_error', None),
 ('metadata', {}),
 ('model', 'gpt-3.5-turbo-1106'),
 ('object', 'thread.run'),
 ('required_action', None),
 ('started_at', 1700473673),
 ('status', 'in_progress'),
 ('thread_id', 'thread_5AATsGVU0T7rpmb2lONsuKHm'),
 ('tools',
  [ToolAssistantToolsCode(type='code_interpreter'),
   ToolAssistantToolsRetrieval(type='retrieval'),
   ToolAssistantToolsFunction(function=FunctionDefinition(name='get_total_revovered', parameters={'type': 'object', 'properties': {'totalRecovered': {'type': 'number', 'description': '总恢复人数'}}, 'required': ['

- queued	当Runs首次创建或者调用了retrive获取状态后，就会变成queued等待运行。正常情况下，很快就会变成in_progress状态。
- in_progress	说明run正在执行中，这时候可以调用run step来查看具体的执行过程
- completed	执行完成，可以获取Assistant返回的消息了，也可以继续想Assistant提问了
- requires_action	如果Assistant需要执行函数调用，就会转到这个状态，然后你必须按给定的参数调用指定的方法，之后run才可以继续运行
- expired	当没有在expires_at之前提交函数调用输出，run将会过期。另外，如果在expires_at之前没获取输出，run也会变成expired状态
- cancelling	当你调用client.beta.threads.runs.cancel(run_id=run.id, thread_id=thread.id)方法后，run就会变成cancelling，取消成功后就会变成callcelled状态
- cancelled	Run已成功取消。
- failed	运行失败，你可以通过查看Run中的last_error对象来查看失败的原因。

**这里需要特别注意requires_action状态，这个是需要要求代码本地去执行一些函数的，执行完成后将结果返回给Assistant，之后run才能继续运行下去。**


In [71]:
# run 触发函数调用
import json

if run.status == 'requires_action':
    tool_outputs = []
    # 调用并保存所有函数调用的结果
    for call in run.required_action.submit_tool_outputs.tool_calls:
        if call.type != "function":
            continue
        print(call)
        print(json.loads(call.function.arguments))
        # 获取真实函数
        function = available_functions[call.function.name]
        output= {
            "tool_call_id": call.id,
            "output": function(**json.loads(call.function.arguments)),
        }
        tool_outputs.append(output)

    run = client.beta.threads.runs.submit_tool_outputs(
        thread_id = thread.id,
        run_id = run.id,
        tool_outputs = tool_outputs
    )
        

In [72]:
list(run)

[('id', 'run_3hoGRwWWwVf8IE7mIgjxWxQX'),
 ('assistant_id', 'asst_9LR3yFBZUCyQr3yGZ5wigthE'),
 ('cancelled_at', None),
 ('completed_at', None),
 ('created_at', 1700473673),
 ('expires_at', 1700474273),
 ('failed_at', None),
 ('file_ids', ['file-3tcLztUj39qmbRWW1Dbijn88']),
 ('instructions',
  'You are a professional data analyst with years of experience. Answer the question as truthfully as possible based on the information provided.'),
 ('last_error', None),
 ('metadata', {}),
 ('model', 'gpt-3.5-turbo-1106'),
 ('object', 'thread.run'),
 ('required_action', None),
 ('started_at', 1700473673),
 ('status', 'in_progress'),
 ('thread_id', 'thread_5AATsGVU0T7rpmb2lONsuKHm'),
 ('tools',
  [ToolAssistantToolsCode(type='code_interpreter'),
   ToolAssistantToolsRetrieval(type='retrieval'),
   ToolAssistantToolsFunction(function=FunctionDefinition(name='get_total_revovered', parameters={'type': 'object', 'properties': {'totalRecovered': {'type': 'number', 'description': '总恢复人数'}}, 'required': ['

In [74]:
# 获取 Assistant 的消息
run = client.beta.threads.runs.retrieve(
    thread_id = thread.id,
    run_id = run.id
)
if run.status == 'completed':
    messages = client.beta.threads.messages.list(
        thread_id = thread.id
    )
    print(messages)
elif run.status == 'failed':
    print(run.error)

    

SyncCursorPage[ThreadMessage](data=[ThreadMessage(id='msg_OPOUTtYJM9UHIRimx2n9BwSN', assistant_id='asst_9LR3yFBZUCyQr3yGZ5wigthE', content=[MessageContentText(text=Text(annotations=[], value='Here are the recovery rates as a percentage for the selected countries:\n\n- USA: 97.24%\n- India: 98.81%\n- France: 99.34%\n- Germany: 98.99%\n- Brazil: 97.54%\n\nThese numbers represent the proportion of recovered cases out of the total reported cases for each country. If you need the recovery rates for more countries, please let me know!'), type='text')], created_at=1700473692, file_ids=[], metadata={}, object='thread.message', role='assistant', run_id='run_3hoGRwWWwVf8IE7mIgjxWxQX', thread_id='thread_5AATsGVU0T7rpmb2lONsuKHm'), ThreadMessage(id='msg_zw85BC2T6NLPrK3B6KsQbRrK', assistant_id='asst_9LR3yFBZUCyQr3yGZ5wigthE', content=[MessageContentText(text=Text(annotations=[], value="The data contains information about the total cases, deaths, recoveries, active cases, total tests, and population

In [80]:
messages.data

[ThreadMessage(id='msg_OPOUTtYJM9UHIRimx2n9BwSN', assistant_id='asst_9LR3yFBZUCyQr3yGZ5wigthE', content=[MessageContentText(text=Text(annotations=[], value='Here are the recovery rates as a percentage for the selected countries:\n\n- USA: 97.24%\n- India: 98.81%\n- France: 99.34%\n- Germany: 98.99%\n- Brazil: 97.54%\n\nThese numbers represent the proportion of recovered cases out of the total reported cases for each country. If you need the recovery rates for more countries, please let me know!'), type='text')], created_at=1700473692, file_ids=[], metadata={}, object='thread.message', role='assistant', run_id='run_3hoGRwWWwVf8IE7mIgjxWxQX', thread_id='thread_5AATsGVU0T7rpmb2lONsuKHm'),
 ThreadMessage(id='msg_zw85BC2T6NLPrK3B6KsQbRrK', assistant_id='asst_9LR3yFBZUCyQr3yGZ5wigthE', content=[MessageContentText(text=Text(annotations=[], value="The data contains information about the total cases, deaths, recoveries, active cases, total tests, and population for different countries. To calcu

In [82]:
list(run)

[('id', 'run_3hoGRwWWwVf8IE7mIgjxWxQX'),
 ('assistant_id', 'asst_9LR3yFBZUCyQr3yGZ5wigthE'),
 ('cancelled_at', None),
 ('completed_at', 1700473693),
 ('created_at', 1700473673),
 ('expires_at', None),
 ('failed_at', None),
 ('file_ids', ['file-3tcLztUj39qmbRWW1Dbijn88']),
 ('instructions',
  'You are a professional data analyst with years of experience. Answer the question as truthfully as possible based on the information provided.'),
 ('last_error', None),
 ('metadata', {}),
 ('model', 'gpt-3.5-turbo-1106'),
 ('object', 'thread.run'),
 ('required_action', None),
 ('started_at', 1700473673),
 ('status', 'completed'),
 ('thread_id', 'thread_5AATsGVU0T7rpmb2lONsuKHm'),
 ('tools',
  [ToolAssistantToolsCode(type='code_interpreter'),
   ToolAssistantToolsRetrieval(type='retrieval'),
   ToolAssistantToolsFunction(function=FunctionDefinition(name='get_total_revovered', parameters={'type': 'object', 'properties': {'totalRecovered': {'type': 'number', 'description': '总恢复人数'}}, 'required': ['to

# 发起新信息

In [None]:
# 添加新消息
message = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content="另外还有2支向日葵，补充下这份账单"
)
# 创建run
run = client.beta.threads.runs.create(
  thread_id=thread.id,
  assistant_id=assistant.id
)
# 获取执行结果
run = client.beta.threads.runs.retrieve(
  thread_id=thread.id,
  run_id=run.id
)
