# <center>OpenAI在线大模型调用及微调方法

## <center>Ch.17.3 借助Embedding提升用户意图识别准确率

In [111]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tqdm
import time
import sys
import json
import threading
import random

# 加载OpenAI API KEY
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")


# 使用 auto 子模块可以确保在 Jupyter notebooks 和控制台中都能正常显示进度条
from tqdm.auto import tqdm  

from pprint import pprint

# Embedding Tokens 计数
import tiktoken

# 获取Embedding向量表示
from openai.embeddings_utils import get_embedding

### 一、基于银行业务办理的复杂意图识别

#### 1.项目背景说明

- 模拟银行办理业务场景说明

1. **储蓄账户开设与管理**
   - 这类业务涉及到储蓄账户的创建和维护。客户可以在银行开设新的储蓄账户，这通常需要提供个人身份证明、地址证明以及可能的初始存款。银行还提供更新账户信息的服务，如更改联系信息、更改账户类型等。此外，客户还可以查询自己账户的余额、交易记录和其他账户活动。这类服务还可能包括网上银行和移动银行服务的设置和支持，以方便客户远程管理其账户。

2. **贷款服务**
   - 这包括各种类型的贷款申请和咨询服务，如住房贷款、汽车贷款、个人贷款等。银行提供详细的贷款产品信息，包括贷款金额、利率、还款期限和还款方式等。银行还会根据客户的信用评分和财务状况审核贷款申请。对于不同类型的贷款，如住房贷款或汽车贷款，银行可能需要相应的资产作为抵押。此外，银行还提供贷款计算器和专业顾问来帮助客户计划其财务。

3. **信用卡服务**
   - 这涉及信用卡的申请、激活、挂失、信用额度管理和账单查询等服务。客户可以根据自己的需要选择不同类型的信用卡，如奖励卡、积分卡或商务卡等。银行提供在线服务来激活新卡、报告丢失或被盗的卡，并及时发行新卡。客户还可以调整信用额度，查询每月的账单和消费记录。此外，信用卡服务还包括各种优惠和奖励计划，如旅行奖励、现金返还等。

4. **投资与理财咨询**
   - 这类业务提供关于股票、债券、基金和其他投资产品的咨询服务。银行通常会提供个性化理财规划，帮助客户根据自己的风险承受能力、投资目标和时间框架制定投资策略。此外，银行还提供退休规划服务，帮助客户规划其退休金账户和储蓄。投资顾问可帮助客户了解市场动态、资产配置以及潜在的投资机会。

5. **国际业务与汇款**
   - 这类业务涵盖了与国际金融交易相关的服务，包括外汇兑换、国际汇款和外币账户管理。客户可以通过银行进行跨国货币转换和汇款，银行提供即时的汇率信息和汇款指导。对于需要频繁进行国际交易的客户，银行提供外币账户服务，允许存储和管理多种货币。此外，银行还提供企业级的国际贸易融资和汇款服务，支持企业在全球范围内的业务扩展。

- 对话场景文本数据

1. **储蓄账户开设与管理**
   - 客户：“你好，我想开一个储蓄账户。我需要了解一下开户的流程和需要的文件。另外，我还想知道你们的利率是多少。”

2. **贷款服务**
   - 客户：“我想咨询一下关于房屋贷款的事情。我刚看中了一套房子，想知道申请贷款的条件和大概的年利率。还有，贷款的最长期限可以是多久呢？”

3. **信用卡服务**
   - 客户：“嗨，我昨天收到了你们邮寄的新信用卡，但我不太清楚怎么激活它。还有，能不能顺便帮我检查一下我的信用额度，看看是否可以提高一些？”

4. **投资与理财咨询**
   - 客户：“你好，我最近在考虑一些投资理财的事情，但不太了解市场。你们能提供一些基础的股票或者基金投资建议吗？我主要是想为退休后做些准备。”

5. **国际业务与汇款**
   - 客户：“我需要汇一笔款项到国外的家人那里，想问一下你们的国际汇款手续费是多少？同时，我还想了解一下汇款的时效和汇率。”

这些对话示例可以作为您数据集中各个业务类型的模板。如果您需要更多示例或者有其他特定的要求，请随时告诉我。

#### 2.数据集准备

In [396]:
import pandas as pd

# 读取训练集
train_df = pd.read_csv('./data/train_dataset.csv')

# 读取测试集
test_df = pd.read_csv('./data/test_dataset.csv')

In [397]:
pd.set_option('display.max_rows', None)

In [398]:
train_df

Unnamed: 0,Conversation,type
0,我的储蓄账户是否可以与支付软件直接绑定？,1
1,请问如何设置储蓄账户的自动转账功能？,1
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3
4,我想知道，申请房贷需要提供哪些资料？,2
5,请问贵行储蓄账户有什么保险保障吗？,1
6,你好，我想了解一下开设储蓄账户的具体要求和流程。,1
7,我想了解贵行信用卡的超额使用费是怎么计算的。,3
8,我需要了解贵行的国际汇款货币兑换政策。,5
9,我是外籍人士，想在中国申请贷款，有什么特别要求吗？,2


In [82]:
test_df

Unnamed: 0,Conversation,type
0,我想了解一下贷款担保的具体要求。,2
1,我需要汇款到澳大利亚，贵行有什么特别的要求吗？,5
2,请问贵行国际汇款有哪些安全保障措施？,5
3,请问贵行储蓄账户是否有年费或者管理费？,1
4,我的储蓄账户需要更新个人信息，我该怎么操作？,1
...,...,...
68,我有一笔汇款需要紧急发送到加拿大，你们可以加急处理吗？,5
69,我想开一个小孩的储蓄账户，需要父母的信息吗？,1
70,我想知道，办理贷款是否需要担保人？,2
71,我是留学生，想了解贵行的留学生汇款服务。,5


#### 3.Embedding创建过程

- Embedding方法回顾

In [84]:
text = '我的储蓄账户是否可以与支付软件直接绑定？'

In [85]:
res = openai.Embedding.create(
  # 调用第二代Embedding
  model="text-embedding-ada-002",
  input=text,
  encoding_format="float"
)
res

<OpenAIObject list at 0x19e430ef950> JSON: {
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.008031217,
        -0.01483722,
        0.014379447,
        -0.03142475,
        -0.050678127,
        0.00931702,
        -0.02311752,
        -0.009801721,
        0.00041590835,
        -0.019239916,
        0.013625469,
        0.012460842,
        0.0051432108,
        0.0034316122,
        0.0030175971,
        0.017637713,
        0.0025143838,
        -0.0064694053,
        0.010515308,
        -0.0049446183,
        -0.01024603,
        0.017072229,
        -0.01666831,
        -0.013410047,
        -0.014541014,
        0.0077484758,
        0.01991311,
        -0.003857408,
        -0.0034147822,
        -0.008987155,
        0.027318258,
        -0.004857103,
        -0.023373334,
        -0.01162608,
        0.006206859,
        -0.016170146,
        -0.0059678745,
        0.007984094,
        0.004560897,
      

In [90]:
res.data[0]["embedding"]

[-0.008031217,
 -0.01483722,
 0.014379447,
 -0.03142475,
 -0.050678127,
 0.00931702,
 -0.02311752,
 -0.009801721,
 0.00041590835,
 -0.019239916,
 0.013625469,
 0.012460842,
 0.0051432108,
 0.0034316122,
 0.0030175971,
 0.017637713,
 0.0025143838,
 -0.0064694053,
 0.010515308,
 -0.0049446183,
 -0.01024603,
 0.017072229,
 -0.01666831,
 -0.013410047,
 -0.014541014,
 0.0077484758,
 0.01991311,
 -0.003857408,
 -0.0034147822,
 -0.008987155,
 0.027318258,
 -0.004857103,
 -0.023373334,
 -0.01162608,
 0.006206859,
 -0.016170146,
 -0.0059678745,
 0.007984094,
 0.004560897,
 -1.1577115e-05,
 0.021017151,
 0.038075916,
 0.008711145,
 0.0145814065,
 -0.008172588,
 0.018108949,
 -0.012440646,
 -0.013073449,
 -0.03260957,
 0.016129754,
 -0.0072368477,
 -0.026793165,
 -0.013840891,
 -0.010575895,
 -0.018257052,
 0.0017048666,
 -0.0004117009,
 0.022215439,
 -0.0021761032,
 -0.03301349,
 -0.02128643,
 0.03702573,
 -0.029647512,
 -0.01814934,
 -0.015847012,
 0.009613226,
 -0.012965738,
 -0.0007716499,
 0

In [91]:
len(res.data[0]["embedding"])

1536

In [83]:
from openai.embeddings_utils import cosine_similarity, get_embedding

In [92]:
get_embedding?

[1;31mSignature:[0m [0mget_embedding[0m[1;33m([0m[0mtext[0m[1;33m:[0m [0mstr[0m[1;33m,[0m [0mengine[0m[1;33m=[0m[1;34m'text-similarity-davinci-001'[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m [1;33m->[0m [0mList[0m[1;33m[[0m[0mfloat[0m[1;33m][0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m <no docstring>
[1;31mFile:[0m      c:\users\admin\anaconda3\envs\openenv\lib\site-packages\openai\embeddings_utils.py
[1;31mType:[0m      function

In [95]:
embedding_model = "text-embedding-ada-002"

In [96]:
get_embedding(text, engine=embedding_model)

[-0.008031217381358147,
 -0.014837220311164856,
 0.01437944732606411,
 -0.03142474964261055,
 -0.05067812651395798,
 0.009317019954323769,
 -0.023117519915103912,
 -0.009801721200346947,
 0.0004159083473496139,
 -0.019239915534853935,
 0.01362546905875206,
 0.012460841797292233,
 0.005143210757523775,
 0.0034316121600568295,
 0.003017597133293748,
 0.017637712880969048,
 0.0025143837556242943,
 -0.006469405256211758,
 0.010515308007597923,
 -0.004944618325680494,
 -0.010246030054986477,
 0.017072228714823723,
 -0.01666831038892269,
 -0.013410046696662903,
 -0.014541014097630978,
 0.007748475763946772,
 0.01991311088204384,
 -0.0038574079517275095,
 -0.003414782229810953,
 -0.008987154811620712,
 0.027318257838487625,
 -0.004857102874666452,
 -0.023373333737254143,
 -0.011626079678535461,
 0.006206858903169632,
 -0.016170145943760872,
 -0.0059678745456039906,
 0.007984094321727753,
 0.004560897126793861,
 -1.157711540145101e-05,
 0.021017150953412056,
 0.03807591646909714,
 0.0087111452

- 对数据集执行Embedding

In [97]:
train_df["embedding"] = train_df.Conversation.apply(lambda x: get_embedding(x, engine=embedding_model))

In [99]:
train_df.head()

Unnamed: 0,Conversation,type,embedding
0,我的储蓄账户是否可以与支付软件直接绑定？,1,"[-0.008010320365428925, -0.014849383383989334,..."
1,请问如何设置储蓄账户的自动转账功能？,1,"[-0.007989817298948765, -0.025212900713086128,..."
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4,"[-0.0057016052305698395, -0.01597500406205654,..."
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3,"[-0.0032490252051502466, 0.003358007175847888,..."
4,我想知道，申请房贷需要提供哪些资料？,2,"[-0.003918064758181572, 0.012988920323550701, ..."


In [101]:
train_df["embedding"][0]

[-0.008010320365428925,
 -0.014849383383989334,
 0.014324337244033813,
 -0.03142199665307999,
 -0.05067368969321251,
 0.009289278648793697,
 -0.0231154952198267,
 -0.009713354520499706,
 0.0004162926343269646,
 -0.019211305305361748,
 0.013570424169301987,
 0.01247321255505085,
 0.0051461257971823215,
 0.0033993374090641737,
 0.0030930605717003345,
 0.017649630084633827,
 0.002470409730449319,
 -0.006445278413593769,
 0.010534580796957016,
 -0.004930722061544657,
 -0.010285519994795322,
 0.01697649247944355,
 -0.016653388738632202,
 -0.013415602967143059,
 -0.014580128714442253,
 0.0077141402289271355,
 0.019870977848768234,
 -0.0038671670481562614,
 -0.003449822776019573,
 -0.009020023979246616,
 0.027410103008151054,
 -0.0047960951924324036,
 -0.023357823491096497,
 -0.011611598543822765,
 0.006226509343832731,
 -0.016182193532586098,
 -0.006004374474287033,
 0.007983394898474216,
 0.004621079657226801,
 2.5413519324501976e-05,
 0.0209614597260952,
 0.03812643140554428,
 0.0087305754

In [102]:
len(train_df["embedding"][0])

1536

In [103]:
test_df["embedding"] = test_df.Conversation.apply(lambda x: get_embedding(x, engine=embedding_model))

In [132]:
test_df.head()

Unnamed: 0,Conversation,type,embedding
0,我想了解一下贷款担保的具体要求。,2,"[-0.0121180210262537, -0.01918686553835869, 0...."
1,我需要汇款到澳大利亚，贵行有什么特别的要求吗？,5,"[-0.020629247650504112, -0.005797294899821281,..."
2,请问贵行国际汇款有哪些安全保障措施？,5,"[-0.001715182326734066, -0.017442874610424042,..."
3,请问贵行储蓄账户是否有年费或者管理费？,1,"[0.004440059419721365, -0.016759661957621574, ..."
4,我的储蓄账户需要更新个人信息，我该怎么操作？,1,"[-0.01662181131541729, -0.008774831891059875, ..."


In [131]:
test_df["embedding"][0]

[-0.0121180210262537,
 -0.01918686553835869,
 0.020237093791365623,
 -0.02940639667212963,
 -0.020802602171897888,
 0.008677849546074867,
 -0.0006845840252935886,
 0.006075840909034014,
 -0.026928935199975967,
 -0.020277487114071846,
 0.01235364843159914,
 0.0020499650854617357,
 0.012488293461501598,
 -0.015578389167785645,
 -0.0024841942358762026,
 0.013908795081079006,
 0.021610470488667488,
 -0.03145299479365349,
 0.033849671483039856,
 -0.008852886967360973,
 -0.01583421416580677,
 0.0024185548536479473,
 -0.019644657149910927,
 0.005745961330831051,
 0.0009820646373555064,
 -0.0025649811141192913,
 0.008523007854819298,
 -0.010980273596942425,
 -0.013747220858931541,
 0.0036690672859549522,
 0.03244936466217041,
 0.008374898694455624,
 -0.03888538107275963,
 0.00048808695282787085,
 -0.0005200650775805116,
 -0.015241777524352074,
 -0.002009571762755513,
 0.006082572974264622,
 0.012165145948529243,
 -0.030106550082564354,
 -0.004429809749126434,
 0.015915000811219215,
 0.01712680

### 二、基于Function calling的意图识别过程

- 创建不同业务场景下的外部函数

In [2]:
def handle_savings_account_management():
    res = "用户需要执行储蓄账户开设与管理相关业务"
    return res

In [3]:
handle_savings_account_management_description = "这是一个专门用于执行储蓄账户开设与管理相关业务的函数，\
储蓄账户开设与管理业务涉及到储蓄账户的创建和维护。客户可以在银行开设新的储蓄账户，这通常需要提供个人身份证明、地址证明以及可能的初始存款。银行还提供更新账户信息的服务，如更改联系信息、更改账户类型等。此外，客户还可以查询自己账户的余额、交易记录和其他账户活动。这类服务还可能包括网上银行和移动银行服务的设置和支持，以方便客户远程管理其账户。"

In [4]:
handle_savings_account_management_function = {
    "name": "handle_savings_account_management",
    "description": handle_savings_account_management_description,
    "parameters": {}
}

In [5]:
def handle_loan_services():
    res = "用户需要执行贷款服务相关业务"
    return res

In [6]:
handle_loan_services_description = "这是一个专门用于执行贷款服务相关业务的函数，\
贷款服务包括各种类型的贷款申请和咨询服务，如住房贷款、汽车贷款、个人贷款等。银行提供详细的贷款产品信息，包括贷款金额、利率、还款期限和还款方式等。银行还会根据客户的信用评分和财务状况审核贷款申请。对于不同类型的贷款，如住房贷款或汽车贷款，银行可能需要相应的资产作为抵押。此外，银行还提供贷款计算器和专业顾问来帮助客户计划其财务。"

In [7]:
handle_loan_services_function = {
    "name": "handle_loan_services",
    "description": handle_loan_services_description,
    "parameters": {}
}

In [8]:
def handle_credit_card_services():
    res = "用户需要执行信用卡服务相关业务"
    return res

In [9]:
handle_credit_card_services_description = "这是一个专门用于执行信用卡服务相关业务的函数，\
信用卡服务涉及信用卡的申请、激活、挂失、信用额度管理和账单查询等服务。客户可以根据自己的需要选择不同类型的信用卡，如奖励卡、积分卡或商务卡等。银行提供在线服务来激活新卡、报告丢失或被盗的卡，并及时发行新卡。客户还可以调整信用额度，查询每月的账单和消费记录。此外，信用卡服务还包括各种优惠和奖励计划，如旅行奖励、现金返还等。"

In [10]:
handle_credit_card_services_function = {
    "name": "handle_credit_card_services",
    "description": handle_credit_card_services_description,
    "parameters": {}
}

In [11]:
def handle_investment_advisory():
    res = "用户需要执行投资与理财咨询业务"
    return res

In [12]:
handle_investment_advisory_description = "这是一个专门用于执行投资与理财咨询服务的函数，\
投资与理财咨询服务指的是提供关于股票、债券、基金和其他投资产品的咨询服务。银行通常会提供个性化理财规划，帮助客户根据自己的风险承受能力、投资目标和时间框架制定投资策略。此外，银行还提供退休规划服务，帮助客户规划其退休金账户和储蓄。投资顾问可帮助客户了解市场动态、资产配置以及潜在的投资机会。"

In [13]:
handle_investment_advisory_function = {
    "name": "handle_investment_advisory",
    "description": handle_investment_advisory_description,
    "parameters": {}
}

In [14]:
def handle_international_transactions():
    res = "用户需要执行国际业务与汇款相关业务"
    return res

In [15]:
handle_international_transactions_description = "这是一个专门用于执行国际业务与汇款服务的函数，\
国际业务与汇款服务涵盖了与国际金融交易相关的服务，包括外汇兑换、国际汇款和外币账户管理。客户可以通过银行进行跨国货币转换和汇款，银行提供即时的汇率信息和汇款指导。对于需要频繁进行国际交易的客户，银行提供外币账户服务，允许存储和管理多种货币。此外，银行还提供企业级的国际贸易融资和汇款服务，支持企业在全球范围内的业务扩展。"

In [16]:
handle_international_transactions_function = {
    "name": "handle_international_transactions",
    "description": handle_international_transactions_description,
    "parameters": {}
}

- 创建函数说明

In [17]:
functions = [handle_savings_account_management_function, 
             handle_loan_services_function, 
             handle_credit_card_services_function, 
             handle_investment_advisory_function, 
             handle_international_transactions_function]

In [144]:
functions

[{'name': 'handle_savings_account_management',
  'description': '这是一个专门用于执行储蓄账户开设与管理相关业务的函数，储蓄账户开设与管理业务涉及到储蓄账户的创建和维护。客户可以在银行开设新的储蓄账户，这通常需要提供个人身份证明、地址证明以及可能的初始存款。银行还提供更新账户信息的服务，如更改联系信息、更改账户类型等。此外，客户还可以查询自己账户的余额、交易记录和其他账户活动。这类服务还可能包括网上银行和移动银行服务的设置和支持，以方便客户远程管理其账户。',
  'parameters': {}},
 {'name': 'handle_loan_services',
  'description': '这是一个专门用于执行贷款服务相关业务的函数，贷款服务包括各种类型的贷款申请和咨询服务，如住房贷款、汽车贷款、个人贷款等。银行提供详细的贷款产品信息，包括贷款金额、利率、还款期限和还款方式等。银行还会根据客户的信用评分和财务状况审核贷款申请。对于不同类型的贷款，如住房贷款或汽车贷款，银行可能需要相应的资产作为抵押。此外，银行还提供贷款计算器和专业顾问来帮助客户计划其财务。',
  'parameters': {}},
 {'name': 'handle_credit_card_services',
  'description': '这是一个专门用于执行信用卡服务相关业务的函数，信用卡服务涉及信用卡的申请、激活、挂失、信用额度管理和账单查询等服务。客户可以根据自己的需要选择不同类型的信用卡，如奖励卡、积分卡或商务卡等。银行提供在线服务来激活新卡、报告丢失或被盗的卡，并及时发行新卡。客户还可以调整信用额度，查询每月的账单和消费记录。此外，信用卡服务还包括各种优惠和奖励计划，如旅行奖励、现金返还等。',
  'parameters': {}},
 {'name': 'handle_investment_advisory',
  'description': '这是一个专门用于执行投资与理财咨询服务的函数，投资与理财咨询服务指的是提供关于股票、债券、基金和其他投资产品的咨询服务。银行通常会提供个性化理财规划，帮助客户根据自己的风险承受能力、投资目标和时间框架制定投资策略。此外，银行还提供退休规划服务，帮助客户规

In [18]:
available_functions = {
    "handle_savings_account_management": handle_savings_account_management,
    "handle_loan_services": handle_loan_services,
    "handle_credit_card_services": handle_credit_card_services,
    "handle_investment_advisory": handle_investment_advisory,
    "handle_international_transactions": handle_international_transactions
}

- 创建函数调用关系与数据集标签关系

In [19]:
type_dict = {
    "handle_savings_account_management": 1,
    "handle_loan_services": 2,
    "handle_credit_card_services": 3,
    "handle_investment_advisory": 4,
    "handle_international_transactions": 5
}

- function calling测试

In [307]:
text = '我的储蓄账户是否可以与支付软件直接绑定？'

In [309]:
messages = [{"role": "user", "content": text}]
messages

[{'role': 'user', 'content': '我的储蓄账户是否可以与支付软件直接绑定？'}]

In [147]:
functions

[{'name': 'handle_savings_account_management',
  'description': '这是一个专门用于执行储蓄账户开设与管理相关业务的函数，储蓄账户开设与管理业务涉及到储蓄账户的创建和维护。客户可以在银行开设新的储蓄账户，这通常需要提供个人身份证明、地址证明以及可能的初始存款。银行还提供更新账户信息的服务，如更改联系信息、更改账户类型等。此外，客户还可以查询自己账户的余额、交易记录和其他账户活动。这类服务还可能包括网上银行和移动银行服务的设置和支持，以方便客户远程管理其账户。',
  'parameters': {}},
 {'name': 'handle_loan_services',
  'description': '这是一个专门用于执行贷款服务相关业务的函数，贷款服务包括各种类型的贷款申请和咨询服务，如住房贷款、汽车贷款、个人贷款等。银行提供详细的贷款产品信息，包括贷款金额、利率、还款期限和还款方式等。银行还会根据客户的信用评分和财务状况审核贷款申请。对于不同类型的贷款，如住房贷款或汽车贷款，银行可能需要相应的资产作为抵押。此外，银行还提供贷款计算器和专业顾问来帮助客户计划其财务。',
  'parameters': {}},
 {'name': 'handle_credit_card_services',
  'description': '这是一个专门用于执行信用卡服务相关业务的函数，信用卡服务涉及信用卡的申请、激活、挂失、信用额度管理和账单查询等服务。客户可以根据自己的需要选择不同类型的信用卡，如奖励卡、积分卡或商务卡等。银行提供在线服务来激活新卡、报告丢失或被盗的卡，并及时发行新卡。客户还可以调整信用额度，查询每月的账单和消费记录。此外，信用卡服务还包括各种优惠和奖励计划，如旅行奖励、现金返还等。',
  'parameters': {}},
 {'name': 'handle_investment_advisory',
  'description': '这是一个专门用于执行投资与理财咨询服务的函数，投资与理财咨询服务指的是提供关于股票、债券、基金和其他投资产品的咨询服务。银行通常会提供个性化理财规划，帮助客户根据自己的风险承受能力、投资目标和时间框架制定投资策略。此外，银行还提供退休规划服务，帮助客户规

In [149]:
response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        messages=messages,
        functions=functions,
        function_call='auto',  
    )

In [150]:
response

<OpenAIObject chat.completion id=chatcmpl-8fPnafxexOGiWW65fYePR3sZQaBGf at 0x19e447b09a0> JSON: {
  "id": "chatcmpl-8fPnafxexOGiWW65fYePR3sZQaBGf",
  "object": "chat.completion",
  "created": 1704881310,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "\u662f\u7684\uff0c\u60a8\u53ef\u4ee5\u5c06\u60a8\u7684\u50a8\u84c4\u8d26\u6237\u4e0e\u652f\u4ed8\u8f6f\u4ef6\u76f4\u63a5\u7ed1\u5b9a\u3002\u901a\u8fc7\u7ed1\u5b9a\uff0c\u60a8\u53ef\u4ee5\u4f7f\u7528\u652f\u4ed8\u8f6f\u4ef6\u8fdb\u884c\u8d26\u6237\u4f59\u989d\u67e5\u8be2\u3001\u8f6c\u8d26\u548c\u6d88\u8d39\u7b49\u64cd\u4f5c\u3002\u8fd9\u6837\uff0c\u60a8\u5c31\u53ef\u4ee5\u66f4\u65b9\u4fbf\u5730\u7ba1\u7406\u548c\u4f7f\u7528\u60a8\u7684\u50a8\u84c4\u8d26\u6237\u8d44\u91d1\u3002\u8bf7\u6ce8\u610f\uff0c\u5728\u8fdb\u884c\u7ed1\u5b9a\u4e4b\u524d\uff0c\u60a8\u9700\u8981\u786e\u4fdd\u652f\u4ed8\u8f6f\u4ef6\u5df2\u652f\u6301\u60a8\u7684\u94f6\u884c\u548c\u

In [152]:
messages = [
    {"role": "system", "content": "你是一个智能银行客户接待应用，输入的每个user message都是某位银行客户的需求。\
    你的每一次回答都必须调用function call来完成。请仔细甄别用户需求，并合理调用外部函数来进行回答。"},
    {"role": "user", "content": text}]

In [153]:
functions

[{'name': 'handle_savings_account_management',
  'description': '这是一个专门用于执行储蓄账户开设与管理相关业务的函数，储蓄账户开设与管理业务涉及到储蓄账户的创建和维护。客户可以在银行开设新的储蓄账户，这通常需要提供个人身份证明、地址证明以及可能的初始存款。银行还提供更新账户信息的服务，如更改联系信息、更改账户类型等。此外，客户还可以查询自己账户的余额、交易记录和其他账户活动。这类服务还可能包括网上银行和移动银行服务的设置和支持，以方便客户远程管理其账户。',
  'parameters': {}},
 {'name': 'handle_loan_services',
  'description': '这是一个专门用于执行贷款服务相关业务的函数，贷款服务包括各种类型的贷款申请和咨询服务，如住房贷款、汽车贷款、个人贷款等。银行提供详细的贷款产品信息，包括贷款金额、利率、还款期限和还款方式等。银行还会根据客户的信用评分和财务状况审核贷款申请。对于不同类型的贷款，如住房贷款或汽车贷款，银行可能需要相应的资产作为抵押。此外，银行还提供贷款计算器和专业顾问来帮助客户计划其财务。',
  'parameters': {}},
 {'name': 'handle_credit_card_services',
  'description': '这是一个专门用于执行信用卡服务相关业务的函数，信用卡服务涉及信用卡的申请、激活、挂失、信用额度管理和账单查询等服务。客户可以根据自己的需要选择不同类型的信用卡，如奖励卡、积分卡或商务卡等。银行提供在线服务来激活新卡、报告丢失或被盗的卡，并及时发行新卡。客户还可以调整信用额度，查询每月的账单和消费记录。此外，信用卡服务还包括各种优惠和奖励计划，如旅行奖励、现金返还等。',
  'parameters': {}},
 {'name': 'handle_investment_advisory',
  'description': '这是一个专门用于执行投资与理财咨询服务的函数，投资与理财咨询服务指的是提供关于股票、债券、基金和其他投资产品的咨询服务。银行通常会提供个性化理财规划，帮助客户根据自己的风险承受能力、投资目标和时间框架制定投资策略。此外，银行还提供退休规划服务，帮助客户规

In [154]:
response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        messages=messages,
        functions=functions,
        function_call='auto',  
    )

In [155]:
response

<OpenAIObject chat.completion id=chatcmpl-8fPsbomStEt20CHeQ5xseJ7VB0G08 at 0x19e444d59f0> JSON: {
  "id": "chatcmpl-8fPsbomStEt20CHeQ5xseJ7VB0G08",
  "object": "chat.completion",
  "created": 1704881621,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "handle_savings_account_management",
          "arguments": "{}"
        }
      },
      "logprobs": null,
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 1196,
    "completion_tokens": 10,
    "total_tokens": 1206
  },
  "system_fingerprint": null
}

- 将Function calling结果转化为标签

In [164]:
response["choices"][0]["message"]

<OpenAIObject at 0x19e447872c0> JSON: {
  "role": "assistant",
  "content": null,
  "function_call": {
    "name": "handle_savings_account_management",
    "arguments": "{}"
  }
}

In [165]:
response_message = response["choices"][0]["message"]
response_message

<OpenAIObject at 0x19e447872c0> JSON: {
  "role": "assistant",
  "content": null,
  "function_call": {
    "name": "handle_savings_account_management",
    "arguments": "{}"
  }
}

In [167]:
response_message.get("function_call")

<OpenAIObject at 0x19e4478e4a0> JSON: {
  "name": "handle_savings_account_management",
  "arguments": "{}"
}

In [169]:
response_message["function_call"]["name"]

'handle_savings_account_management'

In [170]:
text

'我的储蓄账户是否可以与支付软件直接绑定？'

In [171]:
train_df

Unnamed: 0,Conversation,type,embedding
0,我的储蓄账户是否可以与支付软件直接绑定？,1,"[-0.008010320365428925, -0.014849383383989334,..."
1,请问如何设置储蓄账户的自动转账功能？,1,"[-0.007989817298948765, -0.025212900713086128,..."
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4,"[-0.0057016052305698395, -0.01597500406205654,..."
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3,"[-0.0032490252051502466, 0.003358007175847888,..."
4,我想知道，申请房贷需要提供哪些资料？,2,"[-0.003918064758181572, 0.012988920323550701, ..."
...,...,...,...
287,请问贵行信用卡支持国外消费吗？有没有额外的手续费？,3,"[0.0027897085528820753, 0.005940127186477184, ..."
288,我想了解贵行的退休理财计划，可以提供一些信息吗？,4,"[-0.002361275488510728, -0.015198089182376862,..."
289,请问我可以将信用卡的账单日改为每月的1号吗？,3,"[-0.02974422089755535, -0.001779056154191494, ..."
290,我想用我的外币储蓄账户进行汇款，可以吗？,5,"[-0.017369315028190613, -0.029290981590747833,..."


In [174]:
type_dict[response_message["function_call"]["name"]]

1

In [176]:
train_df['type'][0]

1

In [177]:
type_dict[response_message["function_call"]["name"]] == train_df['type'][0]

True

- function calling准确率

In [20]:
def function_call_predict(text, model='gpt-3.5-turbo-0613'):
    # 创建message
    messages = [
        {"role": "system", "content": "你是一个智能银行客户接待应用，输入的每个user message都是某位银行客户的需求。\
        你的每一次回答都必须调用function call来完成。请仔细甄别用户需求，并合理调用外部函数来进行回答。"},
        {"role": "user", "content": text}]
    
    # 创建回答
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        functions=functions,
        function_call='auto',  
    )
    response_message = response["choices"][0]["message"]
    
    # 获取分类结果
    res = 0
    if response_message.get("function_call"):
        function_name = response_message["function_call"]["name"]
        res = type_dict[function_name]
    
    return res

In [179]:
function_call_predict(text)

1

In [180]:
train_df.head()

Unnamed: 0,Conversation,type,embedding
0,我的储蓄账户是否可以与支付软件直接绑定？,1,"[-0.008010320365428925, -0.014849383383989334,..."
1,请问如何设置储蓄账户的自动转账功能？,1,"[-0.007989817298948765, -0.025212900713086128,..."
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4,"[-0.0057016052305698395, -0.01597500406205654,..."
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3,"[-0.0032490252051502466, 0.003358007175847888,..."
4,我想知道，申请房贷需要提供哪些资料？,2,"[-0.003918064758181572, 0.012988920323550701, ..."


In [181]:
train_df["function_call_prediction_3.5"] = train_df.Conversation.apply(lambda x: function_call_predict(x, model='gpt-3.5-turbo-0613'))

In [182]:
train_df.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5
0,我的储蓄账户是否可以与支付软件直接绑定？,1,"[-0.008010320365428925, -0.014849383383989334,...",1
1,请问如何设置储蓄账户的自动转账功能？,1,"[-0.007989817298948765, -0.025212900713086128,...",1
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4,"[-0.0057016052305698395, -0.01597500406205654,...",4
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3,"[-0.0032490252051502466, 0.003358007175847888,...",3
4,我想知道，申请房贷需要提供哪些资料？,2,"[-0.003918064758181572, 0.012988920323550701, ...",2


查询训练集上零样本分类准确率：

In [185]:
(train_df["function_call_prediction_3.5"] != train_df["type"]).sum()

27

In [190]:
train_df.shape

(292, 4)

In [192]:
1 - 27/292

0.9075342465753424

查看误判样例：

In [194]:
train_df[train_df["function_call_prediction_3.5"] != train_df["type"]]

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5
10,我账户的利息是怎么计算的？是按月计算还是按季度？,1,"[-0.011851722374558449, -0.004501063842326403,...",0
18,请问贵行的理财产品有哪些安全保障措施？,4,"[0.006038441322743893, -0.01589595153927803, 0...",0
31,我想了解一下在贵行开户后，是否可以获得贵宾理财服务？,1,"[0.0012723366962745786, -0.014203467406332493,...",0
47,能不能在网上直接开户？还是必须去银行亲自办理？,1,"[0.0003207390254829079, -0.016580743715167046,...",0
52,我想查询我的账户余额，可以电话查询吗？,1,"[-0.012707204557955265, -0.014195669442415237,...",0
56,我想知道，你们银行的汇款是否可以实时跟踪？,5,"[-0.02527034841477871, -0.02282915823161602, 0...",0
64,请问贵行国际汇款需要的时间是否会因为汇款金额的大小而不同？,5,"[0.007469383534044027, -0.015061776153743267, ...",0
85,我能否在一个账户里设置多个储蓄目标？,1,"[-0.032669585198163986, -0.03899965435266495, ...",0
95,请问贵行信用卡的逾期还款会影响个人信用记录吗？,3,"[-0.029968010261654854, -0.005994266364723444,...",0
104,我想关闭我的信用卡账户，需要注意什么？,3,"[-0.02176697738468647, -0.006076559890061617, ...",0


也可以采用如下方式进行运行，以便控制进度：

In [187]:
for index, row in enumerate(test_df.itertuples()):
    try:
        # 尝试执行 function_call_predict 函数
        test_df.at[index, "function_call_prediction_3.5"] = function_call_predict(row.Conversation, model='gpt-3.5-turbo-0613')
    except Exception as e:
        # 打印错误信息并等待一分钟
        print(f"Error on row {index}: {e}")
        time.sleep(60)  # 等待一分钟
        continue  # 继续下一次循环

    # 每10行打印一次进度
    if index % 10 == 0:
        print(f"Processed {index}/{len(test_df)} rows")

Processed 0/73 rows
Processed 10/73 rows
Processed 20/73 rows
Processed 30/73 rows
Processed 40/73 rows
Processed 50/73 rows
Processed 60/73 rows
Processed 70/73 rows


In [188]:
test_df.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5
0,我想了解一下贷款担保的具体要求。,2,"[-0.0121180210262537, -0.01918686553835869, 0....",2.0
1,我需要汇款到澳大利亚，贵行有什么特别的要求吗？,5,"[-0.020629247650504112, -0.005797294899821281,...",5.0
2,请问贵行国际汇款有哪些安全保障措施？,5,"[-0.001715182326734066, -0.017442874610424042,...",5.0
3,请问贵行储蓄账户是否有年费或者管理费？,1,"[0.004440059419721365, -0.016759661957621574, ...",1.0
4,我的储蓄账户需要更新个人信息，我该怎么操作？,1,"[-0.01662181131541729, -0.008774831891059875, ...",1.0


In [195]:
(test_df["function_call_prediction_3.5"] != test_df["type"]).sum()

10

In [196]:
test_df.shape

(73, 4)

In [197]:
1 - 10/73

0.863013698630137

查看误判样例：

In [198]:
test_df[test_df["function_call_prediction_3.5"] != test_df["type"]]

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5
5,信用卡的年度报告是怎样的？包含哪些信息？,3,"[-0.027157334610819817, -0.009819122962653637,...",0.0
14,请问贵行的储蓄账户有没有最低存款限制？,1,"[0.004416212439537048, -0.027348997071385384, ...",0.0
18,你们银行的储蓄账户能否支持多币种？,1,"[-0.00860319472849369, -0.0217947605997324, 0....",0.0
22,你们的国际汇款服务是否有语言支持？,5,"[-0.00940465833991766, -0.015292448922991753, ...",0.0
33,请问贵行的储蓄账户是否有年龄限制？,1,"[0.011664615012705326, -0.029595570638775826, ...",0.0
43,请问用美元汇款和用人民币汇款有什么不同？,5,"[0.014912860468029976, -0.010989814065396786, ...",0.0
49,我想更改我的投资组合，应该怎么操作？,4,"[-0.018760407343506813, -0.02338292822241783, ...",0.0
56,请问我可以通过贵行投资国际股指吗？,4,"[-0.00836880598217249, -0.024504825472831726, ...",0.0
65,信用卡的积分有效期是多久？会过期吗？,3,"[-0.00968279130756855, -0.007884838618338108, ...",0.0
70,我想知道，办理贷款是否需要担保人？,2,"[-0.0010422804625704885, -0.002528570592403412...",0.0


- 保存阶段数据

In [199]:
train_df.to_csv('./data/train_dataset_embedding&func.csv', index=False)
test_df.to_csv('./data/test_dataset_embedding&funct.csv', index=False)

### 三、基于机器学习建模进行意图识别

- 数据集准备

In [204]:
train_df["embedding"][0]

[-0.008010320365428925,
 -0.014849383383989334,
 0.014324337244033813,
 -0.03142199665307999,
 -0.05067368969321251,
 0.009289278648793697,
 -0.0231154952198267,
 -0.009713354520499706,
 0.0004162926343269646,
 -0.019211305305361748,
 0.013570424169301987,
 0.01247321255505085,
 0.0051461257971823215,
 0.0033993374090641737,
 0.0030930605717003345,
 0.017649630084633827,
 0.002470409730449319,
 -0.006445278413593769,
 0.010534580796957016,
 -0.004930722061544657,
 -0.010285519994795322,
 0.01697649247944355,
 -0.016653388738632202,
 -0.013415602967143059,
 -0.014580128714442253,
 0.0077141402289271355,
 0.019870977848768234,
 -0.0038671670481562614,
 -0.003449822776019573,
 -0.009020023979246616,
 0.027410103008151054,
 -0.0047960951924324036,
 -0.023357823491096497,
 -0.011611598543822765,
 0.006226509343832731,
 -0.016182193532586098,
 -0.006004374474287033,
 0.007983394898474216,
 0.004621079657226801,
 2.5413519324501976e-05,
 0.0209614597260952,
 0.03812643140554428,
 0.0087305754

In [206]:
embedding_train = pd.DataFrame(train_df['embedding'].tolist())
embedding_train

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1526,1527,1528,1529,1530,1531,1532,1533,1534,1535
0,-0.008010,-0.014849,0.014324,-0.031422,-0.050674,0.009289,-0.023115,-0.009713,0.000416,-0.019211,...,0.015576,-0.029026,0.019359,-0.023492,0.002948,0.004325,-0.008832,-0.011847,-0.010514,-0.006519
1,-0.007990,-0.025213,0.017263,-0.011555,-0.034949,0.022858,-0.007428,-0.004703,-0.014948,-0.011866,...,0.019022,-0.003813,0.039658,-0.014974,-0.001740,0.001890,-0.014776,0.009643,-0.010854,-0.014868
2,-0.005702,-0.015975,-0.003181,-0.028193,-0.026734,0.034814,-0.006142,0.012428,-0.017131,-0.008933,...,0.014057,0.000658,0.004930,-0.031162,-0.013367,-0.003324,0.009124,-0.020402,-0.007672,0.000804
3,-0.003249,0.003358,0.025366,-0.033131,-0.031578,0.023649,-0.022518,-0.012676,-0.027831,-0.021074,...,0.000541,-0.022627,0.013575,-0.017791,-0.025257,0.000835,-0.010973,-0.011852,-0.008521,-0.030896
4,-0.003918,0.012989,0.030074,-0.033911,-0.029063,0.004602,-0.006793,-0.003834,-0.014350,-0.029919,...,0.012503,-0.003114,0.006922,-0.020287,-0.011382,-0.001623,-0.006381,-0.002687,0.006488,-0.013870
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
287,0.002790,0.005940,0.033066,-0.037487,-0.020412,0.024912,-0.009597,-0.002245,-0.021497,-0.015646,...,0.014283,-0.010940,0.018254,-0.021166,-0.019856,0.000857,0.013780,-0.016083,-0.005037,-0.025217
288,-0.002361,-0.015198,0.010311,-0.024046,-0.023052,0.026274,-0.014983,-0.012298,-0.038237,-0.004122,...,0.014003,0.000093,0.016312,-0.021844,-0.027630,-0.004451,-0.009015,-0.014580,0.001970,-0.009834
289,-0.029744,-0.001779,0.006650,-0.033422,-0.021549,0.001303,-0.017457,-0.001461,-0.024734,-0.017964,...,0.016125,-0.019310,0.014739,-0.025053,-0.005934,-0.016325,-0.008762,0.025506,0.002895,-0.045203
290,-0.017369,-0.029291,0.017264,-0.028607,-0.034923,0.007902,-0.031081,-0.029370,-0.009389,-0.026646,...,0.028423,-0.011053,0.027791,-0.017593,-0.012981,0.005559,-0.007902,-0.006217,0.012751,-0.028001


In [207]:
embedding_test = pd.DataFrame(test_df['embedding'].tolist())
embedding_test

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1526,1527,1528,1529,1530,1531,1532,1533,1534,1535
0,-0.012118,-0.019187,0.020237,-0.029406,-0.020803,0.008678,-0.000685,0.006076,-0.026929,-0.020277,...,0.004413,0.010300,0.012859,-0.025138,-0.006163,-0.013162,-0.010374,-0.005652,0.004309,-0.001781
1,-0.020629,-0.005797,0.011595,-0.034614,-0.022018,0.030256,-0.023938,-0.016694,-0.014039,-0.037963,...,0.041122,0.005168,0.014951,-0.018927,0.001380,-0.007183,0.000864,-0.021038,0.012228,0.004514
2,-0.001715,-0.017443,0.020512,-0.021090,-0.016151,0.029717,-0.023567,0.004832,-0.030956,-0.013674,...,0.018129,-0.011938,0.036339,-0.023499,-0.018964,0.002196,0.001141,-0.029556,-0.007376,0.008607
3,0.004440,-0.016760,0.017984,-0.035968,-0.020104,0.018063,-0.017655,0.004450,-0.031755,-0.014482,...,0.004799,-0.011434,0.016088,-0.015311,-0.005618,0.017326,0.011441,-0.009347,0.013560,-0.020275
4,-0.016622,-0.008775,0.019710,-0.044404,-0.053497,0.024986,-0.020519,-0.006674,-0.000168,-0.027252,...,0.004381,-0.020545,0.020665,-0.010047,-0.016635,-0.003153,-0.021526,0.007072,0.004891,-0.014673
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,-0.028588,-0.007617,0.004339,-0.025324,-0.022006,0.015517,-0.023994,-0.003078,-0.014133,-0.005337,...,0.024773,-0.008719,0.016511,-0.005488,-0.000805,-0.001467,0.011265,0.001758,0.009653,-0.031625
69,0.016335,-0.003918,-0.003095,-0.046547,-0.040805,0.016947,-0.003222,-0.018395,-0.016056,-0.024523,...,0.023499,-0.007317,0.014793,-0.024868,-0.011391,-0.004330,-0.015551,0.006210,0.014434,-0.032139
70,-0.001042,-0.002529,0.022422,-0.038812,-0.017930,0.012720,-0.003562,0.018805,-0.010382,-0.027058,...,0.008802,0.001969,0.021169,-0.024551,-0.004323,0.008182,0.000659,-0.015070,0.013660,-0.020059
71,-0.013510,-0.006860,0.010651,-0.022655,-0.034333,0.004457,-0.003511,-0.004055,-0.017031,-0.030008,...,0.011158,-0.014612,0.027494,-0.028007,-0.007117,0.003367,-0.010618,-0.023641,0.013307,-0.019667


In [211]:
train_df['type']

0      1
1      1
2      4
3      3
4      2
      ..
287    3
288    4
289    3
290    5
291    5
Name: type, Length: 292, dtype: int64

In [213]:
X_train = embedding_train
X_test = embedding_test

In [220]:
y_train = train_df['type'].values.ravel()
y_test = test_df['type'].values.ravel()

- 随机森林预测过程

首先尝试创建随机森林进行预测：

In [310]:
from sklearn.ensemble import RandomForestClassifier

In [241]:
# 实例化模型
clf_RF = RandomForestClassifier(n_estimators=100, random_state=22)

In [242]:
# 在训练集上进行训练
clf_RF.fit(X_train, y_train)

In [273]:
# 进行预测
train_preds = clf_RF.predict(X_train)
test_preds = clf_RF.predict(X_test)

In [251]:
# 计算准确率
from sklearn.metrics import accuracy_score
train_accuracy = accuracy_score(y_train, train_preds)
test_accuracy = accuracy_score(y_test, test_preds)
print(f"Train-Accuracy: {train_accuracy}")
print(f"Test-Accuracy: {test_accuracy}")

Train-Accuracy: 1.0
Test-Accuracy: 0.9452054794520548


将预测结果拼接到原始数据集中

In [252]:
train_df['RF_pre'] = train_preds
test_df['RF_pre'] = test_preds

In [254]:
train_df.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre
0,我的储蓄账户是否可以与支付软件直接绑定？,1,"[-0.008010320365428925, -0.014849383383989334,...",1,1
1,请问如何设置储蓄账户的自动转账功能？,1,"[-0.007989817298948765, -0.025212900713086128,...",1,1
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4,"[-0.0057016052305698395, -0.01597500406205654,...",4,4
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3,"[-0.0032490252051502466, 0.003358007175847888,...",3,3
4,我想知道，申请房贷需要提供哪些资料？,2,"[-0.003918064758181572, 0.012988920323550701, ...",2,2


查询错误样本

In [257]:
test_df[test_df['RF_pre'] != test_df['type']]

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre
10,请问贵行有提供海外投资的服务或建议吗？,4,"[-0.0006550041725859046, -0.031585942953825, 0...",4.0,5
15,我想了解贵行信用卡的外汇交易费用。,3,"[-0.019391309469938278, -0.0006310329190455377...",3.0,5
31,请问贵行信用卡支持Apple Pay或者其他手机支付方式吗？,3,"[-0.00976413395255804, 0.0008489831816405058, ...",3.0,5
48,请问我可以设定我的信用卡账单日和还款日吗？,3,"[-0.030712025240063667, -0.011503838934004307,...",3.0,1


- 10：投资理财类需求误以为是国际业务与汇款；
- 15：信用卡服务误以为是国际业务与汇款；
- 31：信用卡服务误以为是国际业务与汇款；
- 48：信用卡业务误以为是储蓄账户开设与管理；

- LGBM预测过程

In [None]:
!pip install lightgbm

In [269]:
# 实例化模型
import lightgbm as lgb
clf_LGBM = lgb.LGBMClassifier(n_estimators=100, learning_rate=0.05, random_state=42)

In [270]:
# 模型训练
clf_LGBM.fit(X_train, y_train)

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.011499 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 150391
[LightGBM] [Info] Number of data points in the train set: 292, number of used features: 1536
[LightGBM] [Info] Start training from score -1.210846
[LightGBM] [Info] Start training from score -1.633703
[LightGBM] [Info] Start training from score -2.013192
[LightGBM] [Info] Start training from score -1.939084
[LightGBM] [Info] Start training from score -1.472061


In [274]:
# 进行预测
train_preds = clf_LGBM.predict(X_train)
test_preds = clf_LGBM.predict(X_test)

In [275]:
from sklearn.metrics import accuracy_score
train_accuracy = accuracy_score(y_train, train_preds)
test_accuracy = accuracy_score(y_test, test_preds)
print(f"Train-Accuracy: {train_accuracy}")
print(f"Test-Accuracy: {test_accuracy}")

Train-Accuracy: 1.0
Test-Accuracy: 0.9863013698630136


查询错误样本

In [252]:
train_df['LGBM_pre'] = train_preds
test_df['LGBM_pre'] = test_preds

In [278]:
test_df[test_df['LGBM_pre'] != test_df['type']]

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre
9,我想查询我上周的国际汇款是否已经到账。,5,"[-0.028131483122706413, -0.01799607276916504, ...",5.0,5,1


- 9：国际业务与汇款误以为是储蓄账户开设与管理；

- XGB预测过程

In [None]:
!pip install xgboost

In [298]:
# 实例化模型
import xgboost as xgb
clf_XGB = xgb.XGBClassifier(n_estimators=100, learning_rate=0.05, random_state=42)

In [299]:
# 模型训练
clf_XGB.fit(X_train, y_train-1)

In [300]:
# 进行预测
train_preds = clf_XGB.predict(X_train) + 1
test_preds = clf_XGB.predict(X_test) + 1

In [301]:
from sklearn.metrics import accuracy_score
train_accuracy = accuracy_score(y_train, train_preds)
test_accuracy = accuracy_score(y_test, test_preds)
print(f"Train-Accuracy: {train_accuracy}")
print(f"Test-Accuracy: {test_accuracy}")

Train-Accuracy: 1.0
Test-Accuracy: 1.0


In [302]:
train_df['XGB_pre'] = train_preds
test_df['XGB_pre'] = test_preds

In [303]:
train_df.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre
0,我的储蓄账户是否可以与支付软件直接绑定？,1,"[-0.008010320365428925, -0.014849383383989334,...",1,1,1,1
1,请问如何设置储蓄账户的自动转账功能？,1,"[-0.007989817298948765, -0.025212900713086128,...",1,1,1,1
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4,"[-0.0057016052305698395, -0.01597500406205654,...",4,4,4,4
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3,"[-0.0032490252051502466, 0.003358007175847888,...",3,3,3,3
4,我想知道，申请房贷需要提供哪些资料？,2,"[-0.003918064758181572, 0.012988920323550701, ...",2,2,2,2


|意图识别方法评分|零样本分类(全数据集准确率)|有监督分类(测试集准确率)|
|:--:|:--:|:--:|
|function_call_3.5|**0.89863**|/|
|机器学习_RF|/|**0.945205**|
|机器学习_LGBM|/|**0.986301**|
|机器学习_XGB|/|**1.0**|

### 四、基于Function calling意图识别的进阶策略

#### 1.基于高性能模型GPT-4 Function calling的意图识别过程

In [23]:
# 读取训练集
train_df = pd.read_csv('./data/train_dataset_final.csv')

# 读取测试集
test_df = pd.read_csv('./data/test_dataset_final.csv')

In [30]:
train_df = train_df[['Conversation', 'type', 'embedding', 'function_call_prediction_3.5', 'RF_pre', 'LGBM_pre', 'XGB_pre']].copy()

In [31]:
test_df = test_df[['Conversation', 'type', 'embedding', 'function_call_prediction_3.5', 'RF_pre', 'LGBM_pre', 'XGB_pre']].copy()

In [32]:
train_df

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre
0,我的储蓄账户是否可以与支付软件直接绑定？,1,"[-0.008010320365428925, -0.014849383383989334,...",1,1,1,1
1,请问如何设置储蓄账户的自动转账功能？,1,"[-0.007989817298948765, -0.025212900713086128,...",1,1,1,1
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4,"[-0.0057016052305698395, -0.01597500406205654,...",4,4,4,4
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3,"[-0.0032490252051502466, 0.003358007175847888,...",3,3,3,3
4,我想知道，申请房贷需要提供哪些资料？,2,"[-0.003918064758181572, 0.012988920323550701, ...",2,2,2,2
...,...,...,...,...,...,...,...
287,请问贵行信用卡支持国外消费吗？有没有额外的手续费？,3,"[0.0027897085528820753, 0.005940127186477184, ...",3,3,3,3
288,我想了解贵行的退休理财计划，可以提供一些信息吗？,4,"[-0.002361275488510728, -0.015198089182376862,...",4,4,4,4
289,请问我可以将信用卡的账单日改为每月的1号吗？,3,"[-0.02974422089755535, -0.001779056154191494, ...",3,3,3,3
290,我想用我的外币储蓄账户进行汇款，可以吗？,5,"[-0.017369315028190613, -0.029290981590747833,...",5,5,5,5


In [33]:
import time
def process_dataset(dataset, 
                    model_name='gpt-4-0613', 
                    text_col_name='Conversation', 
                    prediction_col_name='function_call_prediction_4'):
    """
    对给定的数据集应用function_call_predict进行意图识别。

    :param dataset: DataFrame, 包含需要处理的数据
    :param model_name: str, 使用的模型名称
    :param text_col_name: str, 输入function calling的文本列名称
    :param prediction_col_name: str, 输出意图判别结果的列名称
    :return: 修改后的DataFrame
    """
    for index in range(len(dataset)):
        success = False
        while not success:
            try:
                # 尝试执行 function_call_predict 函数
                result = function_call_predict(dataset.at[index, text_col_name], model=model_name)
                dataset.at[index, prediction_col_name] = result
                success = True  # 如果执行成功，跳出循环
            except Exception as e:
                # 打印错误信息并等待一分钟
                print(f"Error on row {index}: {e}")
                time.sleep(60)  # 等待一分钟后再次尝试

        # 每10行打印一次进度
        if index % 10 == 0:
            print(f"Processed {index}/{len(dataset)} rows")

    return dataset

In [34]:
train_df = process_dataset(dataset=train_df)

Processed 0/292 rows
Processed 10/292 rows
Processed 20/292 rows
Processed 30/292 rows
Processed 40/292 rows
Processed 50/292 rows
Processed 60/292 rows
Processed 70/292 rows
Processed 80/292 rows
Processed 90/292 rows
Error on row 95: The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 4c322acf8018c71acf31413a1d29d2c9 in your email.) {
  "error": {
    "message": "The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 4c322acf8018c71acf31413a1d29d2c9 in your email.)",
    "type": "server_error",
    "param": null,
    "code": null
  }
}
 500 {'error': {'message': 'The server had an error processing your request. Sorry about that! You can retry your request, or contact u

In [35]:
train_df.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4
0,我的储蓄账户是否可以与支付软件直接绑定？,1,"[-0.008010320365428925, -0.014849383383989334,...",1,1,1,1,1.0
1,请问如何设置储蓄账户的自动转账功能？,1,"[-0.007989817298948765, -0.025212900713086128,...",1,1,1,1,1.0
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4,"[-0.0057016052305698395, -0.01597500406205654,...",4,4,4,4,4.0
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3,"[-0.0032490252051502466, 0.003358007175847888,...",3,3,3,3,3.0
4,我想知道，申请房贷需要提供哪些资料？,2,"[-0.003918064758181572, 0.012988920323550701, ...",2,2,2,2,2.0


In [36]:
test_df = process_dataset(dataset=test_df)

Processed 0/73 rows
Processed 10/73 rows
Processed 20/73 rows
Processed 30/73 rows
Processed 40/73 rows
Processed 50/73 rows
Processed 60/73 rows
Processed 70/73 rows


In [37]:
test_df.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4
0,我想了解一下贷款担保的具体要求。,2,"[-0.0121180210262537, -0.01918686553835869, 0....",2.0,2,2,2,2.0
1,我需要汇款到澳大利亚，贵行有什么特别的要求吗？,5,"[-0.020629247650504112, -0.005797294899821281,...",5.0,5,5,5,5.0
2,请问贵行国际汇款有哪些安全保障措施？,5,"[-0.001715182326734066, -0.017442874610424042,...",5.0,5,5,5,5.0
3,请问贵行储蓄账户是否有年费或者管理费？,1,"[0.004440059419721365, -0.016759661957621574, ...",1.0,1,1,1,1.0
4,我的储蓄账户需要更新个人信息，我该怎么操作？,1,"[-0.01662181131541729, -0.008774831891059875, ...",1.0,1,1,1,1.0


查询训练集上零样本分类准确率：

In [38]:
(train_df["function_call_prediction_4"] != train_df["type"]).sum()

5

In [39]:
train_df.shape

(292, 8)

In [40]:
1 - 5/292

0.9828767123287672

相比GPT-3.5误判了27条样本，GPT-4的零样本意图识别准确率明显提升。

查看误判样例：

In [41]:
train_df[train_df["function_call_prediction_4"] != train_df["type"]]

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4
24,我对贵行提供的伊斯兰金融服务感兴趣，可以提供一些信息吗？,4,"[-0.017161114141345024, -0.019475867971777916,...",4,4,4,4,0.0
182,请问使用信用卡在国外消费，汇率是怎么计算的？,3,"[0.010909482836723328, -0.007314762100577354, ...",5,3,3,3,5.0
183,我想了解一下贵行的跨境汇款储蓄账户。,1,"[-0.007215231657028198, -0.013142028823494911,...",5,1,1,1,5.0
184,你们银行的利率是怎样的？比其他银行高吗？,1,"[-0.0010289129568263888, -0.008435490541160107...",0,1,1,1,0.0
264,我丢失了我的银行卡，需要怎样办理挂失和重新申请？,1,"[-0.00944333802908659, 0.00042509808554314077,...",3,1,1,1,3.0


查看测试集零样本预测结果

In [42]:
(test_df["function_call_prediction_4"] != test_df["type"]).sum()

1

In [43]:
test_df.shape

(73, 8)

In [44]:
1 - 1/73

0.9863013698630136

同样， 相比GPT-3.5在测试集上误判了10条样本，GPT-4的零样本意图识别准确率明显提升。

查看误判样例：

In [45]:
test_df[test_df["function_call_prediction_4"] != test_df["type"]]

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4
50,我想了解贵行的遗产规划和信托服务。,4,"[-0.013152499683201313, -0.018206914886832237,...",4.0,4,4,4,0.0


In [110]:
# GPT-4 function call在全样本下意图识别准确率
1- (5+1) / (292+73)

0.9835616438356164

|意图识别方法评分|零样本分类(全数据集准确率)|有监督分类(测试集准确率)|
|:--:|:--:|:--:|
|function_call_3.5|**0.89863**|/|
|机器学习_RF|/|**0.945205**|
|机器学习_LGBM|/|**0.986301**|
|机器学习_XGB|/|**1.0**|
|function_call_4|**0.98356**|/|

#### 2.借助误判训练集样本标签进行Few-shot提示

In [50]:
train_temp_3 = train_df[train_df["function_call_prediction_3.5"] != train_df["type"]][['Conversation', 'type', 'function_call_prediction_3.5']]

In [51]:
train_temp_3

Unnamed: 0,Conversation,type,function_call_prediction_3.5
10,我账户的利息是怎么计算的？是按月计算还是按季度？,1,0
18,请问贵行的理财产品有哪些安全保障措施？,4,0
31,我想了解一下在贵行开户后，是否可以获得贵宾理财服务？,1,0
47,能不能在网上直接开户？还是必须去银行亲自办理？,1,0
52,我想查询我的账户余额，可以电话查询吗？,1,0
56,我想知道，你们银行的汇款是否可以实时跟踪？,5,0
64,请问贵行国际汇款需要的时间是否会因为汇款金额的大小而不同？,5,0
85,我能否在一个账户里设置多个储蓄目标？,1,0
95,请问贵行信用卡的逾期还款会影响个人信用记录吗？,3,0
104,我想关闭我的信用卡账户，需要注意什么？,3,0


- 基于误判样本创建Few-shot

In [52]:
system_message = [{"role": "system", "content": "你是一个智能银行客户接待应用，输入的每个user message都是某位银行客户的需求。\
你的每一次回答都必须调用function call来完成。请仔细甄别用户需求，并合理调用外部函数来进行回答。"}]

In [53]:
few_shot_messages = []

In [57]:
type_dict

{'handle_savings_account_management': 1,
 'handle_loan_services': 2,
 'handle_credit_card_services': 3,
 'handle_investment_advisory': 4,
 'handle_international_transactions': 5}

In [61]:
def get_key_by_value(dict, value):
    for key, val in dict.items():
        if val == value:
            return key
    return None

In [67]:
key = get_key_by_value(type_dict, 3)
print(key)  

handle_credit_card_services


In [165]:
# function calling response_message格式
response_message

<OpenAIObject at 0x19e447872c0> JSON: {
  "role": "assistant",
  "content": null,
  "function_call": {
    "name": "handle_savings_account_management",
    "arguments": "{}"
  }
}

In [65]:
for index, row in train_temp_3.iterrows():
    text = row['Conversation']
    intention_category = row['type']
    function_name = get_key_by_value(type_dict, intention_category)
    
    assistant_message = {
        "role": "assistant",
        "content": None,
        "function_call": {
            "name": function_name,
            "arguments": "{}"
        }
    }
    
    few_shot_messages.append({"role": "user", "content": text})
    few_shot_messages.append(assistant_message)

In [66]:
few_shot_messages

[{'role': 'user', 'content': '我账户的利息是怎么计算的？是按月计算还是按季度？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}},
 {'role': 'user', 'content': '请问贵行的理财产品有哪些安全保障措施？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_investment_advisory', 'arguments': '{}'}},
 {'role': 'user', 'content': '我想了解一下在贵行开户后，是否可以获得贵宾理财服务？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}},
 {'role': 'user', 'content': '能不能在网上直接开户？还是必须去银行亲自办理？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}},
 {'role': 'user', 'content': '我想查询我的账户余额，可以电话查询吗？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}},
 {'role': 'user', 'content': '我想知道，你们银行的汇款是否可以实时跟踪？'},
 {'role': 'assistan

In [69]:
messages = system_message + few_shot_messages
messages

[{'role': 'system',
  'content': '你是一个智能银行客户接待应用，输入的每个user message都是某位银行客户的需求。你的每一次回答都必须调用function call来完成。请仔细甄别用户需求，并合理调用外部函数来进行回答。'},
 {'role': 'user', 'content': '我账户的利息是怎么计算的？是按月计算还是按季度？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}},
 {'role': 'user', 'content': '请问贵行的理财产品有哪些安全保障措施？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_investment_advisory', 'arguments': '{}'}},
 {'role': 'user', 'content': '我想了解一下在贵行开户后，是否可以获得贵宾理财服务？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}},
 {'role': 'user', 'content': '能不能在网上直接开户？还是必须去银行亲自办理？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}},
 {'role': 'user', 'content': '我想查询我的账户余额，可以电话查询吗？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name':

In [68]:
def function_call_predict(messages, model='gpt-3.5-turbo-0613'):
    
    # 创建回答
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        functions=functions,
        function_call='auto',  
    )
    response_message = response["choices"][0]["message"]
    
    # 获取分类结果
    res = 0
    if response_message.get("function_call"):
        function_name = response_message["function_call"]["name"]
        res = type_dict[function_name]
    
    return res

In [75]:
def process_dataset(dataset, 
                    messages=None,
                    model_name='gpt-4-0613', 
                    text_col_name='Conversation', 
                    prediction_col_name='function_call_prediction_4'):
    """
    对给定的数据集应用function_call_predict进行意图识别。

    :param dataset: DataFrame, 包含需要处理的数据
    :param model_name: dict, Few-shot-messages，默认为None，表示不带入任何系统消息和提示
    :param model_name: str, 使用的模型名称
    :param text_col_name: str, 输入function calling的文本列名称
    :param prediction_col_name: str, 输出意图判别结果的列名称
    :return: 修改后的DataFrame
    """
    if messages == None:
        input_messages = []
    else:
        input_messages = messages.copy()
        
    for index in range(len(dataset)):
        success = False
        while not success:
            try:
                # 尝试执行 function_call_predict 函数
                text = dataset.at[index, text_col_name]
                input_messages.append({"role": "user", "content": text})
                result = function_call_predict(input_messages, model=model_name)
                dataset.at[index, prediction_col_name] = result
                success = True  # 如果执行成功，跳出循环
                input_messages = messages.copy()
            except Exception as e:
                # 打印错误信息并等待一分钟
                print(f"Error on row {index}: {e}")
                time.sleep(60)  # 等待一分钟后再次尝试

        # 每10行打印一次进度
        if index % 10 == 0:
            print(f"Processed {index}/{len(dataset)} rows")

    return dataset

In [78]:
test_df = process_dataset(dataset=test_df, 
                          messages=messages,
                          model_name='gpt-3.5-turbo-0613', 
                          prediction_col_name='function_call_prediction_3.5_new')

Processed 0/73 rows
Processed 10/73 rows
Processed 20/73 rows
Processed 30/73 rows
Processed 40/73 rows
Processed 50/73 rows
Processed 60/73 rows
Processed 70/73 rows


In [79]:
test_df

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4,function_call_prediction_3.5_new
0,我想了解一下贷款担保的具体要求。,2,"[-0.0121180210262537, -0.01918686553835869, 0....",2.0,2,2,2,2.0,2.0
1,我需要汇款到澳大利亚，贵行有什么特别的要求吗？,5,"[-0.020629247650504112, -0.005797294899821281,...",5.0,5,5,5,5.0,5.0
2,请问贵行国际汇款有哪些安全保障措施？,5,"[-0.001715182326734066, -0.017442874610424042,...",5.0,5,5,5,5.0,5.0
3,请问贵行储蓄账户是否有年费或者管理费？,1,"[0.004440059419721365, -0.016759661957621574, ...",1.0,1,1,1,1.0,1.0
4,我的储蓄账户需要更新个人信息，我该怎么操作？,1,"[-0.01662181131541729, -0.008774831891059875, ...",1.0,1,1,1,1.0,1.0
...,...,...,...,...,...,...,...,...,...
68,我有一笔汇款需要紧急发送到加拿大，你们可以加急处理吗？,5,"[-0.02858843468129635, -0.007617312949150801, ...",5.0,5,5,5,5.0,5.0
69,我想开一个小孩的储蓄账户，需要父母的信息吗？,1,"[0.01633516512811184, -0.003917648456990719, -...",1.0,1,1,1,1.0,1.0
70,我想知道，办理贷款是否需要担保人？,2,"[-0.0010422804625704885, -0.002528570592403412...",0.0,2,2,2,2.0,2.0
71,我是留学生，想了解贵行的留学生汇款服务。,5,"[-0.013510254211723804, -0.0068598841316998005...",5.0,5,5,5,5.0,5.0


查看测试集零样本预测结果

In [81]:
(test_df["function_call_prediction_3.5_new"] != test_df["type"]).sum()

1

In [82]:
test_df.shape

(73, 9)

In [83]:
1 - 1/73

0.9863013698630136

能够看出，相比GPT-3.5在测试集上误判了10条样本，经过训练集的错误样本提示修正之后，准确率有了明显提升。

查看误判样例：

In [84]:
test_df[test_df["function_call_prediction_3.5_new"] != test_df["type"]]

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4,function_call_prediction_3.5_new
63,我刚从国外回来，想开一个外币储蓄账户。,1,"[-0.008502471260726452, -0.01901247166097164, ...",1.0,1,1,1,1.0,5.0


|意图识别方法评分|零样本分类(全数据集准确率)|有监督分类(测试集准确率)|
|:--:|:--:|:--:|
|function_call_3.5|**0.89863**|/|
|机器学习_RF|/|**0.945205**|
|机器学习_LGBM|/|**0.986301**|
|机器学习_XGB|/|**1.0**|
|function_call_4|**0.98356**|/|
|Few-shot-GPT3.5|/|**0.986301**|

- GPT-4在Few-shot提示下有监督意图分类过程

In [85]:
train_temp_4 = train_df[train_df["function_call_prediction_4"] != train_df["type"]][['Conversation', 'type', 'function_call_prediction_4']]

In [101]:
train_temp_4

Unnamed: 0,Conversation,type,function_call_prediction_4
24,我对贵行提供的伊斯兰金融服务感兴趣，可以提供一些信息吗？,4,0.0
182,请问使用信用卡在国外消费，汇率是怎么计算的？,3,5.0
183,我想了解一下贵行的跨境汇款储蓄账户。,1,5.0
184,你们银行的利率是怎样的？比其他银行高吗？,1,0.0
264,我丢失了我的银行卡，需要怎样办理挂失和重新申请？,1,3.0


In [87]:
system_message = [{"role": "system", "content": "你是一个智能银行客户接待应用，输入的每个user message都是某位银行客户的需求。\
你的每一次回答都必须调用function call来完成。请仔细甄别用户需求，并合理调用外部函数来进行回答。"}]

In [88]:
few_shot_messages = []

In [89]:
type_dict

{'handle_savings_account_management': 1,
 'handle_loan_services': 2,
 'handle_credit_card_services': 3,
 'handle_investment_advisory': 4,
 'handle_international_transactions': 5}

In [90]:
for index, row in train_temp_4.iterrows():
    text = row['Conversation']
    intention_category = row['type']
    function_name = get_key_by_value(type_dict, intention_category)
    
    assistant_message = {
        "role": "assistant",
        "content": None,
        "function_call": {
            "name": function_name,
            "arguments": "{}"
        }
    }
    
    few_shot_messages.append({"role": "user", "content": text})
    few_shot_messages.append(assistant_message)

In [91]:
few_shot_messages

[{'role': 'user', 'content': '我对贵行提供的伊斯兰金融服务感兴趣，可以提供一些信息吗？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_investment_advisory', 'arguments': '{}'}},
 {'role': 'user', 'content': '请问使用信用卡在国外消费，汇率是怎么计算的？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_credit_card_services', 'arguments': '{}'}},
 {'role': 'user', 'content': '我想了解一下贵行的跨境汇款储蓄账户。'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}},
 {'role': 'user', 'content': '你们银行的利率是怎样的？比其他银行高吗？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}},
 {'role': 'user', 'content': '我丢失了我的银行卡，需要怎样办理挂失和重新申请？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}}]

In [92]:
messages = system_message + few_shot_messages
messages

[{'role': 'system',
  'content': '你是一个智能银行客户接待应用，输入的每个user message都是某位银行客户的需求。你的每一次回答都必须调用function call来完成。请仔细甄别用户需求，并合理调用外部函数来进行回答。'},
 {'role': 'user', 'content': '我对贵行提供的伊斯兰金融服务感兴趣，可以提供一些信息吗？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_investment_advisory', 'arguments': '{}'}},
 {'role': 'user', 'content': '请问使用信用卡在国外消费，汇率是怎么计算的？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_credit_card_services', 'arguments': '{}'}},
 {'role': 'user', 'content': '我想了解一下贵行的跨境汇款储蓄账户。'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}},
 {'role': 'user', 'content': '你们银行的利率是怎样的？比其他银行高吗？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle_savings_account_management',
   'arguments': '{}'}},
 {'role': 'user', 'content': '我丢失了我的银行卡，需要怎样办理挂失和重新申请？'},
 {'role': 'assistant',
  'content': None,
  'function_call': {'name': 'handle

In [102]:
test_df = process_dataset(dataset=test_df, 
                          messages=messages,
                          model_name='gpt-4-0613', 
                          prediction_col_name='function_call_prediction_4_new')

Error on row 0: The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 247bf770b9835868a3ccac470df46603 in your email.) {
  "error": {
    "message": "The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 247bf770b9835868a3ccac470df46603 in your email.)",
    "type": "server_error",
    "param": null,
    "code": null
  }
}
 500 {'error': {'message': 'The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 247bf770b9835868a3ccac470df46603 in your email.)', 'type': 'server_error', 'param': None, 'code': None}} {'Date':

In [103]:
test_df

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4,function_call_prediction_3.5_new,function_call_prediction_4_new
0,我想了解一下贷款担保的具体要求。,2,"[-0.0121180210262537, -0.01918686553835869, 0....",2.0,2,2,2,2.0,2.0,2.0
1,我需要汇款到澳大利亚，贵行有什么特别的要求吗？,5,"[-0.020629247650504112, -0.005797294899821281,...",5.0,5,5,5,5.0,5.0,5.0
2,请问贵行国际汇款有哪些安全保障措施？,5,"[-0.001715182326734066, -0.017442874610424042,...",5.0,5,5,5,5.0,5.0,5.0
3,请问贵行储蓄账户是否有年费或者管理费？,1,"[0.004440059419721365, -0.016759661957621574, ...",1.0,1,1,1,1.0,1.0,1.0
4,我的储蓄账户需要更新个人信息，我该怎么操作？,1,"[-0.01662181131541729, -0.008774831891059875, ...",1.0,1,1,1,1.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...
68,我有一笔汇款需要紧急发送到加拿大，你们可以加急处理吗？,5,"[-0.02858843468129635, -0.007617312949150801, ...",5.0,5,5,5,5.0,5.0,5.0
69,我想开一个小孩的储蓄账户，需要父母的信息吗？,1,"[0.01633516512811184, -0.003917648456990719, -...",1.0,1,1,1,1.0,1.0,1.0
70,我想知道，办理贷款是否需要担保人？,2,"[-0.0010422804625704885, -0.002528570592403412...",0.0,2,2,2,2.0,2.0,2.0
71,我是留学生，想了解贵行的留学生汇款服务。,5,"[-0.013510254211723804, -0.0068598841316998005...",5.0,5,5,5,5.0,5.0,5.0


查看测试集零样本预测结果

In [104]:
(test_df["function_call_prediction_4_new"] != test_df["type"]).sum()

1

In [105]:
test_df.shape

(73, 10)

In [106]:
1 - 1/73

0.9863013698630136

查看误判样例：

In [107]:
test_df[test_df["function_call_prediction_4_new"] != test_df["type"]]

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4,function_call_prediction_3.5_new,function_call_prediction_4_new
63,我刚从国外回来，想开一个外币储蓄账户。,1,"[-0.008502471260726452, -0.01901247166097164, ...",1.0,1,1,1,1.0,5.0,5.0


|意图识别方法评分|零样本分类(全数据集准确率)|有监督分类(测试集准确率)|
|:--:|:--:|:--:|
|function_call_3.5|**0.89863**|/|
|机器学习_RF|/|**0.945205**|
|机器学习_LGBM|/|**0.986301**|
|机器学习_XGB|/|**1.0**|
|function_call_4|**0.98356**|/|
|Few-shot-GPT3.5|/|**0.986301**|
|Few-shot-GPT4|/|**0.986301**|

### 五、基于文本搜索的有监督意图识别策略

In [323]:
from sklearn.metrics.pairwise import cosine_similarity

# train_embeddings和test_embeddings分别是训练集和测试集中文本的嵌入向量
train_embeddings = np.stack(train_df["embedding"].values)
test_embeddings = np.stack(test_df["embedding"].values)

In [345]:
# 训练集中计算训练集集与训练集之间的余弦相似度
train_cos_sim_matrix = cosine_similarity(train_embeddings, train_embeddings)

# 计算训练集中彼此最相似的三个文本
train_most_similar_indices = np.argpartition(-(train_cos_sim_matrix - np.eye(train_cos_sim_matrix.shape[0])), 3, axis=1)[:, :3]

In [346]:
train_most_similar_indices

array([[187, 115, 134],
       [ 25, 204,  34],
       [121, 161,  54],
       [287, 216, 214],
       [ 46,  53, 147],
       [206, 181,  18],
       [200, 275,  61],
       [ 30,  71,  14],
       [263,  96,  27],
       [114, 191,  55],
       [221, 140, 156],
       [ 70, 121, 125],
       [105, 169,  65],
       [  1, 204,  34],
       [291, 193, 183],
       [ 29, 246,  88],
       [ 24, 275,  70],
       [227, 202,  54],
       [103,   5, 165],
       [119, 166, 231],
       [183,  96, 228],
       [191, 253,   4],
       [ 81,  74,  40],
       [ 40,  61, 275],
       [ 62,  84, 288],
       [223,   1, 204],
       [ 84, 265,  16],
       [248,   8, 263],
       [126,  25, 146],
       [233, 246,  15],
       [244, 248,   7],
       [227, 288,  50],
       [244, 175,  50],
       [126, 243, 104],
       [204, 241, 284],
       [232, 194, 188],
       [115, 278, 212],
       [285,  87,  68],
       [ 64,  75, 139],
       [ 68, 200,  37],
       [ 23, 251,  22],
       [100,  58

In [348]:
train_most_similar_indices_df = pd.DataFrame(train_most_similar_indices, columns=['sim_1', 'sim_2', 'sim_3'])
train_most_similar_indices_df.head()

Unnamed: 0,sim_1,sim_2,sim_3
0,187,115,134
1,25,204,34
2,121,161,54
3,287,216,214
4,46,53,147


In [349]:
train_df_final = pd.concat([train_df, train_most_similar_indices_df], axis=1)
train_df_final.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4,sim_1,sim_2,sim_3
0,我的储蓄账户是否可以与支付软件直接绑定？,1,"[-0.008010320365428925, -0.014849383383989334,...",1,1,1,1,1.0,187,115,134
1,请问如何设置储蓄账户的自动转账功能？,1,"[-0.007989817298948765, -0.025212900713086128,...",1,1,1,1,1.0,25,204,34
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4,"[-0.0057016052305698395, -0.01597500406205654,...",4,4,4,4,4.0,121,161,54
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3,"[-0.0032490252051502466, 0.003358007175847888,...",3,3,3,3,3.0,287,216,214
4,我想知道，申请房贷需要提供哪些资料？,2,"[-0.003918064758181572, 0.012988920323550701, ...",2,2,2,2,2.0,46,53,147


相似度查找结果验证：

In [342]:
train_df_final.Conversation[0]

'我的储蓄账户是否可以与支付软件直接绑定？'

In [353]:
train_df_final.type[0]

1

In [343]:
train_df_final.Conversation[187]

'我想知道我的储蓄账户是否可以绑定多张银行卡？'

In [352]:
train_df_final.type[187]

1

In [350]:
train_df_final.Conversation[115]

'请问我可以用手机银行管理我的储蓄账户吗？'

In [354]:
train_df_final.type[115]

1

In [351]:
train_df_final.Conversation[134]

'请问储蓄账户能绑定多张银行卡吗？'

In [355]:
train_df_final.type[134]

1

In [357]:
train_df_final['sim_1_target'] = train_df_final.sim_1.apply(lambda x: train_df_final.type[x])
train_df_final['sim_2_target'] = train_df_final.sim_2.apply(lambda x: train_df_final.type[x])
train_df_final['sim_3_target'] = train_df_final.sim_3.apply(lambda x: train_df_final.type[x])

In [358]:
train_df_final.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4,sim_1,sim_2,sim_3,sim_1_target,sim_2_target,sim_3_target
0,我的储蓄账户是否可以与支付软件直接绑定？,1,"[-0.008010320365428925, -0.014849383383989334,...",1,1,1,1,1.0,187,115,134,1,1,1
1,请问如何设置储蓄账户的自动转账功能？,1,"[-0.007989817298948765, -0.025212900713086128,...",1,1,1,1,1.0,25,204,34,1,1,1
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4,"[-0.0057016052305698395, -0.01597500406205654,...",4,4,4,4,4.0,121,161,54,4,4,4
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3,"[-0.0032490252051502466, 0.003358007175847888,...",3,3,3,3,3.0,287,216,214,3,3,3
4,我想知道，申请房贷需要提供哪些资料？,2,"[-0.003918064758181572, 0.012988920323550701, ...",2,2,2,2,2.0,46,53,147,2,2,2


验证零样本文本匹配准确率

In [359]:
(train_df_final["sim_1_target"] != train_df_final["type"]).sum()

41

In [313]:
train_df.shape

(292, 8)

In [360]:
1 - 41/292

0.8595890410958904

测试集和训练集相互匹配：

In [362]:
# 计算测试集与训练集之间的余弦相似度
test_cos_sim_matrix = cosine_similarity(test_embeddings, train_embeddings)

# 为测试集中的每个文本找到与训练集中最相似的三个文本
test_most_similar_indices = np.argpartition(-test_cos_sim_matrix, 3, axis=1)[:, :3]

In [363]:
test_most_similar_indices

array([[253, 147, 122],
       [118, 259, 135],
       [ 18, 119, 166],
       [194,   5, 181],
       [ 58,  28, 124],
       [ 98, 232, 194],
       [ 34, 284, 203],
       [ 99, 199,  89],
       [  4,  46,  21],
       [179,  48,  65],
       [ 54,  99,   2],
       [166, 263,  96],
       [  3,  93, 216],
       [248,  55, 107],
       [  5,  39, 181],
       [ 50,  30,  14],
       [124, 108,  86],
       [160, 227,  84],
       [158, 141, 134],
       [157, 130,  96],
       [171, 282,  77],
       [136, 137, 119],
       [158, 231,  19],
       [ 57, 282, 218],
       [ 82,  19, 231],
       [226, 213, 122],
       [204,  13,   1],
       [ 24, 234,  63],
       [270, 112, 247],
       [  4, 253, 147],
       [129,  71, 171],
       [ 99, 287,  57],
       [284,  81,  34],
       [  5, 144, 237],
       [253,  21, 191],
       [112, 282, 111],
       [ 39, 245, 221],
       [  6, 244, 245],
       [147,  71,   4],
       [ 80, 253, 154],
       [157,  64, 130],
       [ 96, 183

In [364]:
test_most_similar_indices_df = pd.DataFrame(test_most_similar_indices, columns=['sim_1', 'sim_2', 'sim_3'])
test_most_similar_indices_df.head()

Unnamed: 0,sim_1,sim_2,sim_3
0,253,147,122
1,118,259,135
2,18,119,166
3,194,5,181
4,58,28,124


In [365]:
test_df_final = pd.concat([test_df, test_most_similar_indices_df], axis=1)
test_df_final.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4,sim_1,sim_2,sim_3
0,我想了解一下贷款担保的具体要求。,2,"[-0.0121180210262537, -0.01918686553835869, 0....",2.0,2,2,2,2.0,253,147,122
1,我需要汇款到澳大利亚，贵行有什么特别的要求吗？,5,"[-0.020629247650504112, -0.005797294899821281,...",5.0,5,5,5,5.0,118,259,135
2,请问贵行国际汇款有哪些安全保障措施？,5,"[-0.001715182326734066, -0.017442874610424042,...",5.0,5,5,5,5.0,18,119,166
3,请问贵行储蓄账户是否有年费或者管理费？,1,"[0.004440059419721365, -0.016759661957621574, ...",1.0,1,1,1,1.0,194,5,181
4,我的储蓄账户需要更新个人信息，我该怎么操作？,1,"[-0.01662181131541729, -0.008774831891059875, ...",1.0,1,1,1,1.0,58,28,124


相似度查找结果验证：

In [366]:
test_df_final.Conversation[0]

'我想了解一下贷款担保的具体要求。'

In [367]:
test_df_final.type[0]

2

In [368]:
train_df_final.Conversation[253]

'我想了解一下企业贷款的申请条件。'

In [369]:
train_df_final.type[253]

2

In [370]:
train_df_final.Conversation[147]

'我想了解一下，办理房屋抵押贷款的流程是怎样的？'

In [371]:
train_df_final.type[147]

2

In [372]:
train_df_final.Conversation[122]

'我想了解一下关于海外留学贷款的细节。'

In [373]:
train_df_final.type[122]

2

In [375]:
test_df_final['sim_1_target'] = test_df_final.sim_1.apply(lambda x: train_df_final.type[x])
test_df_final['sim_2_target'] = test_df_final.sim_2.apply(lambda x: train_df_final.type[x])
test_df_final['sim_3_target'] = test_df_final.sim_3.apply(lambda x: train_df_final.type[x])

In [376]:
test_df_final.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4,sim_1,sim_2,sim_3,sim_1_target,sim_2_target,sim_3_target
0,我想了解一下贷款担保的具体要求。,2,"[-0.0121180210262537, -0.01918686553835869, 0....",2.0,2,2,2,2.0,253,147,122,2,2,2
1,我需要汇款到澳大利亚，贵行有什么特别的要求吗？,5,"[-0.020629247650504112, -0.005797294899821281,...",5.0,5,5,5,5.0,118,259,135,5,5,5
2,请问贵行国际汇款有哪些安全保障措施？,5,"[-0.001715182326734066, -0.017442874610424042,...",5.0,5,5,5,5.0,18,119,166,4,5,5
3,请问贵行储蓄账户是否有年费或者管理费？,1,"[0.004440059419721365, -0.016759661957621574, ...",1.0,1,1,1,1.0,194,5,181,1,1,1
4,我的储蓄账户需要更新个人信息，我该怎么操作？,1,"[-0.01662181131541729, -0.008774831891059875, ...",1.0,1,1,1,1.0,58,28,124,1,1,3


验证准确率

In [377]:
(test_df_final["sim_1_target"] != test_df_final["type"]).sum()

15

In [378]:
test_df.shape

(73, 8)

In [379]:
1 - 15/73

0.7945205479452055

|意图识别方法评分|零样本分类(全数据集准确率)|有监督分类(测试集准确率)|
|:--:|:--:|:--:|
|function_call_3.5|**0.89863**|/|
|机器学习_RF|/|**0.945205**|
|机器学习_LGBM|/|**0.986301**|
|机器学习_XGB|/|**1.0**|
|function_call_4|**0.98356**|/|
|Few-shot-GPT3.5|/|**0.986301**|
|Few-shot-GPT4|/|**0.986301**|
|文本匹配检索|/|**0.794520**|

### 六、借助模型融合策略提升意图识别准确率

In [380]:
cols = ['function_call_prediction_3.5', 'function_call_prediction_4', 'RF_pre', 'LGBM_pre', 'XGB_pre', 
        'sim_1_target', 'sim_2_target', 'sim_3_target']

In [381]:
# 使用apply方法按行计算众数
mode_series = train_df_final[cols].apply(lambda x: x.mode().iloc[0], axis=1)

In [384]:
mode_series[:10]

0    1.0
1    1.0
2    4.0
3    3.0
4    2.0
5    1.0
6    1.0
7    3.0
8    5.0
9    2.0
dtype: float64

In [385]:
# 创建一个新列来存储每一行的众数
train_df_final['Mode_preds'] = mode_series

In [386]:
train_df_final.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4,sim_1,sim_2,sim_3,sim_1_target,sim_2_target,sim_3_target,Mode_preds
0,我的储蓄账户是否可以与支付软件直接绑定？,1,"[-0.008010320365428925, -0.014849383383989334,...",1,1,1,1,1.0,187,115,134,1,1,1,1.0
1,请问如何设置储蓄账户的自动转账功能？,1,"[-0.007989817298948765, -0.025212900713086128,...",1,1,1,1,1.0,25,204,34,1,1,1,1.0
2,请问贵行有提供关于投资组合管理的咨询服务吗？,4,"[-0.0057016052305698395, -0.01597500406205654,...",4,4,4,4,4.0,121,161,54,4,4,4,4.0
3,请问贵行的信用卡可以申请临时提升额度吗？比如旅行时。,3,"[-0.0032490252051502466, 0.003358007175847888,...",3,3,3,3,3.0,287,216,214,3,3,3,3.0
4,我想知道，申请房贷需要提供哪些资料？,2,"[-0.003918064758181572, 0.012988920323550701, ...",2,2,2,2,2.0,46,53,147,2,2,2,2.0


In [388]:
(train_df_final["Mode_preds"] != train_df_final["type"]).sum()

2

In [389]:
train_df_final[train_df_final["Mode_preds"] != train_df_final["type"]]

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4,sim_1,sim_2,sim_3,sim_1_target,sim_2_target,sim_3_target,Mode_preds
184,你们银行的利率是怎样的？比其他银行高吗？,1,"[-0.0010289129568263888, -0.008435490541160107...",0,1,1,1,2.0,209,174,94,2,2,2,2.0
199,请问贵行的外汇存款账户有哪些优势？,5,"[-0.005315988324582577, -0.008732923306524754,...",1,5,5,5,5.0,237,50,5,1,1,1,1.0


In [390]:
# 使用apply方法按行计算众数
mode_series = test_df_final[cols].apply(lambda x: x.mode().iloc[0], axis=1)

In [391]:
mode_series[:10]

0    2.0
1    5.0
2    5.0
3    1.0
4    1.0
5    3.0
6    1.0
7    1.0
8    2.0
9    5.0
dtype: float64

In [392]:
# 创建一个新列来存储每一行的众数
test_df_final['Mode_preds'] = mode_series

In [393]:
test_df_final.head()

Unnamed: 0,Conversation,type,embedding,function_call_prediction_3.5,RF_pre,LGBM_pre,XGB_pre,function_call_prediction_4,sim_1,sim_2,sim_3,sim_1_target,sim_2_target,sim_3_target,Mode_preds
0,我想了解一下贷款担保的具体要求。,2,"[-0.0121180210262537, -0.01918686553835869, 0....",2.0,2,2,2,2.0,253,147,122,2,2,2,2.0
1,我需要汇款到澳大利亚，贵行有什么特别的要求吗？,5,"[-0.020629247650504112, -0.005797294899821281,...",5.0,5,5,5,5.0,118,259,135,5,5,5,5.0
2,请问贵行国际汇款有哪些安全保障措施？,5,"[-0.001715182326734066, -0.017442874610424042,...",5.0,5,5,5,5.0,18,119,166,4,5,5,5.0
3,请问贵行储蓄账户是否有年费或者管理费？,1,"[0.004440059419721365, -0.016759661957621574, ...",1.0,1,1,1,1.0,194,5,181,1,1,1,1.0
4,我的储蓄账户需要更新个人信息，我该怎么操作？,1,"[-0.01662181131541729, -0.008774831891059875, ...",1.0,1,1,1,1.0,58,28,124,1,1,3,1.0


In [394]:
(test_df_final["Mode_preds"] != test_df_final["type"]).sum()

0

此时在测试集上的意图判别准确率已经提升到了100%。

|意图识别方法评分|零样本分类(全数据集准确率)|有监督分类(测试集准确率)|
|:--:|:--:|:--:|
|function_call_3.5|**0.89863**|/|
|机器学习_RF|/|**0.945205**|
|机器学习_LGBM|/|**0.986301**|
|机器学习_XGB|/|**1.0**|
|function_call_4|**0.98356**|/|
|Few-shot-GPT3.5|/|**0.986301**|
|Few-shot-GPT4|/|**0.986301**|
|文本匹配检索|/|**0.794520**|
|投票法模型融合|/|**1.0**|

最后保存数据集：

In [395]:
train_df_final.to_csv('./data/train_dataset_final.csv', index=False)
test_df_final.to_csv('./data/test_dataset_final.csv', index=False)

<center><img src="https://ml2022.oss-cn-hangzhou.aliyuncs.com/img/202401111951031.png" alt="大模型用户意图识别方法总览" style="zoom:33%;" />