# Using the LLM API

This chapter mainly introduces the API application guidelines and native API calling methods of the Python version of four large language models (ChatGPT, Wenxin Yiyan, iFlytek Spark, and Zhipu GLM). Readers can choose an API that they can apply for according to their actual situation for reading and learning.

* ChatGPT: Recommended for readers who can access the Internet scientifically;

* Wenxin Yiyan: There is currently no activity to give new users tokens, and it is recommended for users who already have Wenxin tokens and paid users;

* iFlytek Spark: New users are given tokens, and free users are recommended;

* Zhipu GLM: New users are given tokens, and free users are recommended.

If you need to use LLM in LangChain, you can refer to the calling method in [LLM access to LangChain](https://github.com/datawhalechina/llm-universe/blob/main/notebook/C4%20%E6%9E%84%E5%BB%BA%20RAG%20%E5%BA%94%E7%94%A8/1.LLM%20%E6%8E%A5%E5%85%A5%20LangChain.ipynb).

## 1. Using ChatGPT

ChatGPT, released in November 2022, is a representative product of the currently popular Large Language Model (LLM). At the end of 2022, it was ChatGPT's amazing performance that triggered the LLM craze. To date, GPT-4 released by OpenAI is still the representative of the upper limit of LLM performance, and ChatGPT is still the LLM product with the largest number of users, the greatest popularity, and the greatest development potential. In fact, in the eyes of outsiders, ChatGPT is a synonym for LLM.

In addition to releasing free Web products, OpenAI also provides a variety of ChatGPT APIs, which support developers to call ChatGPT through Python or Request requests and embed the powerful capabilities of LLM into their own services. The main models to choose from include ChatGPT-3.5 and GPT-4, and each model also has multiple context versions. For example, ChatGPT-3.5 has the original 4K context length model and the 16K context length model gpt-turbo-16k-0613.

### 1.1 API Application Guidelines

#### Get and configure the OpenAI API key

The OpenAI API call service is paid. Every developer needs to obtain and configure the OpenAI API key before accessing ChatGPT in the application they build. In this section, we will briefly describe how to obtain and configure the OpenAI API key.

Before obtaining the OpenAI API key, we need to register an account on the [OpenAI official website](https://openai.com/). Here we assume that we already have an OpenAI account and log in to the [OpenAI official website](https://openai.com/). After logging in, the following figure is shown:

<p align="center">
<img src="../../figures/C2-2-openai-choose.png" width="1000" alt="OpenAI official website login and select API">
</p>

We select `API` and then click `API keys` in the left sidebar, as shown below:

<p align="center">
<img src="../../figures/C2-2-openai-get-key.png" width="1000" alt="OpenAI Get API key">
</p>

Click the `Create new secret key` button to create the OpenAI API key. We will copy the created OpenAI API key in this form `OPENAI_API_KEY="sk-..."` and save it to the `.env` file, and save the `.env` file in the project root directory.

The following is the code to read the `.env` file:

In [1]:
import os
from dotenv import load_dotenv, find_dotenv

# Read local/project environment variables.

# find_dotenv() finds and locates the path of the .env file
# load_dotenv() reads the .env file and loads the environment variables in it into the current running environment
# If you set a global environment variable, this line of code will have no effect.
_ = load_dotenv(find_dotenv())

# If you need to access through the proxy port, you also need to do the following configuration
os.environ['HTTPS_PROXY'] = 'http://127.0.0.1:7890'
os.environ["HTTP_PROXY"] = 'http://127.0.0.1:7890'

### 1.2 Calling OpenAI API

To call ChatGPT, you need to use the [ChatCompletion API](https://platform.openai.com/docs/api-reference/chat), which provides calls to the ChatGPT series of models, including ChatGPT-3.5, GPT-4, etc.

The ChatCompletion API call method is as follows:

In [2]:
from openai import OpenAI

client = OpenAI(
# This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)

# Import required libraries
# Note that we assume that you have configured the OpenAI API Key as described above. If not, access will fail.
completion = client.chat.completions.create(
# Call model: ChatGPT-3.5
    model="gpt-3.5-turbo",
# messages is a list of conversations
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

Calling this API will return a ChatCompletion object, which includes properties such as answer text, creation time, id, etc. What we generally need is the answer text, that is, the content information in the answer object.

In [3]:
completion

ChatCompletion(id='chatcmpl-9FAKG4M6HXML257axa12PUuCXbJJz', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hello! How can I assist you today?', role='assistant', function_call=None, tool_calls=None))], created=1713401640, model='gpt-3.5-turbo-0125', object='chat.completion', system_fingerprint='fp_c2295e73ad', usage=CompletionUsage(completion_tokens=9, prompt_tokens=19, total_tokens=28))

In [4]:
print(completion.choices[0].message.content)

Hello! How can I assist you today?


Here we introduce several parameters that are often used when calling the API:

· model, that is, the model called, generally takes values ​​including "gpt-3.5-turbo" (ChatGPT-3.5), "gpt-3.5-turbo-16k-0613" (ChatGPT-3.5 16K version), "gpt-4" (ChatGPT-4). Note that the cost of different models is different.

· messages, that is, our prompt. ChatCompletion's messages need to pass in a list, which includes prompts of multiple different roles. The roles we can choose generally include system: the system prompt mentioned above; user: the prompt entered by the user; assistant: the assistant, which is generally the historical reply of the model as a reference content provided to the model.

· temperature, temperature. That is, the Temperature coefficient mentioned above.

· max_tokens, the maximum number of tokens, that is, the maximum number of tokens output by the model. The number of tokens calculated by OpenAI is the total number of tokens of Prompt and Completion combined, requiring the total number of tokensThe model limit cannot be exceeded (for example, the default model token limit is 4096). Therefore, if the input prompt is long, a larger max_token value needs to be set, otherwise an error message will be displayed indicating that the length exceeds the limit.

OpenAI provides ample customization space, allowing us to improve the model's answering effect by customizing the prompt. The following is a simple function that encapsulates the OpenAI interface, allowing us to directly pass in the prompt and obtain the model's output:

In [5]:
from openai import OpenAI

client = OpenAI(
# This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)


def gen_gpt_messages(prompt):
    '''
    构造 GPT 模型请求参数 messages
    
    请求参数：
        prompt: 对应的用户提示词
    '''
    messages = [{"role": "user", "content": prompt}]
    return messages


def get_completion(prompt, model="gpt-3.5-turbo", temperature = 0):
    '''
    获取 GPT 模型调用结果

    请求参数：
        prompt: 对应的提示词
        model: 调用的模型，默认为 gpt-3.5-turbo，也可以按需选择 gpt-4 等其他模型
        temperature: 模型输出的温度系数，控制输出的随机程度，取值范围是 0~2。温度系数越低，输出内容越一致。
    '''
    response = client.chat.completions.create(
        model=model,
        messages=gen_gpt_messages(prompt),
        temperature=temperature,
    )
    if len(response.choices) > 0:
        return response.choices[0].message.content
    return "generate answer error"

In [6]:
get_completion("你好")

'你好！有什么可以帮助你的吗？'

In the above function, we encapsulate the details of messages and only use user prompt to implement the call. In simple scenarios, this function is sufficient to meet the usage requirements.

## 2. Use Wenxinyiyan

Wenxinyiyan, a Chinese large model launched by Baidu on March 27, 2023, is a representative product of domestic large language models. Limited by the difference in the quality of Chinese corpus and the bottleneck of domestic computing resources and computing technology, Wenxinyiyan still has a certain gap from ChatGPT in overall performance, but it has shown a relatively superior performance in the Chinese context. The landing scenarios considered by Wenxinyiyan include multimodal generation, literary creation and other commercial scenarios. Its goal is to surpass ChatGPT in the Chinese context. Of course, Baidu still has a long way to go to truly defeat ChatGPT; but in China, where generative AI is strictly regulated, as the first batch of generative AI applications allowed to be open to the public, Wenxinyiyan still has certain commercial advantages over ChatGPT, which cannot be publicly used.

Baidu also provides the API interface of Wenxinyiyan. At the same time as launching the large model, it also launched the Wenxinqianfan enterprise-level large language model service platform, including Baidu's entire large language model development work chain. For small and medium-sized enterprises or traditional enterprises that do not have the ability to implement large models, considering Wenxin Qianfan is a viable option. Of course, this tutorial only includes calling the Wenxin Yiyan API through the Wenxin Qianfan platform, and does not discuss other enterprise-level services.

### 2.1 Qianfan SDK

#### 2.1.1 API Application Guidelines

#### Get the key

Baidu Smart Cloud Qianfan Large Model Platform provides [Qianfan SDK](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/wlmhm7vuo) in multiple languages. Developers can use SDK to quickly develop functions and improve development efficiency.

Before using Qianfan SDK, you need to obtain the Wenxin Yiyan call key first. You need to configure your own key in the code to call the model. Below we take [Python SDK](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/7lq3ft3pb) as an example to introduce the process of calling the Wenxin model through Qianfan SDK.

First, you need to have a Baidu account that has been authenticated by real name. Each account can create several applications, and each application will correspond to an API_Key and Secret_Key.

![](../../figures/C2-2-baidu_qianfan_1.png)

Enter the [Wenxin Qianfan Service Platform](https://console.bce.baidu.com/qianfan/overview), click the above `Application Access` button, and create an application that calls the Wenxin model.

![](../../figures/C2-2-baidu_qianfan_2.png)

Then click the `Go to Create` button to enter the application creation interface:

![](../../figures/C2-2-baidu_qianfan_3.png)

Simply enter basic information, select the default configuration, and create the application.

![](../../figures/C2-2-baidu_qianfan_4.png)

After the creation is completed, we can see the `API Key` and `Secret Key` of the created application in the console.

**It should be noted that Qianfan currently only has three services, [Prompt template](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Alisj3ard), [Yi-34B-Chat](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/vlpteyv3c) and [Fuyu-8B public cloud online call experience service](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Qlq4l7uw6), which are free to use. If you want to experience other model services, you need to use [Billing Management](https://console.bce.baidu.com/qianfYou can experience it only if you activate the paid service of the corresponding model at an/chargemanage/list. **

We fill the `API Key` and `Secret Key` obtained here into the `QIANFAN_AK` and `QIANFAN_SK` parameters of the `.env` file. If you are using parameter verification for security authentication, you need to check the `Access Key` and `Secret Key` on the [Baidu Smart Cloud Console-User Account-Security Authentication](https://console.bce.baidu.com/iam/#/iam/accesslist) page, and fill the obtained parameters into the `QIANFAN_ACCESS_KEY` and `QIANFAN_SECRET_KEY` of the `.env` file accordingly.

![](../../figures/C2-2-baidu_qianfan_5.png)

Then execute the following code to load the key into the environment variable.

In [7]:
from dotenv import load_dotenv, find_dotenv

# Read local/project environment variables.

# find_dotenv() finds and locates the path of the .env file
# load_dotenv() reads the .env file and loads the environment variables in it into the current running environment
# If you set a global environment variable, this line of code will have no effect.
_ = load_dotenv(find_dotenv())

#### 2.1.2 Calling the Wenxin Qianfan API

Baidu Wenxin also supports configuring the prompts of the two member roles of user and assistant in the messages field of the incoming parameters, but unlike the prompt format of OpenAI, the model personality is passed in through another parameter system field, not in the messages field.

Below we use the SDK to encapsulate a `get_completion` function for subsequent use.

**Remind readers again: If there is no free or purchased credit in the account, when executing the following code to call Wenxin `ERNIE-Bot`, the following error will be reported: `error code: 17, err msg: Open api daily request limit reached`. **

Click [Model Service](https://console.bce.baidu.com/qianfan/ais/console/onlineService) to view the full list of models supported by Qianfan.

In [8]:
import qianfan

def gen_wenxin_messages(prompt):
    '''
    构造文心模型请求参数 messages

    请求参数：
        prompt: 对应的用户提示词
    '''
    messages = [{"role": "user", "content": prompt}]
    return messages


def get_completion(prompt, model="ERNIE-Bot", temperature=0.01):
    '''
    获取文心模型调用结果

    请求参数：
        prompt: 对应的提示词
        model: 调用的模型，默认为 ERNIE-Bot，也可以按需选择 Yi-34B-Chat 等其他模型
        temperature: 模型输出的温度系数，控制输出的随机程度，取值范围是 0~1.0，且不能设置为 0。温度系数越低，输出内容越一致。
    '''

    chat_comp = qianfan.ChatCompletion()
    message = gen_wenxin_messages(prompt)

    resp = chat_comp.do(messages=message, 
                        model=model,
                        temperature = temperature,
                        system="你是一名个人助理-小鲸鱼")

    return resp["result"]

If you are a free user, when using the above function, you can specify a free model (such as `Yi-34B-Chat`) in the input parameter and then run:

In [10]:
get_completion("你好，介绍一下你自己", model="Yi-34B-Chat")

[INFO] [04-18 08:54:40] openapi_requestor.py:316 [t:8610091584]: requesting llm api endpoint: /chat/yi_34b_chat


'你好！我叫 Yi，我是零一万物开发的一个智能助手，由零一万物的研究团队通过大量的文本数据进行训练，学习了语言的各种模式和关联，从而能够生成文本、回答问题、进行对话。我的目标是帮助用户获取信息、解答疑问以及提供各种语言相关的帮助。我是一个人工智能，没有感受和意识，但我可以模拟人类的交流方式，以便于与用户互动。如果你有任何问题或需要帮助，请随时告诉我！'

If you have a quota for the Wenxin series model `ERNIE-Bot`, you can directly run the following function:

In [11]:
get_completion("你好，介绍一下你自己")

[INFO] [04-18 08:57:01] openapi_requestor.py:316 [t:8610091584]: requesting llm api endpoint: /chat/completions


'你好！我是小鲸鱼，你的个人助理。我致力于为你提供准确、及时的信息和帮助，解答你的问题，并尽力满足你的需求。无论你需要什么帮助，我都会尽力提供帮助和支持。'

Baidu Qianfan provides a variety of model interfaces for calling. Among them, the conversation chat interface of the `ERNIE-Bot` model we used above is also known as the Baidu Wenxin model. Here is a brief introduction to the common parameters of the Wenxin model interface:

· messages, which is the prompt called. The message configuration of Wenxin is somewhat different from ChatGPT. It does not support the max_token parameter. The maximum number of tokens is controlled by the model. The total length of content in messages, the total content of functions and system fields cannot exceed 20480 characters, and cannot exceed 5120 tokens, otherwise the model will automatically forget the previous text. Wenxin's messages have the following requirements: ① One member is a single-round conversation, and multiple members are multi-round conversations; ② The last message is the current conversation, and the previous message is a historical conversation; ③ The number of members must be an odd number, and the role in the message must be user and assistant in sequence. Note: Here we introduce the character count and tokens limits of the ERNIE-Bot model. The parameter limits vary from model to model. Please check the parameter description of the corresponding model on the Wenxin Qianfan official website.

· stream, whether to use streamingTransmission.

· temperature, temperature coefficient, default is 0.8, the temperature parameter of the text center requires the range to be (0, 1.0], and cannot be set to 0.

### 2.2 ERNIE SDK

#### 2.2.1 API Application Guidelines

Here we will use `ERNIE Bot` in `ERNIE SDK` to call Wenxin Yiyan. ERNIE Bot provides developers with a convenient and easy-to-use interface, enabling them to easily call the powerful functions of the Wenxin big model, covering multiple basic functions such as text creation, general dialogue, semantic vectors, and AI mapping. `ERNIE SDK` does not support various large language models like `Qianfan SDK`, but only supports Baidu's own Wenxin big model. Currently, the models supported by ERNIE Bot are:

```
ernie-3.5 Wenxin large model (ernie-3.5)
ernie-lite Wenxin large model (ernie-lite)
ernie-4.0 Wenxin large model (ernie-4.0)
ernie-longtext Wenxin large model (ernie-longtext)
ernie-speed Wenxin large model (ernie-speed)
ernie-speed-128k Wenxin large model (ernie-speed-128k)
ernie-tiny-8k Wenxin large model (ernie-tiny-8k）
ernie-char-8k Wenxin Big Model (ernie-char-8k)
ernie-text-embedding Wenxin Baizhong Semantic Model
ernie-vilg-v2 Wenxin Yige Model
```

Before using ERNIE SDK, you need to obtain the authentication (access token) of the AI ​​Studio backend. You need to configure your own key in the code to call the model. Below we take [Ernie Bot](https://ernie-bot-agent.readthedocs.io/zh-cn/latest/sdk/) as an example to introduce the process of calling the Wenxin model through ERNIE Bot.

First, you need to register and log in to the [AI Studio Galaxy Community](https://aistudio.baidu.com/index) (new users will be given a free quota of 1 million tokens for 3 months).

![](../../figures/C2-2-ernie_bot_1.png)

Click `Access Token` to get the account's access token, copy the access token and use it as `EB_ACCESS_TOKEN="..."` is saved to the `.env` file.
![](../../figures/C2-2-ernie_bot_2.png)

Then execute the following code to load the key into the environment variable.

In [1]:
from dotenv import load_dotenv, find_dotenv

# Read local/project environment variables.

# find_dotenv() finds and locates the path of the .env file
# load_dotenv() reads the .env file and loads the environment variables in it into the current running environment
# If you set a global environment variable, this line of code will have no effect.
_ = load_dotenv(find_dotenv())

#### 2.2.2 Calling Ernie Bot API

In [2]:
import erniebot
import os

erniebot.api_type = "aistudio"
erniebot.access_token = os.environ.get("EB_ACCESS_TOKEN")

def gen_wenxin_messages(prompt):
    '''
    构造文心模型请求参数 messages

    请求参数：
        prompt: 对应的用户提示词
    '''
    messages = [{"role": "user", "content": prompt}]
    return messages


def get_completion(prompt, model="ernie-3.5", temperature=0.01):
    '''
    获取文心模型调用结果

    请求参数：
        prompt: 对应的提示词
        model: 调用的模型
        temperature: 模型输出的温度系数，控制输出的随机程度，取值范围是 0~1.0，且不能设置为 0。温度系数越低，输出内容越一致。
    '''

    chat_comp = erniebot.ChatCompletion()
    message = gen_wenxin_messages(prompt)

    resp = chat_comp.create(messages=message, 
                        model=model,
                        temperature = temperature,
                        system="你是一名个人助理")

    return resp["result"]

In [3]:
get_completion("你好，介绍一下你自己")

'你好！我是一名个人助理，我的主要任务是协助和支持你的日常工作和活动。我可以帮助你管理时间、安排日程、提供信息、解答问题，以及完成其他你需要的任务。如果你有任何需求或问题，请随时告诉我，我会尽力帮助你。'

The calling method and parameters of `Ernie Bot API` are basically the same as `Wenxin Qianfan API`, except that the `.create()` function is used when creating `ChatCompletion`.

Therefore, we will not introduce the common parameters of the interface in detail here. You can refer to [Parameter Introduction](https://ernie-bot-agent.readthedocs.io/zh-cn/latest/sdk/api_reference/chat_completion/) for detailed information on other parameters.

## 3. Use iFlytek Spark

iFlytek Spark Cognitive Big Model, a Chinese big model launched by iFlytek in May 2023, is also one of the representative products of domestic big models. Similarly, due to the limitations of the Chinese context and computing resources, Spark still has differences in user experience from ChatGPT, but as a domestic Chinese big model that is on par with Wenxin, it is still worth looking forward to and trying. Compared with Baidu, which has significant resource and technical advantages, if iFlytek wants to break through the siege and become a leader in domestic big models, it needs to make full use of its relative advantages. At least for now, Spark has not fallen behind.

### 3.1 API Application Guidelines

We can use the [exclusive link provided by Datawhale](https://xinghuo.xfyun.cn/sparkapi?ch=dwKeloHY), through which you can get more free quotas, click `free trial`:

![](../../figures/C2-2-spark_1.png)

![](../../figures/C2-2-spark_2.png)

If you are a user who has not received a free trial package, you can receive a trial of 100,000 tokens. After completing personal identity authentication, you can also receive a trial of 2,000,000 tokens for free. After receiving it, click to enter the console and create an application. After the creation is completed, you can see the `APPID`, `APISecret` and `APIKey` we obtained:

![](../../figures/C2-2-spark_3.png)

Spark provides two calling models: one is SDK calling, which is easy to use and recommended for beginners; the other is WebSocket calling, which is enterprise-friendly but difficult for beginners and novice developers. The following will introduce these two calling methods in detail.

### 3.2 Calling via SDK (recommended)

First execute the following code to load the key into the environment variable.

In [15]:
import os

from dotenv import load_dotenv, find_dotenv

# Read local/project environment variables.

# find_dotenv() finds and locates the path of the .env file
# load_dotenv() reads the .env file and loads the environment variables in it into the current running environment
# If you set a global environment variable, this line of code will have no effect.
_ = load_dotenv(find_dotenv())

Then we use the SDK to encapsulate a `get_completion` function for subsequent use.

In [28]:
from sparkai.llm.llm import ChatSparkLLM, ChunkPrintHandler
from sparkai.core.messages import ChatMessage

def gen_spark_params(model):
    '''
    构造星火模型请求参数
    '''

    spark_url_tpl = "wss://spark-api.xf-yun.com/{}/chat"
    model_params_dict = {
# v1.5 version
        "v1.5": {
            "domain": "general", # 用于配置大模型版本
            "spark_url": spark_url_tpl.format("v1.1") # 云端环境的服务地址
        },
# v2.0 version
        "v2.0": {
            "domain": "generalv2", # 用于配置大模型版本
            "spark_url": spark_url_tpl.format("v2.1") # 云端环境的服务地址
        },
# v3.0 version
        "v3.0": {
            "domain": "generalv3", # 用于配置大模型版本
            "spark_url": spark_url_tpl.format("v3.1") # 云端环境的服务地址
        },
# v3.5 version
        "v3.5": {
            "domain": "generalv3.5", # 用于配置大模型版本
            "spark_url": spark_url_tpl.format("v3.5") # 云端环境的服务地址
        }
    }
    return model_params_dict[model]

def gen_spark_messages(prompt):
    '''
    构造星火模型请求参数 messages

    请求参数：
        prompt: 对应的用户提示词
    '''

    messages = [ChatMessage(role="user", content=prompt)]
    return messages


def get_completion(prompt, model="v3.5", temperature = 0.1):
    '''
    获取星火模型调用结果

    请求参数：
        prompt: 对应的提示词
        model: 调用的模型，默认为 v3.5，也可以按需选择 v3.0 等其他模型
        temperature: 模型输出的温度系数，控制输出的随机程度，取值范围是 0~1.0，且不能设置为 0。温度系数越低，输出内容越一致。
    '''

    spark_llm = ChatSparkLLM(
        spark_api_url=gen_spark_params(model)["spark_url"],
        spark_app_id=os.environ["SPARK_APPID"],
        spark_api_key=os.environ["SPARK_API_KEY"],
        spark_api_secret=os.environ["SPARK_API_SECRET"],
        spark_llm_domain=gen_spark_params(model)["domain"],
        temperature=temperature,
        streaming=False,
    )
    messages = gen_spark_messages(prompt)
    handler = ChunkPrintHandler()
# When streaming is set to False, callbacks do not work
    resp = spark_llm.generate([messages], callbacks=[handler])
    return resp

In [29]:
# Here, the normal response content is directly printed out. In the production environment, it is necessary to be compatible with the abnormal response processing
get_completion("你好").generations[0][0].text

'你好！有什么我能帮忙的吗？'

### 3.3 Calling via WebSocket

The way to connect via WebSocket is relatively complex to configure. iFlytek provides a [call example](https://www.xfyun.cn/doc/spark/Web.html#_3-%E8%B0%83%E7%94%A8%E7%A4%BA%E4%BE%8B). Click the corresponding language call example to download. Here we take the [Python call example](https://xfyun-doc.xfyun.cn/lc-sp-sparkAPI-1709535448185.zip) as an example. After downloading, we can get a `sparkAPI.py` file, which contains the server encapsulation and client call implementation.

It should be noted that directly running the `sparkAPI.py` file of the official example will result in an error. The following modifications need to be made:

(1) Comment out the following line: `import openpyxl` (this package is not used in the code. If it is not installed, a ModuleNotFoundError will be prompted);

(2) Modify the `on_close` function (this function receives 3 input parameters). The modified function is as follows:

In [18]:
# Receive the processing of websocket closing
def on_close(ws, close_status_code, close_msg):  
    print("### closed ###")

Then we run the modified official sample code. Note: Before running, you also need to assign the API key obtained in the previous section to the input parameters of the `main` function `appid`, `api_secret`, `api_key`.

Execute `python sparkAPI.py`, and you can get the following output:

![](../../figures/C2-2-spark_4.png)

It can be noticed that in addition to LLM's answer, the output of the official example also contains a print log indicating the end of the answer ("#### close session", "### close ###"). If you only want to keep the original output content, you can optimize it by modifying the source code.

Based on the `sparkAPI.py` file, we also encapsulate a `get_completion` function for calling in subsequent chapters.

First, execute the following code to read the key configuration of the `.env` file.

In [19]:
import os
import sparkAPI

from dotenv import load_dotenv, find_dotenv

# Read local/project environment variables.

# find_dotenv() finds and locates the path of the .env file
# load_dotenv() reads the .env file and loads the environment variables in it into the current running environment
# If you set a global environment variable, this line of code will have no effect.
_ = load_dotenv(find_dotenv())

Spark Big Model API currently has four versions: V1.5, V2.0, V3.0 and V3.5, and the four versions measure tokens independently. The `get_completion` function is encapsulated as follows:

In [20]:
def gen_spark_params(model):
    '''
    构造星火模型请求参数
    '''

    spark_url_tpl = "wss://spark-api.xf-yun.com/{}/chat"
    model_params_dict = {
# v1.5 version
        "v1.5": {
            "domain": "general", # 用于配置大模型版本
            "spark_url": spark_url_tpl.format("v1.1") # 云端环境的服务地址
        },
# v2.0 version
        "v2.0": {
            "domain": "generalv2", # 用于配置大模型版本
            "spark_url": spark_url_tpl.format("v2.1") # 云端环境的服务地址
        },
# v3.0 version
        "v3.0": {
            "domain": "generalv3", # 用于配置大模型版本
            "spark_url": spark_url_tpl.format("v3.1") # 云端环境的服务地址
        },
# v3.5 version
        "v3.5": {
            "domain": "generalv3.5", # 用于配置大模型版本
            "spark_url": spark_url_tpl.format("v3.5") # 云端环境的服务地址
        }
    }
    return model_params_dict[model]


def get_completion(prompt, model="v3.5", temperature = 0.1):
    '''
    获取星火模型调用结果

    请求参数：
        prompt: 对应的提示词
        model: 调用的模型，默认为 v3.5，也可以按需选择 v3.0 等其他模型
        temperature: 模型输出的温度系数，控制输出的随机程度，取值范围是 0~1.0，且不能设置为 0。温度系数越低，输出内容越一致。
    '''

    response = sparkAPI.main(
        appid=os.environ["SPARK_APPID"],
        api_secret=os.environ["SPARK_API_SECRET"],
        api_key=os.environ["SPARK_API_KEY"],
        gpt_url=gen_spark_params(model)["spark_url"],
        domain=gen_spark_params(model)["domain"],
        query=prompt
    )
    return response

In [21]:
get_completion("你好")

你好！有什么我能帮忙的吗？


It should be noted that in the official example `sparkAPI.py` file, the `temperature` parameter does not support external input, but is fixed to 0.5. If you do not want to use the default value, you can support external parameter input by modifying the source code, which is not explained here.

## 4. Using Zhipu GLM

Zhipu AI is a company transformed from the technological achievements of the Department of Computer Science at Tsinghua University, dedicated to creating a new generation of cognitive intelligence general models. The company has jointly developed a bilingual 100 billion-level ultra-large-scale pre-trained model GLM-130B, and built a high-precision general knowledge graph to form a cognitive engine driven by data and knowledge. Based on this model, ChatGLM (chatglm.cn) was created.

The ChatGLM series of models, including ChatGLM-130B, ChatGLM-6B and ChatGLM2-6B (an upgraded version of ChatGLM-6B) models, support relatively complex natural language instructions and can solve difficult reasoning problems. Among them, the ChatGLM-6B model has been downloaded more than 3 million times from Huggingface (as of June 24, 2023). The model has ranked first in the Hugging Face (HF) global large model download list for 12 consecutive days, and has had a great impact in the open source community at home and abroad.

### 4.1 API Application Guidelines

First, go to [Zhipu AI Open Platform](https://open.bigmodel.cn/overview), click `Start using` or `Development workbench` to register:

![](../../figures/C2-2-zhipuai_home.png)

Newly registered users can get a free experience package of 1 million tokens with a validity period of 1 month. After personal real-name authentication, they can also get an additional experience package of 4 million tokens. Zhipu AI provides experience entrances for two different models, GLM-4 and GLM-3-Turbo. You can click the `Experience now` button to experience it directly.

![Zhipu AI Console](../../figures/C2-2-zhipuai_overview.png)

If you need to use API key to build an application, you need to click the `View API key` button on the right to enter our personal API management list. In this interface, you can see the application name and `API key` corresponding to the API we obtained.

![Zhipu AI API Management](../../figures/C2-2-zhipuai_api.png)

We can click `Add new API key` and enter the corresponding nameA new API key will be generated.

### 4.2 Calling Zhipu GLM API

Zhipu AI provides SDK and native HTTP to implement model API calls. It is recommended to use SDK for calls to get a better programming experience.

First, we need to configure the key information, set the `API key` obtained earlier to the `ZHIPUAI_API_KEY` parameter in the `.env` file, and then run the following code to load the configuration information.

In [22]:
import os

from dotenv import load_dotenv, find_dotenv

# Read local/project environment variables.

# find_dotenv() finds and locates the path of the .env file
# load_dotenv() reads the .env file and loads the environment variables in it into the current running environment
# If you set a global environment variable, this line of code will have no effect.
_ = load_dotenv(find_dotenv())

Zhipu's call parameter passing is similar to other methods, and also requires passing in a messages list, including role and prompt. We encapsulate the following `get_completion` function for subsequent use.

In [23]:
from zhipuai import ZhipuAI

client = ZhipuAI(
    api_key=os.environ["ZHIPUAI_API_KEY"]
)

def gen_glm_params(prompt):
    '''
    构造 GLM 模型请求参数 messages

    请求参数：
        prompt: 对应的用户提示词
    '''
    messages = [{"role": "user", "content": prompt}]
    return messages


def get_completion(prompt, model="glm-4", temperature=0.95):
    '''
    获取 GLM 模型调用结果

    请求参数：
        prompt: 对应的提示词
        model: 调用的模型，默认为 glm-4，也可以按需选择 glm-3-turbo 等其他模型
        temperature: 模型输出的温度系数，控制输出的随机程度，取值范围是 0~1.0，且不能设置为 0。温度系数越低，输出内容越一致。
    '''

    messages = gen_glm_params(prompt)
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature
    )
    if len(response.choices) > 0:
        return response.choices[0].message.content
    return "generate answer error"

In [25]:
get_completion("你好")

'你好！有什么可以帮助您的吗？'

Here is a brief introduction to the parameters passed to zhipuai:

- `messages (list)`, when calling the dialogue model, the current dialogue message list is input to the model as a prompt; the parameters are passed in the form of key-value pairs of {"role": "user", "content": "你好"}; the total length will be automatically truncated if it exceeds the maximum input limit of the model, and it needs to be sorted from old to new by time

- `temperature (float)`, sampling temperature, controls the randomness of the output, must be a positive number in the range of (0.0, 1.0), cannot be equal to 0, and the default value is 0.95. The larger the value, the more random and creative the output; the smaller the value, the more stable or certain the output

- `top_p (float)`, another method of sampling with temperature, called kernel sampling. The value range is: (0.0, 1.0) open interval, cannot be equal to 0 or 1, and the default value is 0.7. The model considers the results with top_p probability mass tokens. For example: 0.1 means that the model decoder only considers taking tokens from the candidate set with the top 10% probability

- `request_id (string)`, which is passed by the user and must be unique; it is used to distinguish the unique identifier of each request. If the user does not pass it, the platform will generate it by default

- **It is recommended that you adjust to according to the application scenariop_p or temperature parameter, but do not adjust both parameters at the same time**