# Python SDK to access the ZhipuAI GLM-4 Vision API

**This tutorial is available in English and is attached below the Chinese explanation**

此代码将讲述如何使用Python SDK 调用 GLM-4V API，来完成简单的视觉理解和分析工作。

This cookbook will describe how to use the Python SDK to call the GLM-4V API to complete simple visual understanding and analysis work.

In [9]:
import os

#os.environ["ZHIPUAI_API_KEY"] = "your api key"

首先，我们需要将图片转为可以上传的base64格式，这里我们使用PIL库来完成这个工作

In [17]:
import base64
import io
from zhipuai import ZhipuAI
from PIL import Image

client = ZhipuAI()


def image_to_base64(image_path):
    """
    Convert an image to base64 encoding.
    """
    with Image.open(image_path) as image:
        buffered = io.BytesIO()
        image.save(buffered, format="JPEG")  # or format="PNG", depending on your image.
        img_str = base64.b64encode(buffered.getvalue()).decode()
    return img_str


#base64_image = image_to_base64("data/zR.jpg")
base64_image = image_to_base64("data/5.jpg")

In [18]:
base64_image

'/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCANHBnwDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwCj4a8L+H7u3jM9oWJ6/wCcV0x8DeF88WJ/Mf4VmeDV/wBHiNdfL9+tjMwv+EF8Mf8APkfzH+FL/wAIL4Y/58v1H+FbeKXb70AYf/CC+GP+fL9R/hSf8IL4Y/58j+Y/wre2e9GMUrAYP/CC+F/+fE/mP8KP+EG8L/8APifzH+F

In [19]:
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                #"text": "what is this image describe?"
                "text": "描述这张图像,使用中文"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": base64_image
                }

            }
        ]
    }
]

我们已经组织好了信息和图片，现在让我们按照[官方文档](https://open.bigmodel.cn/dev/api#glm-4v)的内容传入对应的参数并获取模型的回答

We have organized the information and pictures, now let us follow the [official document](https://open.bigmodel.cn/dev/api#glm-4v) to pass in the corresponding parameters and get the model's answer

In [20]:
response = client.chat.completions.create(
    model="glm-4v",
    messages=messages,
)

通过这个操作，我们将能得到模型对这张图的描述。

Through this operation, we will be able to get the model's description of this picture.

In [21]:
response

Completion(model='glm-4v', created=1710486253, choices=[CompletionChoice(index=0, finish_reason='stop', message=CompletionMessage(content='这张图片展示了一个银行内部的监控画面，拍摄时间为2021年9月27日下午3点12分47秒。可以看到三位客户正在柜台前办理业务，其中两位戴着口罩。柜台上摆放着各种金融设备，如点钞机、电脑和电话等。地面上有一个红色的垃圾桶，上面写着“废弃”二字。此外，还有一个提示标语：“东西湖额头湾现金合席1”，可能指的是该窗口的服务区域或功能。整个场景体现了银行内部繁忙而有序的工作环境。', role='assistant', tool_calls=None))], request_id='8477877108634215080', id='8477877108634215080', usage=CompletionUsage(prompt_tokens=1040, completion_tokens=113, total_tokens=1153))

你还可以对这张图片进行更多的提问，并使用历史记录的方式保留之前的提问和回答。现在，我将为这一段对话添加一段新的历史

You can also ask more questions about this picture and use historical records to retain previous questions and answers. Now I'm going to add a new history to your conversation

In [15]:
messages +=[
        
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "what is the color of hair and dress the this women?"
                }
            ]
        },
    ]
messages[-1]

{'role': 'user',
 'content': [{'type': 'text',
   'text': 'what is the color of hair and dress the this women?'}]}

现在，我们再次请求，看看模型返回的结果

Now, we request again and see the results returned by the model

In [16]:
response = client.chat.completions.create(
    model="glm-4v",
    messages=messages,
)
response

Completion(model='glm-4v', created=1710477041, choices=[CompletionChoice(index=0, finish_reason='stop', message=CompletionMessage(content='The woman has black hair and she is wearing a blue and white uniform.', role='assistant', tool_calls=None))], request_id='8477893704388048795', id='8477893704388048795', usage=CompletionUsage(prompt_tokens=1058, completion_tokens=16, total_tokens=1074))

In [24]:
messages +=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "How old is the women?"
                }
            ]
        },
    ]
messages[-1]

{'role': 'user',
 'content': [{'type': 'text', 'text': 'How old is the women?'}]}

In [25]:
response = client.chat.completions.create(
    model="glm-4v",
    messages=messages,
)
response

Completion(model='glm-4v', created=1710469176, choices=[CompletionChoice(index=0, finish_reason='stop', message=CompletionMessage(content="The woman in the image appears to be a mature adult. However, it's important to note that age perception can vary based on individual interpretation and cultural differences.", role='assistant', tool_calls=None))], request_id='8477887897592219907', id='8477887897592219907', usage=CompletionUsage(prompt_tokens=1070, completion_tokens=34, total_tokens=1104))