## LongCoT Cookbook with Seed1.5-VL

**Seed1.5-VL** is capable of generating both swift, succinct replies and in-depth responses featuring long Chain-of-Thought (LongCoT) reasoning. Users can switch the model's operational mode to suit their specific needs by configuring a single parameter. 

| Field Name | Field Type | Description | Default Value |
| ---- | ---- | ---- | ---- |
| thinking | Object | Configuration of the thinking mode |  |
| &emsp;└─ type | String | Supports two parameters: enabled, disabled.<br><br>**•** Use 'enabled' to turn on thinking mode and 'disabled' to turn off. An error will be reported if other strings are passed.| enabled<br><br>The thinking mode is enabled by default. |

In [None]:
# Copyright (c) 2025 Bytedance Ltd. and/or its affiliates
# SPDX-License-Identifier: Apache-2.0
import os
import base64
import requests
from openai import OpenAI

### 0. Obtain the API Key
register and setup a service: https://www.volcengine.com/product/doubao

#### http post method

In [None]:
# Please set the API key here
api_key = "your api key"
api_url = "https://ark.cn-beijing.volces.com/api/v3/chat/completions"
model = "doubao-1-5-thinking-vision-pro-250428"

In [None]:

# encode image into base64 encoding
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        image = base64.b64encode(image_file.read()).decode('utf-8')
    return image
def inference_image(text_content, image_path, enable_thinking_mode='enabled'):
    headers = {
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json'
    }
    base64_image = encode_image(image_path)
    image_format = image_path.split('.')[-1]
    assert image_format in ['jpg', 'jpeg', 'png', 'webp']
    data = {
        "model": model,
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": text_content
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/{image_format};base64,{base64_image}"
                        }
                    }
                ]
            }
        ],
        "thinking": {"type": enable_thinking_mode}
    }
    
    response = requests.post(api_url, headers=headers, json=data)
    
    if response.status_code == 200:
        return response.json()["choices"][0]
    else:
        return {
            "error": f"Request failed with status code {response.status_code}",
            "details": response.text
        }


##### 1. LongCoT Mode
In the LongCoT mode, the model will think before answering, enabling it to handle more complex and difficult tasks.
LongCoT mode is enabled by default for the **Seed1.5-VL** APIs.

In [None]:
# example
image_path = "samples/001.png"
text_prompts = "can you solve this Rebus puzzle?"

result = inference_image(text_prompts, image_path)
print("Seed1.5-VL:", result["message"]["content"])


##### 2. Non-LongCoT Mode
In the Non-LongCoT mode, the model will present more swift, succinct replies.

In [None]:
# example
image_path = "samples/001.png"
text_prompts = "can you solve this Rebus puzzle?"

result = inference_image(text_prompts, image_path, enable_thinking_mode="disabled")
print("Seed1.5-VL:", result["message"]["content"])


#### openai sdk method (not supported yet)

In [None]:
# Please set the API key here
os.environ['OPENAI_API_KEY'] = 'your api key'
seed_vl_version = "doubao-1-5-thinking-vision-pro-250428"
client = OpenAI(
    base_url="https://ark.cn-beijing.volces.com/api/v3",
    api_key=os.environ.get("OPENAI_API_KEY"),
)

In [None]:
# encode image into base64 encoding
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        image = base64.b64encode(image_file.read()).decode('utf-8')
    return image

In [None]:
def inference_image(prompt, image_path, enable_thinking_mode='enabled'):
    base64_image = encode_image(image_path)
    image_format = image_path.split('.')[-1]
    assert image_format in ['jpg', 'jpeg', 'png', 'webp']
    
    response = client.chat.completions.create(
    model=seed_vl_version,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/{image_format};base64,{base64_image}"
                    },
                },
                {"type": "text", "text": prompt},
            ],
        }
    ],
    thinking={
        "type": enable_thinking_mode
    }
    )
    return response.choices[0]

##### 1. LongCoT Mode
In the LongCoT mode, the model will think before answering, enabling it to handle more complex and difficult tasks.
LongCoT mode is enabled by default for the **Seed1.5-VL** APIs.

In [None]:
image_path = "samples/001.png"
text_prompts = "can you solve this Rebus puzzle?"

In [None]:
result = inference_image(text_prompts, image_path) # enable_thinking_mode is set to 'enabled' by default
print("Seed1.5-VL:", result.message.content)

##### 2. Non-LongCoT Mode
In the Non-LongCoT mode, the model will present more swift, succinct replies.

In [None]:
image_path = "samples/001.jpeg"
text_prompts = "can you solve this Rebus puzzle?"

In [None]:
result = inference_image(text_prompts, image_path, enable_thinking_mode='disabled')
print("Seed1.5-VL:", result.message.content)