## Better performance from reasoning models using the Responses API 

We've recently released two new state-of-the-art reasoning models, o3 and o4-mini, that excel at combining reasoning capabilities with agentic tool use. What a lot of folks don't know is that you can improve their performance by fully leveraging our (relatively) new Responses API. This cookbook aims to demonstrate how you might be able to get the most of the two models and dive a little deeper on the details on how reasoning and function calling works for these models behind the scenes. By giving the model access to previous reasoning items, we can ensure make sure it is operating at maximum model intelligence and lowest cost. 


We've introduced the Responses API during its launch with a separate [cookbook](https://cookbook.openai.com/examples/responses_api/responses_example) along with the [API reference](https://platform.openai.com/docs/api-reference/responses). The short takeaway is that by design the Responses API isn't that different from the Completions API with a few improvements and added features. We've recently rolled out encrypted content for Responses, which we will also get into here, which will make it even more useful for folks who cannoot use Responses API in a stateful way!

## How Reasoning Models work

Before we dive into how Responses API can help us, it is useful for us to first review how [reasoning models](https://platform.openai.com/docs/guides/reasoning?api-mode=responses) work behind the scenes. Reasoning models like o3 and o4-mini takes time to think through a problem before answering. Through this thinking process, the model is able to break a complex problem down and work it through step by step, increasing its performance on these tasks. During the thinking process, the models produces a long internal chain of thought that encodes the reasoning logic for the problem. For safety reasons, the reasoning tokens are only exposed to end suers in summarized rather than raw forms. 

In a multistep conversation, the reasoning tokens are discarded after each turn while input and output tokens from each step are fed into the next

![reasoning-context](../../images/reasoning-tokens.png)
Diagram borrowed from our [doc](https://platform.openai.com/docs/guides/reasoning?api-mode=responses#how-reasoning-works)

Let use examine the response object returned 

In [2]:
from openai import OpenAI
import os
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

In [3]:
response = client.responses.create(
    model="o4-mini",
    input="tell me a joke",
)


In [9]:
import json

print(json.dumps(response.model_dump(), indent=2))


{
  "id": "resp_6820f382ee1c8191bc096bee70894d040ac5ba57aafcbac7",
  "created_at": 1746989954.0,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "metadata": {},
  "model": "o4-mini-2025-04-16",
  "object": "response",
  "output": [
    {
      "id": "rs_6820f383d7c08191846711c5df8233bc0ac5ba57aafcbac7",
      "summary": [],
      "type": "reasoning",
      "status": null
    },
    {
      "id": "msg_6820f3854688819187769ff582b170a60ac5ba57aafcbac7",
      "content": [
        {
          "annotations": [],
          "text": "Why don\u2019t scientists trust atoms?  \nBecause they make up everything!",
          "type": "output_text"
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 1.0,
  "tool_choice": "auto",
  "tools": [],
  "top_p": 1.0,
  "max_output_tokens": null,
  "previous_response_id": null,
  "reasoning": {
    "effort": "medium",
    "generate

You can see that from the json dump of the response object, that in addition to the `output_text`, there is a reasoning item that was also produced from this single API call. This represent the reasoning tokens produced by the model. By defualt, it is exposed as an id, in this instance here it is `rs_6820f383d7c08191846711c5df8233bc0ac5ba57aafcbac7`. Since the Responses API is stateful as well, the reasoning token is persisted - all you have to do is to include these items along with their associated id's in subsequent messages for subsequent response to have access to the same reasoning items. If you use `previous_response_id` for multi-turn conversations, the model will also have access to all the reasoning items produced previously.

Note, you can see how much reasoning token the model has produced from this response. With a total # of 10 input tokens, we produced 148 output tokens, of which 128 are reasoning tokens that you don't see from the final assistant message.

But wait! From the above diagram, didn't you say that reasoning from previous turns are discarded? Then why does passing it back in matter for subsequent turns? 

If you've been paying attention, you probably have that question. That is a great question -- For normal multi-turn conversations, the inclusion of reasoning items and tokens are not necessary - the model is trained so that it does not need the reasoning tokens from previous turns to produce the best output. This changes when we consider the possibility of tool use. When we talk about a single turn, the turn may include function calls as well - despite the fact that it may involve an additional round trip outside of the API. In this instance, it is necessary to include the reasoning items (either via `previous_response_id` or explicitly including the reasoning item in `input`). To illustrate this, let's cook up a quick function calling example.

In [14]:
import requests

def get_weather(latitude, longitude):
    response = requests.get(f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m")
    data = response.json()
    return data['current']['temperature_2m']


tools = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current temperature for provided coordinates in celsius.",
    "parameters": {
        "type": "object",
        "properties": {
            "latitude": {"type": "number"},
            "longitude": {"type": "number"}
        },
        "required": ["latitude", "longitude"],
        "additionalProperties": False
    },
    "strict": True
}]

context = [{"role": "user", "content": "What's the weather like in Paris today?"}]

response = client.responses.create(
    model="o4-mini",
    input=context,
    tools=tools,
)


response.output

[ResponseReasoningItem(id='rs_68210c71a95c81919cc44afadb9d220400c77cc15fd2f785', summary=[], type='reasoning', status=None),
 ResponseFunctionToolCall(arguments='{"latitude":48.8566,"longitude":2.3522}', call_id='call_9ylqPOZUyFEwhxvBwgpNDqPT', name='get_weather', type='function_call', id='fc_68210c78357c8191977197499d5de6ca00c77cc15fd2f785', status='completed')]

Here we see that after reasoning for a bit, the o4-mini has decided that it needs additional information which it can obtain from calling a function, which we can go ahead and call and pass the output back to the model. The important thing to note here is that in order for the model have the maximum intelligence, we need to pass the reasoning item back, which one call do simply by adding all of the output back into the context being passed back.

In [15]:
context += response.output # Add the response to the context (including the reasoning item)

tool_call = response.output[1]
args = json.loads(tool_call.arguments)


# calling the function
result = get_weather(args["latitude"], args["longitude"]) 

context.append({                               
    "type": "function_call_output",
    "call_id": tool_call.call_id,
    "output": str(result)
})

# we are calling the api again with the added function call output. Note that while this is another API call, we consider this as a single turn in the conversation.
response_2 = client.responses.create(
    model="o4-mini",
    input=context,
    tools=tools,
)

print(response_2.output_text)

The current temperature in Paris is 16.3°C. If you’d like more details—like humidity, wind speed, or a brief description of the sky—just let me know!


It is hard to illustrate the improved model intelligence in this toy example since the model will probably still do the right thing with or without the reasoning item being included so we ran some tests ourselves: In a more comprehensive benchmark like SWE-bench, we were able to get about **3% improvement** by including the reasoning items for the same prompt and setup.

## Caching
As illustrated above, reasoning models produce both reasoning tokens and completion tokens that are treatead different in the API today. This also has implications for cache utilization and latency. To illustrate the point, we include this helpful sketch.


![reasoning-context](../../images/responses_cache.png)


Note that in turn 2, reasoning items from turn 1 will be ignored and stripped since the model does not reuse reasoning items from previous turns, which is why it is impossible to get a full cache hit on the fourth API call in the diagram above as the prompt now exclude the reasoning items. That being said, we can still include them without harm as the API will strip reasoning items that are irrelevant in the current turn automatically. Keep in mind that cacheing will only become relevant for prompts that are longer than 1024 tokens in length. In our tests, we were able to get cache utilization to go from 40% to 80% of the input prompt by moving from Completions to Responses API. With better cache utilization comes better economics as cached tokens get billed significantly less than uncached ones: for `o4-mini`, cached input tokens are 75% cheaper than uncached input tokens. It will also improve latency as well. 

## Encrypted Reasoning Items

For organizations who cannot use Responses API in a stateful way due to compliance and data requirement constraints (e.g if your organization is under [Zero Data Retention](https://openai.com/enterprise-privacy/)), we've recently rolled out [encrypted reasoning items](https://platform.openai.com/docs/guides/reasoning?api-mode=responses#encrypted-reasoning-items) which allows you reap all the benefits mentioned above while continuing the use the responses API in a stateless way.

To leverage this, all you have to do is to include `[ "reasoning.encrypted_content" ]` as a part of the `include` field. By doing so we will pass an encrypted version of the reaasoning tokens to you that you can then pass back just like how you pass back reasoning items before.

If your org is under Zero Data Retention (ZDR), OpenAI automatically enforces `store=false` settings at the API level. When a user’s request comes into the Responses API, we first check for any `encrypted_content` included in the payload. If present, this content is decrypted in-memory using keys to which only OpenAI has access. This decrypted reasoning content (i.e., chain-of-thought) is never written to disk and is used solely to inform the model’s next response. Once the model generates its output, any new reasoning tokens it produces are immediately encrypted and returned to the client as part of the response payload. At that point, all transient data from the request—including both decrypted inputs and model outputs—is securely discarded. No intermediate state is persisted to disk, ensuring full compliance with ZDR.

Here is a quick modified version of the above code snippet to demonstrate this

In [39]:
context = [{"role": "user", "content": "What's the weather like in Paris today?"}]

response = client.responses.create(
    model="o3",
    input=context,
    tools=tools,
    store=False, #store=false, just like how ZDR is enforced
    include=["reasoning.encrypted_content"] # Encrypted chain of thought is passed back in the response
)

In [34]:
# take a look at the encrypted reasoning item
print(response.output[0]) 

ResponseReasoningItem(id='rs_6821243503d481919e1b385c2a154d5103d2cbc5a14f3696', summary=[], type='reasoning', status=None, encrypted_content='gAAAAABoISQ24OyVRYbkYfukdJoqdzWT-3uiErKInHDC-lgAaXeky44N77j7aibc2elHISjAvX7OmUwMU1r7NgaiHSVWL5BtWgXVBp4BMFkWZpXpZY7ff5pdPFnW3VieuF2cSo8Ay7tJ4aThGUnXkNM5QJqk6_u5jwd-W9cTHjucw9ATGfGqD2qHrXyj6NEW9RmpWHV2SK41d5TpUYdN0xSuIUP98HBVZ2VGgD4MIocUm6Lx0xhRl9KUx19f7w4Sn7SCpKUQ0zwXze8UsQOVvv1HQxk_yDosbIg1SylEj38H-DNLil6yUFlWI4vGWcPn1bALXphTR2EwYVR52nD1rCFEORUd7prS99i18MUMSAhghIVv9OrpbjmfxJh8bSQaHu1ZDTMWcfC58H3i8KnogmI7V_h2TKAiLTgSQIkYRHnV3hz1XwaUqYAIhBvP6c5UxX-j_tpYpB_XCpD886L0XyJxCmfr9cwitipOhHr8zfLVwMI4ULu-P3ftw7QckIVzf71HFLNixrQlkdgTn-zM6aQl5BZcJgwwn3ylJ5ji4DQTS1H3AiTrFsEt4kyiBcE2d7tYA_m3G8L-e4-TuTDdJZtLaz-q8J12onFaKknGSyU6px8Ki4IPqnWIJw8SaFMJ5fSUYJO__myhp7lbbQwuOZHIQuvKutM-QUuR0cHus_HtfWtZZksqvVCVNBYViBxD2_KvKJvR-nN62zZ8sNiydIclt1yJfIMkiRErfRTzv92hQaUtdqz80UiW7FBcN2Lnzt8awXCz1pnGyWy_hNQe8C7W35zRxJDwFdb-f3VpanJT0tNmU5bfEWSXcIVmiMZL1clwzVNryf9Gk482LaWPwhVYrh

Wtih `include=["reasoning.encrypted_content"]` set, we now see a `encrypted_content` field in the reasoning item being passed back, this encrypted content represent the model's reasoning state. persisted entirely on the client side with OpenAI retaining no data. We can then pass this back like how we did with the reasoning item like before.

In [40]:
context += response.output # Add the response to the context (including the encrypted chain of thought)
tool_call = response.output[1]
args = json.loads(tool_call.arguments)



result = 20 #mocking the result of the function call

context.append({                               
    "type": "function_call_output",
    "call_id": tool_call.call_id,
    "output": str(result)
})

response_2 = client.responses.create(
    model="o3",
    input=context,
    tools=tools,
    store=False,
    include=["reasoning.encrypted_content"]
)

print(response_2.output_text)

It’s currently about 20 °C in Paris.


With a simple change to the `include` field, we can now pass back the encrypted reasoning item and use it to improve the model's performance in intelligence, cost and latency.

Now you should be fully equipped with knowledge to be able to fully utilize our latest reasoning models!