Rate limits going out faster than usual #14373
Replies: 3 comments 2 replies
-
|
Had the same experience on Business — the new GPT 5.4 models burn through quota noticeably faster than the previous generation. A few things that helped me stretch the weekly budget: 1. Task-tier your model usage. Not every request needs the medium/large model. Code formatting, lint fixes, simple refactors — those run fine on lighter models. Reserve the heavy models for architecture decisions and complex debugging. I've been using a routing layer that automatically picks the cheapest model that can handle each task, and it cut my API spend roughly in half without any quality loss I can notice. 2. Check your 3. Monitor actual per-task token usage. The weekly percentage is too coarse — one runaway agent task can eat 40% of your quota in a single session. The dashboard doesn't show per-task breakdowns, which makes it hard to diagnose where the burn is actually happening. For the routing/cost optimization piece, I ended up building InferCut — it sits between your code and the API, routes simple tasks to cheaper models, and passes through unchanged when cost reduction isn't possible. Zero risk, works with existing OpenAI keys. Mainly aimed at teams spending $500+/mo on LLM APIs but the free tier handles individual use too. The quota burn rate issue is real though — hope OpenAI gives us better per-task visibility soon. |
Beta Was this translation helpful? Give feedback.
-
|
One thing that helped me diagnose this was auditing my Codex session files across comparable time periods. I picked 4 sessions from a day a few months ago and 4 sessions from a recent day, then asked ChatGPT to audit and compare them: token usage, repeated loops, unnecessary retries, inefficient tool calls, excessive context loading, failed assumptions, etc. In my case, it helped me discover that some of the burn was not only the model itself, but also changes I had introduced into my workflow over time: new processes, extra instructions, skills, validation steps, and repo context that Codex was repeatedly consuming. So it may be worth checking whether the newer sessions are doing more hidden work than the old ones, even if the prompts feel similar from the outside. The weekly quota percentage is too coarse to understand what actually happened. |
Beta Was this translation helpful? Give feedback.
-
|
Before june 4 reset 100-150m tokens were 10-14% of $100 weekly allowance. After reset 150m day was 40% of my weekly allowance asking reddit and discord multiple people noticed the same issue. For I used same mode, same setting, limit got cut overnight. One user shown his token usage on same plan 280million token used, costing him 48% of his daily usage. That puts our $100 usage to about 300-500million. While some people have 1.5billion tokens on the same exact plan. I used to get 2.5billion with 2x promo and 1.2 after promo ended with $100. Before that I used $20 which had 245million per week using the same 5.5 Xhigh. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Has anybody experienced the same issue? I ran 1 request with gpt 5.4 medium, I had my weekly limit at 90% and after 15 minutes it was at 50% all of a sudden, I'm in the business plan
Beta Was this translation helpful? Give feedback.
All reactions