### Runtime setup
The following envs enable stable retries and quiet streaming.

- `SCILLM_DISABLE_AIOHTTP=1` (httpx-only async stability)
- `SCILLM_FORCE_HTTPX_STREAM=1`
- `LITELLM_MAX_RETRIES=3`, `LITELLM_RETRY_AFTER=1`, `LITELLM_TIMEOUT=45`
- Requires `tenacity` installed for backoff.

In [None]:
import os
os.environ.setdefault('SCILLM_DISABLE_AIOHTTP','1')
os.environ.setdefault('SCILLM_FORCE_HTTPX_STREAM','1')
os.environ.setdefault('LITELLM_MAX_RETRIES','3')
os.environ.setdefault('LITELLM_RETRY_AFTER','1')
os.environ.setdefault('LITELLM_TIMEOUT','45')
try:
    import tenacity  # noqa: F401
    print('tenacity: ok')
except Exception:
    print('tenacity missing — run: pip install tenacity')


# Chutes — OpenAI-Compatible

        Minimal chat using the OpenAI-compatible path.

## 1) Sync completion

        Minimal, blocking call using `scillm.completion`. Good for quick sanity.

In [None]:
import os
from scillm import completion
resp = completion(
  model=os.environ['CHUTES_MODEL'],
  api_base=os.environ['CHUTES_API_BASE'],
  api_key=os.environ['CHUTES_API_KEY'],
  custom_llm_provider='openai_like',
  messages=[{'role':'user','content':'Say OK'}],
  max_tokens=8,
  temperature=0,
)
print(resp.choices[0].message.get('content',''))

### Router + Fallbacks (Text) — Recommended
Use Router with a shared model_name for primary+alternates. Capacity (429/503) cools down the failing deployment and tries the next.

In [None]:
import os
from litellm import Router
# Env defaults (set once):
# SCILLM_CHUTES_CANONICALIZE_OPENAI_AUTH=1
# LITELLM_MAX_RETRIES=3 LITELLM_RETRY_AFTER=2
# SCILLM_COOLDOWN_429_S=120 SCILLM_RATE_LIMIT_QPS=2
router_text = Router(model_list=[
  {"model_name": "chutes/text",
   "litellm_params": {"custom_llm_provider": "openai_like",
     "model": os.environ['CHUTES_TEXT_MODEL'],
     "api_base": os.environ['CHUTES_API_BASE'],
     "api_key": os.environ['CHUTES_API_KEY'],
     "order": 1}},
  {"model_name": "chutes/text",
   "litellm_params": {"custom_llm_provider": "openai_like",
     "model": os.environ.get('CHUTES_TEXT_MODEL_ALT1',''),
     "api_base": os.environ['CHUTES_API_BASE'],
     "api_key": os.environ['CHUTES_API_KEY'],
     "order": 2}},
])
out = router_text.completion(
  model='chutes/text',
  messages=[{"role":"user","content":'Return only {\"ok\": true} as JSON.'}],
  response_format={"type":"json_object"},
)
print(out.choices[0].message.get('content',''))


### Auto Router (one‑liner) — Recommended for fallbacks
Why: eliminates manual ALT1/ALT2. Discovers models from your Chutes `/v1/models`, filters by capability (text/VLM; `require_json`/`require_tools`), and ranks by availability/utilization. You get automatic failover on capacity (429/503) using your standard env backoff settings.

How: set env once, then build a Router with a single call.
- `SCILLM_CHUTES_CANONICALIZE_OPENAI_AUTH=1`
- `LITELLM_MAX_RETRIES=3 LITELLM_RETRY_AFTER=2`
- `SCILLM_COOLDOWN_429_S=120 SCILLM_RATE_LIMIT_QPS=2`
- optional: `SCILLM_DISABLE_AIOHTTP=1 LITELLM_TIMEOUT=45`

Notes:
- Honors multiple bases via `CHUTES_API_BASE_1/CHUTES_API_KEY_1` (and _2, _3...) if present.
- Override odd model capabilities with `SCILLM_MODEL_CAPS_JSON`.
- Use `kind="text"` or `kind="vlm"`; add `require_json=True` to enforce JSON mode.


In [None]:
import os
from scillm.extras import auto_router_from_env

# Build a text Router with automatic fallbacks
router_text = auto_router_from_env(kind='text', require_json=True)
out = router_text.completion(
  model='chutes/text',
  messages=[{"role":"user","content":'Return only {\"ok\": true} as JSON.'}],
  response_format={"type":"json_object"},
)
print(out.choices[0].message.get('content',''))

# (Optional) Vision:
# router_vlm = auto_router_from_env(kind='vlm', require_json=True)
# out_v = router_vlm.completion(
#   model='chutes/vlm',
#   messages=[{"role":"user","content":[{"type":"text","text":'Return only {\"ok\": true} as JSON.'}]}],
#   response_format={"type":"json_object"},
# )
# print(out_v.choices[0].message.get('content',''))
