> Tip: For single‑model, dataset‑safe retries (keeps calling until capacity clears), see the companion notebook: `15_tenacious_single_model.ipynb`.


# 00 Quickstart — SciLLM + Chutes (JSON)

Minimal, first‑success notebook. Uses Auto Router to discover a healthy candidate and returns strict JSON. Set env in your shell before running:

- `SCILLM_CHUTES_CANONICALIZE_OPENAI_AUTH=1`
- `LITELLM_MAX_RETRIES=3 LITELLM_RETRY_AFTER=2`
- `SCILLM_COOLDOWN_429_S=120 SCILLM_RATE_LIMIT_QPS=2`
- optional: `SCILLM_DISABLE_AIOHTTP=1 LITELLM_TIMEOUT=45`

Requires: `CHUTES_API_BASE`, `CHUTES_API_KEY`.


In [None]:
import os
from scillm.extras import auto_router_from_env
base = os.getenv('CHUTES_API_BASE'); key = os.getenv('CHUTES_API_KEY')
if not (base and key):
    print('Missing CHUTES_API_BASE/CHUTES_API_KEY — set them in your shell and rerun.')
else:
    router = auto_router_from_env(kind='text', require_json=True)
    out = router.completion(
        model='chutes/text',
        messages=[{'role':'user','content':'Return only {\"ok\": true} as JSON.'}],
        response_format={'type':'json_object'},
    )
    print(out.choices[0].message.get('content',''))


### Troubleshooting — curl header check (DevOps)
Validates your tenant accepts either Bearer or x-api-key on /v1/models. Prints HTTP status lines and a short ID slice.

In [None]:
import os, json, subprocess
base=os.getenv('CHUTES_API_BASE'); key=os.getenv('CHUTES_API_KEY')
if not (base and key):
    print('Missing CHUTES_API_BASE/CHUTES_API_KEY — set and rerun.')
else:
    print('Bearer:')
    cmd="curl -sS -D - -o - -H 'authorization: Bearer %s' '%s/models'" % (key, base)
    r=subprocess.run(cmd, shell=True, capture_output=True, text=True)
    head=r.stderr.splitlines()[:1] if r.stderr else []
    print(head[0] if head else 'HTTP/2 ?')
    try:
        data=json.loads(r.stdout)
        ids=[d.get('id','') for d in data.get('data',[])][:8]
        print('ids:', ids)
    except Exception:
        print('body bytes:', len(r.stdout))
    print('
'+'x-api-key:')
    cmd="curl -sS -D - -o - -H 'x-api-key: %s' '%s/models'" % (key, base)
    r=subprocess.run(cmd, shell=True, capture_output=True, text=True)
    head=r.stderr.splitlines()[:1] if r.stderr else []
    print(head[0] if head else 'HTTP/2 ?')
    try:
        data=json.loads(r.stdout)
        ids=[d.get('id','') for d in data.get('data',[])][:8]
        print('ids:', ids)
    except Exception:
        print('body bytes:', len(r.stdout))


### Standard Calls Overview

This notebook shows multiple standard ways to make calls:
- Text JSON (direct OpenAI‑compatible)
- Inline multimodal (image_url)
- Small batch (sequential, safe)
- Router one‑liner (already above)


### Text JSON (direct OpenAI‑compatible)

Minimal strict‑JSON call against your OpenAI‑compatible base. Uses response_format to enforce JSON.


In [None]:
import os
from scillm import completion
base=os.getenv('CHUTES_API_BASE'); key=os.getenv('CHUTES_API_KEY'); text_model=os.getenv('CHUTES_TEXT_MODEL')
if not (base and key and text_model):
    print('Missing CHUTES_API_BASE/CHUTES_API_KEY/CHUTES_TEXT_MODEL — set and rerun.')
else:
    resp = completion(
      model=text_model, api_base=base, api_key=key, custom_llm_provider='openai_like',
      messages=[{'role':'user','content':'Return only {\"ok\": true} as JSON.'}],
      response_format={'type':'json_object'}, temperature=0, max_tokens=16,
    )
    print(resp.choices[0].message.get('content',''))


### Inline Multimodal (image_url)

Send an OpenAI‑style content list with an `image_url` part. This works for VLM models on compatible gateways.


In [None]:
import os
from scillm import completion
base=os.getenv('CHUTES_API_BASE'); key=os.getenv('CHUTES_API_KEY'); vlm_model=os.getenv('CHUTES_VLM_MODEL','')
if not (base and key and vlm_model):
    print('Missing CHUTES_* VLM vars — set CHUTES_VLM_MODEL and rerun.')
else:
    msg=[{'type':'text','text':'Return only {\"ok\": true} as JSON.'},
         {'type':'image_url','image_url':{'url':'https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Fronalpstock_big.jpg/640px-Fronalpstock_big.jpg'}}]
    resp = completion(
      model=vlm_model, api_base=base, api_key=key, custom_llm_provider='openai_like',
      messages=[{'role':'user','content': msg}],
      response_format={'type':'json_object'}, temperature=0, max_tokens=32,
    )
    print(resp.choices[0].message.get('content',''))


### Small Batch (sequential, safe)

Demonstrates a tiny sequential batch for strict JSON prompts. Use Router for production batch with fallbacks and backoff.


In [None]:
import os
from scillm import completion
base=os.getenv('CHUTES_API_BASE'); key=os.getenv('CHUTES_API_KEY'); text_model=os.getenv('CHUTES_TEXT_MODEL')
prompts=[
  'Return only {\"ok\": true} as JSON.',
  'Return only {\"ok\": \"batch\"} as JSON.'
]
outs=[]
if not (base and key and text_model):
    print('Missing CHUTES envs — set and rerun.')
else:
    for p in prompts:
        r = completion(
          model=text_model, api_base=base, api_key=key, custom_llm_provider='openai_like',
          messages=[{'role':'user','content': p}], response_format={'type':'json_object'},
          temperature=0, max_tokens=16,
        )
        outs.append(r.choices[0].message.get('content',''))
    print(outs)


### Codex‑Agent (OpenAI‑compatible) — JSON call

Most common SciLLM path for reasoning tasks. Points at a Codex agent base (OpenAI‑compatible), returns strict JSON.
Set in your shell: `CODEX_AGENT_API_BASE`, `CODEX_AGENT_API_KEY`. Optional: `CODEX_AGENT_MODEL` (default 'gpt-5' alias).


In [None]:
import os
from scillm import completion
cbase=os.getenv('CODEX_AGENT_API_BASE'); ckey=os.getenv('CODEX_AGENT_API_KEY'); cmodel=os.getenv('CODEX_AGENT_MODEL','gpt-5')
if not (cbase and ckey):
    print('Missing CODEX_AGENT_API_BASE/CODEX_AGENT_API_KEY — set and rerun.')
else:
    out = completion(
      model=cmodel, api_base=cbase, api_key=ckey, custom_llm_provider='codex-agent',
      messages=[{'role':'user','content':'Return only {\"ok\": true} as JSON.'}],
      response_format={'type':'json_object'}, reasoning_effort='high', temperature=0, max_tokens=64,
    )
    print(out.choices[0].message.get('content',''))
