### Why/When/Gotchas/Troubleshooting
- Why: stream partial tokens for interactive UIs.
- When: long responses, live rendering.
- Gotchas: notebook event loops — use nest_asyncio or loop-aware pattern; throttle to a few chunks.
- Troubleshooting: set SCILLM_ALLOW_STREAM_SMOKE=1 to execute; check Bearer header + model supports stream.

### Runtime setup
The following envs enable stable retries and quiet streaming.

- `SCILLM_FORCE_HTTPX_STREAM=1`
- `LITELLM_MAX_RETRIES=3`, `LITELLM_RETRY_AFTER=1`, `LITELLM_TIMEOUT=45`
- Requires `tenacity` installed for backoff.

In [None]:
import os
os.environ.setdefault('SCILLM_FORCE_HTTPX_STREAM','1')
os.environ.setdefault('LITELLM_MAX_RETRIES','3')
os.environ.setdefault('LITELLM_RETRY_AFTER','1')
os.environ.setdefault('LITELLM_TIMEOUT','45')
try:
    import tenacity  # noqa: F401
    print('tenacity: ok')
except Exception:
    print('tenacity missing — run: pip install tenacity')


In [None]:
import os, asyncio
from scillm import acompletion
async def main():
  stream = await acompletion(
    model=os.environ.get('CHUTES_VLM_MODEL', os.environ['CHUTES_MODEL']),
    api_base=os.environ['CHUTES_API_BASE'],
    api_key=None,
    custom_llm_provider='openai_like',
    extra_headers={'Authorization': f"Bearer {os.environ['CHUTES_API_KEY']}"},
    messages=[{'role':'user','content':'In one word, say OK'}],
    temperature=0,
    max_tokens=8,
    stream=True,
    timeout=30,
  )
  try:
    count=0
    async for ev in stream:
      d = getattr(ev,'delta',None) or ev
      text = (d.get('content') if isinstance(d,dict) else getattr(d,'content',None)) or ''
      if text:
        print(text,end='')
        count += 1
        if count >= 2:
          break
  finally:
    if hasattr(stream,'aclose'):
      await stream.aclose()
  print()

asyncio.run(main())



### Troubleshooting with curl (Chutes)

If a SciLLM call fails, verify the tenant directly:

1) List models
```bash
curl -sS -H "Authorization: Bearer $CHUTES_API_KEY"   "$CHUTES_API_BASE/models" | jq '. | {count:(.data//[])|length} // length'
```

2) Minimal JSON chat (non-stream)
```bash
curl -sS -X POST -H "Authorization: Bearer $CHUTES_API_KEY"   -H "Content-Type: application/json"   "$CHUTES_API_BASE/chat/completions"   -d '{
    "model": "'$CHUTES_TEXT_MODEL'",
    "messages": [{"role":"user","content":"Return only {"ok":true} as JSON."}],
    "response_format": {"type":"json_object"},
    "max_tokens": 16,
    "temperature": 0
  }' | jq '.choices[0].message.content // empty'
```

3) Streaming (text) — watch for data: lines
```bash
curl -sN -X POST -H "Authorization: Bearer $CHUTES_API_KEY"   -H "Content-Type: application/json"   "$CHUTES_API_BASE/chat/completions"   -d '{
    "model": "'$CHUTES_TEXT_MODEL'",
    "messages": [{"role":"user","content":"Tell me a 10 word story."}],
    "stream": true
  }'
```

4) Multimodal (image_url)
```bash
IMG_URL=${SCILLM_DEMO_IMAGE:-https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Fronalpstock_big.jpg/320px-Fronalpstock_big.jpg}
curl -sS -X POST -H "Authorization: Bearer $CHUTES_API_KEY"   -H "Content-Type: application/json"   "$CHUTES_API_BASE/chat/completions"   -d '{
    "model": "'$CHUTES_VLM_MODEL'",
    "messages": [{
      "role":"user",
      "content": [
        {"type":"text","text":"Say OK and a color in the image."},
        {"type":"image_url","image_url": {"url": "'"$IMG_URL"'"}}
      ]
    }],
    "max_tokens": 32,
    "temperature": 0
  }' | jq '.choices[0].message.content // empty'
```
