Skip to content
View sitianjia's full-sized avatar
🏠
Working from home
🏠
Working from home
  • AI Lab
  • Shanghai, China
  • 03:39 (UTC -10:00)

Block or report sitianjia

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sitianjia/README.md

Ma Xiaoming

Applied AI researcher. Most of my work lives in the gap between agent demos and agents that don't fall over on Friday night.

Shanghai focus


What I work on

LLM agents in production — the part nobody tweets about. Picking the right tools out of fifty. Evaluating tool-use without LLM-judge wash. Recording traces that survive a postmortem. Building flows you can resume after a worker dies on step 7.

The four repos pinned here are pieces of the same picture: how to take an agent from "works in a notebook" to "shipped, observable, and debuggable".

What I'm thinking about

  • Why "demo-good" and "prod-good" are different problems, and how the gap shows up empirically
  • Cheap routing layers that protect the LLM from its own option-explosion problem
  • Trace formats that one engineer can read and one machine can grep
  • Checkpointing patterns for agents whose tool calls cost real money

Tools I keep reaching for

Python · PyTorch · vLLM · OpenAI / Anthropic / Qwen SDKs · Pydantic · pytest · Jinja2 · Rich · Docker · tmux · a stubborn refusal to add a database

Pinned

  • agent-eval-kit — YAML cases, replayable traces, deterministic checks for tool-using agents
  • tool-router — pick K tools before they ever hit the LLM
  • agent-tape — structured trace recorder + replay; jsonl on disk, no service
  • flowmind — declarative agent flows in YAML, with checkpointing

Activity

Graph


Not on Twitter. Not looking for a job. PRs welcome, issues even more welcome — I read all of them.

Pinned Loading

  1. agent-eval-kit agent-eval-kit Public

    A small, opinionated eval harness for tool-using LLM agents — YAML cases, deterministic checks, replayable traces.

    Python 1

  2. agent-tape agent-tape Public

    Structured trace recorder and replay for LLM agents — jsonl on disk, no service.

    Python 1

  3. flowmind flowmind Public

    Declarative agent flows in YAML, with checkpointing and resume — diff-able and hot-swappable.

    Python 1

  4. tool-router tool-router Public

    Pick K tools before they hit the LLM — cheap routing layer between user query and agent.

    Python 1