Skip to content

Conversation

@ultmaster
Copy link
Contributor

@ultmaster ultmaster commented Nov 21, 2025

Support Tinker by $150 with personal Tinker API key.

Support Azure via Managed Identity.

Copilot AI review requested due to automatic review settings November 21, 2025 17:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds GitHub Actions workflows to automate CI testing for Tinker and Azure examples. Key improvements include parameterizable test configurations with --ci flags for faster verification, fixed dependency versions for reproducibility, enhanced timeout configurations, and improved error handling.

  • Adds automated CI workflows for Tinker and Azure examples
  • Introduces --ci mode for faster test execution with smaller datasets
  • Pins crewai version and removes git commit hash from tinker-cookbook dependency
  • Improves LiteLLM configuration with longer timeouts and better error handling

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
uv.lock Pins crewai to 1.2.0, unpins tinker-cookbook commit, adds new dependencies (ipython, pyvis, etc.)
scripts/litellm_run.sh Makes port configurable via CLI argument with proper variable substitution
pyproject.toml Pins crewai to ==1.2.0, removes git rev from tinker-cookbook
examples/tinker/tests/test_tinker_llm.py Adds threaded store, launch mode config, and trajectory assertions
examples/tinker/q20_train.py Adds CI mode support with smaller datasets and batch sizes
examples/tinker/q20_evaluate.py Adds CI mode with sample limits, timeout increases, and result validation
examples/tinker/hello.py Adds CI mode with reduced dataset sizes
examples/tinker/agl_tinker/llm.py Switches to tinker tokenizer utility, adds timeout config, fixes callback setting
examples/azure/train_capital_agent.py Adds CLI args for iterations and cleanup, adds try-finally for cleanup
examples/azure/aoai_finetune.py Updates to new Azure API format, adds cleanup_deployments method
.github/workflows/examples-tinker.yml New workflow for Tinker CI with multiple test scenarios
.github/workflows/examples-azure.yml New workflow for Azure OpenAI examples
.github/workflows/benchmark.yml Placeholder benchmark workflow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +318 to +320
# If not adding return token ids, we need to add the opentelemetry callback.
# Otherwise, we set it to default.
callbacks=["opentelemetry"] if not add_return_token_ids else None,
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment logic is inverted or unclear. The code sets callbacks to ['opentelemetry'] when add_return_token_ids is False, but the comment says 'If not adding return token ids, we need to add the opentelemetry callback. Otherwise, we set it to default.' This suggests the callback should be added when NOT adding token ids, which matches the code. However, the original code at line 309 was callbacks=['opentelemetry'] if add_return_token_ids else None, which is the opposite. Either clarify the comment to explain why the logic was inverted, or verify this is the correct behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +379 to 388
method={
"type": "supervised",
"supervised": {
"hyperparameters": {
"batch_size": self.finetune_batch_size,
"learning_rate_multiplier": self.finetune_learning_rate,
"n_epochs": self.finetune_epochs,
}
},
},
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API structure change from hyperparameters as a direct parameter to nested under method.supervised.hyperparameters appears to be a breaking change to the Azure OpenAI API. Verify this is compatible with the API version specified in the workflow (AZURE_OPENAI_API_VERSION: 2025-04-01-preview) and consider documenting the minimum required API version.

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +154
sleep 5
python q20_train.py runner --n-runners 4 &
sleep 5
python q20_train.py algo --model qwen4b --ci
sleep 5

pkill -f agl && echo "SIGTERM sent to agl" || echo "No agl process found"
while pgrep -f agl; do
echo "Waiting for agl to finish..."
sleep 5
done
pkill -f q20_train.py && echo "SIGTERM sent to q20_train.py" || echo "No q20_train.py process found"
while pgrep -f q20_train.py; do
echo "Waiting for q20_train.py to finish..."
sleep 5
done
echo "q20_train.py has finished."
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process cleanup logic uses pkill -f with broad pattern matching (agl and q20_train.py) which could inadvertently kill unrelated processes on a shared runner. Consider using more specific process identifiers, storing PIDs from the background jobs, or adding a unique identifier to the commands to ensure only the correct processes are terminated.

Suggested change
sleep 5
python q20_train.py runner --n-runners 4 &
sleep 5
python q20_train.py algo --model qwen4b --ci
sleep 5
pkill -f agl && echo "SIGTERM sent to agl" || echo "No agl process found"
while pgrep -f agl; do
echo "Waiting for agl to finish..."
sleep 5
done
pkill -f q20_train.py && echo "SIGTERM sent to q20_train.py" || echo "No q20_train.py process found"
while pgrep -f q20_train.py; do
echo "Waiting for q20_train.py to finish..."
sleep 5
done
echo "q20_train.py has finished."
AGL_PID=$!
sleep 5
python q20_train.py runner --n-runners 4 &
RUNNER_PID=$!
sleep 5
python q20_train.py algo --model qwen4b --ci
sleep 5
# Cleanup: terminate background processes by PID
if kill -0 "$AGL_PID" 2>/dev/null; then
kill "$AGL_PID" && echo "SIGTERM sent to agl (PID $AGL_PID)" || echo "Failed to send SIGTERM to agl"
else
echo "No agl process found (PID $AGL_PID not running)"
fi
while kill -0 "$AGL_PID" 2>/dev/null; do
echo "Waiting for agl (PID $AGL_PID) to finish..."
sleep 5
done
if kill -0 "$RUNNER_PID" 2>/dev/null; then
kill "$RUNNER_PID" && echo "SIGTERM sent to q20_train.py runner (PID $RUNNER_PID)" || echo "Failed to send SIGTERM to q20_train.py runner"
else
echo "No q20_train.py runner process found (PID $RUNNER_PID not running)"
fi
while kill -0 "$RUNNER_PID" 2>/dev/null; do
echo "Waiting for q20_train.py runner (PID $RUNNER_PID) to finish..."
sleep 5
done
echo "q20_train.py runner has finished."

Copilot uses AI. Check for mistakes.
@ultmaster ultmaster merged commit 4cf8fb9 into main Nov 21, 2025
14 checks passed
totoluo pushed a commit to totoluo/agent-lightning that referenced this pull request Dec 6, 2025
totoluo pushed a commit to totoluo/agent-lightning that referenced this pull request Dec 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants