generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 425
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Problem Statement
Strands agents need the ability to learn from their workflow execution and operational experience. To do so, the Strands SDK need to address these two problems.
- Trajectory Data Utilization: Collect agent execution traces into training datasets for continuous model improvement.
- Trajectory-based Training: Fine-tune models using real agent execution data to build domain-specific agents that outperform generic models on customer tasks while reducing costs.
This enhancement extends the current model-driven framework towards learning-driven framework for continuous agent improvement.
Proposed Solution
Add training capabilities to the Strands Agent SDK to enable continuous learning through model fine-tuning on captured agent trajectories.This is done by integrating open source frameworks such as rLLM and veRL
Example API usage
from rllm.agents import StrandsAgent
from rllm.environments.tools.strands_env import StrandsEnv
from rllm.rewards.reward_fn import math_reward_fn
from rllm.trainer.agent_trainer import AgentTrainer
from strands_tools import python_repl, calculator
agent_args = {"tools": [calculator],
"system_prompt": "You are a helpful assistant."}
trainer = AgentTrainer(
agent_class=StrandsAgent,
env_class=StrandsEnv,
agent_args=agent_args,
env_args={"reward_fn": reward_function},
config=training_config,
train_dataset=dataset,
val_dataset=validation_dataset,
)
trainer.train()
Implementation Components
- Trajectory Capture System: Automatically record agent interactions, tool calls, and outcomes
- Training Environment: StrandsEnv wrapper that interfaces with existing agent execution engine and Strands SDK
- Training Pipeline: Integration with proven RL/SFT frameworks
- Reward Function Framework: Configurable reward systems for different domains
Use Case
Primary Use Cases
- Performance Improvement through Experience
- Learn which tool calls with which parameter values work better in which situations
- Improve the sequence/order of actions in workflows from final reward signals (success/failure)
- Get better at parsing and responding to specific types of user requests
- Cost Optimization through Specialized Models
- Train smaller, domain-specific models (e.g., 4B parameters) that can potentially match performance of large API models (e.g. 120B+) on specific tasks given sufficient trajectory and interaction data
- Replace remote API calls with faster local inference costs on the same network
- Reduce token usage through more efficient reasoning patterns learned from training data
- Operational Independence
- By using highly optimized/trained local models, it is possible to eliminate rate limiting constraints that throttle high-volume (multi-) agent deployments
- Business Continuity and Consistency: Avoid workflow disruptions or quality variations when external API providers update models, deprecate versions, or change APIs that break existing integrations
- Domain Specialization: Train agents for specific business contexts and agentic workflows
- Train agents on company-specific workflows, terminology, and success patterns
- Adapt reasoning approaches to specific industry contexts (e.g., financial analysis vs. code generation)
- Create specialized tool usage patterns based on actual operational requirements and constraints
Alternatives Solutions
Manual Prompt and Context Engineering:
- Pros: Current widely used approach, fine-grained control, no additional SDK complexity, immediate implementation
- Cons: Labor-intensive, requires manual tuning for each use case, static rules do not always adapt, no learning from real interaction data, context strategies do not improve based on actual performance outcomes
Additional Context
Technical Considerations
- Reward Design Complexity: Not all problems have easily quantifiable rewards
- Scope Limitations: Primarily beneficial for local model fine-tuning scenarios
- Infrastructure Requirements: Requires compute resources for training (SageMaker AI integration, etc.)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request