What are the best practices for the integration of ACE components? #11

kaleyroy · 2025-10-29T08:10:45Z

kaleyroy
Oct 29, 2025

Hello everyone,
I have read the ACE framework documentation and got some questions about its practical application.

The main question is:

Currently, the ACE framework implementation involves the following three core components:

Generator - Executes tasks using learned strategies from the playbook

Reflector - Analyzes what worked and what didn't after each execution

Curator - Updates the playbook with new strategies based on reflection

In real-world Agent development, how should these three components be used together? What are the best practices?

Using Offline Adaptation as an example to list my understanding and questions

STEP-1: Adaptation

Sample → Generator (produces answer) → Environment (evaluates) → Reflector (analyzes) → Curator (updates playbook)

REPEAT [STEP-1]

Until all samples are processed ,after then we save the playbook for future use

STEP-2: Generation

Load playbook → New Sample(+ context) → Generator (produces answer)

My Understanding

Before Agent deployment, we use existing data for training (Adaptation), build the Playbook, and then save it
After Agent deployment, we use strategies from the previously trained Playbook to generate answers for new Samples (Generation)

My Questions

After Agent deployment (STEP-2), do we simply end after using the Generator to produce answers, or do we need to go through Reflector (analyzes) → Curator (updates playbook) each time to keep the Agent self-improving?

So in practical Agent application development, how should we combine these three components? And which of the following patterns is the best practice?

Offline Adaptation + Generation
Offline Adaptation + [Generation + Reflector (analyzes) → Curator (updates playbook)]

Code Example (PATTERN-1 vs PATTERN-2)

# Initialize the LLM client
llm = LiteLLMClient(model="gpt-4o-mini")

# Create the three ACE components
generator = Generator(llm)
reflector = Reflector(llm)
curator = Curator(llm)

# Create training samples
samples = [
    Sample(
        question="What is the capital of France?"，
        context="Answer this question",
        ground_truth="Paris"
    ),
    ...
]


# Set up the environment (evaluates answers)
environment = SimpleEnvironment()
# Create an adapter to orchestrate everything
adapter = OfflineAdapter(
    generator,
    reflector,
    curator
)
# Train the agent (it learns strategies from these examples)
results = adapter.run(samples, environment, epochs=2)

# Save the learned strategies
adapter.playbook.save_to_file("trained_playbook.json")


# Test with a new sample 
test_sample = Sample(
    question="What is 5 + 3?",
    context="Provide the answer"
)

# ===================================================================
# STEP-2: [PATTERN-1]
# Offline Adaptation + Generation
# ===================================================================

# Generate an answer using the trained playbook
output = generator.generate(
    question=test_sample.question,
    context=test_sample.context,
    playbook=adapter.playbook
)

# ===================================================================
# STEP-2: [PATTERN-2]
# Offline Adaptation + [Generation + Reflector (analyzes) → Curator (updates playbook)]
# ===================================================================

def self_improving(test_sample: Sample):
    # Generate response
    output = generator.generate(
        question=test_sample.question,
        context=test_sample.context,
        playbook=playbook     
    )
    # Reflect on the response
    reflection = reflector.reflect(
        question=user_input,
        generator_output=output,
        playbook=playbook,
        ground_truth=None,  # No ground truth provided
        feedback=None  # No feedback provided               
    )
    # Update the playbook
    curator_output = curator.curate(
        reflection=reflection,
        playbook=playbook,
        question_context="",
        progress="self-learning",            
    )
    playbook.apply_delta(curator_output.delta)

    return output

# Generate an answer using the trained playbook
# and then analyze it + update the playbook
output = self_improving(test_sample)


# *********************************************************
# Print the answer and reasoning
print("Answer:", output.final_answer)
print("Reasoning:", output.reasoning)

Lanzelot1 · 2025-10-31T02:49:42Z

Lanzelot1
Oct 31, 2025

Great questions! Let's go through each one step by step to clear up any confusion.

Role Clarification

Your description of the three roles is correct! To clarify further:

Generator: The execution engine that does the actual work of your agentic system (data processing, tool calling, etc.)
Reflector: Analyzes what worked/didn't after execution
Curator: Updates the playbook based on reflection insights

Offline Adaptation (Step 1)

You've correctly identified the workflow! The core idea is to train on existing data in an isolated environment, building a pre-filled playbook BEFORE deployment. This makes your agent immediately more effective when it starts handling real tasks.

Online Adaptation (Step 2)

Your implementation in PATTERN-1 would be correct if you just want to freeze the playbook and no longer want ACE to improve it:

# When you have a trained playbook and just want to use it
result = generator.generate(
    question="What is 5 + 3?",
    context="",
    playbook=trained_playbook  # Uses strategies WITHOUT updating
)

PATTERN-2 is the correct approach when you want ACE to continuously improve the playbook after each run. However, you're trying to manually orchestrate the components, but ACE already handles this automatically through OnlineAdapter!

Your self_improving() implementation:

def self_improving(test_sample: Sample):
    output = generator.generate(...)
    reflection = reflector.reflect(...)  
    curator_output = curator.curate(...)
    playbook.apply_delta(curator_output.delta)

This is exactly what OnlineAdapter does internally! You don't need to write this code - just use:

# OnlineAdapter handles the ENTIRE loop automatically
adapter = OnlineAdapter(
    playbook=Playbook(),  # Can start empty or pre-trained
    generator=Generator(llm),
    reflector=Reflector(llm),
    curator=Curator(llm)
)

# Each call automatically does: Generate → Reflect → Curate → Update Playbook
results = adapter.run(samples, environment)

We realize the current README doesn't explain this clearly - it shows initializing components but not how they work together. We're actively working on:

Clearer documentation with complete examples
Starter templates for common patterns
Better demo examples showing both learning and inference modes

Hope this helps! If anything's still unclear, please follow up. Your questions are invaluable for improving our docs - they've already highlighted several areas we need to clarify!

1 reply

kaleyroy Nov 11, 2025
Author

@Lanzelot1

Thank you so much for the detailed and clear explanation!
Your introduction to OnlineAdapter was the key insight and cleared up my confusion.

While trying to use OnlineAdapter to replace my custom self-improving() process, I’ve run into some issues I’d like to discuss with you. ^_^

In your previous reply, you provided this example:

# OnlineAdapter handles the ENTIRE loop automatically
adapter = OnlineAdapter(
    playbook=Playbook(),  # Can start empty or pre-trained
    generator=Generator(llm),
    reflector=Reflector(llm),
    curator=Curator(llm)
)

# Each call automatically does: Generate → Reflect → Curate → Update Playbook
results = adapter.run(samples, environment)

For instance, in simple_qa task, we would define a SimpleEnvironment to provide feedback for the generated result.

class SimpleEnvironment(TaskEnvironment):
    """Minimal environment for testing."""

    def evaluate(self, sample, generator_output):
        correct = sample.ground_truth.lower() in generator_output.final_answer.lower()
        return EnvironmentResult(
            feedback="Correct!" if correct else "Incorrect",
            ground_truth=sample.ground_truth,
        )

The code above works fine during offline training. However, when using OnlineAdapter in production, user will input new questions which we don’t know the correct answers. This makes it impossible to provide real-time evaluation and feedback.

I checked the source code for OnlineAdapter.run(...) and found that after the Generation step, it calls the TaskEnvironment.evaluate() method which depends on the sample.ground_truth parameter. But once the agent is deployed in production, we won’t have the ground-truth answers for new questions, so we can’t provide the correct/realtime feedback.

    def _process_sample(
        self,
        sample: Sample,
        environment: TaskEnvironment,
        *,
        epoch: int,
        total_epochs: int,
        step_index: int,
        total_steps: int,
    ) -> AdapterStepResult:
        generator_output = self.generator.generate(
            question=sample.question,
            context=sample.context,
            playbook=self.playbook,
            reflection=self._reflection_context(),
            sample=sample,
        )
        # ===================================================================
        # Depends on sample.ground_truth & environment feedback
        # ===================================================================
        env_result = environment.evaluate(sample, generator_output)
        reflection = self.reflector.reflect(
            question=sample.question,
            generator_output=generator_output,
            playbook=self.playbook,
            ground_truth=env_result.ground_truth,
            feedback=env_result.feedback,
            max_refinement_rounds=self.max_refinement_rounds,
        )
        self._apply_bullet_tags(reflection)
        self._update_recent_reflections(reflection)
        curator_output = self.curator.curate(
            reflection=reflection,
            playbook=self.playbook,
            question_context=self._question_context(sample, env_result),
            progress=self._progress_string(
                epoch, total_epochs, step_index, total_steps
            ),
        )

        # ... other code

        self.playbook.apply_delta(curator_output.delta)

        return AdapterStepResult(
            sample=sample,
            generator_output=generator_output,
            environment_result=env_result,
            reflection=reflection,
            curator_output=curator_output,
            playbook_snapshot=self.playbook.as_prompt(),
            epoch=epoch,
            step=step_index,
        )

So, in a real-world production environment, how should we correctly use OnlineAdapter for new question answering and continuously improve the playbook without relying on the ground-truth answers?

BTW,
It’s great to hear that the documentation is being improved, and I’m really looking forward to the new examples and templates.
Thanks again for your valuable time. Your reply was a huge help!

baijiu-in-my-cup · 2025-11-23T07:18:52Z

baijiu-in-my-cup
Nov 23, 2025

Why do I feel that the work is to make AI outputs more reliable and easier to control? After all, ground-truth answers to new questions must be provided by humans.

1 reply

kaleyroy Nov 24, 2025
Author

Why do I feel that the work is to make AI outputs more reliable and easier to control? After all, ground-truth answers to new questions must be provided by humans.

@baijiu-in-my-cup
Yes, at first, I had the same idea as you.

I thought that in the OnlineAdapter scenario, the user provides answers and feedback to the Reflector in real-time. The Reflector would then combine the user’s feedback and answers to analyze them and generate suggestions, and then the Curator would update the playbook.

So, at the start of this discussion, I suggested to trying the [PATTERN-2] approach in a production environment, where the self_improving function doesn’t depend on user answers and feedback.

#===================================================================
# STEP-2: [PATTERN-2]
# Offline Adaptation + [Generation + Reflector (analyzes) → Curator (updates playbook)]
# ===================================================================

def self_improving(test_sample: Sample):
    # Generate response
    output = generator.generate(
        question=test_sample.question,
        context=test_sample.context,
        playbook=playbook
    )
    # Reflect on the response
    reflection = reflector.reflect(
        question=user_input,
        generator_output=output,
        playbook=playbook,
        ground_truth=None,  # **No ground truth provided**
        feedback=None  # **No feedback provided**
    )
    # Update the playbook
    curator_output = curator.curate(
        reflection=reflection,
        playbook=playbook,
        question_context="",
        progress="self-learning",
    )
    playbook.apply_delta(curator_output.delta)

    return output

# Generate an answer using the trained playbook
# and then analyze it + update the playbook
output = self_improving(test_sample)


# *********************************************************
# Print the answer and reasoning
print("Answer:", output.final_answer)
print("Reasoning:", output.reasoning)

Later, @Lanzelot1 suggested using OnlineAdapter directly, without needing a custom self_improving function .

This is exactly what OnlineAdapter does internally! You don't need to write this code - just use:

# OnlineAdapter handles the ENTIRE loop automatically

adapter = OnlineAdapter(
playbook=Playbook(), # Can start empty or pre-trained
generator=Generator(llm),
reflector=Reflector(llm),
curator=Curator(llm)
)

# Each call automatically does: Generate → Reflect → Curate → Update Playbook

results = adapter.run(samples, environment)

When I tried to use OnlineAdapter to replace the self_improving function, I found that it depends on ground_truth and environment feedback.

So, I followed up with @Lanzelot1 and asked a question similar to yours.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What are the best practices for the integration of ACE components? #11

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

What are the best practices for the integration of ACE components? #11

Uh oh!

kaleyroy Oct 29, 2025

The main question is:

Using Offline Adaptation as an example to list my understanding and questions

Code Example (PATTERN-1 vs PATTERN-2)

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

Lanzelot1 Oct 31, 2025

Role Clarification

Offline Adaptation (Step 1)

Online Adaptation (Step 2)

Uh oh!

Uh oh!

kaleyroy Nov 11, 2025 Author

Uh oh!

baijiu-in-my-cup Nov 23, 2025

Uh oh!

kaleyroy Nov 24, 2025 Author

kaleyroy
Oct 29, 2025

Replies: 2 comments 2 replies

Lanzelot1
Oct 31, 2025

kaleyroy Nov 11, 2025
Author

baijiu-in-my-cup
Nov 23, 2025

kaleyroy Nov 24, 2025
Author