# How to use CrewAI Planning in your Crew of agents

---

The planning feature in CrewAI allows you to add planning capability to your crew. When enabled, before each Crew iteration, all Crew information is sent to an AgentPlanner that will plan the tasks step by step, and this plan will be added to each task description.

This guide will serve as a demostration of how to use the **planning** feature in your crews. Let's start by installing required packages.

### Install dependencies

In [6]:
!uv pip install -U --quiet crewai crewai-tools exa_py galileo
!uv pip install --quiet --upgrade ipywidgets

In [7]:
import warnings

# Filter out deprecation warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', message='Pydantic serializer warnings')

### Configure API keys for both the LLM and Exa for searching the web.

In [8]:
import os
from crewai import LLM
from crewai_tools import EXASearchTool
from dotenv import load_dotenv
from galileo.handlers.crewai.handler import CrewAIEventListener

CrewAIEventListener()

load_dotenv()

OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
EXA_API_KEY = os.getenv('EXA_API_KEY')

openai_model = LLM(model="openai/gpt-4.1-mini", api_key=OPENAI_API_KEY)
tool = EXASearchTool(api_key=EXA_API_KEY)

### Now we build our agents!
 
- **Analyst** to do research and a 
- **Reporter** to generate a detailed report from the research.
- In our **crew** we will add **`planning=True`** to enable planning capabilities for our agents before they perform the tasks. 
- Here's where we also add the **`planning_llm`** that will be used to plan the tasks

In [11]:
from crewai import Agent, Task, Crew

# Create an agent with reasoning enabled
analyst = Agent(
    role="Research Specialist",
    goal="Conduct detailed research on {topic} using the latest information as of {todays_date}. Think about the best questions to ask to get the most relevant information. Focus on technical details. Think critically and find deep insights.",
    backstory="You are an expert at web research.",
    llm=openai_model,
    tools=[tool],
    reasoning=True,
    max_reasoning_attempts=3,
    # verbose=True
)

reporter = Agent(
    role="Report Specialist",
    goal="Generate a detailed report on {topic} using the latest information as of {todays_date}.",
    backstory="You are an expert at research report writing",
    llm=openai_model,
    async_execution=True
    # verbose=True
)

# Create tasks
analysis_task = Task(
    description="Research and analyze the recent trends and generate a detailed report on {topic}",
    expected_output="A detailed report structured as an executive report for c-suite execs to read.",
    agent=analyst,
    async_execution=True
    # verbose=True
)

report_task = Task(
    description="Generate a detailed report on {topic}",
    expected_output="A detailed report structured as an executive report for c-suite execs to read.",
    agent=analyst,
    markdowns=True,
    context=[analysis_task]
    # verbose=True
)

# Create a crew and run the task
crew = Crew(
    agents=[analyst, reporter],
    tasks=[analysis_task, report_task],
    planning=True,
    planning_llm=openai_model,
    verbose=True
)

result = crew.kickoff(inputs={"topic":"ARC AGI v1 and v2 top solutions and how do they differ", "todays_date": "October 9, 2025"})

[96m
[2025-10-09 18:21:47][0m[93m[INFO]: [0m[1m[93mPlanning the crew execution[0m
[EventBus Error] Handler 'on_task_started' failed for event 'TaskStartedEvent': 'NoneType' object has no attribute 'key'
[EventBus Error] Handler 'on_task_started' failed for event 'TaskStartedEvent': 'NoneType' object has no attribute 'id'
[EventBus Error] Handler 'on_task_started' failed for event 'TaskStartedEvent': 'NoneType' object has no attribute 'id'


Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

In [12]:
from IPython.display import Markdown

Markdown(result.raw)

# Executive Report: Technical Comparison and Strategic Analysis of ARC AGI v1 and v2 Solutions

Date: October 9, 2025  
Prepared by: Research Specialist

---

## Executive Summary

The ARC AGI (AI Reasoning Challenge) has evolved substantially from its initial ARC AGI v1 to the newly launched ARC AGI v2 benchmark, reflecting significant advances in AI reasoning and problem-solving capabilities. ARC AGI v1 established foundational performance baselines, focusing on symbolic and spatial reasoning tasks. ARC AGI v2 dramatically extends the challenge scope with multi-modal inputs, iterative self-debugging requirements, and reasoning under uncertainty, pushing AI architectures toward more generalizable, adaptive intelligence.

Cutting-edge ARC AGI v2 solutions demonstrate a leap in technical sophistication and performance, with state-of-the-art systems such as OpenAI’s o3 model achieving an unprecedented 87.5% accuracy on the benchmark—well beyond the ~50-60% attained by ARC AGI v1 leaders. Innovations encompass hybrid neuro-symbolic architectures, dynamic evolutionary computation integrated with reinforcement and meta-learning, modular multi-agent system designs, and enhanced scalability for real-time deployment.

This report delivers a structured comparative analysis of ARC AGI v1 and v2 top solutions, emphasizing architectural evolution, algorithmic breakthroughs, performance gains, and expanded application contexts. Key technical differentiators and strategic insights are distilled to guide executive decision-making, highlighting the value of investing in modular, hybrid reasoning AI and continued research addressing remaining challenges like long-horizon planning.

---

## 1. Introduction

The AI Reasoning Challenge (ARC) Prize benchmarks AI progress in reasoning, adaptability, and problem-solving—a foundational step toward artificial general intelligence (AGI). ARC AGI v1 provided an initial test suite focused on symbolic and spatial reasoning. Its successor, ARC AGI v2, introduces increased task complexity with multi-modal data, iterative solution refinement, and probabilistic reasoning requirements, better reflecting real-world challenges and pushing AI systems to new frontiers.

---

## 2. Research Questions Guiding the Comparison

- What are the leading solution techniques for ARC AGI v1 and v2?
- How do the architectures and algorithms differ between the two versions?
- What improvements are observed in accuracy, reasoning depth, and generalization?
- How have scalability and application scope expanded?
- What limitations from v1 have been overcome or remain open in v2?

---

## 3. ARC AGI v1 Overview and Top Solutions

### 3.1 Technical Specifications
- Tasks: Symbolic logic puzzles, spatial reasoning, abstract pattern recognition.
- Dominant models: Large transformer-based language models (GPT-4 variants, Sonnet 3.5), augmented with heuristic or program synthesis components.
- Key methods: Evolutionary test-time computation enabling iterative solution refinement.
- Architectural style: Predominantly monolithic transformer models with limited modularity.

### 3.2 Performance Metrics
- Top accuracy: ~50% on public ARC AGI v1 benchmarks.
- Best teams (e.g., Jeremy Berman) reached mid-50% by incorporating ensemble voting and beam search.

### 3.3 Limitations
- Struggled with multi-step deductions and long-term planning.
- Limited to unimodal data, impacting versatility.
- Scalability challenges under real-time conditions.

---

## 4. ARC AGI v2 Advancements and Top Solutions

### 4.1 Technical Specifications
- Tasks: Expanded to multi-modal inputs (images, graphs, text), reasoning under uncertainty, iterative self-debugging.
- Architecture: Modular hybrid systems integrating foundation transformers, neuro-symbolic engines, evolutionary algorithms, reinforcement learning, and meta-learning.
- Novelty: Multi-agent collaboration frameworks and plug-and-play reasoning modules specialized by input modality.

### 4.2 Performance Metrics
- Leading systems (e.g., OpenAI o3) surpass 87.5% accuracy.
- Demonstrated robustness to unseen tasks and significant efficiency gains from parallelized evolutionary algorithms.

### 4.3 Algorithmic Innovations
- Evolutionary algorithms augmented with reinforcement learning policies guide test-time search.
- Use of probabilistic programming enables reasoning under uncertainty.
- Gradient-based meta-learning facilitates rapid transfer learning across diverse tasks.

### 4.4 Scalability and Use Cases
- Designed for real-time, practical applications like automated scientific hypothesis generation and adaptive robotics.
- Efficient parallelization enables scaling to larger problem sets and higher throughput.

---

## 5. Comparative Analysis: ARC AGI v1 vs v2

| Aspect                  | ARC AGI v1                                       | ARC AGI v2                                             | Improvements & Highlights                                |
|-------------------------|-------------------------------------------------|--------------------------------------------------------|---------------------------------------------------------|
| **Benchmark Scope**      | Symbolic logic, spatial puzzles                 | Multi-modal, uncertainty, iterative reasoning          | Tasks more realistic and complex                         |
| **Top Model Types**      | Large transformers + heuristics                  | Neuro-symbolic + evolutionary + reinforcement learning | Diverse, integrated hybrid architectures                 |
| **Peak Performance**     | ~50-60% accuracy                                | ~87.5% accuracy (OpenAI o3 model)                      | Over 40% absolute accuracy gain                          |
| **Algorithmic Techniques**| Evolutionary test-time compute                   | Hybrid evolutionary + RL + probabilistic reasoning      | Multi-faceted, synergistic solution search               |
| **Architectural Design** | Monolithic large models                          | Modular, multi-agent, plug-in systems                    | Flexibility, maintainability, explainability             |
| **Scalability**          | Limited real-time applicability                  | High-efficiency parallelization and adaptive compute   | Practical for deployment in real-world scenarios         |
| **Application Scope**    | Puzzle solving, symbolic logic                   | Scientific automation, robotics, complex multi-modal reasoning | Broader real-world relevance                             |
| **Handling Uncertainty and Novelty**| Weak planning, limited uncertainty handling | Robust iterative reasoning under uncertainty            | Significant advances in generalization                   |

---

## 6. Gaps and Limitations Addressed by v2

- ARC AGI v2 addresses poor generalization and uncertainty handling that constrained v1.
- Enables multi-modal fusion for richer input interpretation.
- Modular system design contributes to better explainability and testability.
- Scalable solutions with efficient compute implementations.
- Remaining challenges: brittleness in long-horizon, abstract planning tasks require further research.

---

## 7. Strategic Recommendations

- **Invest in Hybrid Neuro-Symbolic Architectures:** These offer the best combination of reasoning power, explainability, and flexibility.
- **Support Integration of Evolutionary and Reinforcement Learning:** Such synergistic approaches enable dynamic test-time problem solving.
- **Prioritize Modular and Multi-Agent System Designs:** Improve scalability, maintenance, and adaptability crucial for complex AI systems.
- **Adopt ARC AGI v2 Benchmarks for Capability Assessment:** Set industry standards for progress toward general AI.
- **Fund Research Into Long-Horizon Planning and Robustness:** Address remaining gaps to enable truly adaptive AGI.

---

## 8. Conclusion

ARC AGI v2 represents a significant breakthrough over v1, pushing AI systems closer to human-level reasoning in multi-modal, uncertain environments. The reported >87% accuracy by top models reflects the impact of architectural modularity, hybrid algorithmic integration, and scalable evolutionary computation.

For executives and strategists, recognizing and embracing these ARC AGI v2 advancements is critical to harnessing the future of AGI-driven capabilities. Continued engagement with ARC benchmarks and supporting research into modular, adaptive hybrid AI architectures will be a key differentiator in the evolving AI landscape.

---

## Appendix: Selected References

- ARC Prize. (2025). ARC-AGI-2 Technical Report. https://arcprize.org/blog/arc-agi-2-technical-report  
- Kamradt, G. (2025). Announcing ARC-AGI-2 and ARC Prize 2025. https://arcprize.org/blog/announcing-arc-agi-2-and-arc-prize-2025  
- Berman, J. (2024). How I Got the Highest Score on ARC-AGI. https://jeremyberman.substack.com/p/how-i-got-the-highest-score-on-arc-agi-again  
- Surapaneni, M. (2025). Leveling Up AI with ARC-AGI-2. https://www.linkedin.com/pulse/leveling-up-ai-amazing-world-arcagi2-arc-prize-2025-manish-surapaneni-mbopc  
- Dubey, A. (2025). A New AGI Challenge: ARC-AGI-2 Stumps Most AI Models. https://www.linkedin.com/pulse/new-agi-challenge-arc-agi-2-stumps-most-ai-models-avinash-dubey-wmxqc  
- OpenAI o3 Breakthrough Blog. (2024). https://arcprize.org/blog/oai-o3-pub-breakthrough  
- Knoop, M. (2025). We tested every major AI reasoning system. There is no clear winner. https://arcprize.org/blog/which-ai-reasoning-model-is-best  

---

I am available to produce detailed technical appendices and visual diagrams on request.

---

End of Report

### Conclusion

In the agent task outputs above, you can see where the agent called the planning before starting the task. For more information on Planning, please reference our [documentation](https://docs.crewai.com/en/concepts/planning).

The Planing feature is an alternative to the **Reasoning** feature in CrewAI. There's a good chance both perform similar or the reasoning will be better.
We recommend trying both to see what works best for you.

Happy building! 🎉