# History and data analysis collaboration system

### A multi-agent approach for complex historical questions

Understanding historical events often requires both qualitative reasoning (context, causality, trends) and quantitative insights (data, statistics, measurements). Historians interpret narratives and contexts, while data analysts look for patterns and relationships within numbers.

This notebook presents a multi-agent AI system where two AI agents—one skilled in history research and the other in data analysis—work together to tackle a complex historical question.

The goal is to simulate how experts from different domains collaborate to arrive at a comprehensive, data-backed answer. Each agent contributes according to its domain of expertise, coordinated through a controlled task flow.

In [1]:
import os
import time

from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage
from typing import List, Dict
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Configure OpenAI API key for AI model access
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

### Initialize the language model
The language model serves as the core intelligence of our multi-agent system.

In [2]:
# Initialize the language model that will power our agents
llm = ChatOpenAI(model="gpt-4o-mini-2024-07-18", max_tokens=1000, temperature=0.7)

We configure the GPT-4 model with specific parameters: 1000 tokens for substantial responses and a temperature of 0.7 to balance creativity with consistency.

### Define the base Agent class
At the heart of our collaborative system is a general-purpose `Agent` class that can be specialized for different domains. Each agent is initialized with a `name`, `role`, and list of `skills`, which define its persona and capabilities. This design makes it easy to create specialized agents with distinct personalities, expertise areas, and communication styles while maintaining a consistent interface for collaboration.

The key capability of an agent is its ability to "process" a task — interpreting it through its defined persona and (optionally) in the context of a prior multi-turn dialogue. This makes the interaction feel more intelligent and persistent, mimicking the way domain experts build on earlier discussions.

In [3]:
# Initialize an AI agent with specific identity and capabilities
class Agent:
    def __init__(self, name: str, role: str, skills: List[str]):
        self.name = name  # Agent's name (used for identification)
        self.role = role  # Agent's role (defines their area of expertise)
        self.skills = skills  # List of capabilities or areas the agent is proficient in
        self.llm = llm  # Reference to the language model used for generating responses

    # Process a given task within the context of previous conversation
    def process(self, task: str, context: List[Dict] = None) -> str:
        # Start the message history with a system message describing the agent's persona and competencies
        messages = [
            SystemMessage(content=f"You are {self.name}, a {self.role}. Your skills include: {', '.join(self.skills)}. Respond to the task based on your role and skills.")
        ]

        # Add previous conversation context if available
        if context:
            for msg in context:
                if msg['role'] == 'human':
                    messages.append(HumanMessage(content=msg['content']))
                elif msg['role'] == 'ai':
                    messages.append(AIMessage(content=msg['content']))

        # Add the current task as a human message
        messages.append(HumanMessage(content=task))

        # Get response from the language model
        response = self.llm.invoke(messages)
        # Return the text content of the model's response
        return response.content

Here, we construct a reusable `Agent` class that standardizes how domain-specific agents interact with the language model. The class encapsulates identity information (like name, role, and skills) and uses this to frame its responses.
* **Initialization**: When an `Agent` is created, it stores its name, role, and list of domain-specific skills. It also hooks into a language model (in this case, OpenAI’s GPT-4o-mini) which powers its responses. It initializes identity attributes that tailor the agent’s responses to its intended function.
* **Task processing**: When the agent is asked to `process` a task:
  * It starts by generating a system message — a behind-the-scenes instruction to the model telling it, “Pretend you are this kind of expert.” The `SystemMessage` at the beginning of the message list is critical. It anchors the model in the persona of the agent, ensuring that responses reflect that agent’s domain knowledge and communication style.
  * Then, if there's any previous conversation, it gets replayed so the model understands the full context — which enables more coherent and context-aware responses — particularly useful in multi-step workflows.
  * The new task is added at the end as a user question.
  * This whole conversation (system prompt, previous context, and the current task) gets sent to the model, which generates a response from the agent’s point of view.
* **Final output**: The model’s answer is returned as plain text, ready for the rest of the system to use.


### Define specialized agents
With our base architecture established, we can now create specialized agents for our two primary domains: historical research and data analysis. Each agent inherits the core functionality while defining their specific expertise areas.

In [4]:
# Initialize a specialized agent for historical research
class HistoryResearchAgent(Agent):
    def __init__(self):
        super().__init__(
            name="Clio",
            role="History Research Specialist",
            skills=[
                "deep knowledge of historical events",
                "understanding of historical contexts",
                "identifying historical trends"
            ]
        )

# Initialize a specialized agent for data analysis
class DataAnalysisAgent(Agent):
    def __init__(self):
        super().__init__(
            name="Data",
            role="Data Analysis Expert",
            skills=[
                "interpreting numerical data",
                "statistical analysis",
                "data visualization description"
            ]
        )

These classes inherit from the base `Agent` and predefine specific expertise. `Clio` specializes in historical context; `Data` excels at quantitative reasoning.

The specialized agents demonstrate the power of role-based AI design. By giving each agent a distinct name and carefully curated skill set, we create AI personas that approach problems from different angles. This specialization isn't just cosmetic—it fundamentally shapes how each agent processes information and generates responses, leading to more focused and relevant contributions to the collaborative process.

### Define the different functions for the collaboration system
The heart of our system lies in the structured workflow that guides agent collaboration. Each function represents a specific phase in the research process, with agents taking turns to contribute their expertise while building upon previous insights.

#### Research historical context
The history agent provides background information for the task.

In [5]:
# Establish historical background and context
def research_historical_context(history_agent, task: str, context: list) -> list:
    print("🏛️ History Agent: Researching historical context...")

    # Formulate a task focused on historical context
    history_task = f"Provide relevant historical context and information for the following task: {task}"

    # Get historical context from the history agent
    history_result = history_agent.process(history_task)

    # Add the result to the ongoing conversation context
    context.append({"role": "ai", "content": f"History Agent: {history_result}"})

    # Provide user feedback on progress
    print(f"📜 Historical context provided: {history_result[:100]}...\n")
    return context

This function creates a research-oriented prompt and captures the historical summary from the agent, appending it to the ongoing context.

#### Identify data needs
Now, the data agent interprets the historical background to determine what data is required.

In [6]:
# Determine what quantitative data would be most valuable
def identify_data_needs(data_agent, task: str, context: list) -> list:
    print("📊 Data Agent: Identifying data needs based on historical context...")

    # Extract the most recent historical context
    historical_context = context[-1]["content"]

    # Task the data agent with identifying relevant data needs
    data_need_task = f"Based on the historical context, what specific data or statistical information would be helpful to answer the original question? Historical context: {historical_context}"

    # Get data requirements from the data agent
    data_need_result = data_agent.process(data_need_task, context)

    # Add to conversation context
    context.append({"role": "ai", "content": f"Data Agent: {data_need_result}"})

    # Provide user feedback on progress
    print(f"🔍 Data needs identified: {data_need_result[:100]}...\n")
    return context

This stage generates a data inquiry based on the historical input, determining relevant metrics (e.g., urbanization rates, population stats).

#### Provide historical data
The history agent now responds with data or numeric context based on the earlier data needs.

In [7]:
# Supply relevant historical data based on identified needs
def provide_historical_data(history_agent, task: str, context: list) -> list:
    print("🏛️ History Agent: Providing relevant historical data...")

    # Extract the data needs from the previous step
    data_needs = context[-1]["content"]

    # Task the history agent with providing specific data
    data_provision_task = f"Based on the data needs identified, provide relevant historical data or statistics. Data needs: {data_needs}"

    # Get historical data from the history agent
    data_provision_result = history_agent.process(data_provision_task, context)

    # Add to conversation context
    context.append({"role": "ai", "content": f"History Agent: {data_provision_result}"})

    # Provide user feedback on progress
    print(f"📊 Historical data provided: {data_provision_result[:100]}...\n")
    return context

This augments the analysis by returning real or approximated data aligned with the identified needs.

#### Analyze data
The data agent interprets the provided historical data, identifying patterns or correlations.

In [8]:
# Analyze the provided historical data for patterns and insights
def analyze_data(data_agent, task: str, context: list) -> list:
    print("📈 Data Agent: Analyzing historical data...")

    # Extract the historical data from the previous step
    historical_data = context[-1]["content"]

    # Task the data agent with analyzing the provided data
    analysis_task = f"Analyze the historical data provided and describe any trends or insights relevant to the original task. Historical data: {historical_data}"

    # Get analysis from the data agent
    analysis_result = data_agent.process(analysis_task, context)

    # Add to conversation context
    context.append({"role": "ai", "content": f"Data Agent: {analysis_result}"})

    # Provide user feedback on progress
    print(f"💡 Data analysis results: {analysis_result[:100]}...\n")
    return context

The data agent provides reasoning or statistical interpretation to support conclusions.

#### Synthesize final answer
Finally, the history agent synthesizes everything into a comprehensive answer.

In [9]:
# Combine all insights into a comprehensive final answer
def synthesize_final_answer(history_agent, task: str, context: list) -> str:
    print("🏛️ History Agent: Synthesizing final answer...")

    # Task the history agent with creating a comprehensive synthesis
    synthesis_task = "Based on all the historical context, data, and analysis, provide a comprehensive answer to the original task."

    # Get final synthesis from the history agent
    final_result = history_agent.process(synthesis_task, context)
    return final_result

This aggregates all findings into a final, holistic response from the historical perspective.

This five-phase workflow creates a natural progression from broad context to specific analysis. Each function serves a distinct purpose in the research process, and the sequential design ensures that each agent has the information they need to contribute effectively. The context parameter acts as a shared memory, allowing agents to build upon each other's work rather than operating in isolation. The alternating pattern between history and data agents creates a dialogue that mirrors real-world interdisciplinary collaboration, where different experts contribute their unique perspectives to gradually build understanding.

### Orchestration system class
The collaboration system brings together all components into a cohesive workflow that manages the entire research process from initial question to final synthesis.

The key idea here is that each step builds upon the results of the previous one. This design follows a pipeline pattern, where the system maintains and evolves a shared `context`, which is passed through each agent. Additionally, the class includes basic safeguards, like a timeout mechanism to avoid long-running tasks and error handling for robustness.

In [10]:
class HistoryDataCollaborationSystem:
    # Initialize the collaboration system with specialized agents - creates instances of both history and data analysis agents
    def __init__(self):
        # Create specialized agents
        self.history_agent = HistoryResearchAgent()
        self.data_agent = DataAnalysisAgent()

    # Execute the complete collaboration workflow to solve a complex question
    def solve(self, task: str, timeout: int = 300) -> str:
        print(f"\n👥 Starting collaboration to solve: {task}\n")

        start_time = time.time()  # Record start time to track timeout
        # Initialize context tracking
        context = []

        # Define the workflow steps with their corresponding agents
        steps = [
            (research_historical_context, self.history_agent),
            (identify_data_needs, self.data_agent),
            (provide_historical_data, self.history_agent),
            (analyze_data, self.data_agent),
            (synthesize_final_answer, self.history_agent)
        ]

        # Execute each step in sequence
        for step_func, agent in steps:
            # Check for timeout to prevent infinite execution
            if time.time() - start_time > timeout:
                return "Operation timed out. The process took too long to complete."
            try:
                # Execute the current step
                result = step_func(agent, task, context)
                # Check if this is the final step (returns string instead of context)
                if isinstance(result, str):
                    return result  # This is the final answer
                # Update context for next step
                context = result

            except Exception as e:
                return f"Error during collaboration: {str(e)}"

        print("\n✅ Collaboration complete. Final answer synthesized.\n")
        # Return the last response in the context
        return context[-1]["content"]

The orchestration system implements a pipeline pattern where each step depends on the output of the previous step. This class is the engine that drives the collaborative process. At a high level, it performs the following:
- It starts by initializing two intelligent agents: one focused on historical research and the other on data analysis. These agents are built from the generic `Agent` class but are configured with their own names, roles, and skill sets.
- The `solve` method is where the orchestration happens. It accepts a `task` — usually a complex historical question — and then runs through a series of predefined steps, each handled by the appropriate expert.
- The `steps` list defines the workflow.
- At every step, the shared `context` is updated with the latest output. Think of this like a shared notebook between the agents — each one reads what's already there, adds their own input, and passes it on.
- The return value handling distinguishes between intermediate steps (which return updated context) and the final step (which returns the synthesized answer).
- There’s a timeout guard to prevent the system from running too long, which is important when dealing with multiple steps and external model calls.
- Lastly, there is error handling, so if any one step fails unexpectedly, the system doesn’t crash — it just returns a helpful error message instead.


### Example usage
Now we can see our collaboration system in action with a complex historical question that requires both contextual understanding and quantitative analysis.

In [11]:
# Create an instance of the collaboration system
collaboration_system = HistoryDataCollaborationSystem()

# Define a complex historical question that requires both historical knowledge and data analysis
question = "How did urbanization rates in Europe compare to those in North America during the Industrial Revolution, and what were the main factors influencing these trends?"

# Solve the question using the collaboration system
result = collaboration_system.solve(question)

# Print the result
print(result)


👥 Starting collaboration to solve: How did urbanization rates in Europe compare to those in North America during the Industrial Revolution, and what were the main factors influencing these trends?

🏛️ History Agent: Researching historical context...
📜 Historical context provided: During the Industrial Revolution, which began in the late 18th century and extended into the 19th ce...

📊 Data Agent: Identifying data needs based on historical context...
🔍 Data needs identified: To provide a comprehensive analysis of urbanization trends during the Industrial Revolution in Europ...

🏛️ History Agent: Providing relevant historical data...
📊 Historical data provided: Here’s a compilation of relevant historical data and statistics that relate to the urbanization tren...

📈 Data Agent: Analyzing historical data...
💡 Data analysis results: Data Agent: Analyzing the historical data provided reveals several significant trends and insights r...

🏛️ History Agent: Synthesizing final answer...
Certai

This example showcases the system's ability to handle multifaceted historical questions that require both broad contextual knowledge and specific data analysis. The question about urbanization during the Industrial Revolution is ideal because it demands understanding of historical processes, comparative analysis between regions, and interpretation of demographic data. The system's output will demonstrate how the collaborative approach can produce more comprehensive answers than either agent could provide independently.