<a href="https://colab.research.google.com/github/mahalingamagesthian/learningai/blob/main/6_AgenticAIPythonInterpreter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## A Coding Implementation to Build an AI Agent with Live Python Execution and Automated Validation

This is inspired from https://www.marktechpost.com/2025/05/25/a-coding-implementation-to-build-an-ai-agent-with-live-python-execution-and-automated-validation/

In [None]:
!pip install langchain langchain-anthropic langchain-core anthropic

In [2]:
import os
from langchain.agents import create_react_agent, AgentExecutor
from langchain.tools import Tool
from langchain_core.prompts import PromptTemplate
from langchain_anthropic import ChatAnthropic
import sys
import io
import re
import json
from typing import Dict, Any, List

This PythonREPLTool is a fantastic example of how to build a basic interactive environment in Python, handling input, output, errors, and maintaining state between commands.

`class PythonREPLTool`: This line defines a new class named `PythonREPLTool`. In Python, classes are blueprints for creating objects. An object is an instance of a class. This class is designed to simulate a Python "Read-Eval-Print Loop" (REPL), which is the interactive Python interpreter you get when you type python in your terminal. It allows you to execute Python code programmatically and capture its output.

`def __init__(self):`: This is a special method called the constructor. It's automatically called whenever you create a new object (an "instance") of the PythonREPLTool class. Its purpose is to initialize the object's attributes (variables unique to each object).
*   `self.globals_dict = { ... }`: This line initializes an instance variable called globals_dict. This dictionary will store the "global" variables and modules that are available when the Python code is executed by this tool.

  *   `'__builtins__': __builtins__`: This is crucial. `__builtins__` is a special module that contains all of Python's built-in functions (like print(), len(), sum(), int(), etc.). By including it here, any code run by this tool will have access to these standard functions.
  * `'json': json, 're': re`: These lines make the json (for working with JSON data) and re (for regular expressions) modules available globally to the code executed by this tool. If you wanted other modules (e.g., math, datetime) to be available, you'd add them here.
* `self.locals_dict = {}`: This line initializes another instance variable called locals_dict. This dictionary will store the "local" variables created during the execution of the Python code. In a typical REPL, variables defined in one line are available in subsequent lines. This dictionary helps simulate that persistent state.
* `self.execution_history = []`: This initializes an empty list that will store a record of all the code executed by this tool, along with its output, return value, and any errors. This allows you to review past interactions with the REPL.

`def run(self, code: str) -> str:`: This is the core method of the class. It takes a string code (representing the Python code to execute) as input and is type-hinted to return a string.

* `try:` This starts a try-except block, which is used for error handling. Code inside try is executed, and if an error occurs, the code in the corresponding except block is executed. This outer try-except catches any runtime errors that prevent the code from even reaching the inner execution.
* `old_stdout = sys.stdout old_stderr = sys.stderr` : These lines save the original standard output (sys.stdout, where print() normally writes) and standard error (sys.stderr, where error messages normally go) streams. This is crucial for redirecting them later.
* `sys.stdout = captured_output = io.StringIO() sys.stderr = captured_error = io.StringIO()`: These lines redirect sys.stdout and sys.stderr to io.StringIO() objects. io.StringIO() creates an in-memory text buffer that behaves like a file. Any print() calls or error messages generated by the executed code will now be written into these StringIO objects instead of appearing directly in the console. This allows us to capture the output programmatically.
* `execution_result = None`: Initializes a variable to store the return value of the executed code (if any).
* `try`: (Inner try-except): This inner block handles the actual execution and differentiates between expressions and statements.

  * `result = eval(code, self.globals_dict, self.locals_dict)`: This is the first attempt to execute the code.
    * `eval()`: This built-in Python function attempts to evaluate a string as a Python expression. An expression is something that returns a value `(e.g., "1 + 1", "my_list[0]", "my_function()", "x > 5")`.
    * `self.globals_dict` : Provides the global namespace for the evaluation.
    * `self.locals_dict`: Provides the local namespace for the evaluation.
  * `execution_result = result`: Stores the result if eval() was successful.
  * `if result is not None: print(result)`: If the evaluated expression returned a non-None value, it's explicitly printed. This mimics the behavior of a REPL where the result of an expression is usually shown.
  * `except SyntaxError:`: If eval() fails with a SyntaxError (meaning the input code is not a valid expression but might be a valid statement or multiple statements), this block is executed.
    * `exec(code, self.globals_dict, self.locals_dict)`: This built-in Python function attempts to execute a string as one or more Python statements. A statement doesn't necessarily return a value `(e.g., x = 10, if x > 5: print("Hello"), for i in range(5): pass)`. This allows the REPL to handle both expressions and statements.
  * `output = captured_output.getvalue()`: Retrieves the entire string content that was "printed" to sys.stdout (our StringIO object) during code execution.
  * `error_output = captured_error.getvalue()`: Retrieves the entire string content that was "printed" to sys.stderr (our StringIO object) during code execution (e.g., traceback messages).
  * `sys.stdout = old_stdout sys.stderr = old_stderr`: These lines restore the original sys.stdout and sys.stderr. This is crucial so that subsequent print() calls or errors from outside this run method go to the actual console.
  * `self.execution_history.append({ ... })`: Appends a dictionary containing details of the just-completed execution to the execution_history list. This includes the original code, the captured output, the result (if any), and any error_output.
  * `response = f"..."`: This constructs a formatted string response that summarizes the execution. It uses f-strings for easy embedding of variables.
    * It always includes the executed code.
    * If `error_output` exists, it includes it.
    * It includes the output from the console. If the output is empty after stripping whitespace, it says "No console output".
    * `if execution_result is not None and not output.strip()`: This condition checks if there was a non-None return value `from eval()` AND no explicit `print()` output. This is to avoid double-reporting results if the user already printed the result themselves. If both conditions are true, it adds a "Return Value" line.
  * `return response`: Returns the formatted string containing the execution summary.
  * `except Exception as e:`: This is the except block for the outer try. It catches any Exception that might occur during the process of trying to run the code, for example, if eval or exec encounter a runtime error that isn't a `SyntaxError (e.g., NameError, TypeError, ZeroDivisionError)`.
    * `sys.stdout = old_stdout sys.stderr = old_stderr`: Again, it's critical to restore `sys.stdout` and `sys.stderr` immediately, even if an error occurs.
    * `error_info = f"..."`: Constructs a formatted error message.
    * `self.execution_history.append({ ... })`: Appends the error details to the history.
    * `return error_info`: Returns the error summary string.

`def get_execution_history(self) -> List[Dict[str, Any]]:`: This method is a simple "getter." It returns the execution_history list, allowing users of this class to inspect all past code executions and their results.
  * `List[Dict[str, Any]]`: This is a type hint indicating that the method returns a list, where each element of the list is a dictionary, and within each dictionary, keys are strings and values can be of any type.

`def clear_history(self):`: This method resets the `execution_history` list to an empty list. This is useful if you want to start a new "session" without previous interactions influencing future results or cluttering the history.








In [3]:
class PythonREPLTool:
    def __init__(self):
        self.globals_dict = {
            '__builtins__': __builtins__,
            'json': json,
            're': re
        }
        self.locals_dict = {}
        self.execution_history = []

    def run(self, code: str) -> str:
        try:
            old_stdout = sys.stdout
            old_stderr = sys.stderr
            sys.stdout = captured_output = io.StringIO()
            sys.stderr = captured_error = io.StringIO()

            execution_result = None

            try:
                result = eval(code, self.globals_dict, self.locals_dict)
                execution_result = result
                if result is not None:
                    print(result)
            except SyntaxError:
                exec(code, self.globals_dict, self.locals_dict)

            output = captured_output.getvalue()
            error_output = captured_error.getvalue()

            sys.stdout = old_stdout
            sys.stderr = old_stderr

            self.execution_history.append({
                'code': code,
                'output': output,
                'result': execution_result,
                'error': error_output
            })

            response = f"**Code Executed:**n```pythonn{code}n```nn"
            if error_output:
                response += f"**Errors/Warnings:**n{error_output}nn"
            response += f"**Output:**n{output if output.strip() else 'No console output'}"

            if execution_result is not None and not output.strip():
                response += f"n**Return Value:** {execution_result}"

            return response

        except Exception as e:
            sys.stdout = old_stdout
            sys.stderr = old_stderr

            error_info = f"**Code Executed:**n```pythonn{code}n```nn**Runtime Error:**n{str(e)}n**Error Type:** {type(e).__name__}"

            self.execution_history.append({
                'code': code,
                'output': '',
                'result': None,
                'error': str(e)
            })

            return error_info

    def get_execution_history(self) -> List[Dict[str, Any]]:
        return self.execution_history

    def clear_history(self):
        self.execution_history = []

Alright, let's dive into this `ResultValidator` class! This class is designed to work hand-in-hand with the `PythonREPLTool` you previously learned about. Its main purpose is to automate the validation of results generated by code executed within that REPL.

This is super useful in scenarios where an AI agent or a user generates code, and you want to programmatically check if the output or the defined variables meet certain criteria.

`class ResultValidator:`: This line defines a new class called `ResultValidator`. This class will contain methods to perform different types of validation on the results of Python code execution.

* `def __init__(self, python_repl: PythonREPLTool):`: This is the constructor for the `ResultValidator` class.
 * It takes one argument: `python_repl`.
 * python_repl: `PythonREPLTool` is a type hint, indicating that the python_repl argument is expected to be an instance of the `PythonREPLTool` class. This tells us that this validator will rely on a `PythonREPLTool` object to run validation code and access its execution history.
 * `self.python_repl = python_repl`: This line stores the passed `PythonREPLTool` instance as an attribute of the `ResultValidator` object. This makes the `PythonREPLTool` accessible within all methods of the `ResultValidator`.

* `def validate_mathematical_result(...) -> str:`: This method is designed to validate numerical results, typically extracted from the output of a previously run code snippet.
 * `description: str`: A string explaining what is being validated (e.g., "Check sum of numbers").
 * `expected_properties: Dict[str, Any]`: A dictionary containing the properties you expect the numbers to have. Examples include:
   * `'count': 5` (expecting 5 numbers)
   * `'max_value': 100` (expecting max value to be <= 100)
   * `'min_value': 10` (expecting min value to be >= 10)
   * `'sum_range': [50, 100]` (expecting the sum to be between 50 and 100, inclusive)
 * `-> str`: Type hint indicating it returns a string (the formatted output from the `PythonREPLTool.run` method).
 * `validation_code = f"""..."""`: This is the core of the method. It constructs a **multi-line f-string** that represents a complete Python script. This script will then be passed to and executed by the python_repl instance.
  * `Validation for: {description}:` A comment to label the validation.
  * `validation_results = {}`: An empty dictionary to store the results of the validation checks.
  * `history = {self.python_repl.execution_history}`: **Crucially**, this line embeds the current execution history of the PythonREPLTool directly into the generated `validation_code`. This allows the validation script to look at the previous code's output.
  * if `history: last_execution = history[-1]`: Gets the most recent execution record from the history.
  * `print(f"Last execution output: {{last_execution['output']}}")`: Prints the captured output of the previous execution for context.
  * `import re and numbers = re.findall(r'\d+(?:\.\d+)?', last_execution['output'])`: This is a powerful part. It uses regular expressions (re) to find all sequences of digits (with optional decimal points) in the `last_execution['output']` string, effectively extracting any numbers that were printed.
  * `numbers = [float(n) for n in numbers]`: Converts the extracted number strings into floating-point numbers.
  * `for prop, expected_value in {expected_properties}.items():`: This loop iterates through the `expected_properties` dictionary provided by the user.
  * `Conditional if/elif blocks`: These check for specific property names `('count', 'max_value', 'min_value', 'sum_range')` and perform the corresponding validation logic:
   * `'count'`: Checks if the number of extracted numbers matches expected_value.
   * `'max_value'`: Checks if the maximum extracted number is less than or equal to expected_value.
   * `'min_value'`: Checks if the minimum extracted number is greater than or equal to expected_value.
   * `'sum_range'`: Checks if the sum of extracted numbers falls within the specified min_sum and max_sum range.
   * The results of these checks are stored in `validation_results (e.g., 'count_check': True/False)` and printed to the console within the generated validation script.
 * `print("\nValidation Summary:")`: Prints a summary of all validation checks.
 * `validation_results`: The last line of the `validation_code` is just `validation_results`. Because `eval()` (which `PythonREPLTool.run` might use if it's a single expression) or `exec()` will capture the last expression's result, this dictionary will be returned by `run()` and included in the `execution_result` of the validation run.
 * `return self.python_repl.run(validation_code)`: Finally, this line executes the constructed validation_code using the associated `PythonREPLTool` instance and returns its output.
* `def validate_data_analysis(...) -> str:`: This method is designed to validate the existence and properties of variables that might have been created during a data analysis task within the REPL.
 * `description: str`: A description of the data analysis being validated.
 * `expected_structure: Dict[str, Any]`: A dictionary where keys are variable names you expect to be present in the REPL's global scope, and values are their expected types `(as strings, e.g., 'int', 'list', 'pandas.DataFrame')`.
  * `validation_code = f"""..."""`: Again, this constructs a Python script that will be executed by `python_repl`.
   * `required_vars = {list(expected_structure.keys())}`: Gets a list of variable names to check for from expected_structure.
  * `if var_name in globals():`: This is key! `globals()` is a built-in Python function that returns a dictionary representing the current global symbol table. In the context of `PythonREPLTool`, `globals()` within the executed code will reflect the `self.globals_dict` passed to exec/eval. This allows the validation script to inspect variables defined by previous code runs.
  * `var_value = globals()[var_name]`: Retrieves the value of the variable.
  * `validation_results[f'{{var_name}}_type'] = type(var_value).__name__`: Records the actual type of the found variable.
  * `Type-specific validations`: Checks if the variable is a `list/tuple (records length)`, a dictionary (records keys), or an int/float (records its value).
  * `print(f"\nFound {{len(existing_vars)}}/{{len(required_vars)}} required variables")`: Provides a summary of how many expected variables were found.
  * `for var_name, expected_type in {expected_structure}.items(): ... validation_results[f'{{var_name}}_type_match'] = actual_type == expected_type`: This loop performs an additional check to see if the actual type of the found variables matches the expected_type provided in `expected_structure`.
  * `return self.python_repl.run(validation_code)`: Executes the validation script and returns its output.

* `def validate_algorithm_correctness(...) -> str:`: This method is designed to test Python functions (algorithms) that have been defined in previous code executions within the REPL. It runs a set of predefined test cases against these functions.
 * `description: str`: A description of the algorithm being validated.
 * `test_cases: List[Dict[str, Any]]`: A list of dictionaries, where each dictionary represents a single test case. Each test case dictionary should typically contain:
   * `'name'`: (Optional) A name for the test case.
   * `'input'`: The input value(s) to pass to the function.
   * `'expected'`: The expected output from the function.
   * `'function'`: The name of the function (as a string) to be tested.
 * `validation_code = f"""..."""`: This constructs the Python script for running the tests.
 * `test_results = []`: A list to store the outcome of each individual test.
 * `for i, test_case in enumerate(test_cases):`: Loops through each test case provided.
 * `test_name, input_val, expected, function_name:` Extracts information for the current test case.
 * `if function_name and function_name in globals():`: Checks if the function name exists in the REPL's global scope.
 * `func = globals()[function_name]`: Retrieves the function object.
 * `if callable(func):`: Ensures that what we got from globals() is actually a function (or any callable object).
 * `if isinstance(input_val, (list, tuple)): result = func(*input_val) else: result = func(input_val)`: This handles function arguments. If the `input_val` is a list or tuple, it uses * (the "splat" operator) to unpack it into separate arguments (e.g., func([1, 2]) becomes func(1, 2)). Otherwise, it passes the input_val as a single argument.
 * `passed = result == expected`: Compares the actual result with the expected result.
 * `test_results.append({ ... })`: Stores the details of the test case, including whether it passed.
 * `status = "✓ PASS" if passed else "✗ FAIL"`: Generates a status message for printing.
 * **Error Handling:** Includes try-except blocks to catch errors during function calls or if the function is not found/callable.

**Summary**: After all tests, it calculates `passed_tests, total_tests, and success_rate`, and prints a summary.
 * `test_results`: This list of individual test outcomes is the last expression, so it will be captured as the execution_result of this validation run.

This `ResultValidator`demonstrates powerful concepts like:

`Class Collaboration`: How one class (ResultValidator) uses an instance of another class (PythonREPLTool).
`Dynamic Code Generation`: Creating Python code as a string and then executing it using eval() or exec().
`Introspection`: Using globals() to inspect the state of the executed Python environment.
`String Formatting`: Extensive use of f-strings for building complex code strings.
`Error Handling`: Robust try-except blocks.
`Regular Expressions`: For parsing specific patterns (numbers) from text.


In [4]:
class ResultValidator:
    def __init__(self, python_repl: PythonREPLTool):
        self.python_repl = python_repl

    def validate_mathematical_result(self, description: str, expected_properties: Dict[str, Any]) -> str:
        """Validate mathematical computations"""
        validation_code = f"""
# Validation for: {description}
validation_results = {{}}


# Get the last execution results
history = {self.python_repl.execution_history}
if history:
    last_execution = history[-1]
    print(f"Last execution output: {{last_execution['output']}}")

    # Extract numbers from the output
    import re
    numbers = re.findall(r'\d+(?:\.\d+)?', last_execution['output'])
    if numbers:
        numbers = [float(n) for n in numbers]
        validation_results['extracted_numbers'] = numbers

        # Validate expected properties
        for prop, expected_value in {expected_properties}.items():
            if prop == 'count':
                actual_count = len(numbers)
                validation_results[f'count_check'] = actual_count == expected_value
                print(f"Count validation: Expected {{expected_value}}, Got {{actual_count}}")
            elif prop == 'max_value':
                if numbers:
                    max_val = max(numbers)
                    validation_results[f'max_check'] = max_val <= expected_value
                    print(f"Max value validation: {{max_val}} <= {{expected_value}} = {{max_val <= expected_value}}")
            elif prop == 'min_value':
                if numbers:
                    min_val = min(numbers)
                    validation_results[f'min_check'] = min_val >= expected_value
                    print(f"Min value validation: {{min_val}} >= {{expected_value}} = {{min_val >= expected_value}}")
            elif prop == 'sum_range':
                if numbers:
                    total = sum(numbers)
                    min_sum, max_sum = expected_value
                    validation_results[f'sum_check'] = min_sum <= total <= max_sum
                    print(f"Sum validation: {{min_sum}} <= {{total}} <= {{max_sum}} = {{min_sum <= total <= max_sum}}")


print("Validation Summary:")
for key, value in validation_results.items():
    print(f"{{key}}: {{value}}")


validation_results
"""
        return self.python_repl.run(validation_code)

    def validate_data_analysis(self, description: str, expected_structure: Dict[str, Any]) -> str:
        """Validate data analysis results"""
        validation_code = f"""
# Data Analysis Validation for: {description}
validation_results = {{}}


# Check if required variables exist in global scope
required_vars = {list(expected_structure.keys())}
existing_vars = []


for var_name in required_vars:
    if var_name in globals():
        existing_vars.append(var_name)
        var_value = globals()[var_name]
        validation_results[f'{{var_name}}_exists'] = True
        validation_results[f'{{var_name}}_type'] = type(var_value).__name__

        # Type-specific validations
        if isinstance(var_value, (list, tuple)):
            validation_results[f'{{var_name}}_length'] = len(var_value)
        elif isinstance(var_value, dict):
            validation_results[f'{{var_name}}_keys'] = list(var_value.keys())
        elif isinstance(var_value, (int, float)):
            validation_results[f'{{var_name}}_value'] = var_value

        print(f"✓ Variable '{{var_name}}' found: {{type(var_value).__name__}} = {{var_value}}")
    else:
        validation_results[f'{{var_name}}_exists'] = False
        print(f"✗ Variable '{{var_name}}' not found")


print(f"Found {{len(existing_vars)}}/{{len(required_vars)}} required variables")


# Additional structure validation
for var_name, expected_type in {expected_structure}.items():
    if var_name in globals():
        actual_type = type(globals()[var_name]).__name__
        validation_results[f'{{var_name}}_type_match'] = actual_type == expected_type
        print(f"Type check '{{var_name}}': Expected {{expected_type}}, Got {{actual_type}}")


validation_results
"""
        return self.python_repl.run(validation_code)

    def validate_algorithm_correctness(self, description: str, test_cases: List[Dict[str, Any]]) -> str:
        """Validate algorithm implementations with test cases"""
        validation_code = f"""
# Algorithm Validation for: {description}
validation_results = {{}}
test_results = []


test_cases = {test_cases}


for i, test_case in enumerate(test_cases):
    test_name = test_case.get('name', f'Test {{i+1}}')
    input_val = test_case.get('input')
    expected = test_case.get('expected')
    function_name = test_case.get('function')

    print(f"\nRunning {{test_name}}:")
    print(f"Input: {{input_val}}")
    print(f"Expected: {{expected}}")

    try:
        if function_name and function_name in globals():
            func = globals()[function_name]
            if callable(func):
                if isinstance(input_val, (list, tuple)):
                    result = func(*input_val)
                else:
                    result = func(input_val)

                passed = result == expected
                test_results.append({{
                    'test_name': test_name,
                    'input': input_val,
                    'expected': expected,
                    'actual': result,
                    'passed': passed
                }})

                status = "✓ PASS" if passed else "✗ FAIL"
                print(f"Actual: {{result}}")
                print(f"Status: {{status}}")
            else:
                print(f"✗ ERROR: '{{function_name}}' is not callable")
        else:
            print(f"✗ ERROR: Function '{{function_name}}' not found")

    except Exception as e:
        print(f"✗ ERROR: {{str(e)}}")
        test_results.append({{
            'test_name': test_name,
            'error': str(e),
            'passed': False
        }})


# Summary
passed_tests = sum(1 for test in test_results if test.get('passed', False))
total_tests = len(test_results)
validation_results['tests_passed'] = passed_tests
validation_results['total_tests'] = total_tests
validation_results['success_rate'] = passed_tests / total_tests if total_tests > 0 else 0


print(f"=== VALIDATION SUMMARY ===")
print(f"Tests passed: {{passed_tests}}/{{total_tests}}")
print(f"Success rate: {{validation_results['success_rate']:.1%}}")


test_results
"""
        return self.python_repl.run(validation_code)

In [5]:
#1. Create an instance of PythonREPLTool
python_repl = PythonREPLTool()
# 2. Create an instance of ResultValidator, linking it to the REPL
validator = ResultValidator(python_repl)

# Understanding the `Tool` Concept
First, it's important to understand what a Tool generally represents in this context
* A `Tool` is a callable function or object that an AI agent can use to perform specific actions or retrieve information.
* Each `Tool` has a name (how the AI refers to it), a description (how the AI understands what it does and when to use it), and a func (the actual Python function that gets called when the tool is used).

These tools essentially allow an AI model (which might only understand text) to interact with and execute Python code or perform validation checks by translating its textual "thoughts" into actionable function calls.

 ## Explaining the `python_tool` Definition

Define a tool that allows an AI (or any part of your program) to run Python code interactively.

 * `name="python_repl"`: This gives the tool a short, `descriptive name: python_repl`. An AI agent would refer to this tool by this name when deciding to execute Python code.
 * `description="Execute Python code and return both the code and its output.
 * `Maintains state between executions.`: This is a crucial part. It provides a human-readable (and, more importantly, machine-readable for an AI) explanation of what the python_repl tool does.
 * `"Execute Python code and return both the code and its output"`: Clearly states the primary action and what to expect as a result.
 * `"Maintains state between executions"`: This is a very important detail! It tells the AI that if it defines a variable in one call to `python_repl`, that variable will still exist and be accessible in a subsequent call. This is directly supported by your `PythonREPLTool's` `self.globals_dict` and `self.locals_dict`.
 * `func=python_repl.run`: This assigns the actual Python function that will be executed when this tool is called.
 * `python_repl`: This variable refers to an instance of your PythonREPLTool class (e.g., repl = PythonREPLTool() from our previous examples). So, python_repl would be an object like repl.
 * `.run`: This is the specific method within that `PythonREPLTool` instance that takes a string of Python code and executes it, returning a formatted string with the code, output, and any errors.

 In essence, python_tool is like giving your AI a direct console to type Python commands into and get results back, remembering everything it's done so far.

 ## Explaining the `validation_tool` Definition
 This defines a tool for checking the accuracy or properties of previous computational results.
  * `name="result_validator"`:
    * This is the name an AI agent would use when it needs to verify a computation.
  * `description="Validate the results of previous computations with specific test cases and expected properties.":`
    * This description clearly outlines the tool's purpose: it checks previous results.
    * `with specific test cases and expected properties`: This hints at how the validation works, implying it needs criteria to check against.
  * `func=lambda query: validator.validate_mathematical_result(query, {})`:
    * This is where it gets a little more nuanced due to the `lambda` function.
    * `lambda query:`: This defines a small, anonymous function. This `lambda` function takes one argument, query.
     * In many tool frameworks, tools are expected to take a single string argument from the AI (often called query or input) which the AI uses to convey its intent.
     * `validator:` This refers to an instance of your `ResultValidator` class
     * `.validate_mathematical_result(query, {})`: This is the method from your `ResultValidator` class that gets called.
      * `query`: The lambda passes the query (the input from the AI) as the description argument to `validate_mathematical_result`. This means the AI can tell the validator what it's validating.
      * `{}`: This is the key part that simplifies it for a basic tool. It passes an empty dictionary `{}` as the `expected_properties` argument.
 `What this implies:` This specific validation_tool as defined here will only run the `validate_mathematical_result` function with an empty set of expected properties. This means it might extract numbers and give basic info, but it won't perform any actual checks against specific values unless the `validate_mathematical_result` method itself has default checks, or this tool is primarily meant to just parse and show the numbers.

In [6]:
python_tool = Tool(
    name="python_repl",
    description="Execute Python code and return both the code and its output. Maintains state between executions.",
    func=python_repl.run
)


validation_tool = Tool(
    name="result_validator",
    description="Validate the results of previous computations with specific test cases and expected properties.",
    func=lambda query: validator.validate_mathematical_result(query, {})
)

# How They Work Together (Implicitly)
When an AI agent (like a large language model) is given access to these Tool objects, it receives their name and description. Based on a user's prompt or its internal reasoning, the AI decides which tool to use and what query (or input string) to provide to that tool.

For example:

If the user asks, `"What is 5 + 7?"`, the AI might decide to use python_repl with the query `"print(5 + 7)"`.
After getting the output from `python_repl`, if the AI wants to verify the result, it might then call `result_validator` with a query like `"Checking the sum of 5 and 7"`.
This modular approach makes AI agents more powerful by giving them specific capabilities beyond just generating text. They can now act on information and verify their actions.


We are setting up the core of an Agentic AI system. This `prompt_template` is absolutely crucial because it dictates how your AI model (which you're calling "Claude" here) will think and interact with the tools you've defined (`python_repl` and `result_validator`).

Let's break down each part of this `prompt_template` line by line.

# Explanation of prompt_template
This entire block is a multi-line string that defines the `"persona"` and `"instructions"` for your AI agent. It tells the LLM how to behave and what its capabilities are.

1. `You are Claude, an advanced AI assistant with Python execution and result validation capabilities.:`

 * `Purpose:` This is the system instruction or persona setting. It tells the AI model what role it should embody `("Claude")` and what its core abilities are. This helps the LLM align its responses and reasoning with the defined role.

 * `Impact:` When the LLM receives this, it understands it has two key functionalities: `"Python execution"` and `"result validation."`

2. `You can execute Python code to solve complex problems and then validate your results to ensure accuracy.:`

 * `Purpose`: Reinforces the capabilities mentioned above and emphasizes the purpose of those capabilities – solving problems and ensuring accuracy. This is a subtle hint that emphasizes validation is important.
3. `Available tools: {tools}:`

 * `Purpose`: This is a placeholder `({tools})`. When the `PromptTemplate` is rendered, this placeholder will be dynamically filled with information about the tools the AI has access to.
 * `How it works (with partial_variables below)`: The `partial_variables` section of your PromptTemplate provides the string `"python_repl - Execute Python codenresult_validator - Validate computation results"`. So, when the prompt is sent to the LLM, it will literally see:

    ```
    Available tools:
    python_repl - Execute Python code
    result_validator - Validate computation results
    ```

 * `Impact`: This gives the LLM the names and descriptions of the tools it can use, helping it decide which tool is appropriate for a given task.

 * `Use this format:`:
    * `Purpose`: This is a strict instruction on the expected output format from the LLM. This is critical for agentic setups. The external code that runs the agent (often called the `"agent executor"` or `"orchestrator"`) will parse the LLM's output based on this exact format to identify when the LLM wants to perform an `"Action"` (use a tool) or has a `"Final Answer."`
 * `Question`: the input question you must answer:
    * `Purpose`: Shows the LLM where the user's initial question will appear.
 * `Thought: analyze what needs to be done:`
    * `Purpose`: Instructs the LLM to start its reasoning process by explaining its thinking. This makes the agent's behavior more transparent and helps the LLM chain its thoughts logically.
 * `Action: {tool_names}:`
    * `Purpose`: Instructs the LLM that when it wants to use a tool, it should output `Action: "` followed by the name of the tool it wishes to use.
    * `How it works (with partial_variables)`: The partial_variables will replace `{tool_names}` with `"python_repl, result_validator"`.
 *  `Action Input: [your input]:`
    * `Purpose`: Instructs the LLM that after choosing an action, it must provide the input for that tool. For `python_repl`, this would be the Python code. For `result_validator`, it would be the description/validation parameters.
 * `Observation: [result]:`
    * `Purpose`: This is where the output from the executed tool will be inserted back into the prompt by the agent executor. The LLM then `"observes"` this result and can continue its reasoning. This is how the agent `"sees"` the effects of its actions.
  * `... (repeat Thought/Action/Action Input/Observation as needed):`
    * `Purpose`: Explicitly tells the LLM that this problem-solving process can be iterative. It can go through multiple cycles of thinking, acting, and observing until the problem is solved.
  * `Thought: I should validate my results`:
    * `Purpose`: A specific instruction to guide the LLM's thought process towards validation. This reinforces the importance of the `result_validator` tool.
  * `Action: [validation if needed]:`
    * `Purpose`: Instructs the LLM to call the validation tool when appropriate.
  * `Action Input: [validation parameters]:`
    * `Purpose` : The specific parameters for the validation tool.
  * `Observation: [validation results]:`
    * `Purpose`: Where the output of the validation tool will be inserted.
  * `Thought: I now have the complete answer:`
    * `Purpose:` Instructs the LLM to signal when it believes it has finished the task.
  * `Final Answer: [comprehensive answer with validation confirmation]:`
     * `Purpose`: The final, comprehensive answer that the LLM provides to the user, ideally confirming any validations performed.
  * `Question: {input}:`
     * `Purpose`: Another placeholder ({input}) where the actual user query for the agent will be inserted.
  * `{agent_scratchpad}:`
    * `Purpose`: This is a crucial placeholder. The `agent_scratchpad` is where the history of the `Thought`, `Action`, `Action Input`, and `Observation` steps during the current turn will be built up and inserted. It serves as the agent's "memory" or "scratchpad" for the current interaction, allowing the LLM to see its previous steps and their results.

# Explaining the PromptTemplate Instantiation

 * This part creates an instance of a PromptTemplate object (likely from a library like LangChain).

   * template=prompt_template:
     * This assigns the multi-line string we just analyzed as the base template for the prompt.
   * input_variables=["input", "agent_scratchpad"]:
     * These are the variables that the PromptTemplate expects to receive at runtime (when you actually use the prompt) to fill in the placeholders in the template.
   * input: This will be the user's initial question.
   * agent_scratchpad: This will be the dynamic history of the agent's actions and observations.
   * `partial_variables={ ... }:`
     * These are variables that are pre-filled or fixed within the `PromptTemplate` when it's defined. They are not expected to change during subsequent calls.
      * `"tools": "python_repl - Execute Python codenresult_validator - Validate computation results"`: As explained before, this string provides the detailed list of tools and their descriptions that the LLM will see. The \n creates a newline.
      * `"tool_names": "python_repl, result_validator"`: This provides a comma-separated list of the tool names. As noted, the placement of {tool_names} in the Action: line might be slightly non-standard for strict parsing, but it serves to remind the LLM of the available names. A more common format would be Action: <tool_name>.


In essence, this `PromptTemplate` is your instruction manual for the AI agent. It tells the AI:
Who it is (You are Claude...).
What capabilities it has (Python execution, result validation).
What tools are available (python_repl, result_validator).
Crucially, what exact output format to use when thinking, performing actions, and providing a final answer.
How to iterate through problem-solving steps.





In [7]:
prompt_template = """You are Claude, an advanced AI assistant with Python execution and result validation capabilities.


You can execute Python code to solve complex problems and then validate your results to ensure accuracy.


Available tools:
{tools}


Use this format:
Question: the input question you must answer
Thought: analyze what needs to be done
Action: {tool_names}
Action Input: [your input]
Observation: [result]
... (repeat Thought/Action/Action Input/Observation as needed)
Thought: I should validate my results
Action: [validation if needed]
Action Input: [validation parameters]
Observation: [validation results]
Thought: I now have the complete answer
Final Answer: [comprehensive answer with validation confirmation]


Question: {input}
{agent_scratchpad}"""


prompt = PromptTemplate(
    template=prompt_template,
    input_variables=["input", "agent_scratchpad"],
    partial_variables={
        "tools": "python_repl - Execute Python codenresult_validator - Validate computation results",
        "tool_names": "python_repl, result_validator"
    }
)

This is where all the pieces come together! This `AdvancedClaudeCodeAgent` class encapsulates the entire Agentic AI system. It sets up the Large Language Model, integrates your custom tools, and defines how the agent will operate.

We are using `LangChain` or a very similar framework here, given the `ChatAnthropic`, `create_react_agent`, and `AgentExecutor` constructs.

Let's break it down: First, you'll need the necessary imports at the top of your script. These are typical LangChain imports:

`__init__ Function (Constructor)`

 * `def __init__(self, anthropic_api_key=None):`: This is the constructor for your `AdvancedClaudeCodeAgent` class. It takes an optional `anthropic_api_key` argument.
 * `if anthropic_api_key: os.environ["ANTHROPIC_API_KEY"] = anthropic_api_key`: This line checks if an API key is provided during the agent's initialization. If it is, it sets an environment variable named ANTHROPIC_API_KEY. LangChain's `ChatAnthropic` class (and many other LLM integrations) typically looks for this environment variable to authenticate with the Anthropic API. This is a secure way to manage API keys without hardcoding them.

`self.llm = ChatAnthropic(...):`

 * This initializes the Large Language Model (LLM) component of your agent.
 * `ChatAnthropic`: This is a LangChain class that provides an interface to Anthropic's Claude chat models.
 * `model="claude-3-opus-20240229"`: Specifies the exact Claude model to use. "Claude 3 Opus" is Anthropic's most powerful model at the time of this explanation, known for its strong reasoning and intelligence.
 * `temperature=0` : Controls the randomness of the LLM's output. A temperature of 0 makes the output more deterministic and factual, which is generally desired for code execution and validation tasks where correctness is paramount. Higher temperatures lead to more creative but potentially less reliable output.
 * `max_tokens=4000`: Sets the maximum number of tokens (words/sub-words) that the LLM is allowed to generate in a single response. This helps control costs and prevent excessively long outputs.

`self.agent = create_react_agent(...):`
 * This is where the agent's "brain" or "reasoning engine" is created. LangChain's create_react_agent function constructs an agent that follows the ReAct (Reasoning and Acting) prompting paradigm.
 * `llm=self.llm`: The LLM created above is passed in. This LLM will be responsible for generating the "Thought" and "Action" steps based on the prompt.
 * `tools=[python_tool, validation_tool]:` This provides the list of executable tools that the agent can use. These are the Tool objects you defined in the previous step. The agent knows it can select one of these by outputting its name.
 * `prompt=prompt`: The PromptTemplate you designed earlier is passed here. This template is what guides the LLM's behavior and output format, ensuring it adheres to the "Thought-Action-Observation" loop.

 `self.agent_executor = AgentExecutor(...):`
 * This is the orchestrator that brings everything together. The `AgentExecutor` is responsible for:
   * Taking the user's initial input.
   * Sending the constructed prompt (with input and agent_scratchpad) to the self.agent (the LLM).
   * Parsing the LLM's output (Thought:, Action:, Action Input:).
   * Executing the specified tool (python_tool or validation_tool) with the provided input.
   * Taking the Observation: (the tool's output) and appending it to the agent_scratchpad.
   * Repeating this loop until the LLM produces a Final Answer: or reaches max_iterations.
 * `agent=self.agent`: The agent brain to use.
 * `tools=[python_tool, validation_tool]`: The executor also needs the list of tools so it knows which actual Python functions to call when the agent requests an action.
 * `verbose=True`: This is super helpful for debugging! When verbose is True, the `AgentExecutor` will print out all the intermediate steps (Thoughts, Actions, Action Inputs, Observations) to your console, allowing you to see the agent's reasoning process.
 * `handle_parsing_errors=True`: If the LLM's output doesn't perfectly match the expected Action: or Action Input: format, this setting tells the executor to try and handle it gracefully (e.g., by telling the LLM it made a parsing error, rather than crashing).
 * `max_iterations=8`: Sets a limit on how many `Thought-Action-Observation` loops the agent can perform before giving up. This prevents infinite loops.
 * `return_intermediate_steps=True:` If True, the result returned by the `agent_executor.invoke()` call will include a list of all the intermediate steps taken by the agent.
* `self.python_repl = python_repl`:
* `self.validator = validator`: These lines store the instances of your `PythonREPLTool` and `ResultValidator` directly as attributes of the `AdvancedClaudeCodeAgent`. This provides convenient access to their methods (like get_execution_history or manual validation) if needed from outside the agent executor's loop.

# `run` Function

 * `def run(self, query: str) -> str:`: This is the primary method for interacting with your agent. You'll call this to give the agent a user question.
   * `query: str`: The input question from the user.
   * `-> str`: Type hint indicating it returns a string (the agent's final answer).
 * `try...except`: Standard error handling.
 * `result = self.agent_executor.invoke({"input": query})`:
   * This is the critical line that starts the agent's entire process.
   * `self.agent_executor.invoke(...):` Calls the AgentExecutor to run the agent.
   * `{"input": query}`: The invoke method expects a dictionary as input, where the key corresponds to the `input_variables` defined in your `PromptTemplate`.
   * The invoke method will return a dictionary containing the final output and potentially intermediate steps `(if return_intermediate_steps=True)`.
* `return result["output"]`: Returns the Final Answer generated by the agent.

# `validate_last_result` Function

 * `def validate_last_result(...) -> str:`: This method provides a way to manually trigger validation outside of the agent's automatic `Thought-Action-Observation` loop. This could be useful for debugging or for a human reviewer to perform a specific validation.
 * `description: str`: A description for the validation.
 * `validation_params: Dict[str, Any]`: A dictionary containing the parameters for the validation. This method cleverly inspects the keys in `validation_params` to decide which specific validation method from `self.validator` to call `(validate_algorithm_correctness, validate_data_analysis, or validate_mathematical_result)`.
 * `if/elif/else logic`: Checks for the presence of `test_cases` (for algorithm validation), `expected_structure` (for data analysis validation), or falls back to `validate_mathematical_result` if neither of those specific keys is found.
 * `return self.validator.validate_...`: Calls the appropriate validation method on the `ResultValidator` instance (`self.validator`) and returns its result.

# `get_execution_summary` Function
 * `def get_execution_summary() -> Dict[str, Any]:`: This utility method provides a summary of all Python code executions that have happened within the `python_repl` instance that this agent uses.
 * `history = self.python_repl.get_execution_history():` Retrieves the full execution history from the `PythonREPLTool.
return { ... }`: Returns a dictionary summarizing the history:
 * `total_executions`: Total number of times python_repl.run() was called.
 * `successful_executions`: Count of executions that had no recorded error.
 * `failed_executions`: Count of executions that had an error.
 * `execution_details`: The raw history list itself, providing all the granular details.

In [8]:
class AdvancedClaudeCodeAgent:
    def __init__(self, anthropic_api_key=None):
        if anthropic_api_key:
            os.environ["ANTHROPIC_API_KEY"] = anthropic_api_key

        self.llm = ChatAnthropic(
            model="claude-3-opus-20240229",
            temperature=0,
            max_tokens=4000
        )

        self.agent = create_react_agent(
            llm=self.llm,
            tools=[python_tool, validation_tool],
            prompt=prompt
        )

        self.agent_executor = AgentExecutor(
            agent=self.agent,
            tools=[python_tool, validation_tool],
            verbose=True,
            handle_parsing_errors=True,
            max_iterations=8,
            return_intermediate_steps=True
        )

        self.python_repl = python_repl
        self.validator = validator

    def run(self, query: str) -> str:
        try:
            result = self.agent_executor.invoke({"input": query})
            return result["output"]
        except Exception as e:
            return f"Error: {str(e)}"

    def validate_last_result(self, description: str, validation_params: Dict[str, Any]) -> str:
        """Manually validate the last computation result"""
        if 'test_cases' in validation_params:
            return self.validator.validate_algorithm_correctness(description, validation_params['test_cases'])
        elif 'expected_structure' in validation_params:
            return self.validator.validate_data_analysis(description, validation_params['expected_structure'])
        else:
            return self.validator.validate_mathematical_result(description, validation_params)

    def get_execution_summary(self) -> Dict[str, Any]:
        """Get summary of all executions"""
        history = self.python_repl.get_execution_history()
        return {
            'total_executions': len(history),
            'successful_executions': len([h for h in history if not h['error']]),
            'failed_executions': len([h for h in history if h['error']]),
            'execution_details': history
        }

In [9]:
# Assuming all previous classes (PythonREPLTool, ResultValidator)
# and objects (python_repl, validator, python_tool, validation_tool, prompt)
# are defined and available in your script or imported.

# Initialize the agent
# Replace "YOUR_ANTHROPIC_API_KEY" with your actual key
agent = AdvancedClaudeCodeAgent(anthropic_api_key="your anthropic key")

# Now, ask the agent a question that requires Python execution and perhaps validation
user_question_1 = "Calculate the sum of squares for numbers from 1 to 5. Then validate that the final sum is 55."
print(f"\n--- Agent Query 1 ---")
print(f"User: {user_question_1}")
agent_response_1 = agent.run(user_question_1)
print(f"\nAgent Final Response:\n{agent_response_1}")

print("\n" + "="*50 + "\n")


--- Agent Query 1 ---
User: Calculate the sum of squares for numbers from 1 to 5. Then validate that the final sum is 55.


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mQuestion: Calculate the sum of squares for numbers from 1 to 5. Then validate that the final sum is 55.

Thought: To calculate the sum of squares from 1 to 5, I can use a simple loop in Python to square each number and accumulate the sum. Then I'll validate that the final sum equals 55.

Action: python_repl
Action Input:
sum_squares = 0
for i in range(1, 6):
    sum_squares += i**2
print(sum_squares)
[0m[36;1m[1;3m**Code Executed:**n```pythonnsum_squares = 0
for i in range(1, 6):
    sum_squares += i**2
print(sum_squares)
n```nn**Output:**n55
[0m[32;1m[1;3mThe Python code correctly calculated the sum of squares from 1 to 5 as 55. To validate this, I will use the result_validator tool.

Action: result_validator
Action Input: 
The sum of squares of the first 5 positive integers should equal 55.
1^2 + 

In [None]:
# Assuming all previous classes (PythonREPLTool, ResultValidator)
# and objects (python_repl, validator, python_tool, validation_tool, prompt)
# are defined and available in your script or imported.

# Initialize the agent
# Replace "YOUR_ANTHROPIC_API_KEY" with your actual key
agent = AdvancedClaudeCodeAgent(anthropic_api_key="YourAnthropicKey")

# Now, ask the agent a question that requires Python execution and perhaps validation
user_question_1 = "Calculate the sum of squares for numbers from 1 to 5. Then validate that the final sum is 55."
print(f"\n--- Agent Query 1 ---")
print(f"User: {user_question_1}")
agent_response_1 = agent.run(user_question_1)
print(f"\nAgent Final Response:\n{agent_response_1}")

print("\n" + "="*50 + "\n")

user_question_2 = """
Define a Python function named 'is_prime' that takes an integer and returns True if it's prime, False otherwise.
Then, test it with is_prime(7) and is_prime(4).
Validate that is_prime(7) is True and is_prime(4) is False.
"""
print(f"\n--- Agent Query 2 ---")
print(f"User: {user_question_2}")
agent_response_2 = agent.run(user_question_2)
print(f"\nAgent Final Response:\n{agent_response_2}")

print("\n" + "="*50 + "\n")

# You can also manually inspect the execution history
print("\n--- Python REPL Execution Summary ---")
print(agent.get_execution_summary())

# Or trigger manual validation
# print("\n--- Manual Validation ---")
# print(agent.validate_last_result(
#     "Checking last python run output",
#     {'expected_structure': {'df': 'DataFrame'}}
# ))


This is the **main execution block** of your Agentic AI application! This is the part of the script that you would actually run to put your `AdvancedClaudeCodeAgent` into action and see it solve complex problems.

Let's break down this `if __name__ == "__main__":` block:

# `if __name__ == "__main__":` Block

  * `if __name__ == "__main__"::` This is a standard Python idiom. It means: "If this script is being run directly (not imported as a module into another script), then execute the code inside this block."
   * `Purpose`: It ensures that the code within this block only runs when the file is the main program being executed. This is good practice for organizing Python projects, especially when files might contain both executable code and definitions (like classes) that could be imported elsewhere.
  * `API_KEY = "Use Your Own Key Here":`
    * This line declares a variable `API_KEY`.
    * `Crucial`: You must replace `"Use Your Own Key Here"` with your actual Anthropic API key to make the agent functional. Without a valid key, the `ChatAnthropic` model won't be able to connect to Anthropic's services.
  * `agent = AdvancedClaudeCodeAgent(anthropic_api_key=API_KEY):`
    * This is the instantiation of your AdvancedClaudeCodeAgent class.
    * It creates an object named agent that represents your entire AI system.
    * The `API_KEY` is passed to the constructor `(__init__)`, which then sets the environment variable `ANTHROPIC_API_KEY` internally, allowing the `ChatAnthropic` model to authenticate.
  * `print("🚀 Advanced Claude Code Agent with Validation")`
  * `print("=" * 60):`
    * These lines simply print a header to the console when the script starts, indicating what the program is. The * 60 creates a line of 60 = characters for visual separation.

# Example Queries (query1, query2, query3, query4)
This section defines several complex `query` strings, each representing a distinct, multi-step problem you're asking your `AdvancedClaudeCodeAgent` to solve.

`print("\n🔢 Example 1: Prime Number Analysis with Twin Prime Detection")`: Prints a title for the first example.

`query1 = """..."""`: This defines a multi-line string (using triple quotes """) that contains the detailed instructions for the agent.

  * It asks the agent to find prime numbers, calculate their sum, identify twin primes, calculate average gaps, and find the largest gap.
  * **Crucially, it explicitly instructs the agent to "validate" certain aspects after computation**. This is how your `result_validator` tool gets triggered by the agent. The agent, guided by the `prompt_template`, will recognize the validation instruction and decide to use the `result_validator` tool.

`result1 = agent.run(query1):`
  * This is where the magic happens for query1. You call the `run()` method on your agent object, passing the query1 string.
  * The `agent.run()` method (which in turn calls `agent_executor.invoke()`) will
    1.  Send `query1` to the `Claude LLM` along with the `prompt_template` and tool definitions.
    2. The LLM will start its `Thought-Action-Observation` loop
      * It will `Thought`: about the problem.
      * It will `Action`: `python_repl` to write and execute Python code to find primes, sum them, etc.
      * It will `Observation`: the output from the `python_repl`.
      * It will continue to `Thought and Action`: `python_repl` as needed to complete the computational tasks.
      * When it reaches the validation instruction in the query, it will `Thought`: about validation.
      * It will `Action`: `result_validator` with appropriate `Action` Input.
      * It will `Observation`: the validation results.
      * Finally, it will `Thought`: it has a complete answer and output a Final Answer:.
  * The `result1` variable will store the Final Answer string returned by the agent.
  * `print(result1):` Prints the agent's final answer to the console.
  * `print("\n" + "=" * 80 + "\n")`: Prints a separator line for visual clarity between examples.
  * `query2, query3, query4`: These follow the exact same pattern as query1. They define other complex, multi-step problems (`Sales Data Analysis`, `Algorithm Implementation`, `Machine Learning Pipeline`) that challenge the agent's ability to:
   * Understand diverse problem domains.
   * Break down complex tasks into executable Python code steps.
   * Utilize its internal state (variables defined in one `python_repl` call are available in the next).
   * Strategically employ the `result_validator` tool when instructed to validate.

# Final Execution Summary
 * `print("📋 Execution Summary")`
 * `print("-" * 60):` Prints a title for the summary.
 * `summary = agent.get_execution_summary():` Calls the get_execution_summary() method on your agent object. This method (which we explained previously) retrieves and processes the history of all python_repl executions performed throughout the execution of query1, query2, query3, and query4.
  * `print(f"Total code executions: {summary['total_executions']}")`: Prints the total number of times the python_repl tool was invoked.
  * `print(f"Successful executions: {summary['successful_executions']}")`: Prints how many of those python_repl invocations completed without an error.
  * `print(f"Failed executions: {summary['failed_executions']}"):` Prints how many of those python_repl invocations resulted in an error.
  * `if summary['failed_executions'] > 0: ...:` If there were any failures, this block iterates through the detailed history and prints the error message for each failed execution. This is invaluable for debugging if your agent encounters issues.
  * `print(f"\nSuccess rate: {(summary['successful_executions']/summary['total_executions']*100):.1f}%"):` Calculates and prints the overall success rate of the Python code executions.


In [None]:
if __name__ == "__main__":
    API_KEY = "Use Your Own Key Here"

    agent = AdvancedClaudeCodeAgent(anthropic_api_key=API_KEY)

    print("🚀 Advanced Claude Code Agent with Validation")
    print("=" * 60)

    print("n🔢 Example 1: Prime Number Analysis with Twin Prime Detection")
    print("-" * 60)
    query1 = """
    Find all prime numbers between 1 and 200, then:
    1. Calculate their sum
    2. Find all twin prime pairs (primes that differ by 2)
    3. Calculate the average gap between consecutive primes
    4. Identify the largest prime gap in this range
    After computation, validate that we found the correct number of primes and that all identified numbers are actually prime.
    """
    result1 = agent.run(query1)
    print(result1)

    print("n" + "=" * 80 + "n")

    print("📊 Example 2: Advanced Sales Data Analysis with Statistical Validation")
    print("-" * 60)
    query2 = """
    Create a comprehensive sales analysis:
    1. Generate sales data for 12 products across 24 months with realistic seasonal patterns
    2. Calculate monthly growth rates, yearly totals, and trend analysis
    3. Identify top 3 performing products and worst 3 performing products
    4. Perform correlation analysis between different products
    5. Create summary statistics (mean, median, standard deviation, percentiles)
    After analysis, validate the data structure, ensure all calculations are mathematically correct, and verify the statistical measures.
    """
    result2 = agent.run(query2)
    print(result2)

    print("n" + "=" * 80 + "n")

    print("⚙️ Example 3: Advanced Algorithm Implementation with Test Suite")
    print("-" * 60)
    query3 = """
    Implement and validate a comprehensive sorting and searching system:
    1. Implement quicksort, mergesort, and binary search algorithms
    2. Create test data with various edge cases (empty lists, single elements, duplicates, sorted/reverse sorted)
    3. Benchmark the performance of different sorting algorithms
    4. Implement a function to find the kth largest element using different approaches
    5. Test all implementations with comprehensive test cases including edge cases
    After implementation, validate each algorithm with multiple test cases to ensure correctness.
    """
    result3 = agent.run(query3)
    print(result3)

    print("n" + "=" * 80 + "n")

    print("🤖 Example 4: Machine Learning Model with Cross-Validation")
    print("-" * 60)
    query4 = """
    Build a complete machine learning pipeline:
    1. Generate a synthetic dataset with features and target variable (classification problem)
    2. Implement data preprocessing (normalization, feature scaling)
    3. Implement a simple linear classifier from scratch (gradient descent)
    4. Split data into train/validation/test sets
    5. Train the model and evaluate performance (accuracy, precision, recall)
    6. Implement k-fold cross-validation
    7. Compare results with different hyperparameters
    Validate the entire pipeline by ensuring mathematical correctness of gradient descent, proper data splitting, and realistic performance metrics.
    """
    result4 = agent.run(query4)
    print(result4)

    print("n" + "=" * 80 + "n")

    print("📋 Execution Summary")
    print("-" * 60)
    summary = agent.get_execution_summary()
    print(f"Total code executions: {summary['total_executions']}")
    print(f"Successful executions: {summary['successful_executions']}")
    print(f"Failed executions: {summary['failed_executions']}")

    if summary['failed_executions'] > 0:
        print("nFailed executions details:")
        for i, execution in enumerate(summary['execution_details']):
            if execution['error']:
                print(f"  {i+1}. Error: {execution['error']}")

    print(f"nSuccess rate: {(summary['successful_executions']/summary['total_executions']*100):.1f}%")

# Overall Purpose of this Block:

This `if __name__ == "__main__"`: block serves as the driver for your Agentic AI application. It demonstrates how to:

1. Initialize your AI agent.
2. Provide complex instructions (queries) to the agent.
3. Observe the agent's high-level responses (the Final Answer).
4. Get a summary of the underlying Python code execution, giving you insights into how effectively the agent used its python_repl tool.

When you run this script, you'll see a lot of output because `verbose=True` is set in the `AgentExecutor`. This will show you the LLM's Thought process, the Action it takes, the `Action Input` it generates, and the `Observation` (output) it receives from your `python_repl` and `result_validator` tools. This provides a fascinating window into how an LLM can use external tools to solve problems step-by-step.