In [23]:
import builtins
import contextlib
import io
import math
import os
from typing import Any
import requests
import json
from langchain_openai import ChatOpenAI
from langchain_openai import ChatOpenAI
from langgraph_codeact import create_codeact

import numpy as np
import pandas as pd
import scipy
import sklearn

from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("OPENROUTER_API_KEY")

exec_globals = builtins.__dict__.copy()
exec_globals.update({
    "np": np,
    "pd": pd,
    "scipy": scipy,
    "sklearn": sklearn,
    "math": math,
})

def eval_code(code: str, context: dict[str, Any]) -> tuple[str, dict[str, Any]]:
    """
    A safe and enhanced code evaluation function to execute generated Python code.
    It uses traceback for detailed error reporting and handles variable scope correctly.
    """
    import traceback

    # The execution scope starts with the globally available libraries.
    exec_scope = exec_globals.copy()
    # It is then updated with the context from previous turns.
    exec_scope.update(context)

    stdout_io = io.StringIO()
    try:
        with contextlib.redirect_stdout(stdout_io):
            # Execute the code in the prepared scope.
            # New variables will be added to exec_scope.
            exec(code, exec_scope)
        
        output = stdout_io.getvalue()
        if not output:
            output = "<code ran, no output printed to stdout>"

    except Exception:
        # Capture the full traceback on error.
        output = f"Error during execution:\n{traceback.format_exc()}"

    # The exec_scope now contains the state after execution.
    # We filter out any non-serializable types (like class definitions)
    # before returning the context for the next turn.
    context_after_exec = {
        k: v for k, v in exec_scope.items() if not isinstance(v, type)
    }
    
    return output, context_after_exec

llm = ChatOpenAI(
        openai_api_base="https://openrouter.ai/api/v1",
        openai_api_key=os.getenv("OPENROUTER_API_KEY"),
        model="google/gemini-2.5-pro",
        streaming=False,
        max_completion_tokens=20000,
        request_timeout=120,
    )

In [22]:
print(json.loads(response.content.strip())["choices"][0]["message"]["content"])

That is, without a doubt, one of the most profound and enduring questions humanity has ever asked. There is no single, universally accepted answer, and that is the beauty of it. The question itself pushes us to think about our values, our purpose, and our place in the universe.

Instead of one answer, let's explore the most powerful perspectives that have been offered throughout history. They generally fall into a few key categories.

### 1. The Religious & Spiritual Perspective: Meaning is Given
This viewpoint suggests that meaning is given to us by a higher power or cosmic order. Our purpose is to discover and fulfill that predefined role.

*   **Abrahamic Religions (Christianity, Islam, Judaism):** The meaning of life is to know, love, and serve God. Life on Earth is a test or a journey, with the ultimate goal being salvation, paradise, or unity with God. The purpose is found in devotion, righteous action, and adherence to divine commandments.
*   **Eastern Religions (Hinduism, Budd

In [None]:
"""
You are a chemical kinetics expert analyzing the kinetic data.
    Your task is to develop a set of features to classify the reaction into one of the following four reaction classes: M6, M8, M11, M14.
    **All of your reasoning, hypotheses, and feature ideas must be grounded in the principles of chemical kinetics and the provided reaction mechanisms.**
    
    You should respond with the following format:
    iteration: <iteration_number>/100
    <reflection>
    For example,
    - observations for the previous trial.
    - the previous aim is achieved or not?
    </reflection>
    <hypothesis>
    For example,
    - M6 and M8 could be distinguished by ~. ~ could be known by comparing ~.
    - ...
    </hypothesis>
    <features>
    The features that you want to try in this iteration.
    - M6 has this kind of features.
    - M8 has this kind of features.
    - M11 and M14 could be distinguished by ~.
    - ...
    </features>

    ```python
    <write your code here>
    ```

    Your mission is to find the best possible set of features to classify the reactions. To do this systematically and avoid errors, you must follow this multi-stage workflow:

    **Stage 1: Baseline Analysis**
    1.  **Load Dev Data**: Use `get_dev_data_and_y()` to get the development dataset.
    2.  **Generate Existing Features**: Use the provided tools (like `rpka_analysis.analyze_rpka_features` and `vtna_analysis.get_vtna_fingerprint`) to generate a set of baseline features.
    3.  **Baseline Evaluation**: Create a feature matrix from these baseline features and evaluate it using `evaluate_features_m6_m8_m11_m14` on the **evaluation set**. This will give you a baseline accuracy and show you which of the existing features are most important. This is your starting point. To do this, you must first generate features for the dev set, then generate features for the eval set, and pass the eval features to the evaluation function.

    **Stage 2: Iterative Feature Engineering**
    This is a cycle. Repeat these steps to improve your feature set.

    1.  **Hypothesize**: Based on the baseline results and your chemical knowledge, form a hypothesis for a new feature.
        -   **Idea 1: Combine existing features.** Can you create a more powerful feature by combining existing ones (e.g., `feature_A / feature_B`, `feature_A * feature_C`)?
        -   **Idea 2: Engineer new features from scratch.** Look at the raw data directly. Print out the `time_data` and `product_data` for a few samples from different classes. Can you see a shape or a pattern that your current features don't capture?
        -   **Idea 3: Try new techniques.** The provided tools are a good start, but they don't include everything. You could try:
            -   *Curve Fitting*: Fit a mathematical model (e.g., exponential, sigmoid) to the concentration profiles and use the fitted parameters as features.
            -   *Data Preprocessing*: Apply smoothing (e.g., a moving average) to the raw data before calculating rates to reduce noise.

    2.  **Implement**: Write the Python code to calculate your new feature(s) for all samples in the development set.

    3.  **Sanity Check**: Before running a full evaluation, you MUST check your new feature. For each class, calculate and print basic statistics (min, max, mean, std) of your new feature using the development data. 
        -   **Error Checking**: Does the `max` value look absurdly large? Is the `std` zero? This can help you catch bugs in your calculation.
        -   **Separation Power**: Is there a visible difference in the `mean` value between the classes? This gives you a hint if the feature is likely to be useful.

    4.  **Evaluate**: If the feature looks reasonable, add it to your feature matrix and test its impact. Don't feel constrained to a small number of features. Given the size of the dataset (~5000 samples per class), exploring feature sets with 20, 30, or even more potentially useful features is a good strategy. The robust nature of Random Forest and our separate evaluation set will protect against overfitting. After adding your new feature(s), regenerate the full feature matrix for the evaluation set and pass it to `evaluate_features_m6_m8_m11_m14`. Did the accuracy improve?

    **Stage 3: Final Report**
    Once you have reached the target accuracy of 0.85 or are satisfied with your feature set, generate a final report detailing your best features, the logic behind them, and the final accuracy achieved.

    **Important Rules:**
    -   **Data Separation**: Always perform your feature development and sanity checks on the **development set** (`get_dev_data_and_y`). Only use the **evaluation set** for final, unbiased scoring.
    -   **Performance Constraint**: Your feature generation code must be efficient. The entire feature extraction process for all samples in a dataset (either dev or eval) should complete in **under 20 minutes**. Be mindful of this when implementing computationally expensive methods like curve fitting. If a method is too slow, consider applying it to a subset of the data or finding a faster alternative.

    The features could be numerical values, boolean values, and string labels.
    From the previous trial, the parameters that are obtained by fitting the kinetic data could be one of the important features.

    Think about what features would be useful to distinguish between the mechanisms. Look closely at the differences in the reaction mechanisms to inspire new features. For example, M11 and M14 involve catalyst deactivation, which might be observable in the kinetic profiles as a decrease in reaction rate over time that is not due to substrate depletion.

    YOU MUST NOT USE ANY PLOT TOOLS. YOU CANNOT ACCESS TO THE IMAGE FILES.

    Iterate until you achieve the accuracy of 0.85 or the iteration number reaches 100.
    Finally, you must output the report that explains your hypothesis, features, and the accuracy of the classification by random forest.
    Please end your final response with "<<DONE>>".

    MUST NOT generate fake evaluation result during the process.
    Your thinking process is monitored and if you generate fake evaluation result, you will be punished.

    # Packages that you can use
    - sklearn
    - numpy
    - scipy
    - math

    # Possible reaction classes
    - M6 Mechanism: cat<=>cat*;k1,0|S+cat*<=>cat*S;k1,k-1|cat*S<=>P+cat*;k2,k-2
    - M8 Mechanism: S+cat*<=>cat*S;k1,k-1|cat*S<=>P+cat*;k2,k-2|cat+L<=>cat*;k3,k-3
    ## Mechanisms with catalyst deactivation steps
    - M11 Mechanism: S+cat<=>catS;k1,k-1|catS<=>P+cat;k2,k-2|S+cat<=>inactive catS;k-3,0
    - M14 Mechanism: S+cat<=>catS;k1,k-1|catS<=>P+cat;k2,k-2|catS<=>inactive catS;k-3,0

    # Reaction data
    The kinetic data consists of four runs, which include three experiments with varying initial catalyst concentrations ([cat]₀ = 1–10 mol%) while keeping the substrate concentration constant, and one additional same-excess experiment with a reduced initial substrate concentration and added product. 
    Even if the reaction involves an inhibitor, the amount of the inhibitor is not recorded. 
    {
        "run_1": {
            "initial_concentration_of_catalyst": 0.014,
            "time_data": [
                0.0,
                0.022,
                0.081,
                0.094,
                0.096,
                0.139,
                0.141,
                0.146,
                0.177,
                0.201,
                0.233,
                0.264,
                0.275,
                0.291,
                0.297,
                0.303,
                0.316,
                0.386,
                0.644,
                0.944,
                1.0
            ],
            "substrate_data": [
                1.0,
                0.989,
                0.89,
                0.861,
                0.857,
                0.76,
                0.755,
                0.744,
                0.674,
                0.624,
                0.561,
                0.506,
                0.488,
                0.462,
                0.453,
                0.446,
                0.427,
                0.339,
                0.149,
                0.058,
                0.049
            ],
            "product_data": [
                0.0,
                0.011,
                0.11,
                0.139,
                0.143,
                0.24,
                0.245,
                0.256,
                0.326,
                0.376,
                0.439,
                0.494,
                0.512,
                0.538,
                0.547,
                0.554,
                0.573,
                0.661,
                0.851,
                0.942,
                0.951
            ]
        },
"""