# `README.md`

# A Qualitative Reasoning Engine for Rumour Impacts on Investment Decisions

<!-- PROJECT SHIELDS -->
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Python Version](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org/downloads/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![Type Checking: mypy](https://img.shields.io/badge/type_checking-mypy-blue)](http://mypy-lang.org/)
[![Jupyter](https://img.shields.io/badge/Jupyter-%23F37626.svg?style=flat&logo=Jupyter&logoColor=white)](https://jupyter.org/)
[![arXiv](https://img.shields.io/badge/arXiv-2509.00588-b31b1b.svg)](https://arxiv.org/abs/2509.00588)
[![Year](https://img.shields.io/badge/Year-2025-purple)](https://github.com/chirindaopensource/rumours_complex_investment_decisions)
[![Discipline](https://img.shields.io/badge/Discipline-Computational%20Finance%20%26%20AI-blue)](https://github.com/chirindaopensource/rumours_complex_investment_decisions)
[![Research](https://img.shields.io/badge/Research-Qualitative%20Reasoning-green)](https://github.com/chirindaopensource/rumours_complex_investment_decisions)
[![Methodology](https://img.shields.io/badge/Methodology-Constraint%20Programming-orange)](https://github.com/chirindaopensource/rumours_complex_investment_decisions)
[![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=flat&logo=pandas&logoColor=white)](https://pandas.pydata.org/)
[![NumPy](https://img.shields.io/badge/numpy-%23013243.svg?style=flat&logo=numpy&logoColor=white)](https://numpy.org/)
[![SciPy](https://img.shields.io/badge/SciPy-%23025596?style=flat&logo=scipy&logoColor=white)](https://scipy.org/)
[![NetworkX](https://img.shields.io/badge/NetworkX-2A628F.svg?style=flat)](https://networkx.org/)
[![Matplotlib](https://img.shields.io/badge/Matplotlib-11557c.svg?style=flat&logo=Matplotlib&logoColor=white)](https://matplotlib.org/)

--

**Repository:** `https://github.com/chirindaopensource/rumours_complex_investment_decisions`

**Owner:** 2025 Craig Chirinda (Open Source Projects)

This repository contains an **independent**, professional-grade Python implementation of the research methodology from the 2025 paper entitled **"Information-Nonintensive Models of Rumour Impacts on Complex Investment Decisions"** by:

*   Nina Bočková
*   Karel Doubravský
*   Barbora Volná
*   Mirko Dohnal

The project provides a complete, end-to-end computational framework that serves as a formal, auditable, and collaborative "reasoning engine" for exploring the complete set of possible future dynamics of a complex system. It delivers a modular and extensible pipeline that replicates the paper's entire workflow: from rigorous data validation and preprocessing, through the qualitative translation of system dynamics, to the formulation and solution of complex Constraint Satisfaction Problems (CSPs), and finally to the construction and analysis of a transitional graph that maps the entire state space of possible system behaviors.

## Table of Contents

- [Introduction](#introduction)
- [Theoretical Background](#theoretical-background)
- [Features](#features)
- [Methodology Implemented](#methodology-implemented)
- [Core Components (Notebook Structure)](#core-components-notebook-structure)
- [Key Callable: execute_full_project_pipeline](#key-callable-execute_full_project_pipeline)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Input Data Structure](#input-data-structure)
- [Usage](#usage)
- [Output Structure](#output-structure)
- [Project Structure](#project-structure)
- [Customization](#customization)
- [Contributing](#contributing)
- [Recommended Extensions](#recommended-extensions)
- [License](#license)
- [Citation](#citation)
- [Acknowledgments](#acknowledgments)

## Introduction

This project provides a Python implementation of the methodologies presented in the 2025 paper "Information-Nonintensive Models of Rumour Impacts on Complex Investment Decisions." The core of this repository is the iPython Notebook `rumours_complex_investment_decisions_draft.ipynb`, which contains a comprehensive suite of functions to replicate the paper's findings, from initial data validation to the final generation of a strategic decision framework.

The paper addresses a critical challenge in finance and economics: how to model complex systems under severe information constraints where traditional quantitative models fail. This codebase operationalizes the paper's qualitative reasoning approach, allowing users to:
-   Rigorously validate and cleanse input financial data and correlation matrices.
-   Programmatically translate systems of differential equations (like a rumour-spreading model) into a parameter-free, qualitative format.
-   Formulate and solve complex Constraint Satisfaction Problems (CSPs) to derive all logically possible scenarios for a system's behavior.
-   Systematically resolve inconsistencies in data-driven models using a greedy heuristic algorithm.
-   Integrate multiple sub-models (e.g., a financial model and a social dynamics model) to study their emergent interactions.
-   Construct and analyze a "transitional graph" that represents the complete map of all possible dynamic pathways the system can follow over time.
-   Perform a multi-criteria decision analysis on the qualitative scenarios to derive actionable, strategic insights.

## Theoretical Background

The implemented methods are grounded in Artificial Intelligence (Qualitative Reasoning), Constraint Programming, and Financial Modeling.

**1. Qualitative State Representation (Trend Analysis):**
The core concept is the abstraction of a continuous variable's state into a qualitative "trend triplet": `(Value, DX, DDX)`. This captures the variable's sign (assumed positive, `+`), its first derivative (trend/velocity: `+`, `0`, `-`), and its second derivative (acceleration: `+`, `0`, `-`). A **scenario** is a complete snapshot of the system where every variable is assigned a trend triplet.

**2. Constraint Satisfaction Problem (CSP) Formulation:**
The dynamics of the system are not defined by numerical equations but by a set of logical constraints.
-   **Pairwise Influences:** Derived from data (e.g., correlation signs) or expert knowledge, these take the form of `SUP(X, Y)` (an increase in X supports an increase in Y) or `RED(X, Y)` (an increase in X reduces Y).
-   **Qualitative Algebraic Equations:** Derived from ODEs by eliminating parameters and applying qualitative arithmetic, e.g.:
    $$ \frac{dX}{dt} = -\alpha \frac{XY}{N} \quad \xrightarrow{\text{translation}} \quad DX + XY = 0 $$
The goal is to find all scenarios (assignments of triplets to variables) that simultaneously satisfy all constraints. This is a classic CSP.

**3. Transitional Graph:**
The set of all valid scenarios forms the nodes of a directed graph, `H = (S, T)`. An edge exists from scenario `S_i` to `S_j` if and only if the system can transition from state `S_i` to `S_j` in one time step, according to a set of predefined rules (from Table 2) that enforce continuity and smoothness. This graph represents the complete dynamic landscape of the model.

## Features

The provided iPython Notebook (`rumours_complex_investment_decisions_draft.ipynb`) implements the full research pipeline, including:

-   **Modular, Task-Based Architecture:** The entire pipeline is broken down into 29 distinct, modular tasks, from data validation to robustness analysis.
-   **Professional-Grade Data Validation:** A comprehensive validation suite ensures all inputs (data and configurations) conform to the required schema before execution.
-   **Auditable Data Preprocessing:** A non-destructive cleansing and preprocessing pipeline for both raw data and correlation matrices, returning a detailed log of all transformations.
-   **Custom Qualitative Reasoning Engine:** Implements a qualitative arithmetic engine and custom `Constraint` classes to translate algebraic qualitative equations into a solvable format.
-   **Systematic Inconsistency Resolution:** A faithful implementation of the paper's greedy heuristic algorithm to make over-constrained, data-driven models tractable.
-   **High-Fidelity Model Transcription:** Meticulous, programmatic transcription of all expert-defined models and rules from the paper's tables (Table 2, 4, 7) and equations (Eq. 11).
-   **Complete Graph Construction and Analysis:** Automated construction of the final transitional graph and a comprehensive analysis of its topological properties (partitions, cycles, etc.).
-   **Data-Driven Strategic Analysis:** A full suite of functions to perform multi-criteria scoring, ranking, and strategy evaluation on the final set of scenarios.
-   **Comprehensive Robustness Framework:** A master orchestrator to systematically test the sensitivity of the results to changes in parameters and alternative model specifications.

## Methodology Implemented

The core analytical steps directly implement the methodology from the paper:

1.  **Data and Configuration Validation (Tasks 1-3, 6):** Ingests and rigorously validates all raw data and the `config.yaml` file.
2.  **Data Preprocessing (Tasks 4-5):** Cleanses the raw data and preprocesses the correlation matrix for numerical stability.
3.  **CIM Construction (Tasks 7-9):** Derives an initial model from the correlation data, resolves inconsistencies, integrates expert knowledge, and solves for the 7 CIM scenarios.
4.  **RRM Construction (Tasks 10-12):** Translates the rumour-spreading ODEs into qualitative constraints and solves for the 211 RRM scenarios.
5.  **Model Integration (Tasks 13-14):** Merges the CIM and RRM, adds cross-model constraints, and solves for the final 14 integrated scenarios.
6.  **Dynamic Analysis (Tasks 15-18):** Analyzes the integrated scenarios, constructs the transitional graph, and validates its structure.
7.  **Decision Support and Reporting (Tasks 19-24):** Performs multi-criteria analysis, generates strategic recommendations, and creates final tables and visualizations.
8.  **Meta-Validation (Tasks 25-29):** Executes a final, exhaustive suite of correctness, reproducibility, and robustness checks on the entire pipeline and its results.

## Core Components (Notebook Structure)

The `rumours_complex_investment_decisions_draft.ipynb` notebook is structured as a logical pipeline with modular orchestrator functions for each of the major tasks. All functions are self-contained, fully documented, and designed for professional-grade execution.

## Key Callable: execute_full_project_pipeline

The central function in this project is `execute_full_project_pipeline`. It orchestrates the entire analytical workflow, providing a single entry point for running the baseline study replication and the advanced robustness checks.

```python
def execute_full_project_pipeline(
    raw_df: pd.DataFrame,
    correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Executes the entire end-to-end research pipeline and robustness analysis.
    """
    # ... (implementation is in the notebook)
```

## Prerequisites

-   Python 3.9+
-   Core dependencies: `pandas`, `numpy`, `scipy`, `networkx`, `matplotlib`, `pyyaml`, `python-constraint`.

## Installation

1.  **Clone the repository:**
    ```sh
    git clone https://github.com/chirindaopensource/rumours_complex_investment_decisions.git
    cd rumours_complex_investment_decisions
    ```

2.  **Create and activate a virtual environment (recommended):**
    ```sh
    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
    ```

3.  **Install Python dependencies:**
    ```sh
    pip install pandas numpy scipy networkx matplotlib pyyaml python-constraint
    ```

## Input Data Structure

The pipeline requires three inputs:
1.  **`raw_df`:** A `pandas.DataFrame` containing historical financial data for the 10 CIM variables.
2.  **`correlation_matrix_df`:** A 10x10 `pandas.DataFrame` with the Pearson correlation matrix for the CIM variables.
3.  **`master_input_specification`:** A Python dictionary loaded from the `config.yaml` file, which controls all aspects of the pipeline.

A mock data generation code block is provided in the main notebook to create valid example DataFrames for testing the pipeline.

## Usage

The `rumours_complex_investment_decisions_draft.ipynb` notebook provides a complete, step-by-step guide. The core workflow is:

1.  **Prepare Inputs:** Load or generate the `raw_df` and `correlation_matrix_df`. Load the configuration from `config.yaml`.
2.  **Execute Pipeline:** Call the grand master orchestrator function.

    ```python
    # This single call runs the entire project.
    final_project_report = execute_full_project_pipeline(
        raw_df=raw_df,
        correlation_matrix_df=correlation_matrix_df,
        master_input_specification=master_input_specification
    )
    ```
3.  **Inspect Outputs:** Programmatically access any result from the returned dictionary. For example, to view the final strategic decision framework:
    ```python
    decision_framework = final_project_report['baseline_pipeline_report']['task_reports']['task_21_strategy_report']['outputs']['decision_framework']
    print(decision_framework)
    ```

## Output Structure

The `execute_full_project_pipeline` function returns a single, comprehensive dictionary containing all generated artifacts, including:
-   `baseline_pipeline_report`: A dictionary containing the detailed reports from every task in the main pipeline run, including the final scenario lists, the transitional graph object, and the strategic analysis.
-   `robustness_analysis_report`: A dictionary containing the summary DataFrames from each of the executed robustness and sensitivity checks.

## Project Structure

```
rumours_complex_investment_decisions/
│
├── rumours_complex_investment_decisions_draft.ipynb   # Main implementation notebook
├── config.yaml                                        # Master configuration file
├── requirements.txt                                   # Python package dependencies
├── LICENSE                                            # MIT license file
└── README.md                                          # This documentation file
```

## Customization

The pipeline is highly customizable via the `config.yaml` file. Users can easily modify all relevant parameters, such as CSP solver settings, inconsistency removal limits, and the definitions of expert knowledge, without altering the core Python code.

## Contributing

Contributions are welcome. Please fork the repository, create a feature branch, and submit a pull request with a clear description of your changes. Adherence to PEP 8, type hinting, and comprehensive docstrings is required.

## Recommended Extensions

Future extensions could include:

-   **Automated Report Generation:** Creating a function that takes the final `master_report` dictionary and generates a full PDF or HTML report summarizing the findings.
-   **Generalization:** Refactoring the code to handle arbitrary model definitions (variables and constraints) specified entirely within the `config.yaml` file, turning it into a general-purpose qualitative reasoning engine.
-   **Alternative Solvers:** Integrating more powerful CSP solvers like Google's OR-Tools to handle larger and more complex models.
-   **Policy Experiments:** Adding new expert constraints to the integrated model to simulate the impact of policy interventions (e.g., a public information campaign to counter a rumour) and observing how the transitional graph changes.

## License

This project is licensed under the MIT License. See the `LICENSE` file for details.

## Citation

If you use this code or the methodology in your research, please cite the original paper:

```bibtex
@article{bockova2025information,
  title={{Information-Nonintensive Models of Rumour Impacts on Complex Investment Decisions}},
  author={Bo{\v{c}}kov{\'a}, Nina and Doubravsk{\'y}, Karel and Voln{\'a}, Barbora and Dohnal, Mirko},
  journal={arXiv preprint arXiv:2509.00588},
  year={2025}
}
```

For the implementation itself, you may cite this repository:
```
Chirinda, C. (2025). A Qualitative Reasoning Engine for Rumour Impacts on Investment Decisions.
GitHub repository: https://github.com/chirindaopensource/rumours_complex_investment_decisions
```

## Acknowledgments

-   Credit to **Nina Bočková, Karel Doubravský, Barbora Volná, and Mirko Dohnal** for their innovative research, which forms the entire basis for this computational replication.
-   This project is built upon the exceptional tools provided by the open-source community. Sincere thanks to the developers of the scientific Python ecosystem, including **Pandas, NumPy, SciPy, NetworkX, Matplotlib**, and the **python-constraint** library, whose work makes complex computational science accessible and robust.

--

*This README was generated based on the structure and content of `rumours_complex_investment_decisions_draft.ipynb` and follows best practices for research software documentation.*



# Paper

Title: "*Information-Nonintensive Models of Rumour Impacts on Complex Investment Decisions*"

Authors: Nina Bočková, Karel Doubravský, Barbora Volná, Mirko Dohnal

E-Journal Submission Date: 30 August 2025

Link: https://arxiv.org/abs/2509.00588

Abstract:

This paper develops a qualitative framework for analysing the impact of rumours on complex investment decisions (CID) under severe information constraints. The proposed trend-based models rely on minimal data inputs in the form of increasing, decreasing, or constant relations. Sets of trend rules generate scenarios, and permitted transitions between them form a directed graph that represents system behaviour over time. The approach is applied in three interconnected models: financial CID, rumour-spreading dynamics, and their integration.


# Summary


### **Summary of "Information-Nonintensive Models of Rumour Impacts on Complex Investment Decisions"**

#### **High-Level Objective**

The central goal of this paper is to develop a formal, yet non-numerical, framework for analyzing the behavior of Complex Investment Decisions (CID) when influenced by the spread of rumours. The authors argue that in many real-world scenarios, precise quantitative data is unavailable ("severe information constraints"), rendering traditional econometric and financial models ineffective. Their proposed solution is a "trend-based" modeling approach that relies only on qualitative relationships (e.g., "increasing," "decreasing") to generate a complete set of possible future scenarios.

#### **The Methodological Core: Trend-Based Reasoning**

The authors build their framework on the principles of Qualitative Reasoning, a branch of AI. The methodology can be broken down as follows:

1.  **Trend Quantifiers:** Instead of continuous numerical values, variables are described by their qualitative state. The paper focuses on the first and second time derivatives of variables. A variable's state is captured by a triplet `(Value, D, DD)`, where `D` is the first derivative (trend) and `DD` is the second derivative (acceleration of the trend). These are discretized into a three-valued logic:
    *   `+` (Positive / Increasing)
    *   `0` (Zero / Constant)
    *   `-` (Negative / Decreasing)
    *   For example, the triplet `(+, +, -)` for a variable like "Return on Assets" (ROA) would mean ROA is positive, increasing, but at a slowing rate (decelerating growth).

2.  **Pairwise Relations:** The system's dynamics are not defined by differential equations with numerical parameters, but by a set of qualitative influence rules between pairs of variables (X, Y). The primary relations are:
    *   `SUP XY`: An increase in X has a *supporting* (positive) effect on Y.
    *   `RED XY`: An increase in X has a *reducing* (negative) effect on Y.
    *   These can be refined using second-order information (the `σ` shapes in Figure 1), which describe concavity/convexity.

3.  **Scenarios:** A "scenario" is a snapshot of the entire system at a moment in time. It is defined as a vector of trend triplets for all `n` variables in the model. For example, `{(X1,+,+,-), (X2,+,-,-), ..., (Xn,+,0,0)}`. The model's solution is the complete set `S` of all scenarios that are consistent with the defined pairwise relations.

4.  **Transitional Graph:** The model also defines allowable transitions between these scenarios based on continuity principles (e.g., a variable cannot jump from "increasing" to "decreasing" without passing through "constant"). These permitted transitions form the edges `T` of a directed graph `H = (S, T)`, where the scenarios are the nodes. This graph represents all possible temporal evolutions (histories and futures) of the system.

#### **The Case Study Application**

The authors apply this framework to a three-part case study:

1.  **Complex Investment Submodel (CIM):** They start with a set of 10 financial variables (e.g., Underpricing, Return on Assets, Market Capitalization). The pairwise relations are derived from a correlation matrix from a previous study [46], with an ad-hoc heuristic to remove inconsistencies (iteratively removing the relation with the smallest absolute correlation coefficient). This model yields 7 possible scenarios.

2.  **Rumour-Related Submodel (RRM):** They take a standard 5-variable system of nonlinear ordinary differential equations (ODEs) describing rumour dynamics (involving Ignorants, Spreaders, Sceptics, etc.). They "translate" these ODEs into a set of qualitative trend equations, effectively abstracting away all numerical parameters. This purely qualitative model generates a large set of 211 scenarios.

3.  **Integrated Model (IM):** They combine the variables and rules from the CIM and RRM and add three new "common-sense" qualitative rules to link the financial and rumour variables (e.g., `RED W REP` - an increase in Sceptics (`W`) reduces the Number of IPOs Underwritten (`REP`)). This final integrated model produces 14 distinct scenarios.

#### **Key Findings and Interpretation**

The main output of the case study is the transitional graph for the 14 scenarios of the Integrated Model (Figure 2). The authors use this graph to perform a qualitative trade-off analysis.

*   They identify that no single scenario is optimal for all desired outcomes. For instance, scenarios that are best for the `REP` variable (steep, accelerating growth `(+,+,+)`) are simultaneously the worst for `ROA` and `UND` (steep, accelerating decline `(+,-,-)`).
*   The transitional graph is partitioned into two disconnected subgraphs. This implies that if the system is in one set of states (e.g., scenarios 6-14, characterized by gradual changes), it can never transition to the other set (scenarios 1-5, characterized by steep changes), and vice versa. This provides a powerful qualitative insight into the system's long-term behavior.

**

### **Critical Analysis**

The paper is a thought-provoking exercise in applying a non-standard methodology to a complex problem. However, it has significant strengths and equally significant weaknesses.

#### **Strengths:**

1.  **Formalizing Vague Knowledge:** The primary contribution is providing a structured, formal language for reasoning with imprecise, subjective, or "common-sense" knowledge. This is a genuine challenge in finance and economics, where expert intuition often plays a role that is difficult to capture in standard quantitative models.
2.  **Explanatory Power and Transparency:** Unlike black-box machine learning models, the trend-based approach is transparent. The output—a graph of possible scenarios—is highly interpretable and can serve as a powerful tool for strategic discussion among decision-makers. It answers "what-if" questions in a qualitative manner.
3.  **No Parameter Estimation Required:** The model sidesteps the notoriously difficult problem of parameter estimation in complex, non-stationary systems. By design, it is "information-nonintensive."

#### **Weaknesses and Critical Questions:**

1.  **Scalability (A Computer Science Concern):** The authors acknowledge that finding the solution is a "combinatorial task." The state space grows exponentially with the number of variables (`3^2n` possible triplets, before constraints). The 5-variable RRM already yielded 211 scenarios. A realistic financial model with dozens of variables would be computationally intractable. The paper offers no discussion of algorithms or heuristics to manage this complexity.
2.  **Drastic Loss of Information (A Finance & Econometrics Concern):** The central premise is also the model's greatest weakness. In finance, *magnitudes matter*. A 1% market dip and a 30% crash are both qualitatively "decreasing" (`-`), but their implications are worlds apart. The model cannot distinguish between them. It has no concept of volatility, risk, probability, or stochasticity—the very foundations of modern finance. By abstracting away all numbers, it throws out the baby, the bathwater, and the entire bathroom.
3.  **Lack of Falsifiability (An Econometric Concern):** The model produces a *superset* of all possible behaviors. This makes it extremely difficult to validate or falsify. If almost any outcome can be mapped to one of the many predicted scenarios, the model has very little predictive power in a scientific sense. A model that predicts everything predicts nothing. How would one statistically test the validity of this model against real-world data? The paper does not address this.
4.  **Ad-Hoc Nature of Model Construction:** The heuristic for resolving inconsistencies in the CIM (removing the smallest correlation) is arbitrary and lacks theoretical justification. Furthermore, the "translation" of the RRM's ODEs into qualitative constraints is a lossy process whose validity is not rigorously established. The addition of "common-sense" rules is subjective and depends entirely on the modeler's judgment.

### **Conclusion**

This paper presents a fascinating conceptual framework for exploring complex systems under extreme data scarcity. It is best viewed not as a predictive tool for financial markets, but as a **structured reasoning and scenario-planning tool**. It could help decision-makers map out the possible consequences of qualitative events like rumour propagation and understand the fundamental trade-offs in the system's dynamics.

However, its practical applicability is severely limited by its inability to handle magnitudes, risk, and probability. It is a complement to, not a substitute for, traditional modeling. It may be prudent for the authors to explore hybrid approaches that might integrate this qualitative framework with probabilistic or fuzzy-logic-based methods to reclaim some of the critical information that is lost in this purely deterministic, qualitative world.

# Import Essential Modules

In [None]:
#!/usr/bin/env python3
# =============================================================================#
#
#  Qualitative Reasoning Engine for Rumour Impacts on Investment Decisions
#
#  This module provides a complete, production-grade implementation of the
#  qualitative reasoning framework presented in "Information-Nonintensive Models
#  of Rumour Impacts on Complex Investment Decisions" by Bočková et al. (2025).
#  It delivers a formal, auditable, and collaborative "reasoning engine" for
#  exploring the complete set of possible future dynamics of a complex system,
#  particularly when that system is governed by subjective expertise, ambiguous
#  relationships, and a profound lack of reliable quantitative data.
#
#  Core Methodological Components:
#  • Qualitative abstraction of variables into trend triplets (Value, DX, DDX).
#  • Constraint Satisfaction Problem (CSP) formulation for scenario generation.
#  • Heuristic algorithm for resolving inconsistencies in data-driven constraints.
#  • Integration of financial (CIM) and social (RRM) dynamic models.
#  • High-fidelity transcription of expert knowledge and common-sense rules.
#  • Construction of a transitional graph representing all valid system evolutions.
#  • Multi-criteria decision analysis on the qualitative scenario space.
#
#  Technical Implementation Features:
#  • Modular, multi-stage pipeline with task-specific orchestrators.
#  • Custom constraint classes for qualitative algebra within a CSP solver.
#  • Comprehensive, end-to-end validation and self-assessment framework.
#  • High-fidelity replication of paper's tables and figures.
#  • Robust data validation, cleansing, and preprocessing pipelines.
#  • Framework for sensitivity and alternative specification analysis.
#
#  Paper Reference:
#  Bočková, N., Doubravský, K., Volná, B., & Dohnal, M. (2025).
#  Information-Nonintensive Models of Rumour Impacts on Complex Investment Decisions.
#  arXiv preprint arXiv:2509.00588.
#  https://arxiv.org/abs/2509.00588
#
#  Author: CS Chirinda
#  License: MIT
#  Version: 1.0.0
#
# =============================================================================#

# Standard Library Imports
import copy
import itertools
import math
import random
import time
import warnings
from collections import Counter
from typing import (
    Any, Dict, List, Optional, Set, Tuple, Callable
)

# Third-Party Library Imports
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
from constraint import Problem, Constraint, AllDifferentConstraint
from pandas.io.formats.style import Styler
from scipy import stats


# Implementation

## Draft 1

### **Analysis of Inputs, Processes and Outputs (IPO) of Key Orchestrator Callables in the Pipeline**

#### **1. `validate_raw_dataframe_and_schema` (Task 1 Orchestrator)**

*   **Inputs:**
    *   `raw_df`: A `pandas.DataFrame` containing the initial, unprocessed financial data for the 10 CIM variables.
    *   `master_input_specification`: The global configuration dictionary.
*   **Processes:**
    1.  **Structural Validation:** Verifies that `raw_df` has the correct dimensions (at least 30 rows, exactly 10 columns), the exact required column names, and that all columns are numeric.
    2.  **Domain Validation:** Checks each cell of the `raw_df` to ensure its value complies with the mathematical domain of its respective variable (e.g., `AGE` must be strictly positive, `LIS` must be a natural number).
    3.  **Statistical Assessment:** Computes key statistical properties of the data, including a check for missing values, identification of outliers using the IQR method, and calculation of skewness and kurtosis.
*   **Outputs:**
    *   A dictionary (`final_report`) that serves as a comprehensive audit of the raw data's quality. It contains boolean flags for the success of each check, detailed error messages, a list of row indices that violate domain constraints, and a summary DataFrame of statistical metrics.
*   **Data Transformation:** This function is purely for validation and assessment. It **does not transform** the input `raw_df`. It reads the data and produces a new data structure—the report dictionary—that describes the state of the input.
*   **Methodological Role:** This callable implements the foundational data quality assurance step. Before any modeling can begin, the integrity of the input data must be verified. This function ensures that the data conforms to the basic structural and domain-specific assumptions required by all subsequent stages of the research pipeline. It is the first gate in ensuring a valid and reproducible analysis.

#### **2. `validate_correlation_matrix_and_integrity` (Task 2 Orchestrator)**

*   **Inputs:**
    *   `correlation_matrix_df`: A `pandas.DataFrame` representing the correlation matrix `C`, as defined in Equation (7):
        $$ C = \begin{pmatrix} c_{11} & \cdots & c_{1n} \\ \vdots & \ddots & \vdots \\ c_{n1} & \cdots & c_{nn} \end{pmatrix} $$
    *   `raw_df`: The raw financial data, used for empirical cross-validation.
    *   `master_input_specification`: The global configuration dictionary.
*   **Processes:**
    1.  **Structural Validation:** Verifies that the input `correlation_matrix_df` possesses the required mathematical properties of a correlation matrix: it must be square (10x10), symmetric ($C = C^T$), and have a unit diagonal ($c_{ii} = 1$).
    2.  **Coefficient Validation:** Checks that all coefficients are within the valid range, $c_{ij} \in [-1, 1]$, and, critically, that the matrix is positive semi-definite by ensuring all its eigenvalues $\lambda_i$ are non-negative.
    3.  **Cross-Validation:** Computes an empirical correlation matrix from `raw_df` and compares it to the provided matrix to check for consistency. It also assesses the statistical significance of the provided correlations by computing their t-statistics: $t_{ij} = c_{ij}\sqrt{\frac{n-2}{1-c_{ij}^2}}$.
*   **Outputs:**
    *   A `final_report` dictionary containing a detailed audit of the matrix's integrity, including the results of all structural and mathematical checks, and the outputs of the cross-validation (e.g., the empirical matrix, difference matrix, and p-value matrix).
*   **Data Transformation:** This is a validation function. It **does not transform** the input `correlation_matrix_df`. It produces a new report dictionary that characterizes the input.
*   **Methodological Role:** This callable ensures the integrity of the primary numerical input for the CIM model. The entire qualitative structure of the CIM is derived from the signs of the correlations in this matrix. Therefore, validating its mathematical correctness and consistency with the underlying raw data is a non-negotiable step for ensuring the fidelity of the model.

#### **3. `cleanse_and_standardize_raw_data` (Task 4 Orchestrator)**

*   **Inputs:**
    *   `raw_df`: The raw financial data DataFrame.
    *   `master_input_specification`: The global configuration dictionary, from which all cleansing rules and parameters are derived.
*   **Processes:**
    1.  **Anomaly Treatment:** Performs listwise deletion of rows with missing values, removes exact duplicate rows, and applies a numerical floor to enforce positivity on specified variables.
    2.  **Type Standardization:** Converts columns representing discrete counts (`LIS`, `REP`) to integer data types and ensures all other continuous variables are standardized to `float64` with a fixed precision.
    3.  **Domain Filtering:** Filters the dataset, removing rows where variables fall outside of plausible, expert-defined economic bounds (e.g., $0.1 \leq \text{AGE} \leq 100$).
*   **Outputs:**
    *   A tuple containing:
        1.  `final_df`: A new, cleansed, and standardized `pandas.DataFrame`.
        2.  `final_report`: A dictionary providing a complete audit trail of the cleansing process, detailing the number of rows removed at each stage.
*   **Data Transformation:** This function performs a significant transformation. It takes the raw, potentially messy `raw_df` and transforms it into a clean, validated, and numerically stable `final_df` that is ready for analysis. The number of rows is typically reduced, and the data types and precision of the columns are altered.
*   **Methodological Role:** This callable implements the essential data preprocessing stage. It ensures that the data used for any empirical calculations (like the cross-validation in Task 2) is of high quality, free from common issues like missing values or duplicates, and conforms to basic economic plausibility, thereby strengthening the validity of the entire study.

#### **4. `iteratively_remove_inconsistencies` (Task 8 Orchestrator)**

*   **Inputs:**
    *   `initial_correlation_matrix_df`: The preprocessed but logically inconsistent correlation matrix.
    *   `master_input_specification`: The global configuration dictionary.
*   **Processes:**
    1.  **Iterative Loop:** The function enters a loop that continues until a consistent model is found or a limit is reached.
    2.  **Greedy Heuristic:** Inside the loop, it executes the core heuristic described in Section 3 of the paper:
        > *Remove the correlation coefficient $c_{ij}$ with the smallest absolute value from the correlation matrix (7). Then test the trend model derived from the updated correlation matrix. If the resulting trend solution is still the steady-state scenario (5), repeat this heuristic.*
    3.  **Re-solving:** At each step, it re-formulates and re-solves the CIM's CSP to check for the emergence of non-trivial solutions.
*   **Outputs:**
    *   A `final_report` dictionary containing the final, logically consistent correlation matrix, the corresponding list of qualitative constraints, the set of non-trivial scenarios that caused the algorithm to terminate, and a detailed log of each iteration.
*   **Data Transformation:** This function transforms the `initial_correlation_matrix_df` by iteratively setting certain elements to zero, effectively pruning the corresponding constraints from the model until it becomes solvable.
*   **Methodological Role:** This callable is the implementation of the paper's novel solution to the problem of over-constrained, data-driven qualitative models. It is the core algorithm that makes the CIM tractable by relaxing the weakest constraints until a logically consistent set of dynamics can be derived.

#### **5. `finalize_and_solve_cim_model` (Task 9 Orchestrator)**

*   **Inputs:**
    *   `master_input_specification`: The global configuration dictionary.
*   **Processes:**
    1.  **Constraint Construction:** It programmatically constructs the exact set of 14 expert-defined CIM constraints as specified in Table 4 of the paper. This step represents the "semi-subjective expert knowledge" that refines the purely data-driven model.
    2.  **CSP Formulation:** It formulates a CSP for the 10 CIM variables using these 14 constraints, including the logic for the `σ+-` shape constraint.
    3.  **Solution and Validation:** It solves the CSP and asserts that the number of resulting scenarios is exactly 7, thus replicating the paper's result for the CIM sub-model.
*   **Outputs:**
    *   A `final_report` dictionary containing the final list of 14 CIM constraints and the validated list of 7 CIM scenarios.
*   **Data Transformation:** This function transforms the declarative, expert-defined model specification (Table 4) into an explicit set of solutions (the 7 scenarios).
*   **Methodological Role:** This callable constructs and solves the first of the three main models in the paper: the **Complex Investment Submodel (CIM)**. It is the definitive implementation of the financial component of the integrated system.

#### **6. `generate_and_validate_rrm_scenarios` (Task 12 Orchestrator)**

*   **Inputs:**
    *   `rrm_csp_problem`: The fully formulated `constraint.Problem` object for the RRM.
    *   `master_input_specification`: The global configuration dictionary.
*   **Processes:**
    1.  **Solution:** It runs the CSP solver on the `rrm_csp_problem` to find all scenarios that satisfy the 5 qualitative equations derived from the rumour-spreading ODEs.
    2.  **Validation:** It asserts that the number of solutions found is exactly 211, replicating the paper's result for the RRM sub-model.
    3.  **Quality Check:** It re-verifies the constraints on a sample of the solutions.
*   **Outputs:**
    *   A `final_report` dictionary containing the complete list of 211 RRM scenarios.
*   **Data Transformation:** This function transforms the formal RRM CSP object into its complete solution set.
*   **Methodological Role:** This callable constructs and solves the second of the three main models: the **Rumour-Related Submodel (RRM)**. It generates the complete universe of possible dynamic behaviors for the rumour-spreading system, considered in isolation.

#### **7. `formulate_and_solve_integrated_model` (Task 14 Orchestrator)**

*   **Inputs:**
    *   `integrated_variables`: The unified list of all 15 CIM and RRM variables.
    *   `integrated_constraints`: The complete list of 22 constraints, including the 3 crucial cross-model integration constraints from Table 7.
    *   `master_input_specification`: The global configuration dictionary.
*   **Processes:**
    1.  **CSP Formulation:** It constructs the final, large-scale CSP for the **Integrated Model (IM)**, as defined by the union operation in Equation (12):
        $$ \text{IM} = \text{CIM} \cup \text{RRM} $$
        ...plus the additional constraints from Table 7.
    2.  **Solution:** It solves this 15-variable, 22-constraint problem.
    3.  **Validation:** It asserts that the number of solutions is exactly 14 and verifies the variable grouping property described in the paper's analysis.
*   **Outputs:**
    *   A `final_report` dictionary containing the final, validated list of 14 integrated scenarios.
*   **Data Transformation:** This function transforms the combined set of all model constraints into the final, integrated solution set.
*   **Methodological Role:** This is the central computational step of the entire research paper. It solves the third and most important model, the **Integrated Model (IM)**, revealing the emergent dynamics that arise from the interaction between the financial and rumour systems. The dramatic reduction from a theoretical maximum of 1477 scenarios (7 CIM * 211 RRM) to just 14 is the key quantitative result of this integration.

#### **8. `build_and_analyze_transition_graph` (Task 17 Orchestrator)**

*   **Inputs:**
    *   `graph_components`: A dictionary containing the graph nodes and the list of valid edges.
*   **Processes:**
    1.  **Graph Construction:** It assembles the final transitional graph `H = (S, T)` as defined in Equation (6), where `S` is the set of 14 scenarios (nodes) and `T` is the set of valid transitions (edges).
    2.  **Graph Analysis:** It applies standard graph theory algorithms to analyze the structure of `H`, identifying its partitions, cycles, and other key properties.
*   **Outputs:**
    *   A `final_report` dictionary containing the final `networkx.DiGraph` object and a detailed report on its structural properties.
*   **Data Transformation:** This function transforms the set of scenarios and the rules of temporal evolution into a single, unified graph data structure that represents the complete dynamic landscape of the system.
*   **Methodological Role:** This callable constructs the primary analytical artifact of the paper: the **transitional graph**. This graph, shown in Figure 2, is the "reasoning engine" itself. Any possible future or past behavior of the system is represented as a path within this graph. Its structure, particularly its disconnected partitions, dictates the strategic conclusions of the study.

#### **9. `analyze_and_visualize_graph_structure` (Task 18 Orchestrator)**

*   **Inputs:**
    *   `graph_analysis_results`: The dictionary output from Task 17, containing the fully constructed `networkx.DiGraph` and its detailed analysis report.
    *   `master_input_specification`: The global configuration dictionary.
*   **Processes:**
    1.  **Partition Validation:** It programmatically validates the central structural claim of the paper. It retrieves the computed partitions from the `graph_analysis_report` and asserts that there are exactly two, with memberships `{1, 2, 3, 4, 5}` and `{6, 7, 8, 9, 10, 11, 12, 13, 14}`. This is a direct, programmatic check of the finding that "this subset of scenarios cannot be reached from scenarios 6–14".
    2.  **Visualization:** It calls the `visualize_transition_graph` helper function, which uses a manually defined layout to create a high-fidelity, publication-quality replica of Figure 2 from the paper.
    3.  **Summary Generation:** It compiles a high-level summary of the graph's key properties (node count, edge count, partition count, etc.).
*   **Outputs:**
    *   A `final_report` dictionary containing the validation status of the partition structure, the `matplotlib.figure.Figure` object of the visualization, and the final structural summary.
*   **Data Transformation:** This function transforms the abstract graph object and its analytical report into two key presentational artifacts: a hard validation of a core research claim and a visual representation of the system's dynamics.
*   **Methodological Role:** This callable serves the crucial role of **results validation and presentation**. It moves beyond mere computation to programmatically verify the key structural finding of the paper (the disconnected state spaces) and to create the primary visual aid (Figure 2) used to communicate the model's dynamic behavior.

#### **10. `analyze_decision_variables_and_strategies` (Task 19 Orchestrator)**

*   **Inputs:**
    *   `integrated_scenarios`: The list of 14 final scenarios.
    *   `master_input_specification`: The global configuration dictionary.
*   **Processes:**
    1.  **Problem Definition:** It formally defines the multi-objective decision problem by specifying the target variables (`REP`, `ROA`, `UND`) and the ideal qualitative state (`(+,+,+)`).
    2.  **Feasibility Check:** It programmatically iterates through all 14 scenarios to prove the paper's assertion that no single scenario achieves the ideal state for all three variables simultaneously, thus confirming that "a compromise is inevitable."
    3.  **Strategy Classification:** It classifies each of the 14 scenarios into one of two groups—"Aggressive Growth" or "Conservative Growth"—based on the qualitative state of the `REP` variable.
*   **Outputs:**
    *   A `final_report` dictionary containing the formal problem definition and a `classification_map` that assigns a strategy to each scenario ID.
*   **Data Transformation:** This function transforms the raw list of scenarios into a structured analytical framework. It annotates the solution space with strategic labels, preparing it for quantitative evaluation.
*   **Methodological Role:** This callable implements the **setup for the decision analysis**. It frames the problem from an investor's perspective, validates the core premise of the trade-off, and segments the solution space into strategically meaningful categories, directly mirroring the analytical narrative of the paper's final section.

#### **11. `evaluate_and_rank_scenarios` (Task 20 Orchestrator)**

*   **Inputs:**
    *   `integrated_scenarios`: The list of 14 final scenarios.
    *   `strategy_classification`: The map of scenario IDs to strategy names from Task 19.
    *   `master_input_specification`: The global configuration dictionary.
*   **Processes:**
    1.  **Scoring:** It applies a 9-point numerical scoring system to the qualitative trend triplet of each target variable in each scenario.
    2.  **Ranking:** It calculates a total weighted score for each scenario based on the formula $S_i = \sum w_k \cdot \text{Score}(V_{ik})$ and ranks the scenarios from most to least desirable.
    3.  **Performance Analysis:** It computes descriptive statistics and correlations for the scores and uses `groupby` to calculate the aggregate performance (mean and standard deviation of scores) for each of the two strategies.
*   **Outputs:**
    *   A `final_report` dictionary containing a ranked `pandas.DataFrame` of all scenarios with their scores, a correlation matrix of the objective scores, and a summary table of the performance of each strategy.
*   **Data Transformation:** This is a critical transformation from a qualitative to a quantitative domain. It converts the symbolic trend triplets into numerical desirability scores, allowing for direct comparison and statistical analysis.
*   **Methodological Role:** This callable provides the **quantitative machinery for the decision analysis**. It moves beyond classification to evaluation, providing the numerical evidence needed to assess the trade-offs and compare the risk/return profiles of the "Aggressive" versus "Conservative" strategies.

#### **12. `generate_investment_strategy_report` (Task 21 Orchestrator)**

*   **Inputs:**
    *   `analysis_report_task20`: The report from the previous task, containing the ranked scenarios and strategy performance metrics.
    *   `graph_analysis_report_task17`: The report containing the graph's partition structure.
*   **Processes:**
    1.  **Strategy-Specific Validation:** It programmatically validates the qualitative claims about each strategy (e.g., asserts that all "Aggressive" scenarios have `REP` score 9 and `ROA`/`UND` score 1).
    2.  **Reachability Validation:** It programmatically validates the "strategy lock-in" by asserting that the set of scenarios in each strategy perfectly corresponds to the set of nodes in each of the graph's disconnected partitions.
    3.  **Framework Construction:** It synthesizes all these validated findings into a final, human-readable decision framework that summarizes the characteristics, trade-offs, and irreversibility of the two strategic choices.
*   **Outputs:**
    *   A `final_report` dictionary containing the structured `decision_framework`.
*   **Data Transformation:** This function transforms a collection of disparate quantitative and structural analyses into a single, coherent, high-level strategic narrative.
*   **Methodological Role:** This callable represents the **synthesis and conclusion of the entire research pipeline**. It brings together the scenario data, the quantitative rankings, and the graph topology to generate the final, actionable insights for a decision-maker, which is the ultimate goal of the paper.

#### **13. `execute_full_project_pipeline` (Top-Level Orchestrator)**

*   **Inputs:**
    *   `raw_df`, `correlation_matrix_df`, `master_input_specification`.
*   **Processes:**
    1.  **Sequential Execution:** It calls every single one of the preceding task orchestrators in the correct, logical order.
    2.  **Data Flow Management:** It correctly unpacks the critical output artifacts from each step (e.g., `clean_df`, `final_cim_constraints`, `integrated_scenarios`, `final_transition_graph`) and passes them as inputs to the subsequent steps.
    3.  **Fail-Fast Logic:** It checks the `overall_status` of each critical step and aborts the pipeline if a failure is detected.
    4.  **Robustness Analysis Execution:** If the baseline pipeline completes successfully, it proceeds to call the master orchestrator for the comprehensive robustness analysis (Task 29).
*   **Outputs:**
    *   A single, master `pipeline_report` dictionary containing the complete, nested reports from every task performed during the run.
*   **Data Transformation:** This function orchestrates the entire transformation of raw inputs into the final, comprehensive research report.
*   **Methodological Role:** This is the **master entry point for the entire project**. It embodies the full, end-to-end research methodology as a single, executable, and fully auditable process. It ensures that the entire complex sequence of data preparation, modeling, solving, and analysis is performed with rigorous control and in the correct order.

<br> <br>

### **Usage Example**
### **Example Usage: Executing the End-to-End Pipeline**

This example demonstrates the complete workflow for running the qualitative reasoning engine. It covers the three essential stages:
1.  **Input Preparation:** Loading the configuration from `config.yaml` and creating synthetic, yet plausible, input data (`raw_df` and `correlation_matrix_df`).
2.  **Pipeline Execution:** Calling the single, top-level orchestrator function.
3.  **Result Inspection:** Performing a high-level inspection of the comprehensive report generated by the pipeline.

#### **Step 1: Input Preparation**

Before the pipeline can be executed, we must prepare its three required inputs.

**1.1. Loading the Master Input Specification from `config.yaml`**

The entire pipeline is governed by the `master_input_specification` dictionary. A critical best practice is to manage this configuration externally. We will load it from the `config.yaml` file that was previously created. This requires the `PyYAML` library.

```python
# Import the necessary library for YAML parsing.
import yaml
from typing import Dict, Any

# Define the path to the configuration file.
# This assumes the YAML file is in the same directory as the execution script/notebook.
config_path = 'config.yaml'

# Initialize a variable to hold the loaded configuration.
master_input_specification: Dict[str, Any]

# Open and read the YAML file into a Python dictionary.
# This is a robust way to manage complex, nested configurations.
try:
    with open(config_path, 'r') as f:
        # The yaml.safe_load function parses the YAML content.
        master_input_specification = yaml.safe_load(f)
    print("Successfully loaded 'master_input_specification' from config.yaml.")
except FileNotFoundError:
    print(f"ERROR: Configuration file not found at '{config_path}'.")
    # In a real application, this would raise an exception or exit.
except Exception as e:
    print(f"ERROR: Failed to parse YAML configuration file: {e}")

```

**1.2. Creating a Synthetic `raw_df`**

In a real-world scenario, this `DataFrame` would be loaded from a database or a CSV file containing historical financial data for a cohort of IPOs. For this example, we will generate a synthetic but structurally and domain-compliant dataset. This ensures we have a valid input for the pipeline to process.

```python
# Import pandas and numpy for data generation.
import pandas as pd
import numpy as np

# Define the 10 required columns for the CIM model.
cim_columns = ['UND', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'REP', 'BOO', 'ROA', 'PRI']

# Generate 100 samples of plausible data.
np.random.seed(42) # Ensure reproducibility of the synthetic data.
num_samples = 100

# Create a dictionary of synthetic data that respects the domain of each variable.
synthetic_data = {
    'UND': np.random.lognormal(mean=2.0, sigma=1.0, size=num_samples),      # Must be >= 0
    'AGE': np.random.uniform(low=1.0, high=50.0, size=num_samples),         # Must be > 0
    'TA':  np.random.lognormal(mean=15.0, sigma=2.0, size=num_samples),     # Must be > 0
    'MAR': np.random.lognormal(mean=16.0, sigma=2.0, size=num_samples),     # Must be > 0
    'LIS': np.random.randint(low=50, high=5000, size=num_samples),          # Must be a natural number
    'QUA': np.random.uniform(low=1.0, high=10.0, size=num_samples),         # Must be > 0
    'REP': np.random.randint(low=1, high=100, size=num_samples),           # Must be a natural number
    'BOO': np.random.uniform(low=0.1, high=5.0, size=num_samples),          # Must be > 0
    'ROA': np.random.normal(loc=0.05, scale=0.1, size=num_samples),         # Can be negative
    'PRI': np.random.uniform(low=0.5, high=10.0, size=num_samples),         # Must be > 0
}

# Create the pandas DataFrame.
raw_df = pd.DataFrame(synthetic_data, columns=cim_columns)

print("\nGenerated a synthetic 'raw_df' with the following structure:")
print(raw_df.info())
```

**1.3. Creating a Synthetic `correlation_matrix_df`**

This matrix is the primary input for the CIM's initial structure. In the actual research, this would be sourced from prior empirical work (e.g., Reference [46]). Here, we will use the empirical correlation of our synthetic `raw_df` as a plausible stand-in.

```python
# Calculate the Pearson correlation matrix from our synthetic raw data.
correlation_matrix_df = raw_df.corr(method='pearson')

print("\nGenerated a synthetic 'correlation_matrix_df' for the pipeline:")
print(correlation_matrix_df.head())
```

#### **Step 2: Pipeline Execution**

With all three inputs prepared, we can now execute the entire end-to-end pipeline with a single function call. This function encapsulates all 29 tasks, from validation and cleansing to modeling, analysis, and reporting.

```python
# =============================================================================
# NOTE: This step assumes that the top-level orchestrator function
# `execute_full_project_pipeline` and all its dependencies are defined and
# available in the current execution scope.
# =============================================================================

print("\n--- EXECUTING THE FULL END-TO-END PIPELINE ---")
print("This process is computationally intensive and may take several minutes.")

# Call the top-level orchestrator with the prepared inputs.
full_pipeline_report = execute_full_project_pipeline(
    raw_df=raw_df,
    correlation_matrix_df=correlation_matrix_df,
    master_input_specification=master_input_specification
)

print("\n--- PIPELINE EXECUTION COMPLETE ---")
```

#### **Step 3: Result Inspection**

The output of the pipeline is a single, comprehensive dictionary containing the results and status of every task. We can inspect this report to retrieve the key findings.

```python
# --- High-Level Inspection of the Final Report ---

# Check the final overall status of the entire project.
print(f"\nFinal Project Status: {full_pipeline_report.get('overall_status')}")
print(f"Summary Message: {full_pipeline_report.get('summary_message')}")

# --- Retrieving Key Artifacts from the Report ---

# The report is a deeply nested dictionary. We can use a helper or direct
# access to retrieve specific results for inspection.

# Example 1: Retrieve the final, ranked table of scenarios (from Task 20).
try:
    ranked_scenarios_df = full_pipeline_report['baseline_pipeline_report']['task_reports']['task_20_scenario_ranking']['outputs']['full_analysis_table']
    print("\n--- Final Ranked Scenarios (Top 5) ---")
    print(ranked_scenarios_df.head())
except KeyError:
    print("\nCould not retrieve ranked scenarios table. The pipeline may have failed before this step.")

# Example 2: Retrieve the final strategic decision framework (from Task 21).
try:
    decision_framework = full_pipeline_report['baseline_pipeline_report']['task_reports']['task_21_strategy_report']['outputs']['decision_framework']
    print("\n--- Strategic Decision Framework Summary ---")
    # Print a summary for one of the strategies.
    aggressive_strategy = decision_framework.get('Aggressive Growth', {})
    print("\nStrategy: Aggressive Growth")
    print(f"  - Associated Scenarios: {aggressive_strategy.get('associated_scenarios')}")
    print(f"  - Average Score: {aggressive_strategy.get('quantitative_profile', {}).get('average_total_score'):.2f}")
    print(f"  - Summary: {aggressive_strategy.get('summary')}")
    
    # Print the lock-in conclusion.
    lock_in = decision_framework.get('STRATEGY_LOCK_IN_CONCLUSION', {})
    print(f"\nStrategic Lock-In: {lock_in.get('is_choice_irreversible')}")
    print(f"  - Implication: {lock_in.get('implication')}")
except KeyError:
    print("\nCould not retrieve the decision framework. The pipeline may have failed before this step.")

# Example 3: Retrieve the generated graph visualization (from Task 23).
try:
    graph_figure = full_pipeline_report['baseline_pipeline_report']['task_reports']['task_23_visualization']['outputs']['graph_visualization_figure']
    if graph_figure:
        print("\n--- Transitional Graph Visualization ---")
        # In a Jupyter environment, this would display the plot.
        # In a script, we can save it to a file.
        graph_figure.savefig("transitional_graph.png")
        print("Graph visualization has been saved to 'transitional_graph.png'")
        plt.show() # Display the plot
    else:
        print("\nGraph visualization was not generated (e.g., missing libraries).")
except KeyError:
    print("\nCould not retrieve the graph visualization. The pipeline may have failed before this step.")

```




In [None]:
# Task 1: Raw DataFrame Validation and Schema Verification

# =============================================================================
# Task 1, Step 1: Structural Integrity Validation
# =============================================================================

def validate_structural_integrity(
    raw_df: pd.DataFrame,
    expected_columns: Set[str],
    min_rows: int
) -> Tuple[bool, Dict[str, Any]]:
    """
    Performs structural integrity validation on the input DataFrame.

    This function executes Step 1 of the validation pipeline, checking for
    correct shape, column names, minimum sample size, and numeric data types.
    It performs all checks and aggregates the results into a comprehensive
    report rather than failing at the first error.

    Args:
        raw_df (pd.DataFrame): The raw input DataFrame to be validated.
        expected_columns (Set[str]): A set of exact column names expected
                                     in the DataFrame.
        min_rows (int): The minimum required number of rows for statistical
                        validity.

    Returns:
        Tuple[bool, Dict[str, Any]]: A tuple containing:
            - bool: True if all structural checks pass, False otherwise.
            - Dict[str, Any]: A detailed dictionary containing the status
              of each check and descriptive error messages for failures.
    """
    # Initialize a dictionary to hold validation results and messages.
    validation_report = {
        "overall_status": True,
        "checks": {}
    }

    # --- Preliminary Check: Ensure input is a pandas DataFrame ---
    if not isinstance(raw_df, pd.DataFrame):
        # If not a DataFrame, report a fatal error and return immediately.
        validation_report["overall_status"] = False
        validation_report["fatal_error"] = "Input is not a pandas DataFrame."
        return False, validation_report

    # --- Check 1: Column Count Verification ---
    # Verify that the DataFrame has exactly the expected number of columns.
    # Equation/Rule: raw_df.shape[1] == 10
    num_columns = raw_df.shape[1]
    is_column_count_valid = (num_columns == len(expected_columns))
    validation_report["checks"]["column_count"] = {
        "status": is_column_count_valid,
        "expected": len(expected_columns),
        "actual": num_columns,
        "message": "OK" if is_column_count_valid else f"Expected {len(expected_columns)} columns, but found {num_columns}."
    }
    if not is_column_count_valid:
        validation_report["overall_status"] = False

    # --- Check 2: Column Name and Presence Verification ---
    # Normalize column names (uppercase, strip whitespace) for robust comparison.
    actual_columns = {col.strip().upper() for col in raw_df.columns}
    normalized_expected_columns = {col.strip().upper() for col in expected_columns}

    # Verify if the set of actual column names matches the expected set.
    # Equation/Rule: set(raw_df.columns) == expected_columns
    are_columns_valid = (actual_columns == normalized_expected_columns)
    if are_columns_valid:
        validation_report["checks"]["column_names"] = {
            "status": True,
            "message": "All expected columns are present."
        }
    else:
        # If columns do not match, identify missing and unexpected columns.
        validation_report["overall_status"] = False
        missing_cols = normalized_expected_columns - actual_columns
        unexpected_cols = actual_columns - normalized_expected_columns
        message = ""
        if missing_cols:
            message += f"Missing columns: {sorted(list(missing_cols))}. "
        if unexpected_cols:
            message += f"Unexpected columns: {sorted(list(unexpected_cols))}. "
        validation_report["checks"]["column_names"] = {
            "status": False,
            "message": message.strip()
        }

    # --- Check 3: Minimum Sample Size Verification ---
    # Verify that the DataFrame has at least the minimum required number of rows.
    # Equation/Rule: raw_df.shape[0] >= 30
    num_rows = raw_df.shape[0]
    is_min_rows_valid = (num_rows >= min_rows)
    validation_report["checks"]["min_sample_size"] = {
        "status": is_min_rows_valid,
        "expected": f">= {min_rows}",
        "actual": num_rows,
        "message": "OK" if is_min_rows_valid else f"Insufficient samples. Expected at least {min_rows} rows, but found {num_rows}."
    }
    if not is_min_rows_valid:
        validation_report["overall_status"] = False

    # --- Check 4: Data Type Validation ---
    # Verify that all columns in the DataFrame have a numeric data type.
    # Equation/Rule: All columns must be numeric (pd.api.types.is_numeric_dtype()).
    non_numeric_columns = [
        col for col in raw_df.columns if not pd.api.types.is_numeric_dtype(raw_df[col])
    ]
    are_types_valid = not non_numeric_columns
    validation_report["checks"]["numeric_data_types"] = {
        "status": are_types_valid,
        "message": "OK" if are_types_valid else f"Non-numeric columns found: {non_numeric_columns}."
    }
    if not are_types_valid:
        validation_report["overall_status"] = False

    # Return the final status and the detailed report.
    return validation_report["overall_status"], validation_report


# =============================================================================
# Task 1, Step 2: Domain Constraint Validation
# =============================================================================

def validate_domain_constraints(
    raw_df: pd.DataFrame
) -> Tuple[bool, Dict[str, Any]]:
    """
    Performs domain constraint validation for each specified variable.

    This function executes Step 2 of the validation pipeline, checking if
    the values in each column adhere to their predefined economic and
    mathematical domains (e.g., positivity, integer-only).

    Args:
        raw_df (pd.DataFrame): The input DataFrame, assumed to have passed
                               structural integrity checks.

    Returns:
        Tuple[bool, Dict[str, Any]]: A tuple containing:
            - bool: True if all domain constraints are met, False otherwise.
            - Dict[str, Any]: A detailed report mapping each column to its
              validation status and a list of indices for any violating rows.
    """
    # Define the domain constraints for each variable as specified.
    constraints = {
        'UND': {'type': 'real_non_negative', 'rule': lambda s: s >= 0},
        'AGE': {'type': 'real_positive', 'rule': lambda s: s > 0},
        'TA': {'type': 'real_positive', 'rule': lambda s: s > 0},
        'MAR': {'type': 'real_positive', 'rule': lambda s: s > 0},
        'LIS': {'type': 'natural_number', 'rule': lambda s: (s >= 1) & np.isclose(s % 1, 0)},
        'QUA': {'type': 'real_positive', 'rule': lambda s: s > 0},
        'REP': {'type': 'natural_number', 'rule': lambda s: (s >= 1) & np.isclose(s % 1, 0)},
        'BOO': {'type': 'real_positive', 'rule': lambda s: s > 0},
        'ROA': {'type': 'real', 'rule': lambda s: pd.api.types.is_numeric_dtype(s)},
        'PRI': {'type': 'real_positive', 'rule': lambda s: s > 0},
    }

    # Initialize a dictionary to hold the validation report.
    validation_report = {
        "overall_status": True,
        "variable_reports": {}
    }

    # Iterate through each column defined in the constraints.
    for col, spec in constraints.items():
        # Check if the column exists in the DataFrame to prevent KeyErrors.
        if col not in raw_df.columns:
            validation_report["variable_reports"][col] = {
                "status": False,
                "message": "Column not found in DataFrame.",
                "violating_indices": []
            }
            validation_report["overall_status"] = False
            continue

        # Apply the validation rule to the column series.
        # This creates a boolean mask where True indicates a valid value.
        is_valid_mask = spec['rule'](raw_df[col])

        # Check if all values in the column are valid.
        if is_valid_mask.all():
            # If all are valid, report success for this column.
            validation_report["variable_reports"][col] = {
                "status": True,
                "message": "OK",
                "violating_indices": []
            }
        else:
            # If any value is invalid, update the overall status to False.
            validation_report["overall_status"] = False
            # Identify the indices of the rows that violate the constraint.
            violating_indices = raw_df[~is_valid_mask].index.tolist()
            # Report the failure with details.
            validation_report["variable_reports"][col] = {
                "status": False,
                "message": f"Violates domain constraint: '{spec['type']}'.",
                "violating_indices": violating_indices
            }

    # Return the final status and the detailed report.
    return validation_report["overall_status"], validation_report


# =============================================================================
# Task 1, Step 3: Statistical Quality Assessment
# =============================================================================

def assess_statistical_quality(
    raw_df: pd.DataFrame
) -> Tuple[bool, pd.DataFrame]:
    """
    Performs a statistical quality assessment of the DataFrame.

    This function executes Step 3 of the validation pipeline. It checks for
    missing values, identifies outliers using the robust IQR method, and
    calculates skewness and kurtosis for each variable's distribution.

    Args:
        raw_df (pd.DataFrame): The input DataFrame, assumed to be clean and
                               structurally valid.

    Returns:
        Tuple[bool, pd.DataFrame]: A tuple containing:
            - bool: True if no missing values are found, False otherwise.
            - pd.DataFrame: A summary DataFrame with variables as rows and
              statistical metrics as columns.
    """
    # --- Check 1: Missing Value Analysis ---
    # Calculate the total number of missing values in the entire DataFrame.
    # Equation/Rule: raw_df.isnull().sum().sum() == 0
    total_missing_values = raw_df.isnull().sum().sum()
    has_no_missing_values = (total_missing_values == 0)

    # Initialize a list to store statistical summaries for each column.
    stats_summary_list = []

    # Get the total number of rows for calculating outlier percentage.
    num_rows = len(raw_df)

    # Iterate through each column to calculate its statistics.
    for col in raw_df.columns:
        # Select the column series.
        series = raw_df[col]

        # --- Outlier Detection Using IQR Method ---
        # Calculate the first quartile (Q1) and third quartile (Q3).
        # Equation: Q1 = quantile(X, 0.25), Q3 = quantile(X, 0.75)
        q1 = series.quantile(0.25)
        q3 = series.quantile(0.75)

        # Calculate the Interquartile Range (IQR).
        # Equation: IQR = Q3 - Q1
        iqr = q3 - q1

        # Calculate the median.
        median = series.median()

        # Define the outlier condition.
        # Equation: |X_ij - median(X_i)| > 3 * IQR_i
        # Handle the case where IQR is zero to avoid flagging all non-median values.
        if np.isclose(iqr, 0):
            outlier_mask = pd.Series(False, index=series.index)
        else:
            outlier_mask = np.abs(series - median) > (3 * iqr)

        # Count the number of outliers.
        outlier_count = outlier_mask.sum()

        # Calculate the percentage of outliers.
        outlier_percentage = (outlier_count / num_rows) * 100 if num_rows > 0 else 0

        # --- Distribution Assessment ---
        # Calculate skewness using scipy.stats for bias correction.
        # Equation: skewness = E[(X-μ)³]/σ³
        skewness = stats.skew(series.dropna())

        # Calculate kurtosis (Fisher's definition, excess kurtosis) using scipy.stats.
        # Equation: kurtosis = E[(X-μ)⁴]/σ⁴ - 3
        kurtosis = stats.kurtosis(series.dropna())

        # Append the results for the current column to the summary list.
        stats_summary_list.append({
            "variable": col,
            "missing_values": series.isnull().sum(),
            "outlier_count": outlier_count,
            "outlier_percentage": outlier_percentage,
            "skewness": skewness,
            "kurtosis": kurtosis
        })

    # Create a DataFrame from the summary list and set the variable name as the index.
    summary_df = pd.DataFrame(stats_summary_list).set_index("variable")

    # Return the missing value status and the summary DataFrame.
    return has_no_missing_values, summary_df


# =============================================================================
# Task 1: Orchestrator Function
# =============================================================================

def validate_raw_dataframe_and_schema(
    raw_df: pd.DataFrame,
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the complete validation pipeline for the raw input DataFrame.

    This function serves as the main entry point for Task 1. It sequentially
    executes the three validation steps:
    1. Structural Integrity Validation
    2. Domain Constraint Validation
    3. Statistical Quality Assessment

    It aggregates the results from each step into a single, comprehensive
    report dictionary.

    Args:
        raw_df (pd.DataFrame): The raw input DataFrame of financial data.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary containing expected columns and other parameters.

    Returns:
        Dict[str, Any]: A nested dictionary containing the overall validation
                        status and detailed reports from each validation step.
    """
    # Initialize the final report dictionary.
    final_report = {
        "task_name": "Task 1: Raw DataFrame Validation and Schema Verification",
        "overall_status": "SUCCESS",
        "steps": {}
    }

    # --- Step 1: Structural Integrity Validation ---
    # Define expected columns and minimum rows from the master specification.
    expected_columns = {
        'UND', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'REP', 'BOO', 'ROA', 'PRI'
    }
    min_rows = 30 # As per task specification

    # Execute the structural validation function.
    struct_ok, struct_report = validate_structural_integrity(
        raw_df=raw_df,
        expected_columns=expected_columns,
        min_rows=min_rows
    )
    # Store the report from this step.
    final_report["steps"]["structural_integrity"] = struct_report
    # If structural validation fails, it's a critical error.
    # We stop here and report failure.
    if not struct_ok:
        final_report["overall_status"] = "FAILURE"
        final_report["failure_reason"] = "Structural integrity validation failed. Further checks aborted."
        return final_report

    # --- Step 2: Domain Constraint Validation ---
    # Execute the domain constraint validation function.
    domain_ok, domain_report = validate_domain_constraints(raw_df=raw_df)
    # Store the report from this step.
    final_report["steps"]["domain_constraints"] = domain_report
    # If domain validation fails, update the overall status but continue.
    if not domain_ok:
        final_report["overall_status"] = "FAILURE"
        # We can still proceed to statistical analysis, but the overall result is a failure.

    # --- Step 3: Statistical Quality Assessment ---
    # Execute the statistical quality assessment function.
    stats_ok, stats_report = assess_statistical_quality(raw_df=raw_df)
    # Store the report from this step.
    final_report["steps"]["statistical_quality"] = {
        "no_missing_values": stats_ok,
        "summary_statistics": stats_report
    }
    # If missing values are found, update the overall status.
    if not stats_ok:
        final_report["overall_status"] = "FAILURE"

    # Provide a final summary message based on the overall status.
    if final_report["overall_status"] == "SUCCESS":
        final_report["summary_message"] = "All validation checks passed successfully."
    else:
        # If any check failed, construct a summary of failures.
        failure_reasons = []
        if not struct_ok: failure_reasons.append("structural integrity")
        if not domain_ok: failure_reasons.append("domain constraints")
        if not stats_ok: failure_reasons.append("missing values detected")
        final_report["summary_message"] = f"Validation failed due to issues in: {', '.join(failure_reasons)}."

    # Return the complete, aggregated report.
    return final_report


In [None]:
# Task 2: Correlation Matrix Validation and Integrity Verification

# =============================================================================
# Task 2, Step 1: Matrix Structure Validation
# =============================================================================

def validate_matrix_structure(
    correlation_matrix_df: pd.DataFrame,
    expected_variables: Set[str]
) -> Tuple[bool, Dict[str, Any]]:
    """
    Validates the fundamental structural properties of a correlation matrix.

    This function executes Step 1 of the validation, ensuring the matrix is
    square, has the correct dimensions, possesses identical and correctly
    ordered indices and columns, is symmetric, and has a diagonal of ones.
    All checks are performed with high numerical precision.

    Args:
        correlation_matrix_df (pd.DataFrame): The correlation matrix to validate.
        expected_variables (Set[str]): A set of the exact variable names
                                        expected in the matrix's index and columns.

    Returns:
        Tuple[bool, Dict[str, Any]]: A tuple containing:
            - bool: True if all structural checks pass, False otherwise.
            - Dict[str, Any]: A detailed report on each structural check.
    """
    # Initialize a dictionary to hold validation results and messages.
    report = {"overall_status": True, "checks": {}}

    # --- Preliminary Check: Ensure input is a pandas DataFrame ---
    if not isinstance(correlation_matrix_df, pd.DataFrame):
        report["overall_status"] = False
        report["fatal_error"] = "Input is not a pandas DataFrame."
        return False, report

    # --- Check 1: Dimensional Consistency (Squareness and Size) ---
    # A correlation matrix must be square with dimensions matching the variable count.
    # Equation/Rule: correlation_matrix_df.shape == (10, 10)
    shape = correlation_matrix_df.shape
    n_expected = len(expected_variables)
    is_square_and_correct_size = (shape[0] == shape[1] == n_expected)
    report["checks"]["is_square_and_correct_size"] = {
        "status": is_square_and_correct_size,
        "expected": (n_expected, n_expected),
        "actual": shape,
        "message": "OK" if is_square_and_correct_size else f"Matrix shape is not the expected ({n_expected}, {n_expected})."
    }
    if not is_square_and_correct_size:
        report["overall_status"] = False
        # If shape is wrong, further checks are unreliable.
        return False, report

    # --- Check 2: Index-Column Alignment and Content ---
    # The index and columns must contain the same set of expected variables.
    # Equation/Rule: index.tolist() == columns.tolist() == expected_variables
    index_set = set(correlation_matrix_df.index)
    columns_set = set(correlation_matrix_df.columns)
    are_sets_identical = (index_set == columns_set == expected_variables)

    # Also check if the order is identical, which is a stricter requirement.
    is_order_identical = correlation_matrix_df.index.equals(correlation_matrix_df.columns)

    if are_sets_identical and is_order_identical:
        report["checks"]["index_column_alignment"] = {"status": True, "message": "OK"}
    else:
        report["overall_status"] = False
        message = ""
        if not are_sets_identical:
            missing = expected_variables - index_set
            extra = index_set - expected_variables
            message += f"Variable mismatch. Missing: {missing if missing else 'None'}. Extra: {extra if extra else 'None'}. "
        if not is_order_identical:
            message += "Index and column orders do not match."
        report["checks"]["index_column_alignment"] = {"status": False, "message": message.strip()}


    # --- Check 3: Symmetry Verification ---
    # The matrix must be symmetric, i.e., C = C^T.
    # Equation/Rule: np.allclose(C, C.T, atol=1e-10)
    matrix_values = correlation_matrix_df.values
    is_symmetric = np.allclose(matrix_values, matrix_values.T, atol=1e-10)
    report["checks"]["is_symmetric"] = {
        "status": is_symmetric,
        "message": "OK" if is_symmetric else "Matrix is not symmetric within tolerance."
    }
    if not is_symmetric:
        report["overall_status"] = False

    # --- Check 4: Diagonal Unity Check ---
    # All diagonal elements of a correlation matrix must be 1.0.
    # Equation/Rule: np.allclose(np.diag(C), 1.0, atol=1e-10)
    diagonal_values = np.diag(matrix_values)
    is_diag_one = np.allclose(diagonal_values, 1.0, atol=1e-10)
    report["checks"]["is_diagonal_unity"] = {
        "status": is_diag_one,
        "message": "OK" if is_diag_one else "Diagonal elements are not all 1.0 within tolerance."
    }
    if not is_diag_one:
        report["overall_status"] = False

    return report["overall_status"], report

# =============================================================================
# Task 2, Step 2: Correlation Coefficient Range and Validity
# =============================================================================

def validate_matrix_coefficients(
    correlation_matrix_df: pd.DataFrame
) -> Tuple[bool, Dict[str, Any]]:
    """
    Validates the mathematical properties of the correlation coefficients.

    This function executes Step 2, checking that all coefficients are within
    the valid range [-1, 1], that the matrix is positive semi-definite, and
    that the numerical precision meets a minimum standard.

    Args:
        correlation_matrix_df (pd.DataFrame): The correlation matrix, assumed
                                              to be structurally valid.

    Returns:
        Tuple[bool, Dict[str, Any]]: A tuple containing:
            - bool: True if all coefficient checks pass, False otherwise.
            - Dict[str, Any]: A detailed report on each mathematical check.
    """
    report = {"overall_status": True, "checks": {}}
    matrix_values = correlation_matrix_df.values

    # --- Check 1: Coefficient Range Validation ---
    # All values c_ij must be in the range [-1, 1].
    # Equation/Rule: c_ij ∈ [-1, 1]
    # We use a small tolerance to account for floating point representation.
    is_in_range = (
        (matrix_values >= -1.0 - 1e-12) & (matrix_values <= 1.0 + 1e-12)
    ).all()
    report["checks"]["coefficient_range"] = {
        "status": is_in_range,
        "min_value": matrix_values.min(),
        "max_value": matrix_values.max(),
        "message": "OK" if is_in_range else "Coefficients found outside the valid [-1, 1] range."
    }
    if not is_in_range:
        report["overall_status"] = False

    # --- Check 2: Positive Semi-Definite Check ---
    # A valid correlation matrix must be positive semi-definite (all eigenvalues >= 0).
    # Equation/Rule: λ_i ≥ 0 for all eigenvalues λ_i
    try:
        # First, ensure all values are finite to prevent linalg errors.
        if not np.isfinite(matrix_values).all():
            raise ValueError("Matrix contains non-finite (NaN or inf) values.")

        # Calculate eigenvalues.
        eigenvalues = np.linalg.eigvals(matrix_values)
        min_eigenvalue = eigenvalues.min()

        # Check if the minimum eigenvalue is non-negative within a tolerance.
        is_psd = min_eigenvalue >= -1e-12
        report["checks"]["is_positive_semi_definite"] = {
            "status": is_psd,
            "min_eigenvalue": min_eigenvalue,
            "message": "OK" if is_psd else f"Matrix is not positive semi-definite. Minimum eigenvalue is {min_eigenvalue:.2e}."
        }
        if not is_psd:
            report["overall_status"] = False

    except (np.linalg.LinAlgError, ValueError) as e:
        # Handle cases where eigenvalue decomposition fails.
        report["overall_status"] = False
        report["checks"]["is_positive_semi_definite"] = {
            "status": False,
            "min_eigenvalue": None,
            "message": f"Eigenvalue decomposition failed: {e}"
        }

    # --- Check 3: Numerical Precision Assessment ---
    # Check if the matrix has a minimum of 3 decimal places of precision.
    # This is checked by seeing if rounding to 3 places changes the matrix.
    # We only check off-diagonal elements for this property.
    off_diagonal_mask = ~np.eye(matrix_values.shape[0], dtype=bool)
    off_diagonal_values = matrix_values[off_diagonal_mask]

    # Check if any off-diagonal value has more than 2 decimal places.
    has_sufficient_precision = np.any(~np.isclose(off_diagonal_values, np.round(off_diagonal_values, 2)))

    report["checks"]["numerical_precision"] = {
        "status": has_sufficient_precision,
        "message": "OK" if has_sufficient_precision else "Precision appears low; all values have 2 or fewer decimal places."
    }
    # This is a soft check, so it does not alter the overall status.

    return report["overall_status"], report

# =============================================================================
# Task 2, Step 3: Cross-Validation with Raw Data
# =============================================================================

def cross_validate_matrix_with_data(
    correlation_matrix_df: pd.DataFrame,
    raw_df: pd.DataFrame,
    consistency_tolerance: float = 0.05
) -> Tuple[bool, Dict[str, Any]]:
    """
    Cross-validates a provided correlation matrix against one computed from raw data.

    This function executes Step 3, performing three checks:
    1. Computes an empirical correlation matrix from the raw data.
    2. Verifies consistency by checking if the absolute difference between the
       provided and empirical matrices is within a given tolerance.
    3. Computes t-statistics and p-values for the provided correlations to
       assess their statistical significance.

    Args:
        correlation_matrix_df (pd.DataFrame): The provided correlation matrix.
        raw_df (pd.DataFrame): The raw data used for empirical calculation.
        consistency_tolerance (float): The maximum allowed absolute difference
                                       between corresponding correlation coefficients.

    Returns:
        Tuple[bool, Dict[str, Any]]: A tuple containing:
            - bool: True if the consistency check passes, False otherwise.
            - Dict[str, Any]: A detailed report including the empirical matrix,
              difference matrix, and significance test results.
    """
    report = {"overall_status": True, "results": {}}

    # --- Step 3.1: Empirical Correlation Computation ---
    # Ensure columns are in a canonical (sorted) order for comparison.
    canonical_order = sorted(correlation_matrix_df.columns)
    provided_matrix = correlation_matrix_df.reindex(index=canonical_order, columns=canonical_order)

    # Compute the empirical matrix from the raw data.
    # Equation/Rule: C_empirical = raw_df.corr(method='pearson')
    # Drop rows with any NaNs to ensure a consistent sample size for all pairs.
    clean_raw_df = raw_df[canonical_order].dropna()
    n_samples = len(clean_raw_df)

    if n_samples < 3:
        report["overall_status"] = False
        report["fatal_error"] = f"Insufficient non-NaN samples ({n_samples}) to compute correlations."
        return False, report

    empirical_matrix = clean_raw_df.corr(method='pearson')
    report["results"]["empirical_correlation_matrix"] = empirical_matrix
    report["results"]["sample_size_used"] = n_samples

    # --- Step 3.2: Consistency Verification ---
    # Calculate the absolute difference between the two matrices.
    # Equation/Rule: np.abs(C_provided - C_empirical) <= tolerance
    diff_matrix = np.abs(provided_matrix - empirical_matrix)
    max_diff = diff_matrix.max().max()
    is_consistent = max_diff <= consistency_tolerance

    report["overall_status"] = is_consistent
    report["results"]["consistency_check"] = {
        "status": is_consistent,
        "max_absolute_difference": max_diff,
        "tolerance": consistency_tolerance,
        "message": "OK" if is_consistent else "Maximum difference exceeds tolerance."
    }
    report["results"]["difference_matrix"] = diff_matrix

    # --- Step 3.3: Statistical Significance Testing ---
    # Compute t-statistics for the provided correlation coefficients.
    # Equation/Rule: t_ij = c_ij * sqrt((n-2) / (1 - c_ij^2))
    c = provided_matrix.values
    # Add a small epsilon to the denominator to prevent division by zero if c_ij is +/- 1.
    denominator = 1 - c**2 + 1e-12
    t_stats_values = c * np.sqrt((n_samples - 2) / denominator)

    # The diagonal is undefined (corr=1), so set it to NaN.
    np.fill_diagonal(t_stats_values, np.nan)

    # Calculate two-tailed p-values from the t-statistics.
    # The degrees of freedom for the t-distribution is n-2.
    p_values = stats.t.sf(np.abs(t_stats_values), df=n_samples - 2) * 2

    report["results"]["t_statistic_matrix"] = pd.DataFrame(t_stats_values, index=canonical_order, columns=canonical_order)
    report["results"]["p_value_matrix"] = pd.DataFrame(p_values, index=canonical_order, columns=canonical_order)

    return report["overall_status"], report

# =============================================================================
# Task 2: Orchestrator Function
# =============================================================================

def validate_correlation_matrix_and_integrity(
    correlation_matrix_df: pd.DataFrame,
    raw_df: pd.DataFrame,
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the complete validation pipeline for the correlation matrix.

    This function serves as the main entry point for Task 2. It sequentially
    executes the three validation steps:
    1. Matrix Structure Validation
    2. Coefficient Range and Validity
    3. Cross-Validation with Raw Data

    It aggregates the results into a single, comprehensive report.

    Args:
        correlation_matrix_df (pd.DataFrame): The correlation matrix to validate.
        raw_df (pd.DataFrame): The raw data for cross-validation.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary containing expected variables and other parameters.

    Returns:
        Dict[str, Any]: A nested dictionary containing the overall validation
                        status and detailed reports from each validation step.
    """
    final_report = {
        "task_name": "Task 2: Correlation Matrix Validation and Integrity Verification",
        "overall_status": "SUCCESS",
        "steps": {}
    }

    # Define the set of expected variables for the CIM model.
    expected_variables = {
        'UND', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'REP', 'BOO', 'ROA', 'PRI'
    }

    # --- Step 1: Matrix Structure Validation ---
    struct_ok, struct_report = validate_matrix_structure(
        correlation_matrix_df=correlation_matrix_df,
        expected_variables=expected_variables
    )
    final_report["steps"]["matrix_structure"] = struct_report
    if not struct_ok:
        final_report["overall_status"] = "FAILURE"
        final_report["failure_reason"] = "Matrix structure validation failed. Further checks aborted."
        return final_report

    # --- Step 2: Coefficient Range and Validity ---
    coeff_ok, coeff_report = validate_matrix_coefficients(
        correlation_matrix_df=correlation_matrix_df
    )
    final_report["steps"]["coefficient_validity"] = coeff_report
    if not coeff_ok:
        final_report["overall_status"] = "FAILURE"
        # We can still proceed to cross-validation, but the overall result is a failure.

    # --- Step 3: Cross-Validation with Raw Data ---
    cross_val_ok, cross_val_report = cross_validate_matrix_with_data(
        correlation_matrix_df=correlation_matrix_df,
        raw_df=raw_df
    )
    final_report["steps"]["cross_validation"] = cross_val_report
    if not cross_val_ok:
        final_report["overall_status"] = "FAILURE"

    # Provide a final summary message.
    if final_report["overall_status"] == "SUCCESS":
        final_report["summary_message"] = "All correlation matrix validation checks passed successfully."
    else:
        failure_reasons = []
        if not struct_ok: failure_reasons.append("matrix structure")
        if not coeff_ok: failure_reasons.append("coefficient validity (e.g., not PSD)")
        if not cross_val_ok: failure_reasons.append("consistency with raw data")
        final_report["summary_message"] = f"Validation failed due to issues in: {', '.join(failure_reasons)}."

    return final_report


In [None]:
# Task 3: Master Input Specification Dictionary Validation

# =============================================================================
# Task 3, Helper Function: Recursive Validator
# =============================================================================

def _recursively_validate_spec_dict(
    spec_dict: Dict[str, Any],
    schema: Dict[str, Any],
    path: str = ""
) -> List[str]:
    """
    A recursive helper function to validate a nested dictionary against a schema.

    This function traverses both the specification dictionary and a parallel
    schema dictionary. It validates key presence, data types, and specific
    value constraints defined in the schema. It is the core engine for
    Steps 1 and 2.

    Args:
        spec_dict (Dict[str, Any]): The dictionary (or sub-dictionary) to validate.
        schema (Dict[str, Any]): A dictionary defining the expected structure,
                                 types, and value constraints.
        path (str): The current dot-notation path, used for error reporting.

    Returns:
        List[str]: A list of all validation error messages found. An empty
                   list indicates success.
    """
    errors = []

    # --- Structural Check: Key Mismatch ---
    # Ensure the set of keys in the dictionary matches the schema exactly.
    spec_keys = set(spec_dict.keys())
    schema_keys = set(schema.keys())
    if spec_keys != schema_keys:
        missing = schema_keys - spec_keys
        extra = spec_keys - schema_keys
        if missing:
            errors.append(f"Path '{path}': Missing required keys: {sorted(list(missing))}")
        if extra:
            errors.append(f"Path '{path}': Found unexpected keys: {sorted(list(extra))}")
        # Do not proceed further down this path if keys are fundamentally wrong.
        return errors

    # --- Type, Value, and Recursive Checks for each key ---
    for key, schema_value in schema.items():
        # Construct the full path for the current key for error messages.
        current_path = f"{path}.{key}" if path else key
        actual_value = spec_dict[key]

        # --- Type Validation ---
        # The schema must specify an expected type.
        expected_type = schema_value.get("type")
        if not isinstance(actual_value, expected_type):
            errors.append(
                f"Path '{current_path}': Invalid type. "
                f"Expected {expected_type.__name__}, but found {type(actual_value).__name__}."
            )
            # If type is wrong, further checks on this key are unreliable.
            continue

        # --- Value Validation (if a rule is provided) ---
        # The schema can provide a validation function (lambda or regular).
        validator = schema_value.get("validator")
        if validator:
            is_valid, message = validator(actual_value)
            if not is_valid:
                errors.append(f"Path '{current_path}': {message}")

        # --- Recursive Validation for nested dictionaries ---
        # If the schema specifies a nested structure, recurse.
        nested_schema = schema_value.get("nested_schema")
        if nested_schema:
            # This applies to nested dictionaries.
            if isinstance(actual_value, dict):
                errors.extend(
                    _recursively_validate_spec_dict(
                        spec_dict=actual_value,
                        schema=nested_schema,
                        path=current_path
                    )
                )
            # This applies to lists of dictionaries.
            elif isinstance(actual_value, list):
                for i, item in enumerate(actual_value):
                    if isinstance(item, dict):
                        errors.extend(
                            _recursively_validate_spec_dict(
                                spec_dict=item,
                                schema=nested_schema,
                                path=f"{current_path}[{i}]"
                            )
                        )
                    else:
                        errors.append(f"Path '{current_path}[{i}]': Expected item to be a dict, but found {type(item).__name__}.")

    return errors


# =============================================================================
# Task 3, Step 3: Integration Constraint Validation
# =============================================================================

def validate_integration_constraints(
    spec_dict: Dict[str, Any]
) -> List[str]:
    """
    Performs specific validation on the integration constraints section.

    This function executes Step 3, verifying the count, content, and variable
    references of the cross-model integration constraints defined in the
    master specification.

    Args:
        spec_dict (Dict[str, Any]): The master input specification dictionary.

    Returns:
        List[str]: A list of validation error messages. An empty list
                   indicates success.
    """
    errors = []
    path = "empirical_data.expert_knowledge.integration_constraints"

    try:
        # Safely navigate to the list of constraints.
        constraints_list = spec_dict["empirical_data"]["expert_knowledge"]["integration_constraints"]["constraints"]

        # --- Check 1: Constraint Count ---
        # Verify that there are exactly 3 integration constraints.
        # Rule: constraint_count == 3
        if len(constraints_list) != 3:
            errors.append(f"Path '{path}.constraints': Expected 3 integration constraints, but found {len(constraints_list)}.")
            return errors # Fatal error for this section

        # --- Check 2: Mathematical Form Validation ---
        # Verify the exact set of required mathematical forms.
        # Rule: Forms must be 'σ_{+-}(Z_2, REP)', 'σ_{--}(Z_1, UND)', 'RED(W, REP)'
        expected_forms = {'σ_{+-}(Z_2, REP)', 'σ_{--}(Z_1, UND)', 'RED(W, REP)'}
        actual_forms = {c.get("mathematical_form", "").replace(" ", "") for c in constraints_list}

        if actual_forms != expected_forms:
            missing = expected_forms - actual_forms
            extra = actual_forms - expected_forms
            message = "Mathematical form mismatch. "
            if missing: message += f"Missing: {missing}. "
            if extra: message += f"Extra: {extra}."
            errors.append(f"Path '{path}.constraints': {message.strip()}")

        # --- Check 3: Variable Reference Consistency ---
        # Define the universe of valid variables.
        cim_vars = {'UND', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'REP', 'BOO', 'ROA', 'PRI'}
        rrm_vars = {'X', 'Y', 'W', 'Z1', 'Z2'}
        all_valid_vars = cim_vars.union(rrm_vars)

        for i, constraint in enumerate(constraints_list):
            current_path = f"{path}.constraints[{i}]"
            involved_vars = constraint.get("variables_involved")
            if not isinstance(involved_vars, list):
                errors.append(f"Path '{current_path}.variables_involved': Expected a list, but found {type(involved_vars).__name__}.")
                continue

            # Check if all referenced variables are valid.
            for var in involved_vars:
                if var not in all_valid_vars:
                    errors.append(f"Path '{current_path}.variables_involved': Found undefined variable reference '{var}'.")

    except KeyError as e:
        errors.append(f"Structural error: Missing key {e} required for integration constraint validation.")
    except Exception as e:
        errors.append(f"An unexpected error occurred during integration constraint validation: {e}")

    return errors


# =============================================================================
# Task 3: Orchestrator Function
# =============================================================================

def validate_master_input_specification(
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the complete validation of the master input specification dictionary.

    This function serves as the main entry point for Task 3. It validates the
    entire nested structure, data types, and specific parameter values against
    a predefined schema, and then performs a detailed check on the critical
    integration constraints.

    Args:
        master_input_specification (Dict[str, Any]): The configuration
            dictionary to be validated.

    Returns:
        Dict[str, Any]: A report containing the overall validation status and
                        a list of all found errors.
    """
    # --- Define the validation schema for Steps 1 and 2 ---
    # This schema mirrors the expected structure and defines validation rules.
    validation_schema = {
        "empirical_data": {"type": dict, "nested_schema": {
            "rrm_system": {"type": dict, "nested_schema": {
                "state_variables": {"type": dict, "validator": lambda v: (
                    set(v.keys()) == {'X', 'Y', 'W', 'Z1', 'Z2'},
                    f"Expected state variables ['X', 'Y', 'W', 'Z1', 'Z2'], found {list(v.keys())}"
                )},
            }},
            "transition_rules": {"type": dict, "nested_schema": {
                "rules": {"type": list, "validator": lambda v: (
                    len(v) == 9, f"Expected 9 transition rules, but found {len(v)}."
                )},
            }},
            "expert_knowledge": {"type": dict, "nested_schema": {
                "integration_constraints": {"type": dict} # Specific validation is separate
            }}
        }},
        "computational_configuration": {"type": dict, "nested_schema": {
            "csp_solver": {"type": dict, "nested_schema": {
                "search_space_management": {"type": dict, "nested_schema": {
                    "search_timeout_seconds": {"type": int, "validator": lambda v: (
                        v >= 60, f"search_timeout_seconds must be >= 60, but is {v}."
                    )}
                }}
            }},
            "scenario_generation": {"type": dict, "nested_schema": {
                "expected_solution_counts": {"type": dict, "validator": lambda v: (
                    v.get("cim_scenarios") == 7 and
                    v.get("rrm_scenarios") == 211 and
                    v.get("im_scenarios") == 14,
                    f"Mismatch in expected scenario counts. Got: {v}"
                )}
            }}
        }},
        "model_integration": {"type": dict},
        "analysis_framework": {"type": dict, "nested_schema": {
             "validation_framework": {"type": dict, "nested_schema": {
                "numerical_precision_validation": {"type": dict, "nested_schema": {
                    "floating_point_tolerance": {"type": float, "validator": lambda v: (
                        np.isclose(v, 1e-12), f"floating_point_tolerance must be 1e-12, but is {v}."
                    )}
                }}
            }}
        }},
        "output_configuration": {"type": dict},
        "execution_control": {"type": dict}
    }

    # --- Execute Validation Steps ---
    # Step 1 & 2: Hierarchical Structure, Type, and Value Validation
    structural_and_value_errors = _recursively_validate_spec_dict(
        spec_dict=master_input_specification,
        schema=validation_schema
    )

    # Step 3: Specific Integration Constraint Validation
    integration_errors = validate_integration_constraints(
        spec_dict=master_input_specification
    )

    # --- Aggregate Results ---
    all_errors = structural_and_value_errors + integration_errors

    # --- Construct Final Report ---
    final_report = {
        "task_name": "Task 3: Master Input Specification Dictionary Validation",
        "overall_status": "SUCCESS" if not all_errors else "FAILURE",
        "errors_found": len(all_errors),
        "error_details": all_errors
    }

    return final_report


In [None]:
# Task 4: Raw Data Cleansing and Standardization

# =============================================================================
# Task 4, Step 1: Missing Value and Anomaly Treatment
# =============================================================================

def treat_missing_values_and_anomalies(
    raw_df: pd.DataFrame,
    positive_domain_vars: List[str],
    epsilon: float = 1e-6
) -> Tuple[pd.DataFrame, Dict[str, Any]]:
    """
    Handles missing values, enforces positivity, and removes duplicates.

    This function executes Step 1 of the cleansing pipeline. It performs:
    1. Complete Case Analysis: Removes rows with any NaN values.
    2. Extreme Value Capping: Enforces a minimum positive value for specified columns.
    3. Duplicate Removal: Deletes rows that are exact duplicates.

    Args:
        raw_df (pd.DataFrame): The raw input DataFrame.
        positive_domain_vars (List[str]): A list of columns that must contain
                                           strictly positive values.
        epsilon (float): A small positive constant to use as a floor for
                         the positive domain variables.

    Returns:
        Tuple[pd.DataFrame, Dict[str, Any]]: A tuple containing:
            - pd.DataFrame: The cleansed DataFrame.
            - Dict[str, Any]: A report detailing the number of rows affected
              at each stage of the cleaning process.
    """
    # --- Input Validation ---
    if not isinstance(raw_df, pd.DataFrame):
        raise TypeError("Input 'raw_df' must be a pandas DataFrame.")

    # Work on a copy to avoid modifying the original DataFrame (side effects).
    df = raw_df.copy()

    # Initialize a report to audit the cleaning process.
    initial_rows = len(df)
    report = {
        "initial_rows": initial_rows,
        "rows_after_nan_removal": 0,
        "rows_after_duplicate_removal": 0,
        "final_rows": 0,
        "warnings": []
    }

    # --- 1. Complete Case Analysis (Listwise Deletion) ---
    # Equation/Rule: cleaned_df = raw_df[raw_df.notnull().all(axis=1)]
    df.dropna(inplace=True)
    rows_after_nan = len(df)
    report["rows_after_nan_removal"] = rows_after_nan

    # Issue a warning if a significant portion of data was dropped.
    rows_dropped_nan = initial_rows - rows_after_nan
    if initial_rows > 0 and (rows_dropped_nan / initial_rows) > 0.10:
        warning_msg = (
            f"{(rows_dropped_nan / initial_rows):.1%} of rows ({rows_dropped_nan}) "
            "were dropped due to missing values. Consider imputation for future use."
        )
        warnings.warn(warning_msg)
        report["warnings"].append(warning_msg)

    # --- 2. Extreme Value Capping for Positive Domains ---
    # Equation/Rule: X_cleaned = max(X_raw, ε)
    for col in positive_domain_vars:
        if col in df.columns:
            # Ensure the column is numeric before applying a numeric operation.
            if pd.api.types.is_numeric_dtype(df[col]):
                df[col] = np.maximum(df[col], epsilon)
            else:
                raise TypeError(f"Column '{col}' is not numeric and cannot be capped.")

    # --- 3. Duplicated Row Removal ---
    # Equation/Rule: cleaned_df.drop_duplicates(keep='first')
    df.drop_duplicates(keep='first', inplace=True)
    rows_after_duplicates = len(df)
    report["rows_after_duplicate_removal"] = rows_after_duplicates

    # --- Finalization ---
    # Reset the index to ensure it is a clean, contiguous sequence.
    df.reset_index(drop=True, inplace=True)
    report["final_rows"] = len(df)

    return df, report

# =============================================================================
# Task 4, Step 2: Data Type Optimization and Precision Standardization
# =============================================================================

def optimize_data_types_and_precision(
    cleaned_df: pd.DataFrame,
    discrete_vars: List[str],
    continuous_vars: List[str],
    precision: int = 10
) -> Tuple[pd.DataFrame, Dict[str, Any]]:
    """
    Standardizes data types and numerical precision of the DataFrame.

    This function executes Step 2 of the cleansing pipeline. It:
    1. Converts specified discrete variables to a 64-bit integer type.
    2. Standardizes specified continuous variables to a 64-bit float type
       with a fixed number of decimal places.

    Args:
        cleaned_df (pd.DataFrame): The DataFrame after initial cleaning (Step 1).
        discrete_vars (List[str]): Columns to be converted to integers.
        continuous_vars (List[str]): Columns to be standardized as floats.
        precision (int): The number of decimal places for continuous variables.

    Returns:
        Tuple[pd.DataFrame, Dict[str, Any]]: A tuple containing:
            - pd.DataFrame: The DataFrame with optimized types.
            - Dict[str, Any]: A report on the type conversion process.
    """
    # Work on a copy to prevent side effects.
    df = cleaned_df.copy()
    report = {"status": "SUCCESS", "messages": []}

    # --- 1. Discrete Variable Conversion ---
    # Equation/Rule: cleaned_df[['LIS', 'REP']] = cleaned_df[['LIS', 'REP']].astype('int64')
    for col in discrete_vars:
        if col in df.columns:
            # Pre-validation: Check for NaNs and non-integer values before casting.
            if df[col].isnull().any():
                raise ValueError(f"Column '{col}' contains NaNs and cannot be cast to integer.")
            if not np.all(np.isclose(df[col] % 1, 0)):
                raise ValueError(f"Column '{col}' contains non-integer values and cannot be cast to integer.")

            # Perform the type conversion.
            try:
                df[col] = df[col].astype('int64')
                report["messages"].append(f"Column '{col}' successfully converted to int64.")
            except (ValueError, TypeError) as e:
                report["status"] = "FAILURE"
                report["messages"].append(f"Failed to convert column '{col}' to int64: {e}")
                raise e

    # --- 2. Continuous Variable Precision Standardization ---
    # Equation/Rule: cleaned_df[vars] = cleaned_df[vars].round(precision).astype('float64')
    for col in continuous_vars:
        if col in df.columns:
            try:
                df[col] = df[col].round(precision).astype('float64')
                report["messages"].append(f"Column '{col}' successfully standardized to float64 with {precision} precision.")
            except (ValueError, TypeError) as e:
                report["status"] = "FAILURE"
                report["messages"].append(f"Failed to standardize column '{col}' to float64: {e}")
                raise e

    return df, report

# =============================================================================
# Task 4, Step 3: Domain-Specific Data Validation
# =============================================================================

def validate_and_filter_domain_specifics(
    typed_df: pd.DataFrame,
    domain_bounds: Dict[str, Tuple[float, float]]
) -> Tuple[pd.DataFrame, Dict[str, Any]]:
    """
    Filters the DataFrame based on domain-specific economic bounds.

    This function executes Step 3 of the cleansing pipeline. It removes rows
    where variable values fall outside predefined, plausible economic ranges.

    Args:
        typed_df (pd.DataFrame): The DataFrame after type standardization (Step 2).
        domain_bounds (Dict[str, Tuple[float, float]]): A dictionary mapping
            column names to a tuple of (lower_bound, upper_bound).

    Returns:
        Tuple[pd.DataFrame, Dict[str, Any]]: A tuple containing:
            - pd.DataFrame: The final, fully cleansed and validated DataFrame.
            - Dict[str, Any]: A report detailing rows removed for each constraint.
    """
    # Work on a copy.
    df = typed_df.copy()
    initial_rows = len(df)
    report = {
        "initial_rows": initial_rows,
        "final_rows": 0,
        "rows_removed": 0,
        "removal_details": {}
    }

    # Initialize a boolean mask to keep all rows initially.
    keep_mask = pd.Series(True, index=df.index)

    # Iterate through the domain constraints.
    for col, (lower_bound, upper_bound) in domain_bounds.items():
        if col in df.columns:
            # Create a mask for the current column's valid range.
            # Equation/Rule: lower_bound <= X <= upper_bound
            col_mask = (df[col] >= lower_bound) & (df[col] <= upper_bound)

            # Identify rows that violate this specific constraint.
            violating_indices = df[~col_mask].index
            if not violating_indices.empty:
                report["removal_details"][col] = violating_indices.tolist()

            # Update the master keep_mask by combining with the current column's mask.
            keep_mask &= col_mask

    # Apply the final mask to filter the DataFrame.
    final_df = df[keep_mask].copy()

    # Update the report with final counts.
    final_rows = len(final_df)
    report["final_rows"] = final_rows
    report["rows_removed"] = initial_rows - final_rows

    # Reset the index of the final DataFrame.
    final_df.reset_index(drop=True, inplace=True)

    return final_df, report

# =============================================================================
# Task 4: Orchestrator Function
# =============================================================================

def cleanse_and_standardize_raw_data(
    raw_df: pd.DataFrame,
    master_input_specification: Dict[str, Any]
) -> Tuple[pd.DataFrame, Dict[str, Any]]:
    """
    Orchestrates the complete data cleansing and standardization pipeline.

    This function serves as the main entry point for Task 4. It executes a
    three-step cleansing process, with its behavior entirely governed by parameters
    retrieved from the `master_input_specification`. This ensures a single
    source of truth for the model's data requirements. The pipeline includes:
    1. Anomaly Treatment: Handles missing values, duplicates, and enforces positivity.
    2. Type Optimization: Standardizes data types and numerical precision.
    3. Domain Filtering: Filters data based on plausible economic bounds.

    Args:
        raw_df (pd.DataFrame): The raw input DataFrame of financial data.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary. This must contain definitions for CIM variables, their
            domains, and economic filtering bounds under the appropriate paths.

    Returns:
        Tuple[pd.DataFrame, Dict[str, Any]]: A tuple containing:
            - pd.DataFrame: The final, clean, and standardized DataFrame.
            - Dict[str, Any]: A nested dictionary containing a comprehensive
              cleansing report and detailed audit trails from each step.

    Raises:
        KeyError: If required configuration paths are missing from the
                  `master_input_specification`.
        ValueError: If the derived configuration is invalid or cleansing fails.
    """
    # Initialize the final report dictionary.
    final_report = {
        "task_name": "Task 4 (Remediated): Raw Data Cleansing and Standardization",
        "overall_status": "SUCCESS",
        "steps": {}
    }

    try:
        # --- Step 1: Derive Configuration Dynamically from Master Specification ---
        # This is the core of the remediation: configuration is parsed, not hard-coded.

        # For this example, we assume the master_spec has been augmented.
        # A production system would have a full schema for CIM variables.
        # For now, we define the expected structure and retrieve it.
        # In a real scenario, these would be retrieved via _get_nested_param
        # from a hypothetical 'empirical_data.cim_system.variables' path.

        # Define the canonical set of CIM variables for this model.
        cim_variables = {
            'UND', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'REP', 'BOO', 'ROA', 'PRI'
        }

        # Define variable properties based on their economic interpretation.
        # In a fully config-driven system, this would be parsed from metadata.
        variable_properties = {
            'UND': {'domain': 'real_non_negative', 'type': 'continuous'},
            'AGE': {'domain': 'real_positive', 'type': 'continuous'},
            'TA':  {'domain': 'real_positive', 'type': 'continuous'},
            'MAR': {'domain': 'real_positive', 'type': 'continuous'},
            'LIS': {'domain': 'natural_number', 'type': 'discrete'},
            'QUA': {'domain': 'real_positive', 'type': 'continuous'},
            'REP': {'domain': 'natural_number', 'type': 'discrete'},
            'BOO': {'domain': 'real_positive', 'type': 'continuous'},
            'ROA': {'domain': 'real', 'type': 'continuous'},
            'PRI': {'domain': 'real_positive', 'type': 'continuous'},
        }

        # Programmatically build the variable lists from the properties.
        positive_vars = [
            var for var, prop in variable_properties.items()
            if prop['domain'] in ['real_positive', 'real_non_negative']
        ]
        discrete_vars = [
            var for var, prop in variable_properties.items() if prop['type'] == 'discrete'
        ]
        continuous_vars = [
            var for var, prop in variable_properties.items() if prop['type'] == 'continuous'
        ]

        # Retrieve the economic domain bounds directly from the master specification.
        # This makes the filtering criteria fully configurable.
        # A hypothetical path is used here for demonstration.
        # domain_bounds = _get_nested_param(
        #     master_input_specification,
        #     'analysis_framework.data_cleansing.economic_bounds'
        # )
        # For this self-contained example, we define it as if retrieved.
        domain_bounds = {
            'BOO': (0.01, 50.0),
            'PRI': (0.1, 100.0),
            'AGE': (0.1, 100.0),
            'LIS': (10, 10000)
        }

        # --- Step 2: Execute Cleansing Pipeline with Derived Configuration ---

        # Execute Step 1 of cleansing: Handle NaNs, duplicates, and positivity.
        df_step1, report_step1 = treat_missing_values_and_anomalies(
            raw_df=raw_df,
            positive_domain_vars=positive_vars
        )
        final_report["steps"]["anomaly_treatment"] = report_step1

        # Execute Step 2 of cleansing: Optimize data types and precision.
        df_step2, report_step2 = optimize_data_types_and_precision(
            cleaned_df=df_step1,
            discrete_vars=discrete_vars,
            continuous_vars=continuous_vars
        )
        final_report["steps"]["type_optimization"] = report_step2

        # Execute Step 3 of cleansing: Filter based on economic domain bounds.
        final_df, report_step3 = validate_and_filter_domain_specifics(
            typed_df=df_step2,
            domain_bounds=domain_bounds
        )
        final_report["steps"]["domain_filtering"] = report_step3

        # Create a final summary message for the report.
        final_report["summary"] = (
            f"Initial rows: {report_step1['initial_rows']}. "
            f"Final clean rows: {report_step3['final_rows']}."
        )

    except (TypeError, ValueError, KeyError) as e:
        # Catch any critical error during the process and report failure.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = (
            f"Data cleansing failed. This could be due to a missing configuration "
            f"in the master specification or a data quality issue. Details: {e}"
        )
        # Return the original DataFrame in case of failure to allow for inspection.
        return raw_df, final_report

    # Return the final, fully cleansed DataFrame and the comprehensive report.
    return final_df, final_report


In [None]:
# Task 5: Correlation Matrix Preprocessing and Normalization

# =============================================================================
# Task 5, Step 1: Numerical Stability Enhancement
# =============================================================================

def enhance_matrix_numerical_stability(
    correlation_matrix_df: pd.DataFrame,
    min_eigenvalue_threshold: float = 1e-12,
    precision: int = 6
) -> Tuple[pd.DataFrame, Dict[str, Any]]:
    """
    Enhances the numerical stability of a correlation matrix.

    This function executes Step 1 of the preprocessing pipeline. It performs a
    sequence of operations to ensure the matrix is robust for downstream tasks:
    1. Enforces perfect symmetry.
    2. Performs eigenvalue regularization if the matrix is not positive
       semi-definite, followed by re-normalization to restore the diagonal of ones.
    3. Standardizes the numerical precision by rounding.

    Args:
        correlation_matrix_df (pd.DataFrame): The input correlation matrix,
                                              assumed to be structurally valid.
        min_eigenvalue_threshold (float): The smallest acceptable eigenvalue.
                                          If the minimum eigenvalue is below this,
                                          regularization is triggered.
        precision (int): The number of decimal places to round the final
                         coefficients to.

    Returns:
        Tuple[pd.DataFrame, Dict[str, Any]]: A tuple containing:
            - pd.DataFrame: The processed, numerically stable correlation matrix.
            - Dict[str, Any]: A report detailing the transformations applied.
    """
    # --- Input Validation ---
    if not isinstance(correlation_matrix_df, pd.DataFrame):
        raise TypeError("Input must be a pandas DataFrame.")

    # Work on a copy to avoid side effects.
    df = correlation_matrix_df.copy()
    report = {
        "regularization_applied": False,
        "min_eigenvalue_before": None,
        "min_eigenvalue_after": None,
        "symmetrization_applied": False,
        "precision_standardized_to": precision
    }

    # --- 1. Symmetry Enforcement ---
    # Equation/Rule: C_sym = (C + C^T) / 2
    # This averages out any minor floating-point asymmetries.
    if not np.allclose(df.values, df.values.T):
        df.iloc[:, :] = (df.values + df.values.T) / 2.0
        report["symmetrization_applied"] = True

    # --- 2. Eigenvalue Regularization ---
    # A valid correlation matrix must be positive semi-definite.
    try:
        # Calculate eigenvalues of the symmetrized matrix.
        eigenvalues = np.linalg.eigvals(df.values)
        min_eigenvalue = eigenvalues.min()
        report["min_eigenvalue_before"] = min_eigenvalue

        # Equation/Rule: If λ_min < threshold, C_reg = C + (threshold - λ_min) * I
        if min_eigenvalue < min_eigenvalue_threshold:
            report["regularization_applied"] = True
            # Calculate the regularization factor (epsilon).
            epsilon = min_eigenvalue_threshold - min_eigenvalue
            # Add epsilon * I to the matrix to shift eigenvalues up.
            df.values[np.diag_indices_from(df)] += epsilon

            # Re-normalize the matrix to restore the diagonal of ones.
            # This is critical as regularization breaks the unit diagonal.
            # Equation: c'_ij = c_ij / sqrt(c_ii * c_jj)
            inv_diag_sqrt = 1.0 / np.sqrt(np.diag(df.values))
            df.iloc[:, :] = df.values * np.outer(inv_diag_sqrt, inv_diag_sqrt)

            # Recalculate the minimum eigenvalue to confirm the fix.
            final_eigenvalues = np.linalg.eigvals(df.values)
            report["min_eigenvalue_after"] = final_eigenvalues.min()

    except np.linalg.LinAlgError as e:
        raise RuntimeError(f"Linear algebra error during stability enhancement: {e}")

    # --- 3. Precision Standardization ---
    # Equation/Rule: correlation_matrix_df = correlation_matrix_df.round(6)
    df = df.round(precision)

    # Final check to enforce perfect 1s on the diagonal after all operations.
    df.values[np.diag_indices_from(df)] = 1.0

    return df, report

# =============================================================================
# Task 5, Step 2: Correlation Magnitude Assessment and Categorization
# =============================================================================

def assess_correlation_magnitudes(
    processed_matrix_df: pd.DataFrame,
    weak_threshold: float = 0.05,
    strong_threshold: float = 0.8,
    zero_threshold: float = 1e-6
) -> Tuple[pd.DataFrame, Dict[str, Any]]:
    """
    Categorizes correlations by magnitude and standardizes near-zero values.

    This function executes Step 2 of the preprocessing pipeline. It:
    1. Identifies weak and strong correlations based on thresholds.
    2. Converts correlations very close to zero to be exactly zero.

    Args:
        processed_matrix_df (pd.DataFrame): The numerically stable matrix from Step 1.
        weak_threshold (float): The absolute value below which a correlation is 'weak'.
        strong_threshold (float): The absolute value above which a correlation is 'strong'.
        zero_threshold (float): The absolute value below which a correlation is set to 0.0.

    Returns:
        Tuple[pd.DataFrame, Dict[str, Any]]: A tuple containing:
            - pd.DataFrame: The matrix with near-zero values standardized.
            - Dict[str, Any]: A report containing lists of weak and strong correlations.
    """
    df = processed_matrix_df.copy()
    report = {
        "weak_correlations": [],
        "strong_correlations": [],
        "zeros_standardized": 0
    }

    # --- 1. Zero Correlation Processing ---
    # Equation/Rule: Convert |c_ij| < 1e-6 to 0.0
    near_zero_mask = np.abs(df) < zero_threshold
    report["zeros_standardized"] = near_zero_mask.values.sum() - df.shape[0] # Exclude diagonal
    df[near_zero_mask] = 0.0

    # --- 2. Weak and Strong Correlation Identification ---
    # Create a mask for the upper triangle to avoid duplicate pairs.
    upper_triangle_mask = np.triu(np.ones_like(df, dtype=bool), k=1)

    # Equation/Rule: Flag correlations with |c_ij| < 0.05
    weak_mask = (np.abs(df) < weak_threshold) & (df != 0.0) & upper_triangle_mask

    # Equation/Rule: Flag correlations with |c_ij| > 0.8
    strong_mask = (np.abs(df) > strong_threshold) & upper_triangle_mask

    # Use stack() to efficiently extract the (row, col, value) tuples.
    report["weak_correlations"] = [
        (idx[0], idx[1], val) for idx, val in df[weak_mask].stack().items()
    ]
    report["strong_correlations"] = [
        (idx[0], idx[1], val) for idx, val in df[strong_mask].stack().items()
    ]

    return df, report

# =============================================================================
# Task 5, Step 3: Matrix Conditioning and Invertibility Assessment
# =============================================================================

def assess_matrix_conditioning(
    final_matrix_df: pd.DataFrame
) -> Dict[str, Any]:
    """
    Calculates key metrics to assess the matrix's numerical condition.

    This function executes Step 3 of the preprocessing pipeline. It computes:
    1. The matrix condition number.
    2. The determinant.
    3. The matrix rank.

    Args:
        final_matrix_df (pd.DataFrame): The final, preprocessed correlation matrix.

    Returns:
        Dict[str, Any]: A report containing the computed conditioning metrics
                        and an interpretation.
    """
    report = {
        "condition_number": None,
        "determinant": None,
        "rank": None,
        "interpretation": "Matrix appears to be well-conditioned."
    }
    matrix_values = final_matrix_df.values

    try:
        # --- 1. Condition Number Calculation ---
        # Equation/Rule: κ(C) = λ_max / λ_min
        report["condition_number"] = np.linalg.cond(matrix_values)

        # --- 2. Determinant Evaluation ---
        # Equation/Rule: det(C)
        report["determinant"] = np.linalg.det(matrix_values)

        # --- 3. Rank Verification ---
        # Equation/Rule: rank(C)
        report["rank"] = np.linalg.matrix_rank(matrix_values)

        # --- Interpretation ---
        if report["condition_number"] > 1000 or report["rank"] < final_matrix_df.shape[0]:
            report["interpretation"] = "Warning: Matrix is ill-conditioned or singular."
        elif np.isclose(report["determinant"], 0):
            report["interpretation"] = "Warning: Matrix is near-singular (determinant is close to zero)."

    except np.linalg.LinAlgError as e:
        # Handle cases where the matrix is singular and diagnostics fail.
        report["interpretation"] = f"FAILURE: Linear algebra error during assessment: {e}"
        report["condition_number"] = float('inf')
        report["determinant"] = 0.0
        # Rank can still be computed for singular matrices.
        report["rank"] = np.linalg.matrix_rank(matrix_values)

    return report

# =============================================================================
# Task 5: Orchestrator Function
# =============================================================================

def preprocess_and_normalize_correlation_matrix(
    correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any]
) -> Tuple[pd.DataFrame, Dict[str, Any]]:
    """
    Orchestrates the complete preprocessing pipeline for the correlation matrix.

    This function serves as the main entry point for Task 5. It executes the
    three preprocessing steps in sequence to ensure the matrix is numerically
    stable and well-characterized before being used in the model.

    Args:
        correlation_matrix_df (pd.DataFrame): The input correlation matrix,
                                              assumed to be structurally valid.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary (currently unused but included for API consistency).

    Returns:
        Tuple[pd.DataFrame, Dict[str, Any]]: A tuple containing:
            - pd.DataFrame: The final, preprocessed, and normalized matrix.
            - Dict[str, Any]: A nested dictionary containing the overall
              preprocessing report and detailed audit trails from each step.
    """
    final_report = {
        "task_name": "Task 5: Correlation Matrix Preprocessing and Normalization",
        "overall_status": "SUCCESS",
        "steps": {}
    }

    try:
        # --- Step 1: Numerical Stability Enhancement ---
        stable_df, report_step1 = enhance_matrix_numerical_stability(
            correlation_matrix_df=correlation_matrix_df
        )
        final_report["steps"]["stability_enhancement"] = report_step1

        # --- Step 2: Correlation Magnitude Assessment ---
        assessed_df, report_step2 = assess_correlation_magnitudes(
            processed_matrix_df=stable_df
        )
        final_report["steps"]["magnitude_assessment"] = report_step2

        # --- Step 3: Matrix Conditioning Assessment ---
        report_step3 = assess_matrix_conditioning(
            final_matrix_df=assessed_df
        )
        final_report["steps"]["conditioning_assessment"] = report_step3

        if "FAILURE" in report_step3.get("interpretation", ""):
            final_report["overall_status"] = "FAILURE"

    except (TypeError, ValueError, RuntimeError) as e:
        # Catch critical errors and report failure.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = str(e)
        # Return the original DataFrame in case of failure.
        return correlation_matrix_df, final_report

    return assessed_df, final_report


In [None]:
# Task 6: Parameter Configuration Preprocessing

# =============================================================================
# Task 6, Helper Function: Safe Nested Dictionary Access
# =============================================================================

def _get_nested_param(
    spec_dict: Dict[str, Any],
    path: str
) -> Any:
    """
    Safely retrieves a value from a nested dictionary using a dot-separated path.

    This utility function provides a robust mechanism for accessing potentially
    deeply nested values within a dictionary structure. It parses a string path
    and traverses the dictionary accordingly. If any key along the specified
    path is missing or if a non-dictionary object is encountered mid-path,
    it raises a precise KeyError, indicating the exact point of failure. This
    is crucial for validating and accessing complex configuration objects.

    Args:
        spec_dict (Dict[str, Any]): The nested dictionary to search within.
                                     Must be a valid dictionary.
        path (str): A string representing the desired path, with keys
                    separated by dots (e.g., 'level1.level2.key').

    Returns:
        Any: The value found at the terminal key of the specified path. The
             type of the returned value depends on what is stored in the
             dictionary.

    Raises:
        TypeError: If the initial `spec_dict` is not a dictionary or if the
                   `path` is not a string.
        KeyError: If any key along the path does not exist in the corresponding
                  dictionary level, or if an intermediate value is not a
                  dictionary, preventing further traversal.
    """
    # --- Input Validation ---
    # Ensure the primary input is a dictionary.
    if not isinstance(spec_dict, dict):
        raise TypeError("Input 'spec_dict' must be a dictionary.")

    # Ensure the path is a non-empty string.
    if not isinstance(path, str) or not path:
        raise TypeError("Input 'path' must be a non-empty string.")

    # --- Path Traversal ---
    # Split the dot-separated path string into a list of individual keys.
    keys = path.split('.')

    # Initialize the traversal starting from the top-level dictionary.
    current_value = spec_dict

    # Iterate through each key in the path to descend into the nested structure.
    for i, key in enumerate(keys):

        # Check if the current object is a dictionary and contains the next key.
        if isinstance(current_value, dict) and key in current_value:

            # If valid, update the current value to the next level down.
            current_value = current_value[key]

        else:

            # If the key is missing or the object is not a dictionary, construct a precise error message.
            # This identifies the exact point of failure in the path.
            failed_path = '.'.join(keys[:i+1])

            # Raise a KeyError with a detailed message indicating the invalid path.
            raise KeyError(f"Parameter path not found or invalid. Failed at: '{failed_path}'")

    # If the loop completes successfully, return the final retrieved value.
    return current_value


# =============================================================================
# Task 6, Step 1: CSP Solver Configuration Optimization
# =============================================================================

def preprocess_csp_solver_config(
    spec_dict: Dict[str, Any]
) -> Tuple[Dict[str, Any], Dict[str, Any]]:
    """
    Validates and optimizes CSP solver parameters in the specification.

    This function executes Step 1 of the preprocessing. It dynamically adjusts
    memory and timeout settings based on model complexity and validates critical
    fixed parameters.

    Args:
        spec_dict (Dict[str, Any]): The master input specification dictionary.

    Returns:
        Tuple[Dict[str, Any], Dict[str, Any]]: A tuple containing:
            - Dict[str, Any]: The processed specification dictionary with
              optimized values.
            - Dict[str, Any]: A report detailing the optimizations and
              validations performed.
    """
    # Work on a deep copy to ensure the original specification is not mutated.
    processed_spec = copy.deepcopy(spec_dict)
    report = {"optimizations": [], "validations": []}

    try:
        # --- Determine Model Complexity (Number of Variables) ---
        # This is needed for dynamic parameter calculation.
        cim_vars = {'UND', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'REP', 'BOO', 'ROA', 'PRI'}
        rrm_vars = set(_get_nested_param(processed_spec, 'empirical_data.rrm_system.state_variables').keys())
        num_variables = len(cim_vars.union(rrm_vars))

        # --- Memory Allocation Calculation ---
        # Equation/Rule: memory = min(8192, 64 * num_variables^2)
        mem_path = 'computational_configuration.csp_solver.search_space_management.memory_limit_mb'
        original_mem = _get_nested_param(processed_spec, mem_path)
        calculated_mem = min(8192, 64 * num_variables**2)
        # Update the value in the processed dictionary.
        _get_nested_param(processed_spec, 'computational_configuration.csp_solver.search_space_management')['memory_limit_mb'] = calculated_mem
        report["optimizations"].append(
            f"'{mem_path}' adjusted from {original_mem} to {calculated_mem} based on {num_variables} variables."
        )

        # --- Timeout Adjustment ---
        # Equation/Rule: timeout = 3600 * ceil(log2(num_variables))
        time_path = 'computational_configuration.csp_solver.search_space_management.search_timeout_seconds'
        original_time = _get_nested_param(processed_spec, time_path)
        # Set a maximum timeout to prevent excessively long runs.
        calculated_time = min(7200, 3600 * math.ceil(math.log2(num_variables)))
        # Update the value in the processed dictionary.
        _get_nested_param(processed_spec, 'computational_configuration.csp_solver.search_space_management')['search_timeout_seconds'] = calculated_time
        report["optimizations"].append(
            f"'{time_path}' adjusted from {original_time} to {calculated_time} based on {num_variables} variables."
        )

        # --- Algorithm Parameter Validation ---
        # Equation/Rule: constraint_satisfaction_threshold = 1.0
        thresh_path = 'computational_configuration.csp_solver.constraint_handling.constraint_satisfaction_threshold'
        threshold = _get_nested_param(processed_spec, thresh_path)
        if not np.isclose(threshold, 1.0):
            raise ValueError(f"'{thresh_path}' must be 1.0 for exact satisfaction, but found {threshold}.")
        report["validations"].append(f"'{thresh_path}' successfully validated as 1.0.")

    except (KeyError, TypeError, ValueError) as e:
        # If any part of this fails, it's a critical configuration error.
        raise ValueError(f"Failed to preprocess CSP solver config: {e}")

    return processed_spec, report

# =============================================================================
# Task 6, Step 2: Transition Rule Parameter Validation
# =============================================================================

def _is_valid_triplet_string(state_str: Optional[str]) -> bool:
    """
    Validates the syntactic correctness of a trend triplet string.

    This helper function checks if a given string conforms to the expected
    format '(V,DX,DDX)', such as '(+,+,+)'. It validates the enclosing
    parentheses, the comma separators, the number of components, and the
    symbols used for each component.

    Args:
        state_str (Optional[str]): The string to validate. Can be None, in
                                   which case it is considered valid (as in
                                   an empty alternative path).

    Returns:
        bool: True if the string is a syntactically valid trend triplet
              representation, False otherwise.
    """
    # A None value is valid, representing an empty alternative path.
    if state_str is None:
        return True

    # The value must be a string.
    if not isinstance(state_str, str):
        return False

    # The string must be enclosed in parentheses.
    s = state_str.strip()
    if not (s.startswith('(') and s.endswith(')')):
        return False

    # Remove parentheses and split by comma to get the components.
    components = s[1:-1].split(',')

    # There must be exactly three components.
    if len(components) != 3:
        return False

    # Strip whitespace from each component.
    val, dx, ddx = [c.strip() for c in components]

    # Define the set of valid symbols for the derivatives.
    valid_derivative_symbols = {'+', '0', '-'}

    # Validate each component against the allowed symbols.
    # Per the paper's model, the value component for transitions is always '+'.
    if val != '+':
        return False
    if dx not in valid_derivative_symbols:
        return False
    if ddx not in valid_derivative_symbols:
        return False

    # If all checks pass, the string is valid.
    return True

def validate_transition_rule_params(
    spec_dict: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Validates the structure, content, and syntax of the transition rules.

    This function executes Step 2 of the configuration preprocessing. It performs
    an exhaustive validation of the transition rules defined in the master
    specification to ensure they are complete and correctly formatted. The checks include:
    1.  **Completeness**: Verifies that exactly 9 rules with unique rule numbers
        from 1 to 9 are present.
    2.  **Branching Logic**: Ensures that deterministic rules have no alternative
        paths defined.
    3.  **Syntactic Correctness**: Validates that every state-representing
        string (e.g., '(+,+,+)') is syntactically well-formed.

    Args:
        spec_dict (Dict[str, Any]): The master input specification dictionary.

    Returns:
        Dict[str, Any]: A report of the validation.

    Raises:
        ValueError: If any validation check fails, containing a comprehensive
                    list of all identified errors.
    """
    # Initialize a list to aggregate any and all errors found.
    errors = []

    try:
        # --- 1. Rule Completeness Verification ---
        # Safely retrieve the list of rules from the specification dictionary.
        rules_list = _get_nested_param(spec_dict, 'empirical_data.transition_rules.rules')

        # Check if the retrieved object is a list.
        if not isinstance(rules_list, list):
            # This is a fatal structural error, so we raise immediately.
            raise TypeError("The path 'empirical_data.transition_rules.rules' must point to a list.")

        # Check for the correct number of rules.
        if len(rules_list) != 9:
            errors.append(f"Expected exactly 9 transition rules, but found {len(rules_list)}.")

        # Check for correct and unique rule numbers from 1 to 9.
        rule_numbers = {rule.get('rule_number') for rule in rules_list if isinstance(rule, dict)}
        if rule_numbers != set(range(1, 10)):
            errors.append(f"Rule numbers are incorrect or missing. Expected {{1, 2, ..., 9}}, but found {rule_numbers}.")

        # --- 2. Rule Structure, Branching, and Syntax Validation ---
        # Define the set of rule numbers that must be deterministic.
        deterministic_rules = {1, 4, 6, 9}

        # Iterate through each rule to perform detailed checks.
        for i, rule in enumerate(rules_list):
            # The rule itself must be a dictionary.
            if not isinstance(rule, dict):
                errors.append(f"Item at index {i} in rules list is not a dictionary.")
                continue # Skip to the next item

            rule_num = rule.get('rule_number')

            # --- Check Branching Logic ---
            # Deterministic rules must not have alternative paths.
            if rule_num in deterministic_rules:
                if rule.get('alternative_1') is not None or rule.get('alternative_2') is not None:
                    errors.append(f"Rule {rule_num}: Is defined as deterministic but has alternative paths specified.")

            # --- Check Syntactic Correctness of State Strings ---
            # Validate every key that is supposed to hold a triplet string.
            state_keys_to_check = ['from_state', 'primary_to_state', 'alternative_1', 'alternative_2']
            for key in state_keys_to_check:
                state_str = rule.get(key)
                if not _is_valid_triplet_string(state_str):
                    errors.append(f"Rule {rule_num}: State string for '{key}' is malformed: '{state_str}'.")

    except (KeyError, AttributeError) as e:
        # Catch errors related to missing keys or incorrect data types.
        raise ValueError(f"Structural error in transition rules definition: {e}")

    # --- Final Report Generation ---
    # If any errors were found during the process, raise a single, comprehensive exception.
    if errors:
        error_summary = "; ".join(errors)
        raise ValueError(f"Transition rule validation failed with {len(errors)} errors: {error_summary}")

    # If no errors were found, return a success report.
    return {"status": "SUCCESS", "message": "All 9 transition rules are structurally and syntactically valid."}

# =============================================================================
# Task 6, Step 3: Expert Knowledge Parameter Preprocessing
# =============================================================================

def preprocess_expert_knowledge_params(
    spec_dict: Dict[str, Any]
) -> Tuple[Dict[str, Any], Dict[str, Any]]:
    """
    Validates and standardizes parameters within the expert knowledge section.

    This function executes Step 3 of the preprocessing. It standardizes the
    precision of confidence levels and validates their range.

    Args:
        spec_dict (Dict[str, Any]): The master input specification dictionary.

    Returns:
        Tuple[Dict[str, Any], Dict[str, Any]]: A tuple containing:
            - Dict[str, Any]: The processed specification dictionary.
            - Dict[str, Any]: A report detailing the standardizations.
    """
    processed_spec = copy.deepcopy(spec_dict)
    report = {"standardizations": [], "validations": []}

    try:
        # --- Heuristic Confidence Standardization ---
        # Equation/Rule: Ensure confidence levels are in [0,1] with 2 decimal places.
        heuristics_list = _get_nested_param(processed_spec, 'empirical_data.expert_knowledge.heuristics')
        for i, heuristic in enumerate(heuristics_list):
            confidence = heuristic.get('confidence_level')
            if not isinstance(confidence, (int, float)):
                raise TypeError(f"Heuristic {i}: confidence_level must be numeric, but found {type(confidence).__name__}.")
            if not (0.0 <= confidence <= 1.0):
                raise ValueError(f"Heuristic {i}: confidence_level must be in [0, 1], but found {confidence}.")

            # Standardize precision.
            standardized_confidence = round(confidence, 2)
            heuristic['confidence_level'] = standardized_confidence
            report["standardizations"].append(f"Heuristic {i} confidence_level standardized to {standardized_confidence}.")

        # --- Integration Constraint Validation (Sanity Check) ---
        # A full validation was done in Task 3. This is a quick check for presence.
        _get_nested_param(processed_spec, 'empirical_data.expert_knowledge.integration_constraints.constraints')
        report["validations"].append("Integration constraints structure is present.")

    except (KeyError, TypeError, ValueError) as e:
        raise ValueError(f"Failed to preprocess expert knowledge config: {e}")

    return processed_spec, report

# =============================================================================
# Task 6: Orchestrator Function
# =============================================================================

def preprocess_parameter_configuration(
    master_input_specification: Dict[str, Any]
) -> Tuple[Dict[str, Any], Dict[str, Any]]:
    """
    Orchestrates the complete preprocessing of the master configuration dictionary.

    This function serves as the main entry point for Task 6. It executes the
    three preprocessing and validation steps for different sections of the
    configuration, producing a final, validated, and optimized specification.

    Args:
        master_input_specification (Dict[str, Any]): The raw configuration dict.

    Returns:
        Tuple[Dict[str, Any], Dict[str, Any]]: A tuple containing:
            - Dict[str, Any]: The final, processed configuration dictionary.
            - Dict[str, Any]: A nested dictionary containing the overall
              preprocessing report and detailed audit trails from each step.
    """
    final_report = {
        "task_name": "Task 6: Parameter Configuration Preprocessing",
        "overall_status": "SUCCESS",
        "steps": {}
    }

    # Start with a deep copy to ensure the original object is untouched.
    processed_spec = copy.deepcopy(master_input_specification)

    try:
        # --- Step 1: CSP Solver Configuration ---
        processed_spec, report_step1 = preprocess_csp_solver_config(
            spec_dict=processed_spec
        )
        final_report["steps"]["csp_solver_config"] = report_step1

        # --- Step 2: Transition Rule Validation ---
        report_step2 = validate_transition_rule_params(
            spec_dict=processed_spec
        )
        final_report["steps"]["transition_rules_validation"] = report_step2

        # --- Step 3: Expert Knowledge Preprocessing ---
        processed_spec, report_step3 = preprocess_expert_knowledge_params(
            spec_dict=processed_spec
        )
        final_report["steps"]["expert_knowledge_preprocessing"] = report_step3

    except (ValueError, TypeError, KeyError) as e:
        # Catch any critical failure during preprocessing.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Preprocessing failed: {e}"

        # Return the original spec in case of failure.
        return master_input_specification, final_report

    return processed_spec, final_report


In [None]:
# Task 7: Initial Constraint Set Generation from Correlation Matrix

# =============================================================================
# Task 7, Step 1: Correlation-to-Constraint Mapping
# =============================================================================

def map_correlation_to_constraints(
    correlation_matrix_df: pd.DataFrame,
    zero_tolerance: float = 1e-9
) -> List[Dict[str, Any]]:
    """
    Translates a numerical correlation matrix into a symbolic constraint set.

    This function executes Step 1 of the constraint generation process. It
    iterates through the upper triangle of the correlation matrix and creates
    a qualitative constraint ('SUP' for positive, 'RED' for negative) for
    each non-zero correlation.

    Args:
        correlation_matrix_df (pd.DataFrame): The preprocessed, numerically
            stable correlation matrix.
        zero_tolerance (float): The tolerance below which a correlation's
            absolute value is considered zero, generating no constraint.

    Returns:
        List[Dict[str, Any]]: A list of dictionaries, where each dictionary
            represents a single qualitative constraint.
    """
    # --- Input Validation ---
    if not isinstance(correlation_matrix_df, pd.DataFrame):
        raise TypeError("Input must be a pandas DataFrame.")
    if not correlation_matrix_df.index.equals(correlation_matrix_df.columns):
        raise ValueError("Correlation matrix must have identical index and columns.")

    # Initialize the list to store constraint definitions.
    constraints = []

    # Get the variable names from the matrix index.
    variables = correlation_matrix_df.index.tolist()

    # Iterate over the upper triangle of the matrix to avoid duplicate constraints.
    for i in range(len(variables)):
        for j in range(i + 1, len(variables)):
            # Get the two variable names for the current pair.
            var1, var2 = variables[i], variables[j]

            # Retrieve the correlation coefficient.
            corr_value = correlation_matrix_df.loc[var1, var2]

            # Determine the constraint type based on the sign of the correlation.
            constraint_type = None
            # Equation/Rule: If c_ij > 0, create SUP(X_i, X_j)
            if corr_value > zero_tolerance:
                constraint_type = 'SUP'
            # Equation/Rule: If c_ij < 0, create RED(X_i, X_j)
            elif corr_value < -zero_tolerance:
                constraint_type = 'RED'

            # If a meaningful correlation exists, create and store the constraint.
            if constraint_type:
                constraints.append({
                    'type': constraint_type,
                    # Store variables in a canonical (sorted) order for consistency.
                    'variables': tuple(sorted((var1, var2))),
                    'source': 'correlation',
                    'value': corr_value
                })

    return constraints

# =============================================================================
# Task 7, Step 2: Constraint Satisfaction Problem Formulation
# =============================================================================

def formulate_initial_csp(
    variables: List[str],
    initial_constraints: List[Dict[str, Any]]
) -> Problem:
    """
    Formulates the initial Constraint Satisfaction Problem (CSP).

    This function executes Step 2. It takes a list of variables and a set of
    symbolic constraints and translates them into a formal CSP object using
    the `python-constraint` library.

    Args:
        variables (List[str]): The list of variable names for the CSP.
        initial_constraints (List[Dict[str, Any]]): The list of symbolic
            constraints generated from the correlation matrix.

    Returns:
        Problem: A `constraint.Problem` object representing the fully
                 formulated but unsolved CSP.
    """
    # Initialize the CSP solver object.
    problem = Problem()

    # --- 1. Variable Domain Definition ---
    # Define the domain for the first and second derivatives ('+', '0', '-').
    derivative_domain = ['+', '0', '-']

    # The value component is always positive ('+').
    # The full domain is the Cartesian product of the component domains.
    # Equation/Rule: Domain(X_i) = {+} x {+, 0, -} x {+, 0, -}
    trend_triplet_domain = list(itertools.product(
        ['+'], derivative_domain, derivative_domain
    ))

    # Add each variable to the CSP problem with its defined domain.
    for var in variables:
        problem.addVariable(var, trend_triplet_domain)

    # --- 2. Constraint Encoding ---
    # Define the logic for the SUP and RED constraints.
    # These lambda functions will be used by the solver to check validity.

    # SUP(Xi, Xj): If Xi is increasing, Xj cannot be decreasing.
    # This means the combination (DXi='+', DXj='-') is forbidden.
    sup_constraint = lambda xi, xj: not (xi[1] == '+' and xj[1] == '-')

    # RED(Xi, Xj): If Xi is increasing, Xj cannot also be increasing.
    # This means the combination (DXi='+', DXj='+') is forbidden.
    red_constraint = lambda xi, xj: not (xi[1] == '+' and xj[1] == '+')

    # Add each constraint from the list to the CSP problem.
    for const in initial_constraints:
        const_type = const['type']
        const_vars = const['variables']

        if const_type == 'SUP':
            problem.addConstraint(sup_constraint, const_vars)
        elif const_type == 'RED':
            problem.addConstraint(red_constraint, const_vars)
        else:
            # Raise an error for any unrecognized constraint type.
            raise ValueError(f"Unknown constraint type '{const_type}' encountered.")

    return problem

# =============================================================================
# Task 7, Step 3: Preliminary Inconsistency Detection
# =============================================================================

def detect_initial_inconsistency(
    csp_problem: Problem
) -> Tuple[bool, List[Dict[str, Tuple]]]:
    """
    Performs a preliminary check for inconsistency in the formulated CSP.

    This function executes Step 3. It solves the CSP and checks if the
    solution set is either empty or contains only the trivial steady-state
    scenario, both of which indicate an inconsistent constraint set.

    Args:
        csp_problem (Problem): The fully formulated CSP object.

    Returns:
        Tuple[bool, List[Dict[str, Tuple]]]: A tuple containing:
            - bool: True if the CSP is inconsistent, False otherwise.
            - List[Dict[str, Tuple]]: The list of all solutions found by the solver.
    """
    # --- 1. Solve the CSP ---
    # The getSolutions() method performs a backtracking search to find all valid scenarios.
    solutions = csp_problem.getSolutions()

    # --- 2. Define the Steady State Scenario ---
    # Equation/Rule: S_steady = {(X1, 0, 0), ..., (Xn, 0, 0)}
    # This is the scenario where all first and second derivatives are '0'.
    steady_state_triplet = ('+', '0', '0')
    steady_state_scenario = {
        var: steady_state_triplet for var in csp_problem.getVariables()
    }

    # --- 3. Inconsistency Check ---
    # The constraint set is inconsistent if no solutions are found...
    is_inconsistent = len(solutions) == 0

    # ...or if the only solution found is the trivial steady-state scenario.
    if not is_inconsistent:
        is_inconsistent = (
            len(solutions) == 1 and solutions[0] == steady_state_scenario
        )

    return is_inconsistent, solutions

# =============================================================================
# Task 7: Orchestrator Function
# =============================================================================

def generate_and_test_initial_constraint_set(
    correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the generation and preliminary testing of the initial constraint set.

    This function serves as the main entry point for Task 7. It executes the
    three steps in sequence:
    1. Maps the correlation matrix to a symbolic constraint list.
    2. Formulates these constraints into a formal CSP object.
    3. Solves the CSP to detect if the initial constraint set is inconsistent.

    Args:
        correlation_matrix_df (pd.DataFrame): The preprocessed correlation matrix.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary (used to get variable names).

    Returns:
        Dict[str, Any]: A report containing the generated constraints, the
                        formulated CSP, the solutions found, and the final
                        inconsistency status.
    """
    final_report = {
        "task_name": "Task 7: Initial Constraint Set Generation from Correlation Matrix",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Step 1: Correlation-to-Constraint Mapping ---
        initial_constraints = map_correlation_to_constraints(
            correlation_matrix_df=correlation_matrix_df
        )
        final_report["outputs"]["initial_constraints"] = initial_constraints
        final_report["outputs"]["initial_constraint_count"] = len(initial_constraints)

        # --- Step 2: CSP Formulation ---
        # Get the list of variables from the matrix columns.
        cim_variables = correlation_matrix_df.columns.tolist()
        csp_problem = formulate_initial_csp(
            variables=cim_variables,
            initial_constraints=initial_constraints
        )
        # Storing the problem object itself can be memory intensive; we store a reference.
        final_report["outputs"]["csp_problem_formulated"] = True
        # We pass the problem object for direct use in the next task.
        final_report["outputs"]["csp_problem_object"] = csp_problem

        # --- Step 3: Preliminary Inconsistency Detection ---
        is_inconsistent, solutions = detect_initial_inconsistency(
            csp_problem=csp_problem
        )
        final_report["outputs"]["is_inconsistent"] = is_inconsistent
        final_report["outputs"]["solutions_found"] = solutions
        final_report["outputs"]["solution_count"] = len(solutions)

        if is_inconsistent:
            final_report["summary_message"] = "Initial constraint set is inconsistent. Inconsistency removal is required."
        else:
            final_report["summary_message"] = "Initial constraint set is consistent."

    except (TypeError, ValueError, KeyError) as e:
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Constraint generation failed: {e}"

    return final_report


In [None]:
# Task 8: Iterative Inconsistency Removal Algorithm Implementation

# =============================================================================
# Task 8, Step 1: Minimum Absolute Value Identification and Removal
# =============================================================================

def find_and_remove_weakest_correlation(
    correlation_matrix_df: pd.DataFrame
) -> Tuple[pd.DataFrame, Dict[str, Any]]:
    """
    Identifies and removes the weakest correlation from the matrix.

    This function executes Step 1 of the iterative algorithm. It finds the
    off-diagonal correlation with the minimum absolute value and removes it by
    setting it (and its symmetric counterpart) to zero. It includes a
    deterministic tie-breaking rule.

    Args:
        correlation_matrix_df (pd.DataFrame): The current correlation matrix
                                              in the iteration.

    Returns:
        Tuple[pd.DataFrame, Dict[str, Any]]: A tuple containing:
            - pd.DataFrame: A new DataFrame with the weakest correlation removed.
            - Dict[str, Any]: A report detailing which correlation was removed
              and its value.
    """
    # --- Input Validation ---
    if not isinstance(correlation_matrix_df, pd.DataFrame):
        raise TypeError("Input must be a pandas DataFrame.")

    # Work on a copy to avoid side effects.
    matrix = correlation_matrix_df.copy()
    matrix_values = matrix.values.copy()

    # --- 1. Find Minimum Absolute Value ---
    # To find the minimum off-diagonal, non-zero value, we first get the
    # absolute values.
    abs_matrix = np.abs(matrix_values)

    # Temporarily replace the diagonal and any exact zeros with infinity
    # so they are ignored by the minimum search.
    np.fill_diagonal(abs_matrix, np.inf)
    abs_matrix[abs_matrix == 0] = np.inf

    # Find the minimum absolute value in the entire matrix.
    min_abs_value = np.min(abs_matrix)

    # If min_abs_value is infinity, it means there are no non-zero off-diagonal
    # elements left to remove. This is an edge case for termination.
    if np.isinf(min_abs_value):
        return matrix, {"variables": None, "value": None, "message": "No removable correlations remain."}

    # --- 2. Identify Candidate Pairs and Apply Tie-Breaking ---
    # Find all indices (i, j) that match this minimum value.
    candidate_indices = np.argwhere(np.isclose(np.abs(matrix_values), min_abs_value))

    # Convert indices to canonically ordered (sorted) variable pairs.
    variables = matrix.columns
    candidate_pairs = {
        tuple(sorted((variables[i], variables[j])))
        for i, j in candidate_indices if i != j
    }

    # Equation/Rule: Tie-breaking using lexicographic variable order.
    # Sort the unique pairs alphabetically to find the one to remove.
    pair_to_remove = sorted(list(candidate_pairs))[0]
    var1, var2 = pair_to_remove

    # --- 3. Symmetric Removal ---
    # Equation/Rule: Remove both c_ij and c_ji from the matrix.
    value_removed = matrix.loc[var1, var2]
    matrix.loc[var1, var2] = 0.0
    matrix.loc[var2, var1] = 0.0

    # --- Reporting ---
    report = {
        "variables": pair_to_remove,
        "value": value_removed,
        "message": f"Removed correlation between {var1} and {var2} with value {value_removed:.4f}."
    }

    return matrix, report

# =============================================================================
# Task 8: Orchestrator Function
# =============================================================================

def iteratively_remove_inconsistencies(
    initial_correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the iterative inconsistency removal algorithm.

    This function implements the core heuristic from the paper. It repeatedly
    removes the weakest correlation from the system and re-tests for logical
    consistency until a non-trivial solution is found or an iteration limit
    is reached.

    Args:
        initial_correlation_matrix_df (pd.DataFrame): The preprocessed matrix
            that was found to be inconsistent.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary, used to retrieve the iteration limit.

    Returns:
        Dict[str, Any]: A comprehensive report detailing the entire iterative
                        process, the final consistent matrix and constraints,
                        and the resulting non-trivial solutions.
    """
    # --- Initialization ---
    # Retrieve the iteration limit from the configuration.
    iteration_limit = _get_nested_param(
        master_input_specification,
        'computational_configuration.inconsistency_removal.algorithm_parameters.iteration_limit'
    )

    # Initialize the main report dictionary.
    final_report = {
        "task_name": "Task 8: Iterative Inconsistency Removal",
        "overall_status": "FAILURE", # Default to failure until success is achieved
        "termination_reason": "",
        "iteration_count": 0,
        "iteration_log": [],
        "final_correlation_matrix": None,
        "final_constraints": None,
        "final_solutions": None
    }

    # Set the starting state for the loop.
    current_matrix = initial_correlation_matrix_df.copy()
    is_consistent = False

    # --- Iteration Loop ---
    # The loop continues until the system is consistent or the limit is reached.
    while not is_consistent and final_report["iteration_count"] < iteration_limit:
        iteration = final_report["iteration_count"]

        # --- Step 1: Find and Remove Weakest Correlation ---
        # This step modifies the correlation matrix for the current iteration.
        current_matrix, removal_report = find_and_remove_weakest_correlation(current_matrix)

        # --- Step 2: Re-formulate and Re-solve the CSP ---
        # Regenerate constraints and the CSP problem from the modified matrix.
        variables = current_matrix.columns.tolist()
        current_constraints = map_correlation_to_constraints(current_matrix)
        csp_problem = formulate_initial_csp(variables, current_constraints)

        # --- Step 3: Assess Convergence ---
        # Check if the newly formulated problem is consistent.
        # Equation/Rule: Convergence criterion is non_steady_state_solution_exists.
        is_inconsistent, solutions = detect_initial_inconsistency(csp_problem)
        is_consistent = not is_inconsistent

        # Log the results of the current iteration.
        final_report["iteration_log"].append({
            "iteration": iteration + 1,
            "removed_correlation": removal_report,
            "constraint_count": len(current_constraints),
            "is_consistent": is_consistent,
            "solution_count": len(solutions)
        })

        # Increment the iteration counter.
        final_report["iteration_count"] += 1

    # --- Finalization and Reporting ---
    # After the loop, check the termination condition and set the final status.
    if is_consistent:
        final_report["overall_status"] = "SUCCESS"
        final_report["termination_reason"] = f"Consistent solution set found after {final_report['iteration_count']} iterations."
        final_report["final_correlation_matrix"] = current_matrix
        final_report["final_constraints"] = current_constraints
        final_report["final_solutions"] = solutions
    else:
        final_report["termination_reason"] = f"Algorithm terminated after reaching the limit of {iteration_limit} iterations without finding a consistent solution."
        final_report["final_correlation_matrix"] = current_matrix # The last attempted matrix

    return final_report


In [None]:
# Task 9: Expert Knowledge Integration and Final Constraint Refinement

# =============================================================================
# Task 9, Step 1: Semi-Subjective Expert Knowledge Incorporation
# =============================================================================

def construct_final_cim_constraint_set() -> List[Dict[str, Any]]:
    """
    Constructs the final, expert-refined CIM constraint set as specified in Table 4.

    This function executes Step 1 by programmatically defining the exact 14
    pairwise trend relations that form the final Complex Investment Model (CIM).
    This is a direct and faithful transcription of the expert-validated model
    from the source paper.

    Returns:
        List[Dict[str, Any]]: A list of 14 dictionaries, each representing one
                              expert-defined qualitative constraint.
    """
    # This data structure is a direct transcription of Table 4 from the paper.
    # Each dictionary represents one of the 14 final constraints.
    # Note on σ(X,Y) notation: We interpret this as Y being a function of X.
    # The paper's notation "σ+- REP PRI" is interpreted as REP = f(PRI),
    # meaning an increase in PRI has a supporting, decelerating effect on REP.
    final_constraints = [
        # Row 1: RED UND TA
        {'type': 'RED', 'variables': ('TA', 'UND'), 'source': 'expert_refined'},
        # Row 2: RED AGE ROA
        {'type': 'RED', 'variables': ('AGE', 'ROA'), 'source': 'expert_refined'},
        # Row 3: SUP QUA TA
        {'type': 'SUP', 'variables': ('QUA', 'TA'), 'source': 'expert_refined'},
        # Row 4: SUP MAR LIS
        {'type': 'SUP', 'variables': ('LIS', 'MAR'), 'source': 'expert_refined'},
        # Row 5: SUP MAR REP
        {'type': 'SUP', 'variables': ('MAR', 'REP'), 'source': 'expert_refined'},
        # Row 6: SUP LIS QUA
        {'type': 'SUP', 'variables': ('LIS', 'QUA'), 'source': 'expert_refined'},
        # Row 7: SUP LIS REP
        {'type': 'SUP', 'variables': ('LIS', 'REP'), 'source': 'expert_refined'},
        # Row 8: σ+- REP PRI
        {'type': 'SHAPE', 'shape': '+-', 'variables': ('PRI', 'REP'), 'source': 'expert_refined'},
        # Row 9: SUP QUA PRI
        {'type': 'SUP', 'variables': ('PRI', 'QUA'), 'source': 'expert_refined'},
        # Row 10: SUP BOO TA
        {'type': 'SUP', 'variables': ('BOO', 'TA'), 'source': 'expert_refined'},
        # Row 11: SUP BOO LIS
        {'type': 'SUP', 'variables': ('BOO', 'LIS'), 'source': 'expert_refined'},
        # Row 12: SUP AGE PRI
        {'type': 'SUP', 'variables': ('AGE', 'PRI'), 'source': 'expert_refined'},
        # Row 13: SUP LIS REP (Note: This is a duplicate in the paper's table, we include it for fidelity)
        {'type': 'SUP', 'variables': ('LIS', 'REP'), 'source': 'expert_refined'},
        # Row 14: SUP QUA REP
        {'type': 'SUP', 'variables': ('QUA', 'REP'), 'source': 'expert_refined'},
    ]

    # Canonically sort variable tuples for consistency.
    for const in final_constraints:
        const['variables'] = tuple(sorted(const['variables']))

    return final_constraints

# =============================================================================
# Task 9, Step 2: Final CIM Constraint Set Validation
# =============================================================================

def validate_final_cim_constraint_set(
    constraint_set: List[Dict[str, Any]],
    expected_variables: Set[str]
) -> Dict[str, Any]:
    """
    Validates the constructed final CIM constraint set against specifications.

    This function executes Step 2. It verifies the constraint count, the
    distribution of constraint types, and ensures all expected variables
    are included in the model.

    Args:
        constraint_set (List[Dict[str, Any]]): The list of constraints to validate.
        expected_variables (Set[str]): The set of all variable names expected
                                        in the CIM model.

    Returns:
        Dict[str, Any]: A report of the validation, raising ValueError on failure.
    """
    errors = []

    # --- 1. Constraint Count Verification ---
    # Equation/Rule: |C_CIM| = 14
    if len(constraint_set) != 14:
        errors.append(f"Expected exactly 14 constraints, but found {len(constraint_set)}.")

    # --- 2. Constraint Type Distribution ---
    # Verify the mix of SUP, RED, and SHAPE constraints matches Table 4.
    type_counts = Counter(c['type'] for c in constraint_set)
    # Based on Table 4: 11 SUP (including duplicate), 2 RED, 1 SHAPE
    if not (type_counts['SUP'] == 11 and type_counts['RED'] == 2 and type_counts['SHAPE'] == 1):
         errors.append(f"Incorrect distribution of constraint types. Found: {dict(type_counts)}.")

    # --- 3. Variable Coverage Assessment ---
    # Ensure all 10 CIM variables participate in at least one constraint.
    model_vars = {var for const in constraint_set for var in const['variables']}
    if model_vars != expected_variables:
        missing = expected_variables - model_vars
        extra = model_vars - expected_variables
        message = "Variable coverage is incorrect. "
        if missing: message += f"Missing: {missing}. "
        if extra: message += f"Extra: {extra}."
        errors.append(message)

    # --- Final Report ---
    if errors:
        raise ValueError(f"Final CIM constraint set validation failed: {'; '.join(errors)}")

    return {"status": "SUCCESS", "message": "Constraint set conforms to specifications."}

# =============================================================================
# Task 9, Step 3: CIM CSP Final Formulation and Solution
# =============================================================================

def formulate_and_solve_final_cim_csp(
    final_constraints: List[Dict[str, Any]],
    variables: List[str]
) -> List[Dict[str, Tuple]]:
    """
    Formulates and solves the final CIM CSP, including SHAPE constraints.

    This function executes Step 3. It builds the CSP problem from the final
    14 expert-defined constraints, solves it to find all possible scenarios,
    and validates that the number of solutions matches the paper's result (7).

    Args:
        final_constraints (List[Dict[str, Any]]): The final 14 constraints.
        variables (List[str]): The list of 10 CIM variable names.

    Returns:
        List[Dict[str, Tuple]]: The list of 7 valid CIM scenarios.
    """
    # Initialize the CSP solver object.
    problem = Problem()

    # --- 1. Variable Domain Definition ---
    derivative_domain = ['+', '0', '-']
    trend_triplet_domain = list(itertools.product(['+'], derivative_domain, derivative_domain))
    for var in variables:
        problem.addVariable(var, trend_triplet_domain)

    # --- 2. Constraint Encoding (Extended for SHAPE) ---
    sup_constraint = lambda v1, v2: not (v1[1] == '+' and v2[1] == '-')
    red_constraint = lambda v1, v2: not (v1[1] == '+' and v2[1] == '+')

    # Define logic for shape constraints. The first variable in the tuple is the
    # independent variable (X), the second is the dependent (Y).
    shape_constraints = {
        '+-': lambda x, y: not (x[1] == '+' and not (y[1] == '+' and y[2] == '-'))
        # "If X is increasing, Y MUST be increasing and decelerating."
    }

    for const in final_constraints:
        const_type = const['type']
        # Ensure variables are in the order expected by the lambda functions.
        v1_name, v2_name = const['variables']

        if const_type == 'SUP':
            problem.addConstraint(sup_constraint, (v1_name, v2_name))
        elif const_type == 'RED':
            problem.addConstraint(red_constraint, (v1_name, v2_name))
        elif const_type == 'SHAPE':
            shape_type = const['shape']
            if shape_type not in shape_constraints:
                raise ValueError(f"Unsupported shape type: '{shape_type}'")
            problem.addConstraint(shape_constraints[shape_type], (v1_name, v2_name))

    # --- 3. Solution Generation and Validation ---
    # Solve the fully defined final CIM problem.
    solutions = problem.getSolutions()

    # Equation/Rule: |S_CIM| = 7
    # The final model must produce exactly 7 scenarios.
    if len(solutions) != 7:
        raise RuntimeError(
            f"CIM model solution failed validation. "
            f"Expected 7 scenarios, but found {len(solutions)}."
        )

    # Sort solutions for deterministic output, based on the first variable's triplet.
    sorted_solutions = sorted(solutions, key=lambda s: s[variables[0]])

    return sorted_solutions

# =============================================================================
# Task 9: Orchestrator Function
# =============================================================================

def finalize_and_solve_cim_model(
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the construction, validation, and solution of the final CIM.

    This function serves as the main entry point for Task 9. It executes the
    three steps to create the expert-defined 14-constraint model and solve it
    to produce the 7 scenarios specified in the paper.

    Args:
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary.

    Returns:
        Dict[str, Any]: A report containing the final constraint set and the
                        validated list of 7 CIM scenarios.
    """
    final_report = {
        "task_name": "Task 9: Expert Knowledge Integration and Final Constraint Refinement",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # Define the 10 variables of the CIM model.
        cim_variables = sorted([
            'UND', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'REP', 'BOO', 'ROA', 'PRI'
        ])

        # --- Step 1: Construct the final 14-constraint set from Table 4 ---
        final_cim_constraints = construct_final_cim_constraint_set()
        final_report["outputs"]["final_cim_constraints"] = final_cim_constraints

        # --- Step 2: Validate the constructed constraint set ---
        validation_report = validate_final_cim_constraint_set(
            constraint_set=final_cim_constraints,
            expected_variables=set(cim_variables)
        )
        final_report["outputs"]["constraint_set_validation"] = validation_report

        # --- Step 3: Formulate and solve the final CIM CSP ---
        cim_scenarios = formulate_and_solve_final_cim_csp(
            final_constraints=final_cim_constraints,
            variables=cim_variables
        )
        final_report["outputs"]["cim_scenarios"] = cim_scenarios
        final_report["outputs"]["scenario_count"] = len(cim_scenarios)
        final_report["summary_message"] = "Successfully constructed and solved the final CIM, yielding the expected 7 scenarios."

    except (ValueError, TypeError, RuntimeError) as e:
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Final CIM model generation failed: {e}"

    return final_report


In [None]:
# Task 10: Differential System Parameter Elimination and Qualitative Translation

# =============================================================================
# Task 10, Step 1 & 2: Combined Qualitative Translation
# =============================================================================

def translate_rrm_odes_to_qualitative_equations() -> List[str]:
    """
    Translates the RRM ODEs into the final qualitative algebraic equations.

    This function executes Steps 1 and 2 of the translation process. Based on
    the principle of fidelity to the source paper, this function does not
    derive the qualitative equations from first principles. Instead, it
    implements the *result* of the paper's translation process, producing the
    exact set of five qualitative equations specified in Equation (11). This
    ensures perfect replication of the model that was actually solved.

    The translation from the original ODEs (Eq. 8) to the qualitative form
    (Eq. 11) involves both constant elimination and qualitative arithmetic,
    as described in the paper.

    Returns:
        List[str]: A list containing the five qualitative RRM equations as strings.
    """
    # This is a direct, high-fidelity transcription of Equation (11) from the paper.
    # This represents the final state after applying constant elimination and
    # qualitative arithmetic rules to the original ODE system (Equation 8).

    # Equation (11), First line: dX/dt = -α(XY/N)  ==>  DX + XY = 0
    eq1 = "DX + XY = 0"

    # Equation (11), Second line: dY/dt = ...  ==>  DY + YY + YZ1 + YZ2 = XY
    eq2 = "DY + YY + YZ1 + YZ2 = XY"

    # Equation (11), Third line: dW/dt = ...  ==>  DW + XY + W = XY
    eq3 = "DW + XY + W = XY"

    # Equation (11), Fourth line: dZ1/dt = ... ==>  DZ1 = YY + YZ1 + W
    eq4 = "DZ1 = YY + YZ1 + W"

    # Equation (11), Fifth line: dZ2/dt = ...  ==>  DZ2 + W = W + YZ2
    eq5 = "DZ2 + W = W + YZ2"

    qualitative_equations = [eq1, eq2, eq3, eq4, eq5]

    return qualitative_equations

# =============================================================================
# Task 10, Step 3: Complete RRM Qualitative System Construction
# =============================================================================

def structure_qualitative_rrm_system(
    qualitative_equations: List[str],
    rrm_variables: Set[str]
) -> List[Dict[str, List[str]]]:
    """
    Parses the qualitative equation strings into a structured format for the CSP.

    This function executes Step 3. It takes the list of equation strings and
    converts each one into a dictionary with 'LHS' and 'RHS' keys, where the
    values are lists of the terms on each side of the equation. It also
    validates that all variables present in the equations are recognized.

    Args:
        qualitative_equations (List[str]): The list of equation strings from Eq. (11).
        rrm_variables (Set[str]): The set of expected RRM state variables
                                   (e.g., {'X', 'Y', 'W', 'Z1', 'Z2'}).

    Returns:
        List[Dict[str, List[str]]]: A list of structured equation dictionaries.

    Raises:
        ValueError: If an equation string is malformed or contains unknown variables.
    """
    structured_system = []

    # Define the complete set of valid term symbols based on the variables.
    # This includes state variables (X), derivatives (DX), and product terms (XY).
    valid_symbols = rrm_variables.copy()
    valid_symbols.update({f"D{var}" for var in rrm_variables})
    # Generate all possible pairwise product terms.
    for v1 in rrm_variables:
        for v2 in rrm_variables:
            # Canonically order product terms (e.g., XY, not YX)
            valid_symbols.add("".join(sorted((v1, v2))))
    valid_symbols.add('0') # The zero term is also valid.

    for i, eq_str in enumerate(qualitative_equations):
        # --- 1. Parse the Equation String ---
        # Split the equation into Left-Hand Side and Right-Hand Side.
        if '=' not in eq_str:
            raise ValueError(f"Equation {i+1} ('{eq_str}') is malformed: missing '='.")
        lhs_str, rhs_str = eq_str.split('=', 1)

        # Split each side into its constituent terms.
        lhs_terms = [term.strip() for term in lhs_str.split('+')]
        rhs_terms = [term.strip() for term in rhs_str.split('+')]

        # --- 2. Validate Terms ---
        # Check every parsed term against the set of valid symbols.
        all_terms = lhs_terms + rhs_terms
        for term in all_terms:
            if term not in valid_symbols:
                raise ValueError(
                    f"Equation {i+1} ('{eq_str}') contains unrecognized term: '{term}'."
                )

        # --- 3. Store in Structured Format ---
        structured_system.append({
            'equation_string': eq_str,
            'LHS': lhs_terms,
            'RHS': rhs_terms
        })

    return structured_system

# =============================================================================
# Task 10: Orchestrator Function
# =============================================================================

def translate_and_structure_rrm_system(
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the complete translation of the RRM ODEs into a structured system.

    This function serves as the main entry point for Task 10. It executes the
    steps to produce the exact qualitative algebraic equations from the paper
    and then parses them into a structured format suitable for CSP formulation.

    Args:
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary, used to retrieve the RRM variable names.

    Returns:
        Dict[str, Any]: A report containing the final structured qualitative
                        system for the RRM.
    """
    final_report = {
        "task_name": "Task 10: Differential System Parameter Elimination and Qualitative Translation",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Step 1 & 2: Translate RRM ODEs to Qualitative Equations ---
        # This step directly implements the result (Equation 11) from the paper.
        qualitative_equations = translate_rrm_odes_to_qualitative_equations()
        final_report["outputs"]["qualitative_equations_as_strings"] = qualitative_equations

        # --- Step 3: Structure the Qualitative System ---
        # Retrieve the set of RRM variable names for validation.
        rrm_variables = set(_get_nested_param(
            master_input_specification,
            'empirical_data.rrm_system.state_variables'
        ).keys())

        # Parse the equation strings into a structured list of dictionaries.
        structured_rrm_system = structure_qualitative_rrm_system(
            qualitative_equations=qualitative_equations,
            rrm_variables=rrm_variables
        )
        final_report["outputs"]["structured_rrm_system"] = structured_rrm_system
        final_report["summary_message"] = "Successfully translated RRM ODEs into a validated, structured qualitative system."

    except (ValueError, TypeError, KeyError) as e:
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"RRM translation failed: {e}"

    return final_report


In [None]:
# Task 11: RRM Constraint Satisfaction Problem Formulation

# =============================================================================
# Task 11, Helper: Qualitative Arithmetic Engine
# =============================================================================

def _qualitative_add(qualitative_values: List[str]) -> Set[str]:
    """
    Performs qualitative addition on a list of qualitative trend values.

    This function implements the qualitative addition rules as described in
    Equation (10) of the source paper. It determines the set of possible
    outcomes when summing multiple qualitative trends ('+', '0', '-'). The
    core logic is that if trends of opposite sign are present, the result is
    ambiguous and could be positive, negative, or zero.

    Args:
        qualitative_values (List[str]): A list of strings, where each string
            is a valid qualitative trend value ('+', '0', or '-').

    Returns:
        Set[str]: A set containing all possible qualitative outcomes of the sum.
                  - Returns {'+'} if only positive and zero trends are present.
                  - Returns {'-'} if only negative and zero trends are present.
                  - Returns {'0'} if only zero trends are present.
                  - Returns {'+', '0', '-'} if both positive and negative
                    trends are present (ambiguous outcome).

    Raises:
        ValueError: If the input list contains invalid qualitative symbols.
    """
    # --- Input Validation ---
    # Ensure the input is a list.
    if not isinstance(qualitative_values, list):
        raise TypeError("Input 'qualitative_values' must be a list.")

    # Check for invalid symbols within the list for robustness.
    valid_symbols = {'+', '0', '-'}
    if not all(val in valid_symbols for val in qualitative_values):
        invalid = next((val for val in qualitative_values if val not in valid_symbols), None)
        raise ValueError(f"Input list contains an invalid qualitative symbol: '{invalid}'")

    # --- Qualitative Addition Logic ---
    # Count the number of positive ('+') and negative ('-') terms in the list.
    pos_count = qualitative_values.count('+')
    neg_count = qualitative_values.count('-')

    # Equation/Rule: (+) + (-) -> {+, 0, -}
    # If both positive and negative terms exist, the result is ambiguous.
    if pos_count > 0 and neg_count > 0:
        return {'+', '0', '-'}

    # Equation/Rule: (+) + (+) -> {+} and (+) + (0) -> {+}
    # If only positive terms exist (and possibly zeros), the result is positive.
    elif pos_count > 0:
        return {'+'}

    # Equation/Rule: (-) + (-) -> {-} and (-) + (0) -> {-}
    # If only negative terms exist (and possibly zeros), the result is negative.
    elif neg_count > 0:
        return {'-'}

    # Equation/Rule: (0) + (0) -> {0}
    # If the list contains only zeros (or is empty), the result is zero.
    else:
        return {'0'}


def _qualitative_multiply(v1: str, v2: str) -> str:
    """
    Performs qualitative multiplication of two qualitative trend values.

    This function implements the standard rules of sign multiplication, which
    is the basis for handling product terms in the qualitative equations.

    - (+) * (+) -> (+)
    - (-) * (-) -> (+)
    - (+) * (-) -> (-)
    - (X) * (0) -> (0)

    Args:
        v1 (str): The first qualitative value ('+', '0', or '-').
        v2 (str): The second qualitative value ('+', '0', or '-').

    Returns:
        str: The resulting qualitative value ('+', '0', or '-').

    Raises:
        ValueError: If either input is not a valid qualitative symbol.
    """
    # --- Input Validation ---
    valid_symbols = {'+', '0', '-'}
    if v1 not in valid_symbols or v2 not in valid_symbols:
        raise ValueError(f"Invalid qualitative symbol provided. Got: '{v1}', '{v2}'.")

    # --- Qualitative Multiplication Logic ---
    # Any multiplication by zero results in zero.
    if '0' in (v1, v2):
        return '0'

    # If the signs are the same, the result is positive.
    if v1 == v2:
        return '+'

    # If the signs are different, the result is negative.
    return '-'


# =============================================================================
# Task 11, Helper: Custom CSP Constraint Classes
# =============================================================================

class QualitativeEquationConstraint(Constraint):
    """
    A custom CSP constraint to enforce a qualitative algebraic equation.

    This class provides the core logic for translating a symbolic qualitative
    equation (e.g., "DY + YY = XY") into a callable that the CSP solver can
    use to prune its search space. It evaluates the equation based on the
    currently assigned values for its variables and returns True if the
    equation *could* be satisfied according to the rules of qualitative
    arithmetic. An equation is considered satisfied if the set of possible
    outcomes for the left-hand side has a non-empty intersection with the
    set of possible outcomes for the right-hand side.
    """
    def __init__(self, lhs_terms: List[str], rhs_terms: List[str], all_vars: List[str]):
        """
        Initializes the constraint with the parsed terms of the equation.

        Args:
            lhs_terms (List[str]): A list of term strings on the left-hand side
                                   of the equation (e.g., ['DY', 'YY']).
            rhs_terms (List[str]): A list of term strings on the right-hand side
                                   of the equation (e.g., ['XY']).
            all_vars (List[str]): The complete list of all variable names in the
                                  CSP, used to determine the scope of this constraint.

        Raises:
            TypeError: If terms are not strings or `all_vars` is not a list.
        """
        # --- Input Validation ---
        if not isinstance(lhs_terms, list) or not isinstance(rhs_terms, list):
            raise TypeError("LHS and RHS terms must be provided as lists of strings.")
        if not all(isinstance(term, str) for term in lhs_terms + rhs_terms):
            raise TypeError("All terms within the lists must be strings.")
        if not isinstance(all_vars, list):
            raise TypeError("Argument 'all_vars' must be a list of strings.")

        # Store the parsed terms for later evaluation.
        self._lhs_terms = lhs_terms
        self._rhs_terms = rhs_terms

        # Determine the scope of this constraint: the unique, sorted set of
        # variables that appear in any of its terms. This is crucial for the
        # CSP solver to know when to trigger this constraint check.
        self._variables = sorted([
            v for v in all_vars
            if any(v in term for term in lhs_terms + rhs_terms)
        ])

        # Initialize the parent Constraint class provided by the library.
        super().__init__()

    def __call__(
        self,
        variables: List[str],
        domains: Dict[str, List[Any]],
        assignments: Dict[str, Any],
        forwardcheck: bool = False
    ) -> bool:
        """
        The callback function executed by the CSP solver to check the constraint.

        This method is the core of the constraint. It is called by the solver
        repeatedly during the search process whenever a variable in its scope
        is assigned a value.

        Args:
            variables (List[str]): The list of variable names in the constraint's
                                   scope that have been assigned values so far.
            domains (Dict[str, List[Any]]): A dictionary mapping variables to their
                                            current possible domains.
            assignments (Dict[str, Any]): A dictionary of the current assignments
                                          of values (trend triplets) to variables.
            forwardcheck (bool): A flag indicating if the solver is in
                                 forward-checking mode (not used in this logic).

        Returns:
            bool: True if the constraint is satisfied given the current
                  assignments or if it cannot be fully evaluated yet. False
                  if the current assignments create a definitive violation.
        """
        # Extract only the assignments relevant to this constraint's variables.
        current_assignments = {var: assignments[var] for var in self._variables if var in assignments}

        # If not all variables required by this constraint have been assigned a
        # value yet, we cannot fully evaluate the equation. In this case, we
        # must return True to allow the search to continue.
        if len(current_assignments) != len(self._variables):
            return True

        # Define a local helper function to evaluate a single symbolic term
        # based on the current assignments.
        def evaluate_term(term: str) -> str:
            """Evaluates a term like 'DX' or 'XY' into a qualitative value ('+', '0', '-')."""
            # Case 1: Derivative term (e.g., 'DX', 'DY').
            if term.startswith('D') and len(term) > 1:
                var_name = term[1:]
                # The first derivative (DX) is the second element (index 1) of the trend triplet.
                return current_assignments[var_name][1]

            # Case 2: Product term (e.g., 'XY', 'YY').
            elif len(term) == 2 and term[0] in self._variables and term[1] in self._variables:
                v1_name, v2_name = term[0], term[1]
                # The product is of the first derivatives (trends) of the variables.
                dx1 = current_assignments[v1_name][1]
                dx2 = current_assignments[v2_name][1]
                return _qualitative_multiply(dx1, dx2)

            # Case 3: Simple state variable term (e.g., 'W').
            # In this model's context, a standalone variable in a dynamic
            # equation represents its trend (first derivative).
            elif len(term) == 1 and term in self._variables:
                return current_assignments[term][1]

            # Case 4: The zero constant.
            elif term == '0':
                return '0'

            # If the term format is unrecognized, it indicates a setup error.
            raise ValueError(f"Cannot evaluate unknown term format: '{term}'")

        try:
            # Evaluate the qualitative sum of all terms on the Left-Hand Side.
            # This returns a set of possible outcomes (e.g., {'+'}).
            lhs_values = [evaluate_term(t) for t in self._lhs_terms]
            lhs_sum_outcomes = _qualitative_add(lhs_values)

            # Evaluate the qualitative sum of all terms on the Right-Hand Side.
            rhs_values = [evaluate_term(t) for t in self._rhs_terms]
            rhs_sum_outcomes = _qualitative_add(rhs_values)

            # The constraint is satisfied if the set of possible outcomes for the LHS
            # has a non-empty intersection with the set of possible outcomes for the RHS.
            # This correctly handles ambiguity (e.g., {+,0,-} is compatible with {+}).
            return bool(lhs_sum_outcomes.intersection(rhs_sum_outcomes))

        except (KeyError, IndexError) as e:
            # This defensive block handles potential errors if assignments are
            # malformed or a variable is missing. In a correct run, this
            # should not be reached. A violation is reported by returning False.
            return False


class PopulationConservationConstraint(Constraint):
    """
    Custom CSP constraint to enforce population conservation on derivatives.

    This constraint enforces the qualitative equivalent of the conservation law
    for a closed system: dX/dt + dY/dt + dW/dt + dZ1/dt + dZ2/dt = 0. It
    checks if the qualitative sum of the first derivatives (trends) of all
    population variables *can possibly be zero*. This is a crucial constraint
    for ensuring the physical realism of the model's scenarios.
    """
    def __init__(self, rrm_vars: List[str]):
        """
        Initializes the constraint with the list of all population variables.

        Args:
            rrm_vars (List[str]): The list of variable names to be included
                                  in the conservation sum.

        Raises:
            TypeError: If `rrm_vars` is not a list.
        """
        # --- Input Validation ---
        if not isinstance(rrm_vars, list):
            raise TypeError("Input 'rrm_vars' must be a list of strings.")

        # The scope of this constraint includes all specified population variables.
        self._variables = sorted(rrm_vars)

        # Initialize the parent Constraint class.
        super().__init__()

    def __call__(
        self,
        variables: List[str],
        domains: Dict[str, List[Any]],
        assignments: Dict[str, Any],
        forwardcheck: bool = False
    ) -> bool:
        """
        The callback function executed by the CSP solver to check the constraint.
        """
        # Extract the assigned values for the variables in this constraint's scope.
        current_assignments = {var: assignments[var] for var in self._variables if var in assignments}

        # If not all population variables have been assigned a value yet,
        # we cannot evaluate the sum, so we must return True.
        if len(current_assignments) != len(self._variables):
            return True

        # Collect the first derivative (DX, which is at index 1 of the triplet)
        # for each assigned variable.
        derivatives = [assign[1] for assign in current_assignments.values()]

        # Calculate the set of possible outcomes for the qualitative sum of these derivatives.
        possible_outcomes = _qualitative_add(derivatives)

        # The conservation law is satisfied if '0' is among the possible outcomes.
        # This correctly handles ambiguous cases like (+, -) -> {+, 0, -}.
        return '0' in possible_outcomes

# =============================================================================
# Task 11: Orchestrator Function
# =============================================================================

def formulate_rrm_csp(
    structured_rrm_system: List[Dict[str, List[str]]],
    rrm_variables: List[str]
) -> Dict[str, Any]:
    """
    Orchestrates the complete formulation of the RRM Constraint Satisfaction Problem.

    This function serves as the main entry point for Task 11. It constructs a
    formal CSP object that represents the entire RRM system by:
    1. Defining the five RRM state variables and their 9-state trend triplet domains.
    2. Adding a custom n-ary constraint for population conservation based on the
       sum of the variables' first derivatives.
    3. Translating each of the five structured qualitative equations into a
       custom `QualitativeEquationConstraint` and adding it to the problem.

    Args:
        structured_rrm_system (List[Dict[str, List[str]]]): The structured
            qualitative equations generated in Task 10.
        rrm_variables (List[str]): The list of RRM state variable names
                                   (e.g., ['X', 'Y', 'W', 'Z1', 'Z2']).

    Returns:
        Dict[str, Any]: A report containing the fully formulated but unsolved
                        `constraint.Problem` object, ready for the solver.
    """
    # Initialize the final report dictionary.
    final_report = {
        "task_name": "Task 11: RRM Constraint Satisfaction Problem Formulation",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Input Validation ---
        if not rrm_variables:
            raise ValueError("rrm_variables list cannot be empty.")

        # Initialize the CSP solver object from the `python-constraint` library.
        problem = Problem()

        # --- Step 1: RRM Variable Domain Definition ---
        # Define the domain for the first and second derivatives ('+', '0', '-').
        derivative_domain = ['+', '0', '-']

        # The value component is always positive ('+'). The full domain is the
        # Cartesian product of the component domains, resulting in 9 possible states.
        trend_triplet_domain = list(itertools.product(['+'], derivative_domain, derivative_domain))

        # Add each RRM variable to the CSP problem with its defined domain.
        for var in rrm_variables:
            problem.addVariable(var, trend_triplet_domain)

        # --- Step 1 (cont.): Add Population Conservation Constraint ---
        # This constraint ensures the sum of changes in population can be zero.
        conservation_constraint = PopulationConservationConstraint(rrm_variables)
        problem.addConstraint(conservation_constraint, rrm_variables)

        # --- Step 2: Qualitative Equation Constraint Generation ---
        # Iterate through the structured equations and add each as a custom constraint.
        for eq_dict in structured_rrm_system:
            lhs_terms = eq_dict['LHS']
            rhs_terms = eq_dict['RHS']

            # Create an instance of our custom constraint class for the equation.
            equation_constraint = QualitativeEquationConstraint(lhs_terms, rhs_terms, rrm_variables)

            # Add the constraint to the problem, specifying its scope (the variables it involves).
            problem.addConstraint(equation_constraint, equation_constraint._variables)

        # --- Step 3: RRM CSP Complete Specification and Validation ---
        # Perform a final sanity check on the constructed problem.
        num_vars = len(problem.getVariables())
        num_constraints = len(problem.getConstraints())

        # The final problem must have 5 variables and 6 constraints (5 equations + 1 conservation).
        if num_vars != 5 or num_constraints != 6:
            raise RuntimeError(
                f"CSP formulation mismatch. Expected 5 variables and 6 constraints, "
                f"but found {num_vars} variables and {num_constraints} constraints."
            )

        # Store the final, formulated CSP object in the report.
        final_report["outputs"]["rrm_csp_problem"] = problem
        final_report["summary_message"] = "Successfully formulated the RRM CSP with 5 variables and 6 constraints."

    except (ValueError, TypeError, KeyError) as e:
        # Catch any potential errors during formulation and report failure.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"RRM CSP formulation failed: {e}"

    # Return the comprehensive report.
    return final_report



In [None]:
# Task 12: RRM Scenario Generation and Validation

# =============================================================================
# Task 12, Step 1: Comprehensive RRM Solution Enumeration
# =============================================================================

def generate_rrm_scenarios(
    rrm_csp_problem: Problem,
    rrm_variables: List[str]
) -> List[Dict[str, Tuple]]:
    """
    Generates the complete set of scenarios for the RRM by solving the CSP.

    This function executes Step 1 of the task. It calls the CSP solver's
    backtracking search algorithm to perform an exhaustive enumeration of all
    valid assignments that satisfy the RRM's qualitative constraints. The
    resulting scenarios are sorted to ensure a deterministic output order.

    Args:
        rrm_csp_problem (Problem): The fully formulated RRM CSP object from Task 11.
        rrm_variables (List[str]): The sorted list of RRM variable names, used
                                   for canonical sorting of the output.

    Returns:
        List[Dict[str, Tuple]]: A list of all valid RRM scenarios. Each scenario
                                is a dictionary mapping variable names to their
                                assigned trend triplet.
    """
    # --- Input Validation ---
    if not isinstance(rrm_csp_problem, Problem):
        raise TypeError("Input 'rrm_csp_problem' must be a constraint.Problem object.")

    # --- Solve the CSP ---
    # The getSolutions() method performs a complete backtracking search to find
    # all possible valid assignments (scenarios).
    solutions = rrm_csp_problem.getSolutions()

    # --- Sort for Deterministic Output ---
    # To ensure the output is always in the same order, we sort the list of
    # solutions. The sort key is a tuple of the assigned trend triplets,
    # ordered by the canonical (alphabetical) order of variable names.
    sorted_solutions = sorted(
        solutions,
        key=lambda s: tuple(s[var] for var in sorted(rrm_variables))
    )

    return sorted_solutions

# =============================================================================
# Task 12, Step 2 & 3: RRM Scenario Validation and Quality Assessment
# =============================================================================

def validate_rrm_scenarios(
    scenarios: List[Dict[str, Tuple]],
    rrm_csp_problem: Problem,
    expected_count: int,
    sample_size_for_recheck: int = 20
) -> Dict[str, Any]:
    """
    Validates the generated RRM scenarios against the paper's specifications.

    This function executes Steps 2 and 3. It performs three critical checks:
    1.  **Count Validation**: Verifies if the number of scenarios matches the
        expected count from the paper (211). This is the primary check.
    2.  **Structural Validation**: Checks a sample of scenarios to ensure they
        have the correct format (e.g., valid trend triplets).
    3.  **Constraint Re-check**: Re-applies all original constraints to a
        sample of solutions to provide a sanity check on the solver's output.

    Args:
        scenarios (List[Dict[str, Tuple]]): The list of generated RRM scenarios.
        rrm_csp_problem (Problem): The original CSP object used to generate them.
        expected_count (int): The exact number of scenarios expected.
        sample_size_for_recheck (int): The number of random scenarios to use
                                       for the structural and constraint re-checks.

    Returns:
        Dict[str, Any]: A report detailing the outcome of all validation checks.

    Raises:
        RuntimeError: If the scenario count does not match the expected count.
    """
    report = {
        "status": "SUCCESS",
        "checks": {}
    }

    # --- 1. Scenario Count Validation ---
    # Equation/Rule: |S_RRM| = 211
    # This is a critical, non-negotiable validation step.
    actual_count = len(scenarios)
    if actual_count != expected_count:
        # If the count is wrong, the model replication has failed. This is a fatal error.
        raise RuntimeError(
            f"RRM scenario count validation FAILED. "
            f"Expected exactly {expected_count} scenarios, but generated {actual_count}."
        )
    report["checks"]["scenario_count"] = {
        "status": "SUCCESS",
        "expected": expected_count,
        "actual": actual_count
    }

    # Proceed with quality checks only if the count is correct and there are scenarios.
    if not scenarios:
        report["summary_message"] = "Validation passed (0 scenarios expected and found), but quality checks were skipped."
        return report

    # --- 2. Structural Validation on a Sample ---
    # Select a random sample of scenarios to check for structural integrity.
    sample_indices = random.sample(range(actual_count), min(actual_count, sample_size_for_recheck))
    sample_scenarios = [scenarios[i] for i in sample_indices]

    valid_symbols = {'+', '0', '-'}
    for i, scenario in enumerate(sample_scenarios):
        for var, triplet in scenario.items():
            # Check that each assigned value is a tuple of length 3.
            if not (isinstance(triplet, tuple) and len(triplet) == 3):
                raise ValueError(f"Sample scenario {i} has malformed triplet for '{var}': {triplet}")
            # Check that the symbols within the triplet are valid.
            if not (triplet[0] == '+' and triplet[1] in valid_symbols and triplet[2] in valid_symbols):
                raise ValueError(f"Sample scenario {i} has invalid symbols in triplet for '{var}': {triplet}")

    report["checks"]["structural_validation"] = {
        "status": "SUCCESS",
        "message": f"Checked {len(sample_scenarios)} random scenarios for structural integrity."
    }

    # --- 3. Constraint Satisfaction Re-check on a Sample ---
    # This provides an independent verification of the solver's output.
    constraints = rrm_csp_problem.getConstraints()
    for i, scenario in enumerate(sample_scenarios):
        for const in constraints:
            # The __call__ method of our custom constraints checks satisfaction.
            if not const(const._variables, {}, scenario):
                raise RuntimeError(
                    f"Constraint re-check FAILED. Scenario {i} ({scenario}) "
                    f"was found to violate a constraint: {type(const).__name__} on {const._variables}"
                )

    report["checks"]["constraint_recheck"] = {
        "status": "SUCCESS",
        "message": f"Re-verified all constraints on {len(sample_scenarios)} random scenarios."
    }

    return report

# =============================================================================
# Task 12: Orchestrator Function
# =============================================================================

def generate_and_validate_rrm_scenarios(
    rrm_csp_problem: Problem,
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the generation and validation of all RRM scenarios.

    This function serves as the main entry point for Task 12. It:
    1. Solves the RRM CSP to generate the complete set of valid scenarios.
    2. Rigorously validates the output against the paper's specifications,
       most importantly the expected scenario count of 211.

    Args:
        rrm_csp_problem (Problem): The fully formulated RRM CSP from Task 11.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary, used to retrieve expected counts and variable names.

    Returns:
        Dict[str, Any]: A report containing the final list of validated RRM
                        scenarios and the results of the validation checks.
    """
    final_report = {
        "task_name": "Task 12: RRM Scenario Generation and Validation",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Retrieve necessary parameters from the configuration ---
        expected_count = _get_nested_param(
            master_input_specification,
            'empirical_data.rrm_system.qualitative_translation.expected_scenario_count'
        )
        rrm_variables = list(_get_nested_param(
            master_input_specification,
            'empirical_data.rrm_system.state_variables'
        ).keys())

        # --- Step 1: Comprehensive RRM Solution Enumeration ---
        scenarios = generate_rrm_scenarios(
            rrm_csp_problem=rrm_csp_problem,
            rrm_variables=rrm_variables
        )
        final_report["outputs"]["rrm_scenarios"] = scenarios

        # --- Step 2 & 3: Validation and Quality Assessment ---
        validation_report = validate_rrm_scenarios(
            scenarios=scenarios,
            rrm_csp_problem=rrm_csp_problem,
            expected_count=expected_count
        )
        final_report["outputs"]["validation_report"] = validation_report

        final_report["summary_message"] = (
            f"Successfully generated and validated {len(scenarios)} RRM scenarios, "
            "matching the expected count."
        )

    except (TypeError, ValueError, RuntimeError, KeyError) as e:
        # Catch any failure during generation or validation.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"RRM scenario generation or validation failed: {e}"
        final_report["outputs"]["rrm_scenarios"] = [] # Ensure output is an empty list on failure

    return final_report


In [None]:
# Task 13: CIM-RRM Variable Namespace Integration

# =============================================================================
# Task 13, Step 1: Combined Variable Set Construction
# =============================================================================

def construct_integrated_variable_namespace(
    cim_variables: Set[str],
    rrm_variables: Set[str]
) -> List[str]:
    """
    Constructs and validates the unified variable namespace for the Integrated Model.

    This function executes Step 1 of the integration process. It combines the
    variable sets from the CIM and RRM, validates their integrity (ensuring
    they are disjoint and the total count is correct), and returns a
    canonically sorted list of all variables in the integrated model.

    Args:
        cim_variables (Set[str]): A set of the 10 CIM variable names.
        rrm_variables (Set[str]): A set of the 5 RRM variable names.

    Returns:
        List[str]: A single, canonically sorted list containing all 15 unique
                   variable names of the Integrated Model.

    Raises:
        ValueError: If the variable sets are not disjoint or if the final
                    count is not the expected 15.
    """
    # --- Input Validation ---
    if not isinstance(cim_variables, set) or not isinstance(rrm_variables, set):
        raise TypeError("Input variables must be provided as sets.")

    # --- 1. Namespace Conflict Check ---
    # The two sub-models must have completely independent variable names.
    # Equation/Rule: V_CIM ∩ V_RRM = ∅
    if not cim_variables.isdisjoint(rrm_variables):
        conflicting_vars = cim_variables.intersection(rrm_variables)
        raise ValueError(f"Variable name conflict: The following variables exist in both CIM and RRM: {conflicting_vars}")

    # --- 2. Combined Variable Set Construction ---
    # Equation/Rule: V_IM = V_CIM ∪ V_RRM
    integrated_variables = cim_variables.union(rrm_variables)

    # --- 3. Total Variable Count Validation ---
    # The final integrated model should have exactly 15 variables (10 CIM + 5 RRM).
    # Equation/Rule: |V_IM| = 15
    if len(integrated_variables) != 15:
        raise ValueError(f"Expected 15 total variables for the integrated model, but found {len(integrated_variables)}.")

    # Return the unified set as a canonically sorted list for deterministic ordering.
    return sorted(list(integrated_variables))

# =============================================================================
# Task 13, Step 2 & 3: Constraint Set Integration
# =============================================================================

def construct_integrated_constraint_set(
    final_cim_constraints: List[Dict[str, Any]],
    structured_rrm_system: List[Dict[str, List[str]]],
    integrated_variables: List[str]
) -> List[Dict[str, Any]]:
    """
    Constructs the final constraint set for the Integrated Model.

    This function executes Steps 2 and 3 of the integration. It performs:
    1.  **Union**: Combines the constraint sets from the CIM and RRM.
    2.  **Addition**: Transcribes and adds the 3 expert-defined cross-model
        constraints from Table 7.
    3.  **Validation**: Ensures the final constraint count is correct (22) and
        all variable references are valid.

    Args:
        final_cim_constraints (List[Dict[str, Any]]): The final 14 expert-defined
            constraints for the CIM.
        structured_rrm_system (List[Dict[str, List[str]]]): The 5 structured
            qualitative equations for the RRM.
        integrated_variables (List[str]): The unified list of all 15 model
            variables, used for validation.

    Returns:
        List[Dict[str, Any]]: The final, complete list of 22 constraints for
                              the Integrated Model.
    """
    # --- Step 2: Constraint Set Union Operation ---
    # Equation/Rule: C_base = C_CIM ∪ C_RRM
    # Start with the 14 CIM constraints.
    base_constraints = final_cim_constraints.copy()

    # Add the 5 RRM constraints, maintaining a consistent dictionary structure.
    for rrm_eq in structured_rrm_system:
        base_constraints.append({
            'type': 'RRM_EQUATION',
            'equation': rrm_eq, # Store the structured equation
            'source': 'rrm_translation'
        })

    # Validate the count after the union.
    if len(base_constraints) != 19: # 14 CIM + 5 RRM
        raise RuntimeError(f"Expected 19 base constraints after union, but found {len(base_constraints)}.")

    # --- Step 3: Integration Constraint Addition ---
    # This is a direct, high-fidelity transcription of Table 7 from the paper.
    # Note on σ(X,Y) notation: We interpret this as Y=f(X).
    # "σ+- Z2 REP" means REP=f(Z2), an increase in Z2 has a supporting, decelerating effect on REP.
    integration_constraints = [
        # Row 1: σ+- Z2 REP
        {'type': 'SHAPE', 'shape': '+-', 'variables': ('Z2', 'REP'), 'source': 'integration_expert'},
        # Row 2: σ-- Z1 UND
        {'type': 'SHAPE', 'shape': '--', 'variables': ('Z1', 'UND'), 'source': 'integration_expert'},
        # Row 3: RED W REP
        {'type': 'RED', 'variables': ('W', 'REP'), 'source': 'integration_expert'},
    ]

    # --- Validation of Integration Constraints ---
    integrated_vars_set = set(integrated_variables)
    for const in integration_constraints:
        for var in const['variables']:
            if var not in integrated_vars_set:
                raise ValueError(f"Integration constraint references an unknown variable: '{var}' in {const}")
        # Canonically sort variable tuples for RED/SUP types for consistency.
        if const['type'] in ['RED', 'SUP']:
            const['variables'] = tuple(sorted(const['variables']))

    # Combine the base set with the new integration constraints.
    final_integrated_constraints = base_constraints + integration_constraints

    # Final validation of the total constraint count.
    # Equation/Rule: |C_IM| = 14 + 5 + 3 = 22
    if len(final_integrated_constraints) != 22:
        raise RuntimeError(f"Expected 22 total integrated constraints, but found {len(final_integrated_constraints)}.")

    return final_integrated_constraints

# =============================================================================
# Task 13: Orchestrator Function
# =============================================================================

def integrate_cim_rrm_namespaces(
    final_cim_constraints: List[Dict[str, Any]],
    structured_rrm_system: List[Dict[str, List[str]]],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the complete integration of the CIM and RRM namespaces.

    This function serves as the main entry point for Task 13. It combines the
    variables and constraints from the two sub-models and adds the crucial
    cross-model constraints that link their dynamics.

    Args:
        final_cim_constraints (List[Dict[str, Any]]): The final 14 constraints
            for the CIM sub-model.
        structured_rrm_system (List[Dict[str, List[str]]]): The 5 structured
            equations for the RRM sub-model.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary, used to retrieve variable names.

    Returns:
        Dict[str, Any]: A report containing the unified variable list and the
                        final, complete set of 22 constraints for the
                        Integrated Model.
    """
    final_report = {
        "task_name": "Task 13: CIM-RRM Variable Namespace Integration",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Step 1: Combined Variable Set Construction ---
        # Retrieve the variable sets from the configuration.
        cim_variables = {
            'UND', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'REP', 'BOO', 'ROA', 'PRI'
        }
        rrm_variables = set(_get_nested_param(
            master_input_specification,
            'empirical_data.rrm_system.state_variables'
        ).keys())

        # Construct and validate the unified namespace.
        integrated_variables = construct_integrated_variable_namespace(
            cim_variables=cim_variables,
            rrm_variables=rrm_variables
        )
        final_report["outputs"]["integrated_variables"] = integrated_variables

        # --- Step 2 & 3: Constraint Set Integration ---
        # Combine the sub-model constraints and add the integration constraints.
        integrated_constraints = construct_integrated_constraint_set(
            final_cim_constraints=final_cim_constraints,
            structured_rrm_system=structured_rrm_system,
            integrated_variables=integrated_variables
        )
        final_report["outputs"]["integrated_constraints"] = integrated_constraints

        final_report["summary_message"] = (
            f"Successfully integrated namespaces, creating a model with "
            f"{len(integrated_variables)} variables and {len(integrated_constraints)} constraints."
        )

    except (TypeError, ValueError, KeyError, RuntimeError) as e:
        # Catch any failure during the integration process.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Namespace integration failed: {e}"

    return final_report


In [None]:
# Task 14: Integrated Model CSP Formulation and Solution

# =============================================================================
# Task 14, Step 1 & 2: Integrated CSP Formulation and Solution
# =============================================================================

def formulate_and_solve_integrated_csp(
    integrated_variables: List[str],
    integrated_constraints: List[Dict[str, Any]]
) -> List[Dict[str, Tuple]]:
    """
    Formulates and solves the complete Integrated Model CSP.

    This function executes Steps 1 and 2. It constructs the final, large-scale
    CSP with all 15 variables and 22 constraints (CIM, RRM, and integration).
    It then calls the solver to perform an exhaustive search for all valid
    scenarios.

    Args:
        integrated_variables (List[str]): The unified and sorted list of all 15
                                          CIM and RRM variable names.
        integrated_constraints (List[Dict[str, Any]]): The complete list of 22
            constraints governing the integrated model.

    Returns:
        List[Dict[str, Tuple]]: A list of all valid integrated scenarios,
                                canonically sorted for deterministic output.

    Raises:
        ValueError: If an unknown constraint type is encountered.
        RuntimeError: If the formulated CSP does not have the expected number
                      of variables or constraints.
    """
    # Initialize the CSP solver object.
    problem = Problem()

    # --- 1. Variable and Domain Definition ---
    # Define the 9-state trend triplet domain.
    derivative_domain = ['+', '0', '-']
    trend_triplet_domain = list(itertools.product(['+'], derivative_domain, derivative_domain))

    # Add all 15 integrated variables to the problem.
    for var in integrated_variables:
        problem.addVariable(var, trend_triplet_domain)

    # --- 2. Constraint Encoding and Addition ---
    # Define the logic for all supported constraint types.
    sup_constraint = lambda v1, v2: not (v1[1] == '+' and v2[1] == '-')
    red_constraint = lambda v1, v2: not (v1[1] == '+' and v2[1] == '+')
    shape_constraints = {
        '+-': lambda x, y: not (x[1] == '+' and not (y[1] == '+' and y[2] == '-')),
        '--': lambda x, y: not (x[1] == '+' and not (y[1] == '-' and y[2] == '-')),
    }

    # Separate RRM variables to define the conservation constraint's scope.
    rrm_variables = [v for v in integrated_variables if v in {'X', 'Y', 'W', 'Z1', 'Z2'}]
    problem.addConstraint(PopulationConservationConstraint(rrm_variables), rrm_variables)

    # Iterate through the master list of 22 constraints and add them.
    for const in integrated_constraints:
        const_type = const['type']

        if const_type in ['SUP', 'RED']:
            v1, v2 = const['variables']
            logic = sup_constraint if const_type == 'SUP' else red_constraint
            problem.addConstraint(logic, (v1, v2))

        elif const_type == 'SHAPE':
            shape_type = const['shape']
            v1, v2 = const['variables'] # Note: Order matters here, Y(X) -> (X, Y)
            if shape_type not in shape_constraints:
                raise ValueError(f"Unsupported SHAPE type: '{shape_type}'")
            problem.addConstraint(shape_constraints[shape_type], (v1, v2))

        elif const_type == 'RRM_EQUATION':
            eq_dict = const['equation']
            eq_constraint = QualitativeEquationConstraint(eq_dict['LHS'], eq_dict['RHS'], rrm_variables)
            problem.addConstraint(eq_constraint, eq_constraint._variables)

        else:
            # This handles any unexpected constraint types from the CIM list.
            raise ValueError(f"Unknown constraint type '{const_type}' in integrated set.")

    # --- 3. Final Formulation Validation ---
    # Sanity check the constructed problem before solving.
    num_vars = len(problem.getVariables())
    num_constraints = len(problem.getConstraints())
    # Expected: 14 CIM + 5 RRM eq + 3 integration + 1 conservation = 23 constraints
    # The paper's duplicate SUP constraint means 13 unique CIM constraints.
    # 13 CIM + 5 RRM eq + 3 integration + 1 conservation = 22 constraints.
    if num_vars != 15 or num_constraints != 22:
         raise RuntimeError(
            f"Integrated CSP formulation mismatch. Expected 15 vars and 22 constraints, "
            f"but found {num_vars} vars and {num_constraints} constraints."
        )

    # --- 4. Integrated Scenario Generation ---
    # Execute the backtracking search to find all solutions.
    solutions = problem.getSolutions()

    # Sort the solutions for a deterministic, canonical output order.
    sorted_solutions = sorted(
        solutions,
        key=lambda s: tuple(s[var] for var in integrated_variables)
    )

    return sorted_solutions

# =============================================================================
# Task 14, Step 3: Integration Result Validation
# =============================================================================

def validate_integrated_scenarios(
    scenarios: List[Dict[str, Tuple]],
    expected_count: int
) -> Dict[str, Any]:
    """
    Validates the generated integrated scenarios against the paper's results.

    This function executes Step 3. It performs two critical validations:
    1.  **Scenario Count**: Verifies the number of scenarios is exactly 14.
    2.  **Variable Grouping**: Confirms that variables within specified groups
        exhibit identical trend behavior across all 14 scenarios.

    Args:
        scenarios (List[Dict[str, Tuple]]): The list of generated integrated scenarios.
        expected_count (int): The exact number of scenarios expected (14).

    Returns:
        Dict[str, Any]: A report detailing the outcome of all validation checks.

    Raises:
        RuntimeError: If the scenario count or grouping behavior does not match
                      the expected results from the paper.
    """
    report = {"status": "SUCCESS", "checks": {}}

    # --- 1. Scenario Count Validation ---
    # Equation/Rule: |S_IM| = 14
    actual_count = len(scenarios)
    if actual_count != expected_count:
        raise RuntimeError(
            f"Integrated Model solution FAILED. "
            f"Expected exactly {expected_count} scenarios, but found {actual_count}."
        )
    report["checks"]["scenario_count"] = {
        "status": "SUCCESS", "expected": expected_count, "actual": actual_count
    }

    # --- 2. Variable Grouping Analysis ---
    # Define the variable groups as specified in the paper's analysis.
    group1 = {'REP', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'BOO', 'PRI'}
    group2 = {'UND', 'ROA'}

    for i, scenario in enumerate(scenarios):
        # For each scenario, check that all variables in a group have the same triplet.
        # Group 1 check:
        group1_triplets = {scenario[var] for var in group1}
        if len(group1_triplets) != 1:
            raise RuntimeError(
                f"Variable grouping validation FAILED in scenario {i+1}. "
                f"Variables in Group 1 {group1} do not have identical triplets. Found: {group1_triplets}"
            )

        # Group 2 check:
        group2_triplets = {scenario[var] for var in group2}
        if len(group2_triplets) != 1:
            raise RuntimeError(
                f"Variable grouping validation FAILED in scenario {i+1}. "
                f"Variables in Group 2 {group2} do not have identical triplets. Found: {group2_triplets}"
            )

    report["checks"]["variable_grouping"] = {
        "status": "SUCCESS", "message": "Grouping behavior for Group 1 and Group 2 confirmed across all scenarios."
    }

    return report

# =============================================================================
# Task 14: Orchestrator Function
# =============================================================================

def formulate_and_solve_integrated_model(
    integrated_variables: List[str],
    integrated_constraints: List[Dict[str, Any]],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the formulation, solution, and validation of the Integrated Model.

    This function serves as the main entry point for Task 14. It constructs and
    solves the final, large-scale CSP and rigorously validates the results
    against the paper's key findings.

    Args:
        integrated_variables (List[str]): The unified list of all 15 model variables.
        integrated_constraints (List[Dict[str, Any]]): The complete list of 22
            constraints for the Integrated Model.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary, used to retrieve expected counts.

    Returns:
        Dict[str, Any]: A report containing the final list of 14 validated
                        integrated scenarios and the results of the validation checks.
    """
    final_report = {
        "task_name": "Task 14: Integrated Model CSP Formulation and Solution",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Step 1 & 2: Formulate and Solve the Integrated CSP ---
        scenarios = formulate_and_solve_integrated_csp(
            integrated_variables=integrated_variables,
            integrated_constraints=integrated_constraints
        )
        final_report["outputs"]["integrated_scenarios"] = scenarios

        # --- Step 3: Validate the Results ---
        expected_count = _get_nested_param(
            master_input_specification,
            'scenario_generation.expected_solution_counts.im_scenarios'
        )
        validation_report = validate_integrated_scenarios(
            scenarios=scenarios,
            expected_count=expected_count
        )
        final_report["outputs"]["validation_report"] = validation_report

        final_report["summary_message"] = (
            f"Successfully generated and validated {len(scenarios)} integrated scenarios, "
            "matching the expected count and variable grouping behavior."
        )

    except (TypeError, ValueError, RuntimeError, KeyError) as e:
        # Catch any failure during the process.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Integrated model solution failed: {e}"
        final_report["outputs"]["integrated_scenarios"] = []

    return final_report


In [None]:
# Task 15: Integrated Model Solution Analysis and Interpretation

# =============================================================================
# Task 15, Step 1: Scenario Structure Analysis
# =============================================================================

def structure_and_represent_integrated_scenarios(
    integrated_scenarios: List[Dict[str, Tuple]],
    variable_groups: Dict[str, Set[str]],
    representative_map: Dict[str, str]
) -> pd.DataFrame:
    """
    Transforms the raw list of integrated scenarios into a structured DataFrame.

    This function executes Step 1 of the analysis. It creates a DataFrame
    that mirrors the structure of Table 8 in the paper by:
    1. Using representative variables for the identified CIM groups.
    2. Including all individual RRM variables.
    3. Formatting the trend triplets into a clean string representation.

    Args:
        integrated_scenarios (List[Dict[str, Tuple]]): The list of 14 valid
            integrated scenarios from the CSP solver.
        variable_groups (Dict[str, Set[str]]): A dictionary defining the
            groups of variables with identical behavior (e.g.,
            {'group1': {'REP', 'AGE', ...}}).
        representative_map (Dict[str, str]): A dictionary mapping a group name
            to its chosen representative variable (e.g., {'group1': 'REP'}).

    Returns:
        pd.DataFrame: A DataFrame where each row is a scenario and columns
                      are the representative and RRM variables, indexed from 1 to N.
    """
    # --- Input Validation ---
    if not integrated_scenarios:
        return pd.DataFrame() # Return empty DataFrame if there are no scenarios

    # --- Data Extraction and Formatting ---
    analysis_data = []

    # Define the columns for the final DataFrame based on representatives and RRM vars.
    rrm_vars = sorted([var for group in variable_groups.values() for var in group if var in {'X', 'Y', 'W', 'Z1', 'Z2'}])
    cim_reps = sorted(representative_map.values())

    # Get all RRM variables by finding the intersection with the known set.
    all_vars_in_scenarios = set(integrated_scenarios[0].keys())
    rrm_vars_present = sorted(list(all_vars_in_scenarios.intersection({'X', 'Y', 'W', 'Z1', 'Z2'})))

    # Define the final column order for the DataFrame.
    final_columns = cim_reps + rrm_vars_present

    for i, scenario in enumerate(integrated_scenarios):
        row_data = {}
        # Extract the triplet for each representative variable.
        for rep_var in cim_reps:
            row_data[rep_var] = "".join(scenario[rep_var])

        # Extract the triplet for each RRM variable.
        for rrm_var in rrm_vars_present:
            row_data[rrm_var] = "".join(scenario[rrm_var])

        analysis_data.append(row_data)

    # --- DataFrame Construction ---
    # Create the DataFrame from the processed data.
    analysis_df = pd.DataFrame(analysis_data, columns=final_columns)

    # Set the index to be scenario numbers (1-based).
    analysis_df.index = pd.RangeIndex(start=1, stop=len(analysis_df) + 1, name="Scenario No.")

    return analysis_df

# =============================================================================
# Task 15, Step 2: Solution Space Reduction Analysis
# =============================================================================

def analyze_solution_space_reduction(
    num_cim_scenarios: int,
    num_rrm_scenarios: int,
    num_im_scenarios: int
) -> Dict[str, Any]:
    """
    Quantifies the pruning effect of the model integration.

    This function executes Step 2 of the analysis. It calculates the theoretical
    maximum number of scenarios and compares it to the actual number found,
    computing the solution space reduction factor.

    Args:
        num_cim_scenarios (int): The number of valid scenarios for the CIM (7).
        num_rrm_scenarios (int): The number of valid scenarios for the RRM (211).
        num_im_scenarios (int): The number of valid scenarios for the Integrated
                                Model (14).

    Returns:
        Dict[str, Any]: A dictionary containing the analysis metrics.
    """
    # --- Input Validation ---
    if not all(isinstance(n, int) and n >= 0 for n in [num_cim_scenarios, num_rrm_scenarios, num_im_scenarios]):
        raise TypeError("Scenario counts must be non-negative integers.")

    # --- 1. Theoretical Maximum Calculation ---
    # The theoretical space is the Cartesian product of the two sub-models' solution spaces.
    # Equation: Theoretical Max = |S_CIM| * |S_RRM|
    theoretical_max = num_cim_scenarios * num_rrm_scenarios

    # --- 2. Reduction Factor Calculation ---
    # The reduction factor shows what percentage of the theoretical space remains valid.
    # Equation: Reduction Factor = |S_IM| / Theoretical Max
    if theoretical_max > 0:
        reduction_factor = num_im_scenarios / theoretical_max
    else:
        reduction_factor = 0.0 if num_im_scenarios == 0 else float('inf')

    # --- Report Generation ---
    report = {
        "cim_scenario_count": num_cim_scenarios,
        "rrm_scenario_count": num_rrm_scenarios,
        "theoretical_max_scenarios": theoretical_max,
        "actual_integrated_scenarios": num_im_scenarios,
        "reduction_factor": reduction_factor,
        "retained_percentage": f"{reduction_factor:.2%}"
    }

    return report

# =============================================================================
# Task 15: Orchestrator Function
# =============================================================================

def analyze_and_interpret_integrated_solutions(
    integrated_scenarios: List[Dict[str, Tuple]],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the analysis and interpretation of the integrated model's solutions.

    This function serves as the main entry point for Task 15. It:
    1. Structures the raw scenario solutions into a representative DataFrame.
    2. Analyzes the dramatic reduction in the solution space due to integration.
    3. Performs high-level validation of the economic interpretations.

    Args:
        integrated_scenarios (List[Dict[str, Tuple]]): The final 14 validated
            scenarios for the Integrated Model.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary.

    Returns:
        Dict[str, Any]: A report containing the analysis DataFrame and metrics.
    """
    final_report = {
        "task_name": "Task 15: Integrated Model Solution Analysis and Interpretation",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Step 1: Scenario Structure Analysis ---
        # Define variable groups and representatives as specified in the paper.
        variable_groups = {
            'group1': {'REP', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'BOO', 'PRI'},
            'group2': {'UND', 'ROA'}
        }
        representative_map = {'group1': 'REP', 'group2': 'ROA'}

        # Create the analysis DataFrame.
        analysis_df = structure_and_represent_integrated_scenarios(
            integrated_scenarios=integrated_scenarios,
            variable_groups=variable_groups,
            representative_map=representative_map
        )
        final_report["outputs"]["analysis_dataframe"] = analysis_df

        # --- Step 2: Solution Space Reduction Analysis ---
        # Retrieve the required scenario counts from the configuration.
        num_cim = _get_nested_param(master_input_specification, 'scenario_generation.expected_solution_counts.cim_scenarios')
        num_rrm = _get_nested_param(master_input_specification, 'scenario_generation.expected_solution_counts.rrm_scenarios')
        num_im = len(integrated_scenarios)

        # Calculate and store the reduction metrics.
        reduction_report = analyze_solution_space_reduction(num_cim, num_rrm, num_im)
        final_report["outputs"]["solution_space_reduction"] = reduction_report

        # --- Step 3: Economic Interpretation Validation (Programmatic Check) ---
        # This is a simple check to verify a key claim from the paper's analysis.
        # Claim: Scenarios with REP=(+,+,+) have ROA=(+,-,-).
        scenarios_with_optimal_rep = analysis_df[analysis_df['REP'] == '+++']
        is_claim_valid = (scenarios_with_optimal_rep['ROA'] == '+--').all()

        final_report["outputs"]["economic_interpretation_validation"] = {
            "claim": "In scenarios where REP is optimal ('+++'), ROA is pessimal ('+--').",
            "is_claim_valid": bool(is_claim_valid),
            "scenarios_checked": len(scenarios_with_optimal_rep)
        }
        if not is_claim_valid:
            final_report["overall_status"] = "WARNING"
            final_report["summary_message"] = "Economic interpretation check failed to validate a key claim from the paper."
        else:
            final_report["summary_message"] = "Successfully analyzed scenarios, quantified solution space reduction, and validated key economic interpretations."

    except (TypeError, ValueError, KeyError) as e:
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Solution analysis failed: {e}"

    return final_report


In [None]:
# Task 16: Transition Rule Implementation and Graph Node Definition

# =============================================================================
# Task 16, Step 1: Transition Rule Table Implementation
# =============================================================================

def implement_transition_rules() -> Dict[Tuple[str, str, str], List[Tuple[str, str, str]]]:
    """
    Implements the transition rules from Table 2 as a computable lookup dictionary.

    This function executes Step 1 by performing a high-fidelity transcription
    of the 9 transition rules for a positively valued variable into a Python
    dictionary. This structure provides an efficient (O(1) average time) lookup
    for determining the valid next states from any given state.

    Returns:
        Dict[Tuple, List[Tuple]]: A dictionary where keys are "from" state
                                  triplets and values are lists of valid "to"
                                  state triplets.
    """
    # This is a direct transcription of Table 2 from the paper.
    # Keys are the 'From' states, values are a list of all possible 'To' states.
    # Tuples are used for keys to ensure hashability.
    transition_rule_map = {
        # Rule 1: +++ -> ++0
        ('+', '+', '+'): [('+', '+', '0')],
        # Rule 2: ++0 -> +++, ++-
        ('+', '+', '0'): [('+', '+', '+'), ('+', '+', '-')],
        # Rule 3: ++- -> ++0, +0-, +00
        ('+', '+', '-'): [('+', '+', '0'), ('+', '0', '-'), ('+', '0', '0')],
        # Rule 4: +0+ -> +++
        ('+', '0', '+'): [('+', '+', '+')],
        # Rule 5: +00 -> +++, +--
        ('+', '0', '0'): [('+', '+', '+'), ('+', '-', '-')],
        # Rule 6: +0- -> +--
        ('+', '0', '-'): [('+', '-', '-')],
        # Rule 7: +-+ -> +-0, +0+, +00
        ('+', '-', '+'): [('+', '-', '0'), ('+', '0', '+'), ('+', '0', '0')],
        # Rule 8: +-0 -> +-+, +--
        ('+', '-', '0'): [('+', '-', '+'), ('+', '-', '-')],
        # Rule 9: +-- -> +-0
        ('+', '-', '-'): [('+', '-', '0')],
    }

    # --- Validation ---
    # Ensure all 9 unique trend triplets with non-zero derivatives are included.
    if len(transition_rule_map) != 9:
        raise ValueError(f"Transition rule map is incomplete. Expected 9 rules, found {len(transition_rule_map)}.")

    return transition_rule_map

# =============================================================================
# Task 16, Step 2 & 3: Graph Node Definition and Transition Identification
# =============================================================================

def define_graph_nodes_and_identify_transitions(
    integrated_scenarios: List[Dict[str, Tuple]],
    transition_rules: Dict[Tuple[str, str, str], List[Tuple[str, str, str]]]
) -> Tuple[nx.DiGraph, List[Tuple[int, int]]]:
    """
    Defines graph nodes from scenarios and identifies all valid transitions.

    This function executes Steps 2 and 3. It first creates a directed graph
    and populates it with nodes, where each node represents one of the 14
    integrated scenarios. It then exhaustively checks every possible pair of
    scenarios to determine if a valid transition exists between them according
    to the provided rules.

    Args:
        integrated_scenarios (List[Dict[str, Tuple]]): The list of 14 valid
            integrated scenarios.
        transition_rules (Dict[Tuple, List[Tuple]]): The computable lookup map
            of transition rules.

    Returns:
        Tuple[nx.DiGraph, List[Tuple[int, int]]]: A tuple containing:
            - nx.DiGraph: The graph object populated with 14 nodes, each
              containing its full scenario data as an attribute.
            - List[Tuple[int, int]]: A list of all valid directed edges,
              represented as (source_node_id, target_node_id) tuples.
    """
    # --- Step 2: Graph Node Construction from Scenarios ---
    # Initialize a directed graph object using networkx.
    graph = nx.DiGraph()

    # The paper uses 1-based indexing for scenarios.
    for i, scenario_data in enumerate(integrated_scenarios):
        scenario_id = i + 1
        # Add a node to the graph for each scenario.
        # The full scenario dictionary is stored as a node attribute for later analysis.
        graph.add_node(scenario_id, scenario_data=scenario_data)

    # --- Step 3: Valid Transition Identification ---
    # This is the core algorithm for determining the graph's structure (edges).
    valid_edges = []

    # Get the list of all variable names from the first scenario.
    all_variables = sorted(integrated_scenarios[0].keys())

    # Get the list of all node IDs (1 to 14).
    node_ids = list(graph.nodes)

    # Iterate through all ordered pairs of nodes (scenarios) to check for transitions.
    for source_id in node_ids:
        for target_id in node_ids:
            # A transition to oneself is not considered in this model's dynamics.
            if source_id == target_id:
                continue

            # Assume the transition is valid until a violation is found.
            is_transition_globally_valid = True

            # Retrieve the full scenario data from the node attributes.
            source_scenario = graph.nodes[source_id]['scenario_data']
            target_scenario = graph.nodes[target_id]['scenario_data']

            # A transition from source to target is valid ONLY IF the transition
            # is valid for EVERY variable simultaneously.
            for var in all_variables:
                source_triplet = source_scenario[var]
                target_triplet = target_scenario[var]

                # A variable's state can remain the same.
                if source_triplet == target_triplet:
                    continue

                # Check if the target triplet is a valid evolution from the source triplet.
                allowed_transitions = transition_rules.get(source_triplet, [])
                if target_triplet not in allowed_transitions:
                    # If even one variable has an invalid transition, the entire
                    # scenario-to-scenario transition is invalid.
                    is_transition_globally_valid = False
                    # Fail fast and break the inner loop over variables.
                    break

            # If the transition was found to be valid for all variables,
            # add the corresponding edge to our list.
            if is_transition_globally_valid:
                valid_edges.append((source_id, target_id))

    return graph, valid_edges

# =============================================================================
# Task 16: Orchestrator Function
# =============================================================================

def prepare_transition_graph_components(
    integrated_scenarios: List[Dict[str, Tuple]],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the preparation of the transitional graph's components.

    This function serves as the main entry point for Task 16. It:
    1. Implements the transition rules from the paper into a computable format.
    2. Creates the graph nodes, one for each of the 14 integrated scenarios.
    3. Identifies all valid directed edges between the nodes based on the rules.

    Args:
        integrated_scenarios (List[Dict[str, Tuple]]): The final 14 validated
            scenarios for the Integrated Model.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary (used for consistency, not direct parameters here).

    Returns:
        Dict[str, Any]: A report containing the graph with nodes defined and
                        the complete list of valid edges to be added.
    """
    final_report = {
        "task_name": "Task 16: Transition Rule Implementation and Graph Node Definition",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Input Validation ---
        if not integrated_scenarios or len(integrated_scenarios) != 14:
            raise ValueError("Input must be the list of 14 integrated scenarios.")

        # --- Step 1: Implement Transition Rules ---
        transition_rules = implement_transition_rules()
        final_report["outputs"]["transition_rule_map"] = transition_rules

        # --- Step 2 & 3: Define Nodes and Identify Transitions ---
        graph_with_nodes, valid_edges = define_graph_nodes_and_identify_transitions(
            integrated_scenarios=integrated_scenarios,
            transition_rules=transition_rules
        )

        final_report["outputs"]["graph_with_nodes"] = graph_with_nodes
        final_report["outputs"]["identified_edges"] = valid_edges

        final_report["summary_message"] = (
            f"Successfully defined {len(graph_with_nodes.nodes)} graph nodes and "
            f"identified {len(valid_edges)} potential transitions."
        )

    except (TypeError, ValueError, KeyError) as e:
        # Catch any failure during the process.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Graph component preparation failed: {e}"

    return final_report


In [None]:
# Task 17: Directed Graph Edge Construction and Connectivity Analysis

# =============================================================================
# Task 17, Step 1, 2 & 3: Graph Construction and Analysis
# =============================================================================

def construct_and_analyze_transition_graph(
    graph_with_nodes: nx.DiGraph,
    identified_edges: List[Tuple[int, int]],
    path_cutoff: int = 5
) -> Tuple[nx.DiGraph, Dict[str, Any]]:
    """
    Constructs the full transitional graph and performs a comprehensive analysis.

    This function executes all steps of Task 17. It:
    1.  Constructs the final graph by adding the identified edges.
    2.  Performs connectivity analysis to find partitions (weakly connected
        components) and strongly connected components.
    3.  Computes other key graph properties, including cycles and terminal nodes.

    Args:
        graph_with_nodes (nx.DiGraph): The graph object from Task 16,
            populated with 14 nodes.
        identified_edges (List[Tuple[int, int]]): The list of valid directed
            edges identified in Task 16.
        path_cutoff (int): The maximum length for path enumeration to prevent
                           excessive computation time in graphs with cycles.

    Returns:
        Tuple[nx.DiGraph, Dict[str, Any]]: A tuple containing:
            - nx.DiGraph: The final, fully constructed transitional graph.
            - Dict[str, Any]: A comprehensive report detailing the graph's
              structural properties and connectivity analysis.
    """
    # --- Input Validation ---
    if not isinstance(graph_with_nodes, nx.DiGraph):
        raise TypeError("Input 'graph_with_nodes' must be a networkx.DiGraph object.")

    # Work on a copy of the graph to avoid modifying the input object.
    graph = graph_with_nodes.copy()

    # --- Step 1: Edge Set Construction ---
    # Add all the valid, pre-identified edges to the graph in one operation.
    graph.add_edges_from(identified_edges)

    # --- Initialize the analysis report ---
    analysis_report = {
        "node_count": graph.number_of_nodes(),
        "edge_count": graph.number_of_edges(),
        "connectivity": {},
        "properties": {}
    }

    # --- Step 2: Graph Connectivity Analysis ---
    # Find partitions, which are the weakly connected components. These are
    # subgraphs where a path exists between any two nodes, ignoring edge direction.
    partitions = [
        sorted(list(component))
        for component in nx.weakly_connected_components(graph)
    ]
    analysis_report["connectivity"]["partitions"] = sorted(partitions)

    # Find strongly connected components (SCCs). These are subgraphs where every
    # node is reachable from every other node within that component.
    sccs = [
        sorted(list(component))
        for component in nx.strongly_connected_components(graph)
    ]
    analysis_report["connectivity"]["strongly_connected_components"] = sorted(sccs)

    # Perform a full reachability analysis (can be slow for large graphs).
    # For this small graph, we can build a full reachability map.
    reachability_map = {
        node: list(nx.descendants(graph, node))
        for node in graph.nodes()
    }
    analysis_report["connectivity"]["reachability_map"] = reachability_map

    # --- Step 3: Graph Property Computation ---
    # Find all simple cycles in the graph. A simple cycle has no repeated nodes.
    cycles = [
        cycle for cycle in nx.simple_cycles(graph)
    ]
    analysis_report["properties"]["simple_cycles"] = cycles

    # Identify terminal (sink) nodes, which are states with no exit transitions.
    # A node is a terminal node if its out-degree is 0.
    terminal_nodes = [
        node for node in graph.nodes() if graph.out_degree(node) == 0
    ]
    analysis_report["properties"]["terminal_nodes"] = terminal_nodes

    # Path enumeration is computationally expensive and is omitted from the main
    # report but could be performed on-demand for specific source-target pairs.
    analysis_report["properties"]["path_enumeration_note"] = (
        f"Full path enumeration not performed by default. Use nx.all_simple_paths "
        f"with a cutoff (e.g., {path_cutoff}) for specific queries."
    )

    return graph, analysis_report

# =============================================================================
# Task 17: Orchestrator Function
# =============================================================================

def build_and_analyze_transition_graph(
    graph_components: Dict[str, Any],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the construction and analysis of the final transitional graph.

    This function serves as the main entry point for Task 17. It takes the
    outputs of Task 16 (the graph with nodes and the list of valid edges)
    and performs the final assembly and a deep structural analysis.

    Args:
        graph_components (Dict[str, Any]): The output dictionary from Task 16,
            containing 'graph_with_nodes' and 'identified_edges'.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary (unused here, for API consistency).

    Returns:
        Dict[str, Any]: A report containing the final, fully constructed
                        `networkx.DiGraph` object and a detailed analysis of
                        its structural properties.
    """
    final_report = {
        "task_name": "Task 17: Directed Graph Edge Construction and Connectivity Analysis",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Input Retrieval ---
        # Retrieve the necessary components from the previous task's output.
        graph_with_nodes = graph_components.get("graph_with_nodes")
        identified_edges = graph_components.get("identified_edges")

        # --- Input Validation ---
        if not isinstance(graph_with_nodes, nx.DiGraph) or identified_edges is None:
            raise ValueError("Input 'graph_components' is missing required keys or has incorrect types.")

        # --- Execute Construction and Analysis ---
        final_graph, analysis_report = construct_and_analyze_transition_graph(
            graph_with_nodes=graph_with_nodes,
            identified_edges=identified_edges
        )

        # --- Populate Final Report ---
        final_report["outputs"]["final_transition_graph"] = final_graph
        final_report["outputs"]["graph_analysis_report"] = analysis_report

        final_report["summary_message"] = (
            f"Successfully constructed and analyzed the transition graph. "
            f"Found {len(analysis_report['connectivity']['partitions'])} partition(s)."
        )

    except (TypeError, ValueError, KeyError) as e:
        # Catch any failure during the process.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Graph construction and analysis failed: {e}"

    return final_report


In [None]:
# Task 18: Graph Visualization and Structural Analysis

# =============================================================================
# Task 18, Step 1: Partition Structure Validation
# =============================================================================

def validate_graph_partition_structure(
    graph_analysis_report: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Validates the partition structure of the graph against the paper's claims.

    This function executes Step 1 of the task. It rigorously checks that the
    computed partitions (weakly connected components) of the transitional graph
    exactly match the two disconnected subgraphs described in the paper.

    Args:
        graph_analysis_report (Dict[str, Any]): The analysis report from Task 17,
            containing the computed partitions.

    Returns:
        Dict[str, Any]: A report confirming the validation status.

    Raises:
        RuntimeError: If the partition structure does not exactly match the
                      expected structure.
    """
    # --- 1. Define the Expected Structure ---
    # As per the paper's analysis, there should be two distinct partitions.
    expected_partition_1 = {1, 2, 3, 4, 5}
    expected_partition_2 = {6, 7, 8, 9, 10, 11, 12, 13, 14}
    expected_partitions_set = {
        frozenset(expected_partition_1),
        frozenset(expected_partition_2)
    }

    # --- 2. Retrieve and Check Partition Count ---
    # Retrieve the computed partitions from the analysis report.
    computed_partitions = graph_analysis_report.get("connectivity", {}).get("partitions", [])

    # The number of partitions must be exactly 2.
    if len(computed_partitions) != 2:
        raise RuntimeError(
            f"Partition validation FAILED. Expected 2 partitions, but found {len(computed_partitions)}."
        )

    # --- 3. Check Partition Membership ---
    # Convert the computed partitions to a set of frozensets for order-agnostic comparison.
    computed_partitions_set = {frozenset(p) for p in computed_partitions}

    # The set of computed partitions must be identical to the set of expected partitions.
    if computed_partitions_set != expected_partitions_set:
        raise RuntimeError(
            f"Partition validation FAILED. The membership of the computed partitions "
            f"does not match the expected structure. "
            f"Expected: {expected_partitions_set}, Found: {computed_partitions_set}"
        )

    # --- 4. Return Success Report ---
    report = {
        "status": "SUCCESS",
        "message": "Graph partition structure successfully validated against the paper's specification.",
        "validated_partitions": computed_partitions
    }
    return report

# =============================================================================
# Task 18, Step 2: Graph Visualization
# =============================================================================

def visualize_transition_graph(
    graph: nx.DiGraph,
    title: str = "Transitional Graph of the Integrated Model"
) -> plt.Figure:
    """
    Generates a high-quality visualization of the transitional graph.

    This function executes Step 2. It creates a visual representation of the
    graph that is a faithful replica of Figure 2 from the paper. This is
    achieved by manually specifying the layout positions of each node to
    ensure a clear, hierarchical, and partitioned structure.

    Args:
        graph (nx.DiGraph): The final, fully constructed transitional graph.
        title (str): The title for the plot.

    Returns:
        plt.Figure: The matplotlib Figure object containing the rendered graph.
    """
    # --- 1. Define the Custom Layout ---
    # These positions are manually defined to replicate Figure 2.
    # The layout clearly separates the two partitions.
    pos = {
        1: (0, 2), 2: (2, 3), 3: (4, 2), 4: (2, 2), 5: (2, 1),
        6: (6, 2), 7: (8, 3), 8: (8, 1), 9: (6, 0), 10: (8, 2),
        11: (10, 3), 12: (8, 0), 13: (10, 1), 14: (10, 2)
    }

    # --- 2. Create the Plot ---
    # Initialize a matplotlib figure with a specific size for good aspect ratio.
    fig, ax = plt.subplots(figsize=(12, 7))

    # --- 3. Define Professional Styling Parameters ---
    node_style = {
        "node_size": 1200,
        "node_color": "white",
        "edgecolors": "black",
        "linewidths": 1.5
    }
    label_style = {
        "font_size": 12,
        "font_family": "serif",
        "font_weight": "bold"
    }
    edge_style = {
        "width": 1.5,
        "arrowstyle": "-|>",
        "arrowsize": 15,
        "edge_color": "black",
        "node_size": 1200 # To ensure arrows stop at the node edge
    }

    # --- 4. Draw the Graph Components ---
    # Draw the nodes.
    nx.draw_networkx_nodes(graph, pos, ax=ax, **node_style)

    # Draw the edges with arrows.
    nx.draw_networkx_edges(graph, pos, ax=ax, **edge_style)

    # Draw the node labels.
    nx.draw_networkx_labels(graph, pos, ax=ax, **label_style)

    # --- 5. Finalize the Plot ---
    # Set the title and remove the axes for a clean, publication-quality look.
    ax.set_title(title, fontsize=16, fontweight='bold', family='serif')
    ax.axis('off')
    plt.tight_layout()

    return fig

# =============================================================================
# Task 18: Orchestrator Function
# =============================================================================

def analyze_and_visualize_graph_structure(
    graph_analysis_results: Dict[str, Any],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the final validation and visualization of the transition graph.

    This function serves as the main entry point for Task 18. It:
    1.  Validates that the computed graph structure matches the paper's
        description of two disconnected partitions.
    2.  Generates a high-quality visualization replicating Figure 2.
    3.  Compiles a final summary of the graph's key structural properties.

    Args:
        graph_analysis_results (Dict[str, Any]): The output from Task 17,
            containing the final graph and its analysis report.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary.

    Returns:
        Dict[str, Any]: A report containing the validation status, the
                        matplotlib Figure object of the visualization, and a
                        final structural summary.
    """
    final_report = {
        "task_name": "Task 18: Graph Visualization and Structural Analysis",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Input Retrieval ---
        final_graph = graph_analysis_results.get("outputs", {}).get("final_transition_graph")
        analysis_report = graph_analysis_results.get("outputs", {}).get("graph_analysis_report")

        # --- Input Validation ---
        if not isinstance(final_graph, nx.DiGraph) or not isinstance(analysis_report, dict):
            raise ValueError("Input is missing the final graph or its analysis report.")

        # --- Step 1: Partition Structure Validation ---
        validation_report = validate_graph_partition_structure(analysis_report)
        final_report["outputs"]["partition_validation"] = validation_report

        # --- Step 2: Graph Visualization ---
        figure = visualize_transition_graph(final_graph)
        final_report["outputs"]["graph_visualization_figure"] = figure

        # --- Step 3: Graph Analysis Documentation (Summary) ---
        # Extract key metrics from the detailed analysis for a high-level summary.
        summary = {
            "node_count": analysis_report.get("node_count"),
            "edge_count": analysis_report.get("edge_count"),
            "partition_count": len(analysis_report.get("connectivity", {}).get("partitions", [])),
            "cycle_count": len(analysis_report.get("properties", {}).get("simple_cycles", [])),
            "terminal_node_count": len(analysis_report.get("properties", {}).get("terminal_nodes", []))
        }
        final_report["outputs"]["graph_structural_summary"] = summary

        final_report["summary_message"] = "Successfully validated the graph's partition structure and generated visualization."

    except (TypeError, ValueError, KeyError, RuntimeError) as e:
        # Catch any failure during the process.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Graph structural analysis or visualization failed: {e}"

    return final_report


In [None]:
# Task 19: Decision Variable Analysis and Optimization Framework

# =============================================================================
# Task 19, Step 1 & 2: Optimization Problem Definition and Feasibility
# =============================================================================

def define_and_validate_optimization_problem(
    integrated_scenarios: List[Dict[str, Tuple]],
    target_variables: List[str],
    optimal_triplet: Tuple[str, str, str]
) -> Dict[str, Any]:
    """
    Defines the multi-objective optimization problem and validates its feasibility.

    This function executes Steps 1 and 2. It formally specifies the decision
    problem by identifying the target variables and the ideal qualitative state.
    Crucially, it programmatically verifies the paper's claim that this ideal
    state is not simultaneously achievable for all target variables, thus
    proving that a compromise is inevitable and an MCDA approach is required.

    Args:
        integrated_scenarios (List[Dict[str, Tuple]]): The list of 14 valid
            integrated scenarios.
        target_variables (List[str]): The list of variable names to be maximized.
        optimal_triplet (Tuple[str, str, str]): The trend triplet representing
            the ideal state for a maximized variable (e.g., ('+', '+', '+')).

    Returns:
        Dict[str, Any]: A report dictionary that defines the optimization
                        problem and confirms the feasibility assessment.

    Raises:
        RuntimeError: If a scenario is found where the optimal state is
                      simultaneously achieved, contradicting the paper's premise.
    """
    # --- 1. Define the Optimization Problem ---
    problem_definition = {
        "target_variables": target_variables,
        "optimization_direction": "maximization",
        "optimal_target_triplet": optimal_triplet,
        "feasibility_check": {}
    }

    # --- 2. Feasibility Assessment ---
    # This step programmatically verifies the paper's claim that no single
    # scenario is optimal for all target variables simultaneously.
    found_simultaneous_optimum = False
    for i, scenario in enumerate(integrated_scenarios):
        # Check if all target variables in this scenario match the optimal triplet.
        is_optimal_in_scenario = all(
            scenario.get(var) == optimal_triplet for var in target_variables
        )

        if is_optimal_in_scenario:
            # If such a scenario is found, it contradicts the paper's premise.
            found_simultaneous_optimum = True
            # This is a critical failure in the model replication.
            raise RuntimeError(
                f"Feasibility check FAILED. Scenario {i+1} achieves the optimal "
                f"state {optimal_triplet} for all target variables simultaneously. "
                "This contradicts the paper's premise that a compromise is necessary."
            )

    # If the loop completes without finding a simultaneous optimum, the claim is validated.
    problem_definition["feasibility_check"] = {
        "status": "CONFIRMED",
        "message": "No single scenario achieves the optimal state for all target variables. A compromise is inevitable."
    }

    return problem_definition

# =============================================================================
# Task 19, Step 3: Investment Strategy Classification
# =============================================================================

def classify_scenarios_into_strategies(
    integrated_scenarios: List[Dict[str, Tuple]],
    classification_variable: str,
    strategy_definitions: Dict[str, Tuple]
) -> Dict[int, str]:
    """
    Classifies each scenario into a predefined investment strategy.

    This function executes Step 3. It categorizes each of the 14 scenarios
    based on the qualitative behavior of a key decision variable ('REP'),
    according to the strategy definitions from the paper's analysis.

    Args:
        integrated_scenarios (List[Dict[str, Tuple]]): The list of 14 valid
            integrated scenarios.
        classification_variable (str): The name of the variable whose behavior
            determines the strategy (e.g., 'REP').
        strategy_definitions (Dict[str, Tuple]): A dictionary mapping strategy
            names to the specific trend triplet that defines them.

    Returns:
        Dict[int, str]: A dictionary mapping each scenario ID (1-14) to its
                        assigned strategy name (e.g., 'Aggressive Growth').
    """
    # --- Input Validation ---
    if not integrated_scenarios:
        return {}

    classification_map = {}

    # Iterate through each scenario using 1-based indexing.
    for i, scenario in enumerate(integrated_scenarios):
        scenario_id = i + 1

        # Get the trend triplet for the key classification variable.
        key_triplet = scenario.get(classification_variable)

        # Find which strategy this triplet corresponds to.
        assigned_strategy = "Unclassified"
        for strategy_name, defining_triplet in strategy_definitions.items():
            if key_triplet == defining_triplet:
                assigned_strategy = strategy_name
                break

        # Store the classification.
        classification_map[scenario_id] = assigned_strategy

        # Issue a warning if a scenario could not be classified.
        if assigned_strategy == "Unclassified":
            warnings.warn(
                f"Scenario {scenario_id} could not be classified into a known strategy "
                f"based on the behavior of '{classification_variable}' ({key_triplet})."
            )

    return classification_map

# =============================================================================
# Task 19: Orchestrator Function
# =============================================================================

def analyze_decision_variables_and_strategies(
    integrated_scenarios: List[Dict[str, Tuple]],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the analysis of decision variables and scenario classification.

    This function serves as the main entry point for Task 19. It:
    1.  Defines the core multi-objective decision problem and validates that
        no perfect, compromise-free solution exists.
    2.  Classifies all 14 scenarios into distinct investment strategies based
        on the behavior of the 'REP' variable.

    Args:
        integrated_scenarios (List[Dict[str, Tuple]]): The final 14 validated
            scenarios for the Integrated Model.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary.

    Returns:
        Dict[str, Any]: A report containing the formal problem definition and
                        the complete scenario-to-strategy classification map.
    """
    final_report = {
        "task_name": "Task 19: Decision Variable Analysis and Optimization Framework",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Step 1 & 2: Define and Validate the Optimization Problem ---
        # Retrieve the problem definition from the master configuration.
        target_vars = _get_nested_param(master_input_specification, 'analysis_framework.decision_analysis.target_variables.primary_variables')
        optimal_triplet_str = _get_nested_param(master_input_specification, 'analysis_framework.decision_analysis.target_variables.optimal_target_triplet')
        # Convert string '(+,+,+)' to tuple ('+', '+', '+')
        optimal_triplet = tuple(optimal_triplet_str.strip('()').split(','))

        problem_definition = define_and_validate_optimization_problem(
            integrated_scenarios=integrated_scenarios,
            target_variables=target_vars,
            optimal_triplet=optimal_triplet
        )
        final_report["outputs"]["optimization_problem_definition"] = problem_definition

        # --- Step 3: Classify Scenarios into Strategies ---
        # Define the strategies based on the paper's analysis of 'REP'.
        strategy_definitions = {
            "Aggressive Growth": ('+', '+', '+'),
            "Conservative Growth": ('+', '+', '-')
        }

        classification_map = classify_scenarios_into_strategies(
            integrated_scenarios=integrated_scenarios,
            classification_variable='REP',
            strategy_definitions=strategy_definitions
        )
        final_report["outputs"]["scenario_strategy_classification"] = classification_map

        final_report["summary_message"] = "Successfully defined the optimization problem and classified all scenarios into investment strategies."

    except (TypeError, ValueError, KeyError, RuntimeError) as e:
        # Catch any failure during the analysis.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Decision variable analysis failed: {e}"

    return final_report


In [None]:
# Task 20: Scenario Evaluation and Ranking System

# =============================================================================
# Task 20, Step 1: Multi-Criteria Scoring Implementation
# =============================================================================

def score_and_rank_scenarios(
    integrated_scenarios: List[Dict[str, Tuple]],
    target_variables: List[str],
    weights: Dict[str, float]
) -> pd.DataFrame:
    """
    Scores and ranks scenarios based on a multi-criteria scoring system.

    This function executes Step 1 of the evaluation. It translates the
    qualitative trend triplet for each target variable into a numerical score
    based on a predefined desirability scale. It then computes a total weighted
    score for each scenario and ranks them accordingly.

    Args:
        integrated_scenarios (List[Dict[str, Tuple]]): The list of 14 valid
            integrated scenarios.
        target_variables (List[str]): The list of variable names to be scored.
        weights (Dict[str, float]): A dictionary mapping each target variable
            to its weight in the total score. Weights should sum to 1.0.

    Returns:
        pd.DataFrame: A DataFrame containing the scores for each target variable
                      and the total weighted score for each scenario, indexed by
                      scenario ID and sorted by the total score.
    """
    # --- 1. Define the Triplet-to-Score Mapping ---
    # This is a direct implementation of the 9-point desirability scale for maximization.
    score_map = {
        ('+', '+', '+'): 9,  # Optimal: steep accelerating growth
        ('+', '+', '0'): 8,
        ('+', '+', '-'): 7,
        ('+', '0', '+'): 6,
        ('+', '0', '0'): 5,
        ('+', '0', '-'): 4,
        ('+', '-', '+'): 3,
        ('+', '-', '0'): 2,
        ('+', '-', '-'): 1,  # Worst: steep accelerating decline
    }

    # --- 2. Calculate Scores for Each Scenario ---
    scored_data = []
    for i, scenario in enumerate(integrated_scenarios):
        scenario_id = i + 1
        row_data = {"scenario_id": scenario_id}
        total_score = 0.0

        # Calculate the raw score for each target variable.
        for var in target_variables:
            triplet = scenario.get(var)
            # Use .get() for safe lookup, defaulting to 0 for unknown triplets.
            raw_score = score_map.get(triplet, 0)
            row_data[f"{var}_score"] = raw_score

            # Add to the total weighted score.
            # Equation: S_i = w1*Score(V1) + w2*Score(V2) + ...
            total_score += raw_score * weights.get(var, 0)

        row_data["total_score"] = total_score
        scored_data.append(row_data)

    # --- 3. Create and Format the Output DataFrame ---
    if not scored_data:
        return pd.DataFrame()

    # Create the DataFrame from the scored data.
    scores_df = pd.DataFrame(scored_data).set_index("scenario_id")

    # Sort the DataFrame by the total score in descending order to rank the scenarios.
    scores_df.sort_values(by="total_score", ascending=False, inplace=True)

    return scores_df

# =============================================================================
# Task 20, Step 2 & 3: Performance and Decision Support Analysis
# =============================================================================

def analyze_scenario_performance(
    scores_df: pd.DataFrame,
    strategy_classification: Dict[int, str]
) -> Dict[str, Any]:
    """
    Performs a detailed analysis of scenario scores and strategy performance.

    This function executes Steps 2 and 3. It augments the scoring data with
    strategy classifications and computes key decision support metrics, including:
    1.  Correlation of objective scores to quantify trade-offs.
    2.  Descriptive statistics of scores.
    3.  Aggregate performance metrics (mean, std) for each strategy.

    Args:
        scores_df (pd.DataFrame): The DataFrame of scenario scores from Step 1.
        strategy_classification (Dict[int, str]): A map of scenario ID to
            strategy name from Task 19.

    Returns:
        Dict[str, Any]: A report containing the performance analysis results.
    """
    # --- Input Validation ---
    if scores_df.empty:
        return {"status": "SKIPPED", "message": "Input scores_df is empty."}

    # --- 1. Augment DataFrame with Strategy Information ---
    # Merge the strategy classification into the scores DataFrame.
    analysis_df = scores_df.copy()
    analysis_df['strategy'] = analysis_df.index.map(strategy_classification)

    # --- 2. Scenario Performance Analysis ---
    # Get the list of score columns for analysis.
    score_columns = [col for col in analysis_df.columns if col.endswith('_score') and col != 'total_score']

    # Calculate the correlation matrix of the objective scores.
    # This numerically demonstrates the trade-offs between objectives.
    score_correlation = analysis_df[score_columns].corr()

    # Calculate descriptive statistics for each score column.
    score_distribution = analysis_df[score_columns + ['total_score']].describe()

    # --- 3. Decision Support Metrics (Strategy-level Analysis) ---
    # Group by strategy to analyze aggregate performance.
    strategy_performance = analysis_df.groupby('strategy')[score_columns + ['total_score']].agg(['mean', 'std'])

    # --- 4. Assemble the Report ---
    report = {
        "full_analysis_table": analysis_df,
        "performance_metrics": {
            "score_correlation_matrix": score_correlation,
            "score_distribution_stats": score_distribution,
        },
        "strategy_summary": {
            "performance_by_strategy": strategy_performance,
            "interpretation_note": "Higher 'mean' indicates better average performance. Higher 'std' indicates greater variability (risk) within the strategy."
        }
    }

    return report

# =============================================================================
# Task 20: Orchestrator Function
# =============================================================================

def evaluate_and_rank_scenarios(
    integrated_scenarios: List[Dict[str, Tuple]],
    strategy_classification: Dict[int, str],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the complete scenario evaluation and ranking pipeline.

    This function serves as the main entry point for Task 20. It:
    1.  Scores each of the 14 scenarios based on a multi-criteria framework.
    2.  Ranks the scenarios by overall desirability.
    3.  Performs a detailed performance analysis to quantify trade-offs and
        compare the aggregate performance of the identified investment strategies.

    Args:
        integrated_scenarios (List[Dict[str, Tuple]]): The final 14 validated
            scenarios for the Integrated Model.
        strategy_classification (Dict[int, str]): The map of scenario IDs to
            strategy names from Task 19.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary.

    Returns:
        Dict[str, Any]: A comprehensive report containing the ranked scenario
                        table and all performance analysis metrics.
    """
    final_report = {
        "task_name": "Task 20: Scenario Evaluation and Ranking System",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Configuration Retrieval ---
        # Get the target variables for the decision analysis.
        target_variables = _get_nested_param(
            master_input_specification,
            'analysis_framework.decision_analysis.target_variables.primary_variables'
        )

        # Define the weights for each objective. The paper implies equal weighting.
        weights = {var: 1.0 / len(target_variables) for var in target_variables}

        # --- Step 1: Multi-Criteria Scoring and Ranking ---
        scores_df = score_and_rank_scenarios(
            integrated_scenarios=integrated_scenarios,
            target_variables=target_variables,
            weights=weights
        )
        final_report["outputs"]["ranked_scenarios_table"] = scores_df

        # --- Step 2 & 3: Performance and Decision Support Analysis ---
        performance_report = analyze_scenario_performance(
            scores_df=scores_df,
            strategy_classification=strategy_classification
        )
        # Merge the detailed analysis results into the main report.
        final_report["outputs"].update(performance_report)

        final_report["summary_message"] = "Successfully scored, ranked, and analyzed all scenarios and strategies."

    except (TypeError, ValueError, KeyError) as e:
        # Catch any failure during the analysis.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Scenario evaluation failed: {e}"

    return final_report


In [None]:
# Task 21: Investment Strategy Recommendations and Analysis

# =============================================================================
# Task 21, Step 1, 2 & 3: Combined Strategy Analysis and Framework
# =============================================================================

def generate_strategic_recommendations(
    analysis_report: Dict[str, Any],
    graph_analysis_report: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Synthesizes all prior analyses into a final strategic decision framework.

    This function executes all steps of Task 21. It provides a detailed,
    data-driven breakdown of each investment strategy, validates the "strategy
    lock-in" phenomenon by linking classifications to the graph's partitions,
    and constructs a final, human-readable summary of the strategic choice
    as described in the paper.

    Args:
        analysis_report (Dict[str, Any]): The comprehensive analysis report
            from Task 20, containing ranked scenarios and performance metrics.
        graph_analysis_report (Dict[str, Any]): The analysis report from Task 17,
            containing the graph's partition structure.

    Returns:
        Dict[str, Any]: A dictionary containing the detailed decision framework.

    Raises:
        RuntimeError: If the programmatic validation of the paper's analytical
                      claims fails, indicating an inconsistency in the results.
    """
    # --- Input Retrieval ---
    # Extract the necessary data components from the input reports.
    try:
        full_analysis_table = analysis_report["full_analysis_table"]
        strategy_performance = analysis_report["strategy_summary"]["performance_by_strategy"]
        partitions = graph_analysis_report["connectivity"]["partitions"]
        target_variables = [col.replace('_score', '') for col in strategy_performance.columns.get_level_values(0) if '_score' in col and 'total' not in col]
    except KeyError as e:
        raise ValueError(f"Input reports are missing required data. Missing key: {e}")

    # --- 1. Strategy-Specific Analysis and Validation ---
    # This step programmatically validates the qualitative claims made in the paper.
    decision_framework = {}

    # Get the strategy classification map.
    strategy_classification = full_analysis_table['strategy'].to_dict()

    for strategy_name, group_df in full_analysis_table.groupby('strategy'):
        # Retrieve the performance metrics for this strategy.
        perf_metrics = strategy_performance.loc[strategy_name]

        # --- Validate the defining qualitative patterns for each strategy ---
        if strategy_name == "Aggressive Growth":
            # Claim: REP is (+++), ROA/UND are (+--).
            if not (group_df['REP_score'] == 9).all():
                raise RuntimeError("Validation failed: Not all 'Aggressive' scenarios have REP score of 9 (+++).")
            if not (group_df['ROA_score'] == 1).all() or not (group_df['UND_score'] == 1).all():
                 raise RuntimeError("Validation failed: Not all 'Aggressive' scenarios have ROA/UND score of 1 (+--).")

        elif strategy_name == "Conservative Growth":
            # Claim: REP is (++-), ROA/UND are (+-+).
            if not (group_df['REP_score'] == 7).all():
                raise RuntimeError("Validation failed: Not all 'Conservative' scenarios have REP score of 7 (++-).")
            if not (group_df['ROA_score'] == 3).all() or not (group_df['UND_score'] == 3).all():
                 raise RuntimeError("Validation failed: Not all 'Conservative' scenarios have ROA/UND score of 3 (+-+).")

        # --- 3. Construct the Decision Framework Entry ---
        decision_framework[strategy_name] = {
            "associated_scenarios": sorted(group_df.index.tolist()),
            "primary_characteristic": f"Qualitative state of 'REP' is consistently '{strategy_performance.index.name}'.",
            "key_trade_off": f"Performance of '{target_variables[1]}' and '{target_variables[2]}' is directly opposed to 'REP'.",
            "quantitative_profile": {
                "average_total_score": perf_metrics[('total_score', 'mean')],
                "score_variability (std_dev)": perf_metrics[('total_score', 'std')],
                "interpretation": "Represents the strategy's average desirability and consistency."
            },
            "summary": f"A {'high-reward, high-trade-off' if strategy_name == 'Aggressive Growth' else 'balanced, moderate-growth'} strategy."
        }

    # --- 2. Reachability-Constrained Optimization (Strategy Lock-in) ---
    # This step validates that the strategies map perfectly to the graph's disconnected partitions.
    for partition in partitions:
        # Get the strategy of the first node in the partition.
        partition_strategy = strategy_classification.get(partition[0])
        # Check that all other nodes in the partition have the same strategy.
        is_consistent = all(strategy_classification.get(node) == partition_strategy for node in partition)
        if not is_consistent:
            raise RuntimeError(f"Strategy Lock-in validation FAILED. Partition {partition} contains scenarios from multiple strategies.")

    # Add the lock-in conclusion to the framework.
    decision_framework["STRATEGY_LOCK_IN_CONCLUSION"] = {
        "is_choice_irreversible": True,
        "reason": "The two strategies correspond to two disconnected partitions (weakly connected components) in the transitional graph.",
        "implication": "Once a scenario within one strategy is entered, it is impossible to transition to a scenario in the other strategy. The initial choice of strategy is critical and path-dependent."
    }

    return decision_framework

# =============================================================================
# Task 21: Orchestrator Function
# =============================================================================

def generate_investment_strategy_report(
    analysis_report_task20: Dict[str, Any],
    graph_analysis_report_task17: Dict[str, Any],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the final synthesis of results into a strategic report.

    This function serves as the main entry point for Task 21. It integrates
    the quantitative scenario rankings with the graph's structural analysis
    to produce a final, actionable decision framework that outlines the
    available investment strategies and their profound implications.

    Args:
        analysis_report_task20 (Dict[str, Any]): The comprehensive analysis
            report from Task 20.
        graph_analysis_report_task17 (Dict[str, Any]): The analysis report
            from Task 17.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary.

    Returns:
        Dict[str, Any]: A report containing the final, detailed decision
                        framework.
    """
    final_report = {
        "task_name": "Task 21: Investment Strategy Recommendations and Analysis",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Execute the synthesis and analysis ---
        decision_framework = generate_strategic_recommendations(
            analysis_report=analysis_report_task20["outputs"],
            graph_analysis_report=graph_analysis_report_task17["outputs"]["graph_analysis_report"]
        )

        final_report["outputs"]["decision_framework"] = decision_framework
        final_report["summary_message"] = "Successfully synthesized results into a final strategic decision framework."

    except (TypeError, ValueError, KeyError, RuntimeError) as e:
        # Catch any failure during the final analysis.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Strategic analysis failed: {e}"

    return final_report


In [None]:
# Task 22: Scenario Table Generation and Formatting

# =============================================================================
# Task 22, Helper Function: Triplet Formatting
# =============================================================================

def _format_triplet(triplet: Tuple[str, str, str]) -> str:
    """
    Formats a trend triplet tuple into the paper's string representation.

    This helper converts the internal tuple format (e.g., ('+', '+', '-'))
    into the compact string format used in the paper's tables (e.g., '++-').

    Args:
        triplet (Tuple[str, str, str]): The trend triplet tuple.

    Returns:
        str: The formatted string representation.
    """
    # The first element (value) is omitted in the paper's derivative-focused tables.
    # However, Table 5 and 8 in the prompt show a compact form of all three.
    # The paper's text implies a focus on derivatives, but the tables show a compact
    # form. For fidelity to the prompt's tables, we will use a compact form.
    # Example: ('+', '+', '-') -> "++-"
    # We will use the second and third elements (DX, DDX) as per the paper's focus.
    # Let's re-examine Table 5. It shows "+++", "--", etc. This implies a compact
    # representation of the full triplet. We will adopt this.
    return "".join(triplet)

# =============================================================================
# Task 22, Step 1, 2 & 3: Scenario Table Generation
# =============================================================================

def generate_scenario_tables(
    cim_scenarios: List[Dict[str, Tuple]],
    integrated_scenarios: List[Dict[str, Tuple]],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Generates and formats publication-quality tables for CIM and IM scenarios.

    This function executes all steps of Task 22. It creates two DataFrames
    that are high-fidelity replicas of the paper's Table 5 (CIM) and Table 8 (IM).
    It handles the selection of representative variables, formats the data,
    and returns both the raw DataFrames and styled objects for presentation.

    Args:
        cim_scenarios (List[Dict[str, Tuple]]): The list of 7 valid CIM scenarios.
        integrated_scenarios (List[Dict[str, Tuple]]): The list of 14 valid IM scenarios.
        master_input_specification (Dict[str, Any]): The main configuration dictionary.

    Returns:
        Dict[str, Any]: A dictionary containing the raw and styled DataFrames
                        for both tables, along with explanatory metadata.
    """
    # --- Input Validation ---
    if len(cim_scenarios) != 7:
        raise ValueError("Expected exactly 7 CIM scenarios.")
    if len(integrated_scenarios) != 14:
        raise ValueError("Expected exactly 14 Integrated Model scenarios.")

    # --- 1. Generate CIM Scenario Table (replica of Table 5) ---
    # Define the exact column order as per the paper.
    cim_vars = sorted(['UND', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'REP', 'BOO', 'ROA', 'PRI'])

    # Transform the list of scenario dicts into a list of lists for the DataFrame.
    cim_data = [
        [_format_triplet(scenario[var]) for var in cim_vars]
        for scenario in cim_scenarios
    ]

    # Create the DataFrame.
    cim_df = pd.DataFrame(cim_data, columns=cim_vars, index=pd.RangeIndex(start=1, stop=8, name="No."))

    # --- 2. Generate Integrated Model Scenario Table (replica of Table 8) ---
    # Define the representative and RRM variables for the columns.
    rep_vars = ['REP', 'ROA']
    rrm_vars = sorted(['X', 'Y', 'W', 'Z1', 'Z2'])
    im_columns = rep_vars + rrm_vars

    # Transform the IM scenarios into the required format.
    im_data = [
        [_format_triplet(scenario[var]) for var in im_columns]
        for scenario in integrated_scenarios
    ]

    # Create the DataFrame.
    im_df = pd.DataFrame(im_data, columns=im_columns, index=pd.RangeIndex(start=1, stop=15, name="No."))

    # --- 3. Create Annotations and Apply Styling ---
    # Define the explanatory note for the IM table regarding variable grouping.
    im_table_annotation = (
        "Note: In the Integrated Model scenarios, 'REP' is representative of the group "
        "{'REP', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'BOO', 'PRI'}, and 'ROA' is "
        "representative of {'UND', 'ROA'}, as these groups exhibit identical "
        "trend behavior across all 14 scenarios."
    )

    # Define the styling to be applied to both tables.
    style_properties = {'text-align': 'center', 'font-family': 'serif'}

    # Apply the styling. The .style attribute returns a Styler object.
    cim_styled = cim_df.style.set_properties(**style_properties).set_table_styles(
        [{'selector': 'th', 'props': [('text-align', 'center')]}]
    )
    im_styled = im_df.style.set_properties(**style_properties).set_table_styles(
        [{'selector': 'th', 'props': [('text-align', 'center')]}]
    )

    # --- 4. Assemble the Final Output ---
    output = {
        "cim_table": {
            "raw_dataframe": cim_df,
            "styled_object": cim_styled,
            "title": "Table 5 Replica: Scenarios of the trend-based CIM"
        },
        "im_table": {
            "raw_dataframe": im_df,
            "styled_object": im_styled,
            "title": "Table 8 Replica: Scenarios of the trend-based IM",
            "annotation": im_table_annotation
        }
    }

    return output

# =============================================================================
# Task 22: Orchestrator Function
# =============================================================================

def generate_final_scenario_tables(
    cim_scenarios: List[Dict[str, Tuple]],
    integrated_scenarios: List[Dict[str, Tuple]],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the generation of all final, publication-quality scenario tables.

    This function serves as the main entry point for Task 22. It takes the raw
    scenario solutions for both the CIM and the Integrated Model and transforms
    them into formatted and styled DataFrames that are faithful replicas of the
    tables presented in the source paper.

    Args:
        cim_scenarios (List[Dict[str, Tuple]]): The list of 7 CIM scenarios.
        integrated_scenarios (List[Dict[str, Tuple]]): The list of 14 IM scenarios.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary.

    Returns:
        Dict[str, Any]: A report containing the generated raw and styled tables
                        for both the CIM and Integrated Model.
    """
    final_report = {
        "task_name": "Task 22: Scenario Table Generation and Formatting",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Execute the table generation function ---
        table_outputs = generate_scenario_tables(
            cim_scenarios=cim_scenarios,
            integrated_scenarios=integrated_scenarios,
            master_input_specification=master_input_specification
        )

        final_report["outputs"] = table_outputs
        final_report["summary_message"] = "Successfully generated and formatted scenario tables for CIM and IM."

    except (TypeError, ValueError, KeyError) as e:
        # Catch any failure during the table generation process.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Table generation failed: {e}"

    return final_report


In [None]:
# Task 23: Graph Visualization and Network Diagram Generation

# =============================================================================
# Task 23: Orchestrator Function
# =============================================================================

def orchestrate_graph_visualization(
    final_transition_graph: nx.DiGraph,
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the generation of the final transitional graph visualization.

    This function serves as the main entry point for Task 23. It calls the
    dedicated visualization function to produce a high-quality, publication-ready
    plot of the transitional graph that is a faithful replica of Figure 2 from
    the source paper.

    Args:
        final_transition_graph (nx.DiGraph): The final, fully constructed
            transitional graph object from Task 17.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary (unused here, for API consistency).

    Returns:
        Dict[str, Any]: A report containing the matplotlib Figure object of the
                        visualization and a summary message.
    """
    # Initialize the final report dictionary.
    final_report = {
        "task_name": "Task 23: Graph Visualization and Network Diagram Generation",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    try:
        # --- Input Validation ---
        # Ensure the input is a valid networkx DiGraph.
        if not isinstance(final_transition_graph, nx.DiGraph):
            raise TypeError("Input 'final_transition_graph' must be a networkx.DiGraph object.")

        # Ensure the graph is not empty.
        if not final_transition_graph.nodes:
            raise ValueError("Input graph has no nodes and cannot be visualized.")

        # --- Execute the Visualization Function ---
        # Call the previously defined function to generate the plot.
        # This function contains the custom layout and styling to replicate Figure 2.
        figure = visualize_transition_graph(
            graph=final_transition_graph,
            title="Transitional Graph of the Integrated Model"
        )

        # --- Populate the Final Report ---
        # Store the generated matplotlib Figure object in the report outputs.
        final_report["outputs"]["graph_visualization_figure"] = figure

        # Provide a summary message confirming successful execution.
        final_report["summary_message"] = "Successfully generated the transitional graph visualization."

    except ImportError:
        # Handle cases where optional visualization libraries are not installed.
        final_report["overall_status"] = "WARNING"
        final_report["error_message"] = "Visualization failed: `matplotlib` or `networkx` is not installed. Skipping plot generation."
        final_report["outputs"]["graph_visualization_figure"] = None

    except (TypeError, ValueError, KeyError) as e:
        # Catch any other failure during the visualization process.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Graph visualization failed: {e}"
        final_report["outputs"]["graph_visualization_figure"] = None

    # Return the comprehensive report.
    return final_report


In [None]:
# Task 24: Summary Statistics and Model Characteristics Documentation

# =============================================================================
# Task 24: Orchestrator Function
# =============================================================================

def generate_final_summary_report(
    all_task_reports: Dict[str, Dict[str, Any]],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Generates a final, comprehensive summary of all model characteristics and results.

    This function serves as the main entry point for Task 24. It aggregates the
    key findings from the entire pipeline—from data validation to graph analysis—
    into a single, structured, and auditable summary report. It covers quantitative
    model statistics, key behavioral patterns, and a final validation checklist.

    Args:
        all_task_reports (Dict[str, Dict[str, Any]]): A dictionary containing the
            output reports from all previous tasks, keyed by task name or number.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary.

    Returns:
        Dict[str, Any]: A final, nested dictionary containing the comprehensive
                        summary of the entire model replication process.
    """
    # Initialize the final report structure.
    final_report = {
        "task_name": "Task 24: Summary Statistics and Model Characteristics Documentation",
        "overall_status": "SUCCESS",
        "summary": {}
    }

    try:
        # --- Step 1: Quantitative Model Summary ---
        # Retrieve data from previous reports and the master specification.
        im_constraints = all_task_reports["task_13"]["outputs"]["integrated_constraints"]
        im_variables = all_task_reports["task_13"]["outputs"]["integrated_variables"]
        cim_scenarios_report = all_task_reports["task_9"]
        rrm_scenarios_report = all_task_reports["task_12"]
        im_scenarios_report = all_task_reports["task_14"]
        reduction_report = all_task_reports["task_15"]["outputs"]["solution_space_reduction"]

        quantitative_summary = {
            "variable_counts": {
                "cim_variables": 10,
                "rrm_variables": 5,
                "total_variables": len(im_variables)
            },
            "constraint_counts": {
                "cim_constraints": 14,
                "rrm_constraints": 6, # 5 equations + 1 conservation
                "integration_constraints": 3,
                "total_constraints": len(im_constraints)
            },
            "scenario_counts": {
                "cim_scenarios": cim_scenarios_report["outputs"]["scenario_count"],
                "rrm_scenarios": rrm_scenarios_report["outputs"]["scenario_count"],
                "integrated_scenarios": im_scenarios_report["outputs"]["scenario_count"]
            },
            "complexity_metrics": {
                "theoretical_max_scenarios": reduction_report["theoretical_max_scenarios"],
                "solution_space_retained_percentage": reduction_report["retained_percentage"]
            }
        }
        final_report["summary"]["quantitative_model_summary"] = quantitative_summary

        # --- Step 2: Behavioral Pattern Analysis ---
        graph_analysis_report = all_task_reports["task_17"]["outputs"]["graph_analysis_report"]

        behavioral_summary = {
            "variable_grouping_patterns": {
                "description": "Two groups of CIM variables were confirmed to have identical trend behavior across all 14 integrated scenarios.",
                "group_1": "REP, AGE, TA, MAR, LIS, QUA, BOO, PRI",
                "group_2": "UND, ROA"
            },
            "cross_model_interaction_effects": {
                "description": "The 3 integration constraints dramatically pruned the solution space, demonstrating a strong interaction between the financial and rumour models.",
                "details": f"Only {reduction_report['retained_percentage']} of the theoretical scenarios were valid."
            },
            "temporal_dynamics_summary": {
                "description": "The transitional graph reveals the system's dynamic pathways.",
                "partition_count": len(graph_analysis_report["connectivity"]["partitions"]),
                "cycle_count": len(graph_analysis_report["properties"]["simple_cycles"]),
                "terminal_node_count": len(graph_analysis_report["properties"]["terminal_nodes"])
            }
        }
        final_report["summary"]["behavioral_pattern_analysis"] = behavioral_summary

        # --- Step 3: Validation and Quality Metrics ---
        # This checks the success status of key validation tasks in the pipeline.
        validation_summary = {
            "constraint_satisfaction_verification": "SUCCESS" if all_task_reports["task_14"]["overall_status"] == "SUCCESS" else "FAILURE",
            "expected_scenario_count_validation": {
                "CIM (Expected 7)": "SUCCESS" if quantitative_summary["scenario_counts"]["cim_scenarios"] == 7 else "FAILURE",
                "RRM (Expected 211)": "SUCCESS" if quantitative_summary["scenario_counts"]["rrm_scenarios"] == 211 else "FAILURE",
                "IM (Expected 14)": "SUCCESS" if quantitative_summary["scenario_counts"]["integrated_scenarios"] == 14 else "FAILURE",
            },
            "graph_partition_validation": all_task_reports["task_18"]["outputs"]["partition_validation"]["status"],
            "economic_interpretation_validation": all_task_reports["task_15"]["outputs"]["economic_interpretation_validation"]["is_claim_valid"]
        }
        final_report["summary"]["validation_and_quality_summary"] = validation_summary

        # Check if any validation failed to update the overall status.
        if not all(v == "SUCCESS" or v is True for v in [
            validation_summary["constraint_satisfaction_verification"],
            validation_summary["graph_partition_validation"],
            validation_summary["economic_interpretation_validation"]]
        ) or not all(v == "SUCCESS" for v in validation_summary["expected_scenario_count_validation"].values()):
             final_report["overall_status"] = "WARNING"
             final_report["summary_message"] = "Final summary generated, but one or more key validation checkpoints failed. Review sub-reports."
        else:
             final_report["summary_message"] = "Successfully aggregated all model statistics and validation results."

    except (KeyError, TypeError) as e:
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Failed to generate final summary report. A required prior task report may be missing or malformed. Details: {e}"

    return final_report


In [None]:
# Task 25: Solution Completeness and Correctness Validation

# =============================================================================
# Task 25: Orchestrator and Executor Function
# =============================================================================

def validate_solution_completeness_and_correctness(
    cim_scenarios: List[Dict[str, Tuple]],
    rrm_scenarios: List[Dict[str, Tuple]],
    integrated_scenarios: List[Dict[str, Tuple]],
    final_cim_constraints: List[Dict[str, Any]],
    structured_rrm_system: List[Dict[str, List[str]]],
    integrated_constraints: List[Dict[str, Any]],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Performs a final, exhaustive validation of the entire solution set.

    This function serves as the main entry point for Task 25. It acts as a
    holistic, end-to-end check on the outputs of the entire modeling pipeline.
    This remediated version ensures that the mathematical consistency check is
    fully exhaustive for all models, including all 6 constraints for the RRM.

    Args:
        cim_scenarios (List[Dict[str, Tuple]]): The 7 scenarios for the CIM.
        rrm_scenarios (List[Dict[str, Tuple]]): The 211 scenarios for the RRM.
        integrated_scenarios (List[Dict[str, Tuple]]): The 14 integrated scenarios.
        final_cim_constraints (List[Dict[str, Any]]): The 14 CIM constraints.
        structured_rrm_system (List[Dict[str, List[str]]]): The 5 structured RRM equations.
        integrated_constraints (List[Dict[str, Any]]): The 22 integrated constraints.
        master_input_specification (Dict[str, Any]): The main configuration dictionary.

    Returns:
        Dict[str, Any]: A comprehensive report detailing the success or failure
                        of each validation check.
    """
    # Initialize the final report.
    final_report = {
        "task_name": "Task 25 (Remediated): Solution Completeness and Correctness Validation",
        "overall_status": "SUCCESS",
        "validation_checks": {}
    }

    # This list will aggregate all error messages found during validation.
    all_errors = []

    # --- Internal Helper for Mathematical Consistency Check ---
    def _check_consistency(
        scenarios_to_check: List[Dict[str, Tuple]],
        constraints_as_dicts: List[Dict[str, Any]],
        variables: List[str],
        model_name: str,
        extra_constraints: Optional[List[Constraint]] = None
    ) -> List[str]:
        """A helper to exhaustively check every scenario against every constraint."""
        local_errors = []

        # Define the logic for simple constraint types.
        sup = lambda v1, v2: not (v1[1] == '+' and v2[1] == '-')
        red = lambda v1, v2: not (v1[1] == '+' and v2[1] == '+')
        shapes = {
            '+-': lambda x, y: not (x[1] == '+' and not (y[1] == '+' and y[2] == '-')),
            '--': lambda x, y: not (x[1] == '+' and not (y[1] == '-' and y[2] == '-')),
        }

        # Check every scenario.
        for i, scen in enumerate(scenarios_to_check):
            # Check against constraints defined as dictionaries.
            for const_def in constraints_as_dicts:
                satisfied = True
                ctype = const_def['type']

                if ctype in ['SUP', 'RED']:
                    v1_name, v2_name = const_def['variables']
                    logic = sup if ctype == 'SUP' else red
                    if not logic(scen[v1_name], scen[v2_name]): satisfied = False

                elif ctype == 'SHAPE':
                    v1_name, v2_name = const_def['variables']
                    if not shapes[const_def['shape']](scen[v1_name], scen[v2_name]): satisfied = False

                elif ctype == 'RRM_EQUATION':
                    eq = QualitativeEquationConstraint(const_def['equation']['LHS'], const_def['equation']['RHS'], variables)
                    if not eq(variables, {}, scen): satisfied = False

                if not satisfied:
                    local_errors.append(f"{model_name} Scenario {i+1} failed constraint: {const_def}")

            # Check against any extra, pre-instantiated constraint objects.
            if extra_constraints:
                for const_obj in extra_constraints:
                    if not const_obj(const_obj._variables, {}, scen):
                        local_errors.append(f"{model_name} Scenario {i+1} failed constraint: {type(const_obj).__name__}")

        return local_errors

    try:
        # --- Step 1: Expected Outcome Verification ---
        # (This logic remains the same as the original implementation)
        expected_cim_count = _get_nested_param(master_input_specification, 'scenario_generation.expected_solution_counts.cim_scenarios')
        expected_rrm_count = _get_nested_param(master_input_specification, 'scenario_generation.expected_solution_counts.rrm_scenarios')
        expected_im_count = _get_nested_param(master_input_specification, 'scenario_generation.expected_solution_counts.im_scenarios')

        if len(cim_scenarios) != expected_cim_count: all_errors.append(f"CIM count mismatch: expected {expected_cim_count}, got {len(cim_scenarios)}")
        if len(rrm_scenarios) != expected_rrm_count: all_errors.append(f"RRM count mismatch: expected {expected_rrm_count}, got {len(rrm_scenarios)}")
        if len(integrated_scenarios) != expected_im_count: all_errors.append(f"IM count mismatch: expected {expected_im_count}, got {len(integrated_scenarios)}")

        group1 = {'REP', 'AGE', 'TA', 'MAR', 'LIS', 'QUA', 'BOO', 'PRI'}
        group2 = {'UND', 'ROA'}
        for i, scenario in enumerate(integrated_scenarios):
            if len({scenario[var] for var in group1}) != 1: all_errors.append(f"IM Scenario {i+1}: Group 1 variables are not identical.")
            if len({scenario[var] for var in group2}) != 1: all_errors.append(f"IM Scenario {i+1}: Group 2 variables are not identical.")

        final_report["validation_checks"]["outcome_verification"] = "SUCCESS" if not all_errors else f"FAILURE: {all_errors}"

        # --- Step 2: Mathematical Consistency Validation (Remediated) ---
        initial_error_count = len(all_errors)

        # Check CIM model (14 constraints)
        cim_vars = list(cim_scenarios[0].keys())
        all_errors.extend(_check_consistency(cim_scenarios, final_cim_constraints, cim_vars, "CIM"))

        # Check RRM model (5 equation constraints + 1 conservation constraint)
        rrm_vars = list(rrm_scenarios[0].keys())
        # Instantiate the extra conservation constraint for exhaustive checking.
        conservation_check = PopulationConservationConstraint(rrm_vars)
        # The `structured_rrm_system` contains the 5 equation constraints.
        all_errors.extend(_check_consistency(
            rrm_scenarios, structured_rrm_system, rrm_vars, "RRM", extra_constraints=[conservation_check]
        ))

        # Check Integrated Model (22 constraints)
        im_vars = list(integrated_scenarios[0].keys())
        # The conservation constraint is also part of the IM.
        all_errors.extend(_check_consistency(
            integrated_scenarios, integrated_constraints, im_vars, "IM", extra_constraints=[conservation_check]
        ))

        if len(all_errors) > initial_error_count:
            final_report["validation_checks"]["mathematical_consistency"] = f"FAILURE: New errors found: {all_errors[initial_error_count:]}"
        else:
            final_report["validation_checks"]["mathematical_consistency"] = "SUCCESS"

        # --- Step 3: Cross-Model Integration Validation ---
        # This is implicitly covered by the full IM check above. We report its status.
        final_report["validation_checks"]["cross_model_integration"] = final_report["validation_checks"]["mathematical_consistency"]

        # --- Final Status ---
        if all_errors:
            final_report["overall_status"] = "FAILURE"
            final_report["summary_message"] = f"One or more final validation checks failed. Found {len(all_errors)} inconsistencies."
            final_report["error_details"] = all_errors
        else:
            final_report["summary_message"] = "All final validation checks for solution completeness and correctness passed successfully."

    except (KeyError, TypeError, IndexError) as e:
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Final validation failed due to an unexpected error: {e}"

    return final_report


In [None]:
# Task 26: Methodological Rigor and Reproducibility Validation

# =============================================================================
# Task 26: Orchestrator and Executor Function
# =============================================================================

def validate_methodological_rigor_and_reproducibility(
    all_task_reports: Dict[str, Dict[str, Any]],
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Performs a meta-validation of the pipeline's methodological rigor.

    This function serves as the main entry point for Task 26. It does not
    compute new results, but instead audits the execution and configuration of
    the entire pipeline to ensure it was faithful to the source paper's
    methodology. It checks:
    1.  The correctness of the core algorithm implementations.
    2.  The fidelity of key data transcriptions from the paper.
    3.  The deterministic nature of the results.

    Args:
        all_task_reports (Dict[str, Dict[str, Any]]): A dictionary containing the
            output reports from all previous tasks.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary.

    Returns:
        Dict[str, Any]: A final report on the methodological integrity of the
                        replication.
    """
    # Initialize the final report.
    final_report = {
        "task_name": "Task 26: Methodological Rigor and Reproducibility Validation",
        "overall_status": "SUCCESS",
        "validation_checks": {}
    }
    all_errors = []

    try:
        # --- Step 1: Algorithmic Implementation Verification ---
        # 1a. Verify the Inconsistency Removal Algorithm's greedy choice at each step.
        removal_report = all_task_reports["task_8"]
        if removal_report["overall_status"] == "SUCCESS":
            matrix = all_task_reports["task_5"]["outputs"]["final_correlation_matrix"]
            for i, log_entry in enumerate(removal_report["outputs"]["iteration_log"]):
                # Find the actual weakest link in the matrix before this step's removal.
                temp_matrix, temp_report = find_and_remove_weakest_correlation(matrix)
                # Check if the algorithm made the same choice as our verifier.
                if temp_report["variables"] != log_entry["removed_correlation"]["variables"]:
                    all_errors.append(f"Task 8, Iteration {i+1}: Greedy choice was incorrect.")
                matrix = temp_matrix # Update matrix for the next iteration check.

        # 1b. Verify the Graph Construction algorithm by sampling edges.
        graph = all_task_reports["task_17"]["outputs"]["final_transition_graph"]
        rules = all_task_reports["task_16"]["outputs"]["transition_rule_map"]
        # Select one random edge and one random non-edge to verify.
        edge_to_check = list(graph.edges())[0]
        non_edges = list(nx.non_edges(graph))
        non_edge_to_check = non_edges[0] if non_edges else None

        _, edge_check_list = define_graph_nodes_and_identify_transitions(
            all_task_reports["task_14"]["outputs"]["integrated_scenarios"], rules
        )
        if edge_to_check not in edge_check_list:
            all_errors.append(f"Task 16/17: Edge {edge_to_check} in final graph is invalid according to rules.")
        if non_edge_to_check and non_edge_to_check in edge_check_list:
            all_errors.append(f"Task 16/17: Non-edge {non_edge_to_check} should be invalid but was identified as valid.")

        final_report["validation_checks"]["algorithmic_verification"] = "SUCCESS" if not all_errors else f"FAILURE: {all_errors}"

        # --- Step 2: Parameter Configuration (Transcription) Validation ---
        # Compare the generated CIM constraints against a hardcoded ground truth.
        generated_cim_constraints = all_task_reports["task_9"]["outputs"]["final_cim_constraints"]
        ground_truth_cim_constraints = construct_final_cim_constraint_set() # Re-run the trusted generator

        # Convert to a canonical, order-insensitive format for comparison.
        set_generated = {frozenset(d.items()) for d in generated_cim_constraints}
        set_ground_truth = {frozenset(d.items()) for d in ground_truth_cim_constraints}
        if set_generated != set_ground_truth:
            all_errors.append("Task 9: The generated final CIM constraints do not match the hard-coded ground truth from Table 4.")

        final_report["validation_checks"]["transcription_fidelity"] = "SUCCESS" if len(all_errors) == len(final_report.get("validation_checks", {}).get("algorithmic_verification", [])) else f"FAILURE: {all_errors}"

        # --- Step 3: Reproducibility Testing ---
        # Re-run a complex, deterministic part of the pipeline and check for identical output.
        im_vars = all_task_reports["task_13"]["outputs"]["integrated_variables"]
        im_const = all_task_reports["task_13"]["outputs"]["integrated_constraints"]

        # Run 1 is from the original report.
        solutions1 = all_task_reports["task_14"]["outputs"]["integrated_scenarios"]
        # Run 2 is a fresh execution.
        solutions2 = formulate_and_solve_integrated_csp(im_vars, im_const)

        if solutions1 != solutions2:
            all_errors.append("Task 14: Integrated model solution is not deterministic. Two runs produced different results.")

        final_report["validation_checks"]["reproducibility"] = "SUCCESS" if len(all_errors) == len(final_report.get("validation_checks", {}).get("transcription_fidelity", [])) else f"FAILURE: {all_errors}"

        # --- Final Status ---
        if all_errors:
            final_report["overall_status"] = "FAILURE"
            final_report["summary_message"] = f"Methodological validation failed with {len(all_errors)} errors."
            final_report["error_details"] = all_errors
        else:
            final_report["summary_message"] = "All checks for methodological rigor, transcription fidelity, and reproducibility passed."

    except (KeyError, TypeError, IndexError) as e:
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Methodological validation failed due to a missing or malformed prior task report. Details: {e}"

    return final_report



In [None]:
# Task 27: Economic and Domain-Specific Validation

# =============================================================================
# Task 27: Orchestrator and Executor Function
# =============================================================================

def validate_economic_and_domain_reasonableness(
    integrated_scenarios: List[Dict[str, Tuple]],
    final_integrated_constraints: List[Dict[str, Any]],
    final_transition_graph: nx.DiGraph,
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Performs a final validation of the model's outputs against economic principles.

    This function serves as the main entry point for Task 27. It acts as a
    high-level sanity check on the model's results, ensuring they are not just
    mathematically consistent but also plausible from a domain-specific
    (economic and financial) perspective.

    Args:
        integrated_scenarios (List[Dict[str, Tuple]]): The 14 integrated scenarios.
        final_integrated_constraints (List[Dict[str, Any]]): The 22 integrated constraints.
        final_transition_graph (nx.DiGraph): The final transitional graph.
        master_input_specification (Dict[str, Any]): The main configuration dictionary.

    Returns:
        Dict[str, Any]: A report detailing the results of the domain-specific
                        validation checks.
    """
    # Initialize the final report.
    final_report = {
        "task_name": "Task 27: Economic and Domain-Specific Validation",
        "overall_status": "SUCCESS",
        "validation_checks": {}
    }
    all_errors = []

    try:
        # --- Step 1: Economic Interpretation Validation ---
        # Check for economically contradictory states in the final scenarios.
        economic_errors = []
        for i, scenario in enumerate(integrated_scenarios):
            # Check 1: Price-to-Book (PRI) and Book-to-Market (BOO) are inverses.
            # Their derivatives should generally move in opposite directions.
            dpri = scenario['PRI'][1]
            dboo = scenario['BOO'][1]
            if dpri == '+' and dboo == '+':
                economic_errors.append(f"Scenario {i+1}: Implausible state - PRI and BOO are both increasing.")
            if dpri == '-' and dboo == '-':
                economic_errors.append(f"Scenario {i+1}: Implausible state - PRI and BOO are both decreasing.")

        if economic_errors:
            all_errors.extend(economic_errors)
        final_report["validation_checks"]["scenario_economic_consistency"] = "SUCCESS" if not economic_errors else f"WARNING: {economic_errors}"

        # --- Step 2: Domain Expert Validation Simulation ---
        expert_errors = []
        # Check 1: "No tree can grow to Heaven" heuristic.
        # States of perpetual accelerating growth should not be stable (i.e., part of a cycle).
        accelerating_growth_triplet = ('+', '+', '+')
        cycles = list(nx.simple_cycles(final_transition_graph))
        for cycle in cycles:
            for node_id in cycle:
                scenario = final_transition_graph.nodes[node_id]['scenario_data']
                for var, triplet in scenario.items():
                    if triplet == accelerating_growth_triplet:
                        expert_errors.append(f"Heuristic Violation: Variable '{var}' exhibits accelerating growth in a cycle {cycle}.")

        if expert_errors:
            all_errors.extend(expert_errors)
        final_report["validation_checks"]["heuristic_consistency"] = "SUCCESS" if not expert_errors else f"WARNING: {expert_errors}"

        # --- Step 3: Sensitivity and Robustness Assessment (Methodology Outline) ---
        # This step does not execute tests but outlines the plan for them.
        robustness_plan = {
            "parameter_sensitivity_test_plan": {
                "objective": "Test the stability of the inconsistency removal heuristic.",
                "method": "In Task 8, after removing the weakest correlation, also create a branch that keeps it and removes the second-weakest. Re-run the entire pipeline for this branch and compare the final number of scenarios and graph topology.",
                "expected_outcome": "A robust model would yield a similar number of scenarios and a topologically similar graph."
            },
            "constraint_robustness_test_plan": {
                "objective": "Test the influence of the subjective integration constraints.",
                "method": "Create three branches. In each branch, remove one of the three integration constraints from Task 13. Re-run the IM CSP solver (Task 14) for each branch.",
                "expected_outcome": "The number of scenarios should increase significantly (from 14), demonstrating the powerful pruning effect of these constraints."
            }
        }
        final_report["validation_checks"]["robustness_assessment_plan"] = robustness_plan

        # --- Final Status ---
        if all_errors:
            # We classify these as warnings because they relate to plausibility, not hard errors.
            final_report["overall_status"] = "WARNING"
            final_report["summary_message"] = f"Economic and domain validation completed with {len(all_errors)} plausibility warnings."
            final_report["error_details"] = all_errors
        else:
            final_report["summary_message"] = "All economic and domain-specific validation checks passed."

    except (KeyError, TypeError, IndexError) as e:
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"Economic validation failed due to a missing or malformed prior task report. Details: {e}"

    return final_report


In [None]:
# Task 28: End-to-End Research Pipeline Orchestrator Function Development

def run_qualitative_grapevine_model_pipeline(
    raw_df: pd.DataFrame,
    correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Executes the complete end-to-end research pipeline for the qualitative model.

    This master orchestrator function serves as the single entry point for the
    entire methodology described in "Information-Nonintensive Models of Rumour
    Impacts on Complex Investment Decisions". It manages the sequential execution
    of all tasks, from initial data validation and cleansing to model construction,
    solution, analysis, and final reporting.

    The pipeline is designed for full auditability and reproducibility, capturing
    the inputs, outputs, and status of each major stage in a comprehensive report.
    It implements a fail-fast mechanism, halting execution if any critical
    validation or processing step fails.

    Args:
        raw_df (pd.DataFrame): The raw input DataFrame of financial data.
        correlation_matrix_df (pd.DataFrame): The provided correlation matrix
            for the CIM variables.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary that governs the entire pipeline's behavior.

    Returns:
        Dict[str, Any]: A comprehensive, nested dictionary containing the
                        reports from every task in the pipeline, providing a
                        complete and auditable record of the entire run.
    """
    # --- 1. Initialization ---
    # Initialize the master report that will store all intermediate results.
    pipeline_report = {
        "pipeline_name": "Qualitative Grapevine Model Replication Pipeline",
        "pipeline_status": "RUNNING",
        "task_reports": {}
    }

    try:
        # --- PHASE 1: DATA PREPARATION AND VALIDATION ---

        # Step 1.1: Validate the master configuration dictionary itself.
        report = validate_master_input_specification(master_input_specification)
        pipeline_report["task_reports"]["task_03_spec_validation"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("Master specification validation failed.")

        # Step 1.2: Preprocess configuration (e.g., tune solver params).
        processed_spec, report = preprocess_parameter_configuration(master_input_specification)
        pipeline_report["task_reports"]["task_06_spec_preprocessing"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("Parameter configuration preprocessing failed.")

        # Step 1.3 & 1.4: Validate raw data and correlation matrix.
        report = validate_raw_dataframe_and_schema(raw_df, processed_spec)
        pipeline_report["task_reports"]["task_01_raw_df_validation"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("Raw DataFrame validation failed.")

        report = validate_correlation_matrix_and_integrity(correlation_matrix_df, raw_df, processed_spec)
        pipeline_report["task_reports"]["task_02_corr_matrix_validation"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("Correlation matrix validation failed.")

        # Step 1.5 & 1.6: Cleanse and preprocess the validated data.
        clean_df, report = cleanse_and_standardize_raw_data(raw_df, processed_spec)
        pipeline_report["task_reports"]["task_04_data_cleansing"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("Data cleansing failed.")

        processed_corr_df, report = preprocess_and_normalize_correlation_matrix(correlation_matrix_df, processed_spec)
        pipeline_report["task_reports"]["task_05_corr_matrix_preprocessing"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("Correlation matrix preprocessing failed.")

        # --- PHASE 2: CIM CONSTRUCTION ---

        # Step 2.1: Generate initial constraints and check for inconsistency.
        report = generate_and_test_initial_constraint_set(processed_corr_df, processed_spec)
        pipeline_report["task_reports"]["task_07_initial_csp_generation"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("Initial constraint generation failed.")
        is_inconsistent = report["outputs"]["is_inconsistent"]

        # Step 2.2: Run inconsistency removal if necessary.
        if is_inconsistent:
            report = iteratively_remove_inconsistencies(processed_corr_df, processed_spec)
            pipeline_report["task_reports"]["task_08_inconsistency_removal"] = report
            if report["overall_status"] == "FAILURE": raise RuntimeError("Iterative inconsistency removal failed.")
        else:
            # This path deviates from the paper but is handled gracefully.
            warnings.warn("Initial correlation matrix was already consistent. Skipping inconsistency removal.")
            pipeline_report["task_reports"]["task_08_inconsistency_removal"] = {"status": "SKIPPED"}

        # Step 2.3: Finalize the CIM with expert knowledge and solve.
        report = finalize_and_solve_cim_model(processed_spec)
        pipeline_report["task_reports"]["task_09_final_cim_solution"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("Final CIM model solution failed.")
        cim_scenarios = report["outputs"]["cim_scenarios"]
        final_cim_constraints = report["outputs"]["final_cim_constraints"]

        # --- PHASE 3: RRM CONSTRUCTION ---

        # Step 3.1 - 3.3: Translate, formulate, and solve the RRM.
        report = translate_and_structure_rrm_system(processed_spec)
        pipeline_report["task_reports"]["task_10_rrm_translation"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("RRM translation failed.")
        structured_rrm_system = report["outputs"]["structured_rrm_system"]

        report = formulate_rrm_csp(structured_rrm_system, list(processed_spec['empirical_data']['rrm_system']['state_variables'].keys()))
        pipeline_report["task_reports"]["task_11_rrm_csp_formulation"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("RRM CSP formulation failed.")
        rrm_csp_problem = report["outputs"]["rrm_csp_problem"]

        report = generate_and_validate_rrm_scenarios(rrm_csp_problem, processed_spec)
        pipeline_report["task_reports"]["task_12_rrm_solution"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("RRM scenario generation failed.")
        rrm_scenarios = report["outputs"]["rrm_scenarios"]

        # --- PHASE 4: MODEL INTEGRATION AND SOLUTION ---

        # Step 4.1 & 4.2: Integrate models and solve the final IM CSP.
        report = integrate_cim_rrm_namespaces(final_cim_constraints, structured_rrm_system, processed_spec)
        pipeline_report["task_reports"]["task_13_namespace_integration"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("Namespace integration failed.")
        integrated_variables = report["outputs"]["integrated_variables"]
        integrated_constraints = report["outputs"]["integrated_constraints"]

        report = formulate_and_solve_integrated_model(integrated_variables, integrated_constraints, processed_spec)
        pipeline_report["task_reports"]["task_14_integrated_model_solution"] = report
        if report["overall_status"] == "FAILURE": raise RuntimeError("Integrated model solution failed.")
        integrated_scenarios = report["outputs"]["integrated_scenarios"]

        # --- PHASE 5: ANALYSIS AND REPORTING ---

        # This phase uses the final results to generate all analyses and artifacts.
        report = analyze_and_interpret_integrated_solutions(integrated_scenarios, processed_spec)
        pipeline_report["task_reports"]["task_15_solution_analysis"] = report

        report = prepare_transition_graph_components(integrated_scenarios, processed_spec)
        pipeline_report["task_reports"]["task_16_graph_components_prep"] = report
        graph_components = report["outputs"]

        report = build_and_analyze_transition_graph(graph_components, processed_spec)
        pipeline_report["task_reports"]["task_17_graph_analysis"] = report
        final_transition_graph = report["outputs"]["final_transition_graph"]

        report = analyze_and_visualize_graph_structure(report, processed_spec)
        pipeline_report["task_reports"]["task_18_graph_validation"] = report

        report = analyze_decision_variables_and_strategies(integrated_scenarios, processed_spec)
        pipeline_report["task_reports"]["task_19_strategy_analysis"] = report
        strategy_classification = report["outputs"]["scenario_strategy_classification"]

        report = evaluate_and_rank_scenarios(integrated_scenarios, strategy_classification, processed_spec)
        pipeline_report["task_reports"]["task_20_scenario_ranking"] = report

        report = generate_investment_strategy_report(
            pipeline_report["task_reports"]["task_20_scenario_ranking"],
            pipeline_report["task_reports"]["task_17_graph_analysis"],
            processed_spec
        )
        pipeline_report["task_reports"]["task_21_strategy_report"] = report

        report = generate_final_scenario_tables(cim_scenarios, integrated_scenarios, processed_spec)
        pipeline_report["task_reports"]["task_22_table_generation"] = report

        report = orchestrate_graph_visualization(final_transition_graph, processed_spec)
        pipeline_report["task_reports"]["task_23_visualization"] = report

        # --- FINAL VALIDATION CHECKS ---

        # Run the final, exhaustive validation tasks on the complete set of artifacts.
        report = validate_solution_completeness_and_correctness(
            cim_scenarios, rrm_scenarios, integrated_scenarios,
            final_cim_constraints, structured_rrm_system, integrated_constraints,
            processed_spec
        )
        pipeline_report["task_reports"]["task_25_correctness_validation"] = report

        report = validate_methodological_rigor_and_reproducibility(pipeline_report["task_reports"], processed_spec)
        pipeline_report["task_reports"]["task_26_rigor_validation"] = report

        report = validate_economic_and_domain_reasonableness(
            integrated_scenarios, integrated_constraints, final_transition_graph, processed_spec
        )
        pipeline_report["task_reports"]["task_27_economic_validation"] = report

        # --- Finalization ---
        # The final summary aggregates the status of all key validation checkpoints.
        report = generate_final_summary_report(pipeline_report["task_reports"], processed_spec)
        pipeline_report["task_reports"]["task_24_final_summary"] = report

        # Set the final pipeline status based on the summary.
        pipeline_report["pipeline_status"] = report["overall_status"]

    except (Exception, RuntimeError) as e:
        # Catch any unhandled exception from any step in the pipeline.
        pipeline_report["pipeline_status"] = "CRITICAL_FAILURE"
        pipeline_report["error_message"] = f"The pipeline was aborted due to a critical error: {e}"

    # Return the complete, auditable record of the entire pipeline execution.
    return pipeline_report


In [None]:
# Task 29: Comprehensive Robustness Analysis Framework

# =============================================================================
# Task 29, Step 1, Helper: Core Pipeline Runner
# =============================================================================

def _run_core_pipeline_for_sensitivity(
    perturbed_corr_df: pd.DataFrame,
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Executes a critical path of the main pipeline for a given input matrix.

    This helper function is designed for sensitivity analysis. It takes a
    (potentially perturbed) correlation matrix and runs the pipeline from
    preprocessing through graph construction to extract key structural output
    metrics. This allows for repeated runs with different inputs without the
    overhead of the final reporting tasks.

    Args:
        perturbed_corr_df (pd.DataFrame): The correlation matrix to test.
        master_input_specification (Dict[str, Any]): The main configuration dict.

    Returns:
        Dict[str, Any]: A dictionary of key output metrics, such as the number
                        of integrated scenarios and graph partitions. Returns
                        a dictionary with an 'error' key on failure.
    """
    try:
        # This sequence mirrors the main orchestrator but stops after graph analysis.

        # Phase 1: Prep
        processed_spec, _ = preprocess_parameter_configuration(master_input_specification)
        processed_corr_df, _ = preprocess_and_normalize_correlation_matrix(perturbed_corr_df, processed_spec)

        # Phase 2: CIM
        report_task7 = generate_and_test_initial_constraint_set(processed_corr_df, processed_spec)
        if report_task7["outputs"]["is_inconsistent"]:
            report_task8 = iteratively_remove_inconsistencies(processed_corr_df, processed_spec)
            if report_task8["overall_status"] == "FAILURE": return {"error": "Inconsistency removal failed"}

        report_task9 = finalize_and_solve_cim_model(processed_spec)
        if report_task9["overall_status"] == "FAILURE": return {"error": "Final CIM solution failed"}
        cim_scenarios = report_task9["outputs"]["cim_scenarios"]
        final_cim_constraints = report_task9["outputs"]["final_cim_constraints"]

        # Phase 3: RRM
        report_task10 = translate_and_structure_rrm_system(processed_spec)
        structured_rrm_system = report_task10["outputs"]["structured_rrm_system"]
        report_task11 = formulate_rrm_csp(structured_rrm_system, list(processed_spec['empirical_data']['rrm_system']['state_variables'].keys()))
        rrm_csp_problem = report_task11["outputs"]["rrm_csp_problem"]
        report_task12 = generate_and_validate_rrm_scenarios(rrm_csp_problem, processed_spec)
        rrm_scenarios = report_task12["outputs"]["rrm_scenarios"]

        # Phase 4: Integration
        report_task13 = integrate_cim_rrm_namespaces(final_cim_constraints, structured_rrm_system, processed_spec)
        integrated_variables = report_task13["outputs"]["integrated_variables"]
        integrated_constraints = report_task13["outputs"]["integrated_constraints"]
        report_task14 = formulate_and_solve_integrated_model(integrated_variables, integrated_constraints, processed_spec)
        if report_task14["overall_status"] == "FAILURE": return {"error": "Integrated model solution failed"}
        integrated_scenarios = report_task14["outputs"]["integrated_scenarios"]

        # Phase 5: Graph Construction
        report_task16 = prepare_transition_graph_components(integrated_scenarios, processed_spec)
        graph_components = report_task16["outputs"]
        report_task17 = build_and_analyze_transition_graph(graph_components, processed_spec)
        if report_task17["overall_status"] == "FAILURE": return {"error": "Graph analysis failed"}

        # Extract the key metrics for sensitivity analysis.
        analysis_report = report_task17["outputs"]["graph_analysis_report"]
        return {
            "num_cim_scenarios": len(cim_scenarios),
            "num_rrm_scenarios": len(rrm_scenarios),
            "num_im_scenarios": len(integrated_scenarios),
            "num_partitions": len(analysis_report["connectivity"]["partitions"]),
            "num_cycles": len(analysis_report["properties"]["simple_cycles"]),
            "error": None
        }
    except Exception as e:
        return {"error": str(e)}

# =============================================================================
# Task 29, Step 1: Orchestrator
# =============================================================================

def analyze_correlation_matrix_sensitivity(
    base_correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any],
    perturbation_magnitudes: List[float] = [0.01, 0.05]
) -> pd.DataFrame:
    """
    Performs a sensitivity analysis on the model's correlation matrix input.

    This function systematically perturbs each off-diagonal element of the
    base correlation matrix, re-runs the core modeling pipeline for each
    perturbed matrix, and records the impact on key structural outputs
    (e.g., number of scenarios, graph partitions). This provides a measure
    of the model's robustness to small changes in its primary numerical input.

    Args:
        base_correlation_matrix_df (pd.DataFrame): The original, validated,
            and preprocessed correlation matrix.
        master_input_specification (Dict[str, Any]): The main configuration dict.
        perturbation_magnitudes (List[float]): A list of absolute values by
            which to perturb the correlation coefficients (e.g., 0.05).

    Returns:
        pd.DataFrame: A DataFrame summarizing the results of the sensitivity
                      analysis, indexed by the perturbation details.
    """
    # --- Initialization ---
    # Get the list of variables for iteration.
    variables = base_correlation_matrix_df.columns.tolist()
    results = []

    # --- Perturbation Loop ---
    # Iterate through each unique off-diagonal element (the upper triangle).
    for i in range(len(variables)):
        for j in range(i + 1, len(variables)):
            var1, var2 = variables[i], variables[j]

            # Iterate through each specified perturbation magnitude.
            for magnitude in perturbation_magnitudes:
                # Test both increasing and decreasing the correlation.
                for direction in [1, -1]:
                    # Create a deep copy of the base matrix for this experiment.
                    perturbed_df = base_correlation_matrix_df.copy()
                    original_value = perturbed_df.loc[var1, var2]

                    # Apply the perturbation.
                    new_value = original_value + (direction * magnitude)

                    # Clip the value to the valid correlation range [-1, 1].
                    new_value = np.clip(new_value, -1.0, 1.0)

                    # Update the matrix symmetrically.
                    perturbed_df.loc[var1, var2] = new_value
                    perturbed_df.loc[var2, var1] = new_value

                    # --- Run the Core Pipeline with the Perturbed Matrix ---
                    # This is the computationally intensive step.
                    metrics = _run_core_pipeline_for_sensitivity(
                        perturbed_df, master_input_specification
                    )

                    # --- Record the Results ---
                    results.append({
                        "perturbed_variable_pair": f"{var1}-{var2}",
                        "perturbation_direction": "+" if direction == 1 else "-",
                        "perturbation_magnitude": magnitude,
                        "original_correlation": original_value,
                        "perturbed_correlation": new_value,
                        "num_im_scenarios": metrics.get("num_im_scenarios"),
                        "num_partitions": metrics.get("num_partitions"),
                        "num_cycles": metrics.get("num_cycles"),
                        "run_status": "SUCCESS" if metrics.get("error") is None else "FAILURE",
                        "error_message": metrics.get("error")
                    })

    # --- Finalize and Return Results ---
    if not results:
        return pd.DataFrame()

    # Create a DataFrame from the results list.
    results_df = pd.DataFrame(results)

    # Set a multi-level index for clear and structured analysis.
    results_df.set_index([
        "perturbed_variable_pair",
        "perturbation_direction",
        "perturbation_magnitude"
    ], inplace=True)

    return results_df


# =============================================================================
# Task 29, Step 1: Systematic Parameter Perturbation
# =============================================================================

def analyze_threshold_parameter_sensitivity(
    base_correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any],
    parameters_to_test: Dict[str, List[Any]]
) -> pd.DataFrame:
    """
    Performs a sensitivity analysis on various numerical threshold parameters.

    This function systematically modifies specified threshold parameters within
    the master configuration (e.g., correlation strength thresholds), re-runs
    the core pipeline for each modification, and records the impact on the
    final model structure.

    Args:
        base_correlation_matrix_df (pd.DataFrame): The original correlation matrix.
        master_input_specification (Dict[str, Any]): The base configuration dict.
        parameters_to_test (Dict[str, List[Any]]): A dictionary where keys are
            dot-notation paths to parameters and values are lists of values
            to test for that parameter.

    Returns:
        pd.DataFrame: A DataFrame summarizing the results of the sensitivity
                      analysis, indexed by the parameter and value tested.
    """
    results = []

    # Iterate through each parameter specified for testing.
    for param_path, values_to_test in parameters_to_test.items():
        # Iterate through each value to be tested for the current parameter.
        for value in values_to_test:
            # Create a deep copy of the configuration for this specific run.
            spec_copy = copy.deepcopy(master_input_specification)

            try:
                # Modify the parameter in the copied configuration.
                # This requires a helper to navigate and set the nested value.
                keys = param_path.split('.')
                sub_dict = spec_copy
                for key in keys[:-1]:
                    sub_dict = sub_dict[key]
                sub_dict[keys[-1]] = value

                # --- Run the Core Pipeline with the Modified Spec ---
                metrics = _run_core_pipeline_for_sensitivity(
                    base_correlation_matrix_df, spec_copy
                )

                # --- Record the Results ---
                results.append({
                    "parameter_path": param_path,
                    "tested_value": value,
                    "num_im_scenarios": metrics.get("num_im_scenarios"),
                    "num_partitions": metrics.get("num_partitions"),
                    "run_status": "SUCCESS" if metrics.get("error") is None else "FAILURE",
                    "error_message": metrics.get("error")
                })
            except Exception as e:
                # Catch errors if the path is invalid or the run fails unexpectedly.
                results.append({
                    "parameter_path": param_path,
                    "tested_value": value,
                    "run_status": "CRITICAL_FAILURE",
                    "error_message": str(e)
                })

    # --- Finalize and Return Results ---
    if not results:
        return pd.DataFrame()
    return pd.DataFrame(results).set_index(["parameter_path", "tested_value"])


# =============================================================================
# Task 29, Step 1: CSP Solver Parameter Testing
# =============================================================================

def analyze_csp_solver_parameter_sensitivity(
    base_correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any],
    timeouts_to_test: List[int],
    memory_limits_to_test: List[int]
) -> pd.DataFrame:
    """
    Evaluates the model's sensitivity to CSP solver resource constraints.

    This function tests the pipeline's performance and success rate under
    different CSP solver timeout and memory limit configurations. This helps
    to identify the minimum resource requirements for a successful run.

    Args:
        base_correlation_matrix_df (pd.DataFrame): The original correlation matrix.
        master_input_specification (Dict[str, Any]): The base configuration dict.
        timeouts_to_test (List[int]): A list of timeout values (in seconds) to test.
        memory_limits_to_test (List[int]): A list of memory limits (in MB) to test.

    Returns:
        pd.DataFrame: A DataFrame summarizing the performance and success rate
                      for each combination of solver parameters.
    """
    results = []

    # Create a cartesian product of all timeout and memory limits to test.
    for timeout in timeouts_to_test:
        for mem_limit in memory_limits_to_test:
            spec_copy = copy.deepcopy(master_input_specification)

            # Modify the solver parameters in the copied configuration.
            solver_params = _get_nested_param(
                spec_copy,
                'computational_configuration.csp_solver.search_space_management'
            )
            solver_params['search_timeout_seconds'] = timeout
            solver_params['memory_limit_mb'] = mem_limit

            # --- Run the Core Pipeline and Time Execution ---
            start_time = time.time()
            # Note: The `python-constraint` library does not natively support
            # timeout or memory limits. A production system would use a more
            # advanced solver (like OR-Tools) or wrap the call in a separate
            # process with resource limits. This function simulates the test.
            metrics = _run_core_pipeline_for_sensitivity(
                base_correlation_matrix_df, spec_copy
            )
            end_time = time.time()

            # --- Record the Results ---
            results.append({
                "timeout_sec": timeout,
                "memory_limit_mb": mem_limit,
                "execution_time_sec": end_time - start_time,
                "num_im_scenarios": metrics.get("num_im_scenarios"),
                "run_status": "SUCCESS" if metrics.get("error") is None else "FAILURE",
                "error_message": metrics.get("error")
            })

    if not results:
        return pd.DataFrame()
    return pd.DataFrame(results).set_index(["timeout_sec", "memory_limit_mb"])


# =============================================================================
# Task 29, Step 1: Expert Knowledge Confidence Testing
# =============================================================================

def analyze_expert_knowledge_sensitivity(
    # This function needs the outputs of several prior tasks.
    integrated_variables: List[str],
    base_integrated_constraints: List[Dict[str, Any]],
    confidence_thresholds_to_test: List[float]
) -> pd.DataFrame:
    """
    Assesses the model's sensitivity to the confidence in expert knowledge.

    This function simulates a scenario where expert-defined integration
    constraints are only included if their confidence level meets a certain
    threshold. It then re-solves the Integrated Model to see how the number
    of valid scenarios changes as the bar for including expert knowledge is raised.

    Args:
        integrated_variables (List[str]): The list of all 15 model variables.
        base_integrated_constraints (List[Dict[str, Any]]): The full set of 22
            constraints before any confidence-based filtering.
        confidence_thresholds_to_test (List[float]): A list of confidence
            thresholds to test (e.g., [0.0, 0.7, 0.8]).

    Returns:
        pd.DataFrame: A DataFrame showing how the number of integrated
                      scenarios changes with the confidence threshold.
    """
    results = []

    # Retrieve the confidence levels from the integration constraints.
    # This is a simplified example; a full implementation would fetch this
    # from the master_input_specification.
    confidence_map = {
        ('Z2', 'REP'): 0.75, # For σ+-
        ('Z1', 'UND'): 0.80, # For σ--
        ('W', 'REP'): 0.70  # For RED
    }

    for threshold in confidence_thresholds_to_test:
        # --- Filter Constraints Based on Confidence Threshold ---
        filtered_constraints = []
        for const in base_integrated_constraints:
            is_expert_integration_const = const.get('source') == 'integration_expert'

            if not is_expert_integration_const:
                # Always include non-expert constraints.
                filtered_constraints.append(const)
            else:
                # For expert constraints, check their confidence level.
                # We use a canonical key for the confidence map.
                key = tuple(sorted(const['variables']))
                if confidence_map.get(key, 0.0) >= threshold:
                    filtered_constraints.append(const)

        # --- Re-solve the Integrated Model with the Filtered Constraints ---
        try:
            solutions = formulate_and_solve_integrated_csp(
                integrated_variables, filtered_constraints
            )
            num_scenarios = len(solutions)
            status = "SUCCESS"
            error = None
        except Exception as e:
            num_scenarios = None
            status = "FAILURE"
            error = str(e)

        # --- Record the Results ---
        results.append({
            "confidence_threshold": threshold,
            "num_constraints_included": len(filtered_constraints),
            "num_im_scenarios": num_scenarios,
            "run_status": status,
            "error_message": error
        })

    if not results:
        return pd.DataFrame()
    return pd.DataFrame(results).set_index("confidence_threshold")


# =============================================================================
# Task 29, Step 1: Master Orchestrator
# =============================================================================

def run_parameter_sensitivity_analysis_framework(
    base_correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any],
    # --- Inputs for sub-analyses ---
    final_pipeline_outputs: Dict[str, Any],
    # --- Control parameters for the analyses ---
    corr_perturbations: List[float] = [0.05],
    threshold_params_to_test: Dict[str, List[Any]] = None,
    solver_timeouts: List[int] = [3600, 7200],
    solver_memory_limits: List[int] = [8192],
    confidence_thresholds: List[float] = [0.0, 0.75, 0.9]
) -> Dict[str, Any]:
    """
    Orchestrates a comprehensive suite of parameter sensitivity analyses.

    This master function serves as the entry point for Step 1 of Task 29. It
    executes and aggregates the results from four distinct types of sensitivity
    analysis to provide a holistic view of the model's robustness to changes
    in its inputs and configuration.

    The analyses performed are:
    1.  **Correlation Matrix Sensitivity**: Perturbs the input correlation matrix.
    2.  **Threshold Parameter Sensitivity**: Varies numerical thresholds in the pipeline.
    3.  **CSP Solver Sensitivity**: Tests different resource limits for the solver.
    4.  **Expert Knowledge Sensitivity**: Varies the confidence required to include
        expert-defined constraints.

    Args:
        base_correlation_matrix_df (pd.DataFrame): The original, validated
            correlation matrix.
        master_input_specification (Dict[str, Any]): The base configuration dict.
        final_pipeline_outputs (Dict[str, Any]): A dictionary containing the final
            artifacts from a baseline run of the main pipeline, such as the
            final integrated variables and constraints.
        corr_perturbations (List[float]): Magnitudes for the correlation analysis.
        threshold_params_to_test (Dict[str, List[Any]]): A dictionary defining
            which other numerical parameters to test. Defaults to a common example.
        solver_timeouts (List[int]): Timeout values (seconds) for the solver test.
        solver_memory_limits (List[int]): Memory limits (MB) for the solver test.
        confidence_thresholds (List[float]): Confidence thresholds for the
            expert knowledge test.

    Returns:
        Dict[str, Any]: A comprehensive report containing the results of each
                        sensitivity analysis as a pandas DataFrame.
    """
    # Initialize the final, aggregated report.
    final_report = {
        "task_name": "Task 29, Step 1: Parameter Sensitivity Robustness Testing",
        "overall_status": "SUCCESS",
        "analysis_results": {}
    }

    # Define a default set of threshold parameters to test if none are provided.
    if threshold_params_to_test is None:
        threshold_params_to_test = {
            'computational_configuration.inconsistency_removal.algorithm_parameters.iteration_limit': [30, 45, 60]
        }

    print("--- Starting Comprehensive Sensitivity Analysis (This may take a long time) ---")

    try:
        # --- 1. Correlation Matrix Sensitivity Analysis ---
        print("Running Correlation Matrix Sensitivity Analysis...")
        corr_sensitivity_results = analyze_correlation_matrix_sensitivity(
            base_correlation_matrix_df=base_correlation_matrix_df,
            master_input_specification=master_input_specification,
            perturbation_magnitudes=corr_perturbations
        )
        final_report["analysis_results"]["correlation_sensitivity"] = corr_sensitivity_results
        print("...Correlation Matrix analysis complete.")

        # --- 2. Threshold Parameter Sensitivity Analysis ---
        print("Running Threshold Parameter Sensitivity Analysis...")
        threshold_sensitivity_results = analyze_threshold_parameter_sensitivity(
            base_correlation_matrix_df=base_correlation_matrix_df,
            master_input_specification=master_input_specification,
            parameters_to_test=threshold_params_to_test
        )
        final_report["analysis_results"]["threshold_sensitivity"] = threshold_sensitivity_results
        print("...Threshold Parameter analysis complete.")

        # --- 3. CSP Solver Parameter Sensitivity Analysis ---
        print("Running CSP Solver Parameter Sensitivity Analysis...")
        solver_sensitivity_results = analyze_csp_solver_parameter_sensitivity(
            base_correlation_matrix_df=base_correlation_matrix_df,
            master_input_specification=master_input_specification,
            timeouts_to_test=solver_timeouts,
            memory_limits_to_test=solver_memory_limits
        )
        final_report["analysis_results"]["solver_sensitivity"] = solver_sensitivity_results
        print("...CSP Solver analysis complete.")

        # --- 4. Expert Knowledge Confidence Sensitivity Analysis ---
        print("Running Expert Knowledge Sensitivity Analysis...")
        # Unpack the necessary artifacts from the baseline pipeline run.
        integrated_vars = final_pipeline_outputs["task_13"]["outputs"]["integrated_variables"]
        base_integrated_constraints = final_pipeline_outputs["task_13"]["outputs"]["integrated_constraints"]

        expert_knowledge_results = analyze_expert_knowledge_sensitivity(
            integrated_variables=integrated_vars,
            base_integrated_constraints=base_integrated_constraints,
            confidence_thresholds_to_test=confidence_thresholds
        )
        final_report["analysis_results"]["expert_knowledge_sensitivity"] = expert_knowledge_results
        print("...Expert Knowledge analysis complete.")

        final_report["summary_message"] = "All parameter sensitivity analyses completed successfully."

    except Exception as e:
        # Catch any critical failure during the extensive analysis.
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"A critical error occurred during sensitivity analysis: {e}"

    print("--- Comprehensive Sensitivity Analysis Finished ---")
    return final_report


# =============================================================================
# Task 29, Step 2, Sub-Task 1: Alternative Inconsistency Removal
# =============================================================================

def analyze_alternative_inconsistency_removal_strategies(
    initial_correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any],
    strategies_to_test: List[str] = ['random', 'max_first'],
    num_random_runs: int = 5
) -> pd.DataFrame:
    """
    Analyzes the model's robustness to alternative inconsistency removal strategies.

    This function tests how the final CIM structure changes if the core heuristic
    for resolving constraint inconsistencies is modified. It compares the baseline
    'minimum_absolute_value_first' strategy with alternatives like random removal
    or removing the strongest correlation first.

    Args:
        initial_correlation_matrix_df (pd.DataFrame): The original, inconsistent
            correlation matrix.
        master_input_specification (Dict[str, Any]): The main configuration dict.
        strategies_to_test (List[str]): A list of alternative strategies to run.
        num_random_runs (int): The number of times to run the 'random' strategy
                               to account for its stochasticity.

    Returns:
        pd.DataFrame: A DataFrame summarizing the final model metrics produced
                      by each alternative strategy.
    """
    results = []

    # Define a modified version of the removal function for the 'max_first' strategy.
    def find_and_remove_strongest_correlation(df):
        matrix_values = df.values.copy()
        np.fill_diagonal(matrix_values, 0) # Ignore diagonal
        max_abs_value = np.max(np.abs(matrix_values))
        if np.isclose(max_abs_value, 0): return df, {}
        indices = np.argwhere(np.isclose(np.abs(matrix_values), max_abs_value))
        pairs = {tuple(sorted((df.columns[i], df.columns[j]))) for i, j in indices}
        pair_to_remove = sorted(list(pairs))[0]
        df_copy = df.copy()
        df_copy.loc[pair_to_remove[0], pair_to_remove[1]] = 0.0
        df_copy.loc[pair_to_remove[1], pair_to_remove[0]] = 0.0
        return df_copy, {"variables": pair_to_remove}

    # Define a modified version for the 'random' strategy.
    def find_and_remove_random_correlation(df):
        matrix_values = df.values.copy()
        np.fill_diagonal(matrix_values, 0)
        non_zero_indices = np.argwhere(matrix_values != 0)
        if len(non_zero_indices) == 0: return df, {}
        i, j = random.choice(non_zero_indices)
        pair_to_remove = tuple(sorted((df.columns[i], df.columns[j])))
        df_copy = df.copy()
        df_copy.loc[pair_to_remove[0], pair_to_remove[1]] = 0.0
        df_copy.loc[pair_to_remove[1], pair_to_remove[0]] = 0.0
        return df_copy, {"variables": pair_to_remove}

    # This is a simplified runner for just the CIM part of the pipeline.
    def run_cim_pipeline(corr_df, removal_func):
        limit = master_input_specification['computational_configuration']['inconsistency_removal']['algorithm_parameters']['iteration_limit']
        for _ in range(limit):
            constraints = map_correlation_to_constraints(corr_df)
            problem = formulate_initial_csp(corr_df.columns.tolist(), constraints)
            is_inconsistent, solutions = detect_initial_inconsistency(problem)
            if not is_inconsistent:
                return {"num_cim_scenarios": len(solutions), "error": None}
            corr_df, _ = removal_func(corr_df)
        return {"error": "Max iterations reached"}

    # Run the experiments
    for strategy in strategies_to_test:
        runs = num_random_runs if strategy == 'random' else 1
        for i in range(runs):
            run_name = f"{strategy}_{i+1}" if strategy == 'random' else strategy
            removal_func = {
                'random': find_and_remove_random_correlation,
                'max_first': find_and_remove_strongest_correlation
            }[strategy]

            metrics = run_cim_pipeline(initial_correlation_matrix_df.copy(), removal_func)
            metrics["strategy"] = run_name
            results.append(metrics)

    if not results: return pd.DataFrame()
    return pd.DataFrame(results).set_index("strategy")

# =============================================================================
# Task 29, Step 2, Sub-Task 2: Alternative Expert Knowledge
# =============================================================================

def analyze_alternative_expert_knowledge(
    integrated_variables: List[str],
    base_integrated_constraints: List[Dict[str, Any]]
) -> pd.DataFrame:
    """
    Analyzes the model's robustness to alternative expert knowledge specifications.

    This function tests the impact of the three crucial, expert-defined
    integration constraints by selectively removing them and re-solving the
    Integrated Model. This quantifies their influence on the final number of
    valid scenarios.

    Args:
        integrated_variables (List[str]): The list of all 15 model variables.
        base_integrated_constraints (List[Dict[str, Any]]): The full set of 22
            constraints from the baseline model.

    Returns:
        pd.DataFrame: A DataFrame showing how the number of scenarios changes
                      with different expert knowledge specifications.
    """
    results = []

    # Experiment 1: Baseline (all 22 constraints)
    try:
        solutions = formulate_and_solve_integrated_csp(integrated_variables, base_integrated_constraints)
        results.append({"specification": "Baseline (All Constraints)", "num_im_scenarios": len(solutions)})
    except Exception as e:
        results.append({"specification": "Baseline (All Constraints)", "num_im_scenarios": f"Error: {e}"})

    # Experiment 2: No Integration Constraints
    no_integration_constraints = [c for c in base_integrated_constraints if c.get('source') != 'integration_expert']
    try:
        solutions = formulate_and_solve_integrated_csp(integrated_variables, no_integration_constraints)
        results.append({"specification": "No Integration Constraints", "num_im_scenarios": len(solutions)})
    except Exception as e:
        results.append({"specification": "No Integration Constraints", "num_im_scenarios": f"Error: {e}"})

    # Experiment 3: Omit one constraint at a time
    integration_constraints_to_test = [c for c in base_integrated_constraints if c.get('source') == 'integration_expert']
    for const_to_omit in integration_constraints_to_test:
        spec_name = f"Omit {const_to_omit['type']}{const_to_omit.get('shape', '')}{const_to_omit['variables']}"
        # Create a constraint set that is missing just this one constraint.
        partial_constraints = [c for c in base_integrated_constraints if c != const_to_omit]
        try:
            solutions = formulate_and_solve_integrated_csp(integrated_variables, partial_constraints)
            results.append({"specification": spec_name, "num_im_scenarios": len(solutions)})
        except Exception as e:
            results.append({"specification": spec_name, "num_im_scenarios": f"Error: {e}"})

    if not results: return pd.DataFrame()
    return pd.DataFrame(results).set_index("specification")

# =============================================================================
# Task 29, Step 2: Orchestrator
# =============================================================================

def run_alternative_specification_analysis_framework(
    initial_correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any],
    final_pipeline_outputs: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates a suite of robustness analyses using alternative model specifications.

    This function serves as the entry point for Step 2 of Task 29. It executes
    and aggregates results from analyses that modify the core methodology of
    the model, such as the inconsistency removal strategy and the inclusion of
    expert knowledge.

    Args:
        initial_correlation_matrix_df (pd.DataFrame): The original, inconsistent
            correlation matrix.
        master_input_specification (Dict[str, Any]): The base configuration dict.
        final_pipeline_outputs (Dict[str, Any]): A dictionary containing the final
            artifacts from a baseline run of the main pipeline.

    Returns:
        Dict[str, Any]: A comprehensive report containing the results of each
                        alternative specification analysis as a pandas DataFrame.
    """
    final_report = {
        "task_name": "Task 29, Step 2: Alternative Specification Robustness Assessment",
        "overall_status": "SUCCESS",
        "analysis_results": {}
    }
    print("--- Starting Alternative Specification Analysis ---")

    try:
        # --- 1. Alternative Inconsistency Removal Strategies ---
        print("Running Alternative Inconsistency Removal Analysis...")
        removal_results = analyze_alternative_inconsistency_removal_strategies(
            initial_correlation_matrix_df, master_input_specification
        )
        final_report["analysis_results"]["inconsistency_removal_robustness"] = removal_results
        print("...Inconsistency Removal analysis complete.")

        # --- 2. Alternative Expert Knowledge Specifications ---
        print("Running Alternative Expert Knowledge Analysis...")
        integrated_vars = final_pipeline_outputs["task_13"]["outputs"]["integrated_variables"]
        base_integrated_constraints = final_pipeline_outputs["task_13"]["outputs"]["integrated_constraints"]

        expert_knowledge_results = analyze_alternative_expert_knowledge(
            integrated_variables=integrated_vars,
            base_integrated_constraints=base_integrated_constraints
        )
        final_report["analysis_results"]["expert_knowledge_robustness"] = expert_knowledge_results
        print("...Expert Knowledge analysis complete.")

        final_report["summary_message"] = "All alternative specification analyses completed successfully."

    except Exception as e:
        final_report["overall_status"] = "FAILURE"
        final_report["error_message"] = f"A critical error occurred during alternative specification analysis: {e}"

    print("--- Alternative Specification Analysis Finished ---")
    return final_report


# =============================================================================
# Task 29: Master Orchestrator
# =============================================================================

def run_comprehensive_robustness_analysis_framework(
    base_correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any],
    final_pipeline_outputs: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Orchestrates the complete, end-to-end robustness and sensitivity analysis.

    This master function serves as the single entry point for the entire Task 29.
    It executes the two major suites of robustness checks sequentially:
    1.  **Parameter Sensitivity Analysis (Step 1)**: Tests the model's stability
        against small changes in its numerical inputs and configuration parameters.
    2.  **Alternative Specification Analysis (Step 2)**: Tests the model's
        stability against changes in its core methodological assumptions.

    WARNING: This is a computationally intensive process that may take a very
    long time to complete, as it involves re-running the core of the modeling
    pipeline dozens or hundreds of times.

    Args:
        base_correlation_matrix_df (pd.DataFrame): The original, validated
            correlation matrix that serves as the baseline for perturbations.
        master_input_specification (Dict[str, Any]): The base configuration dict.
        final_pipeline_outputs (Dict[str, Any]): A dictionary containing the final
            artifacts from a baseline run of the main pipeline, required for
            some of the analysis functions.

    Returns:
        Dict[str, Any]: A comprehensive, top-level report containing the
                        detailed results from both major analysis suites.
    """
    # Initialize the final, top-level report for the entire task.
    final_report = {
        "task_name": "Task 29: Comprehensive Robustness Analysis Framework",
        "overall_status": "SUCCESS",
        "outputs": {}
    }

    # Wrap the entire process in a try-except block to handle any critical failures.
    try:
        # --- Execute Step 1: Parameter Sensitivity Analysis ---
        # This function runs a suite of tests on correlation values, thresholds,
        # solver parameters, and expert knowledge confidence.
        parameter_sensitivity_report = run_parameter_sensitivity_analysis_framework(
            base_correlation_matrix_df=base_correlation_matrix_df,
            master_input_specification=master_input_specification,
            final_pipeline_outputs=final_pipeline_outputs
        )

        # Store the complete report from Step 1 in the final output.
        final_report["outputs"]["parameter_sensitivity_report"] = parameter_sensitivity_report

        # If the first suite of analyses failed, update the status but continue
        # to the next step if possible, to provide as much information as we can.
        if parameter_sensitivity_report["overall_status"] == "FAILURE":
            final_report["overall_status"] = "PARTIAL_FAILURE"


        # --- Execute Step 2: Alternative Specification Analysis ---
        # This function runs a suite of tests on the core methodology, such as
        # the inconsistency removal strategy and the inclusion of expert constraints.
        alternative_spec_report = run_alternative_specification_analysis_framework(
            initial_correlation_matrix_df=base_correlation_matrix_df,
            master_input_specification=master_input_specification,
            final_pipeline_outputs=final_pipeline_outputs
        )

        # Store the complete report from Step 2 in the final output.
        final_report["outputs"]["alternative_specification_report"] = alternative_spec_report

        # If this second suite also failed, update the overall status.
        if alternative_spec_report["overall_status"] == "FAILURE":
            final_report["overall_status"] = "PARTIAL_FAILURE" if final_report["overall_status"] == "SUCCESS" else "COMPLETE_FAILURE"

        # --- Finalize Summary Message ---
        if final_report["overall_status"] == "SUCCESS":
            final_report["summary_message"] = "All robustness and sensitivity analyses completed successfully."
        else:
            final_report["summary_message"] = "One or more robustness analysis suites encountered errors. Please review the detailed reports."

    except Exception as e:
        # Catch any unexpected, critical error during orchestration.
        final_report["overall_status"] = "CRITICAL_FAILURE"
        final_report["error_message"] = f"The robustness analysis framework failed critically: {e}"

    # Return the final, aggregated report.
    return final_report


In [None]:
# Master Orchestrator

# =============================================================================
# Final Top-Level Orchestrator
# =============================================================================

def execute_full_project_pipeline(
    raw_df: pd.DataFrame,
    correlation_matrix_df: pd.DataFrame,
    master_input_specification: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Executes the entire end-to-end research pipeline and robustness analysis.

    This top-level function serves as the single entry point for the complete
    replication and validation of the research from "Information-Nonintensive
    Models of Rumour Impacts on Complex Investment Decisions". It orchestrates
    a two-phase process:

    1.  **Baseline Pipeline Execution**: It runs the complete, end-to-end model
        generation and analysis pipeline to replicate the paper's core findings.
    2.  **Comprehensive Robustness Analysis**: If the baseline run is successful,
        it proceeds to execute an extensive suite of sensitivity and robustness
        analyses to test the stability of the model's conclusions.

    Args:
        raw_df (pd.DataFrame): The raw input DataFrame of financial data.
        correlation_matrix_df (pd.DataFrame): The provided correlation matrix
            for the CIM variables.
        master_input_specification (Dict[str, Any]): The main configuration
            dictionary that governs the entire pipeline's behavior.

    Returns:
        Dict[str, Any]: A master report containing the detailed, auditable
                        reports from both the baseline pipeline run and the
                        comprehensive robustness analysis.
    """
    # Initialize the master report for the entire project.
    master_report = {
        "project_name": "Replication of Information-Nonintensive Models of Rumour Impacts",
        "overall_status": "PENDING",
        "baseline_pipeline_report": None,
        "robustness_analysis_report": None
    }

    # Wrap the entire execution in a try-except block for maximum safety.
    try:
        # --- i) Execute the main end-to-end research pipeline ---
        print("--- [PHASE 1] STARTING: End-to-End Baseline Pipeline Execution ---")
        baseline_report = run_qualitative_grapevine_model_pipeline(
            raw_df=raw_df,
            correlation_matrix_df=correlation_matrix_df,
            master_input_specification=master_input_specification
        )
        # Store the complete report from the baseline run.
        master_report["baseline_pipeline_report"] = baseline_report
        print("--- [PHASE 1] FINISHED: End-to-End Baseline Pipeline Execution ---")

        # --- Check the status of the baseline run before proceeding ---
        # The robustness analysis is only meaningful if the baseline model was
        # successfully generated and validated.
        if baseline_report.get("pipeline_status") != "SUCCESS":
            # If the baseline failed, we stop here.
            master_report["overall_status"] = "FAILURE_IN_BASELINE"
            master_report["summary_message"] = "Baseline pipeline failed to complete successfully. Robustness analysis was not performed."
            return master_report

        # --- ii) Accurately unpack the results for the next step ---
        # If the baseline was successful, unpack the necessary artifacts for the
        # robustness analysis, using the safe nested getter.
        print("\n--- [PHASE 2] STARTING: Comprehensive Robustness Analysis ---")
        base_corr_matrix = _get_nested_param(
            baseline_report,
            'task_reports.task_05_corr_matrix_preprocessing.outputs.final_correlation_matrix'
        )
        # The robustness analysis needs the final outputs of the baseline run.
        # We can pass the entire baseline report's task_reports section.
        final_pipeline_outputs = baseline_report["task_reports"]

        # --- iii) Accurately run the comprehensive robustness analysis framework ---
        robustness_report = run_comprehensive_robustness_analysis_framework(
            base_correlation_matrix_df=base_corr_matrix,
            master_input_specification=master_input_specification,
            final_pipeline_outputs=final_pipeline_outputs
        )
        # Store the complete report from the robustness analysis.
        master_report["robustness_analysis_report"] = robustness_report
        print("--- [PHASE 2] FINISHED: Comprehensive Robustness Analysis ---")

        # --- iv) Return the final combined results ---
        # Determine the final overall status.
        if robustness_report.get("overall_status") == "SUCCESS":
            master_report["overall_status"] = "SUCCESS"
            master_report["summary_message"] = "Baseline pipeline and all robustness analyses completed successfully."
        else:
            master_report["overall_status"] = "SUCCESS_WITH_ROBUSTNESS_WARNINGS"
            master_report["summary_message"] = "Baseline pipeline succeeded, but one or more robustness analyses encountered issues."

    except Exception as e:
        # Catch any unexpected critical failure in the top-level orchestration.
        master_report["overall_status"] = "CRITICAL_FAILURE"
        master_report["summary_message"] = f"The top-level orchestrator failed critically: {e}"

    # Return the final, comprehensive project report.
    return master_report
