# Synthetic Demo: yp-diagnostic

**This is a synthetic illustration only.**

This notebook demonstrates the `yp-diagnostic` package using a synthetic system with a known capacity threshold. It does **not** validate real-world applicability.

---

## Important Disclaimers

> **This is a synthetic illustration.** The system constructed here is artificial and designed solely to demonstrate how the diagnostic coordinate behaves under controlled conditions.

> **This does not validate real-world applicability.** Applying y(p) to any real system requires independent, domain-specific validation that is beyond the scope of this package.

> **This demonstrates regime organization, not prediction.** The diagnostic coordinate y(p) organizes observations by their proximity to x = 1. It does not predict outcomes, model mechanisms, or establish causation.

---

## Scope Reminder

**What this notebook shows:**
- How to compute the diagnostic coordinate y(p) from explicit p1 and p2 values
- How to generate collapse plots and negative controls
- How to propagate uncertainty

**What this notebook does NOT show:**
- Prediction of system failure
- Detection of phase transitions
- Validation of any theoretical claims
- Evidence for universality or mechanism

The diagnostic coordinate y(p) is a **descriptor**, not a predictor.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Import from yp_diagnostic
import sys
sys.path.insert(0, '../src')

from yp_diagnostic import (
    compute_x_y,
    bootstrap_ci,
    delta_method_ci,
    collapse_test,
    negative_control_plot,
    sensitivity_check,
)

# For reproducibility
np.random.seed(42)

---

## 1. Synthetic System: Finite Server with Absorbing Failure

We construct a **finite synthetic system** where:
- **p₁** = server capacity (maximum requests per second the server can handle)
- **p₂** = current load (requests per second being sent to server)
- **Failure** = response time exceeds threshold (absorbing state: once failed, system requires intervention)

### Key Properties of This Synthetic System

1. **Finite**: The system has a defined capacity limit (p₁)
2. **Explicit p₁ and p₂**: Both values are constructed directly, not inferred
3. **Absorbing failure**: When p₂ approaches p₁, failure becomes increasingly likely (synthetic definition)

This is a **constructed example** where we explicitly define p₁ and p₂. No inference is performed. The "failure" is a synthetic construct for illustration purposes only.

In [None]:
# Define the synthetic system
# p1: Fixed capacity
# p2: Load values ranging from low to near-capacity

p1_capacity = 1000.0  # requests per second

# Create load values from 10% to 99% of capacity
load_fractions = np.linspace(0.1, 0.99, 50)
p2_loads = p1_capacity * load_fractions

# p1 is constant across all observations
p1_values = np.full_like(p2_loads, p1_capacity)

print(f"Capacity (p1): {p1_capacity} req/sec")
print(f"Load range (p2): {p2_loads.min():.1f} to {p2_loads.max():.1f} req/sec")
print(f"Load fraction range: {load_fractions.min():.2f} to {load_fractions.max():.2f}")

---

## 2. Computing the Diagnostic Coordinate

We compute x = p2/p1 and y = (1-x)^(-1/2) using the `compute_x_y` function.

**Note:** Required metadata is provided to enforce semantic guardrails.

In [None]:
import warnings

# Compute x and y with required metadata
# Note: Warnings are expected for high x values
with warnings.catch_warnings(record=True) as w:
    warnings.simplefilter("always")
    x_values, y_values = compute_x_y(
        p1=p1_values,
        p2=p2_loads,
        p1_name="server_capacity_req_per_sec",
        p2_name="current_load_req_per_sec",
        failure_definition="response_time_exceeds_100ms"
    )
    
    # Display any warnings
    for warning in w:
        print(f"Warning: {warning.message}")

print(f"\nx range: {x_values.min():.3f} to {x_values.max():.3f}")
print(f"y range: {y_values.min():.3f} to {y_values.max():.3f}")

In [None]:
# Plot x vs y
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Left: x vs load fraction
axes[0].plot(load_fractions, x_values, 'b-', linewidth=2)
axes[0].axhline(y=0.9, color='orange', linestyle='--', label='x = 0.9 (warning threshold)')
axes[0].set_xlabel('Load Fraction (p2/p1)')
axes[0].set_ylabel('x = p2/p1')
axes[0].set_title('x as a function of load')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Right: y vs x
axes[1].plot(x_values, y_values, 'r-', linewidth=2)
axes[1].axvline(x=0.9, color='orange', linestyle='--', label='x = 0.9')
axes[1].set_xlabel('x = p2/p1')
axes[1].set_ylabel('y = (1-x)^(-1/2)')
axes[1].set_title('Diagnostic Coordinate y(x)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## Interpretation: What This Plot Shows

The plot above shows:
- **y increases** as x approaches 1
- The increase is **nonlinear**: small changes in x near 1 produce large changes in y
- This **flags** a regime where the ratio p2/p1 is near unity

### What This Does NOT Show

- **No prediction**: y does not predict when or if the system will fail
- **No mechanism**: y does not explain why the system behaves this way
- **No causation**: high y does not cause system degradation
- **No universality**: this behavior is specific to the (1-x)^(-1/2) function, not a general law

---

## 3. Collapse Plot: Multiple Systems

We compare two systems with different capacities to see if their y(x) curves overlap when plotted against x.

In [None]:
# System A: capacity = 1000
p1_A = np.full(20, 1000.0)
p2_A = 1000.0 * np.linspace(0.1, 0.95, 20)

# System B: capacity = 500 (different absolute scale)
p1_B = np.full(20, 500.0)
p2_B = 500.0 * np.linspace(0.1, 0.95, 20)

# Use collapse_test to compute coordinates
collapse_result = collapse_test(
    p1_arrays=[p1_A, p1_B],
    p2_arrays=[p2_A, p2_B],
    labels=['System A (capacity=1000)', 'System B (capacity=500)']
)

print(f"Number of datasets: {collapse_result['n_datasets']}")
for ds in collapse_result['datasets']:
    print(f"  {ds['label']}: {ds['stats']['n_points']} points, "
          f"x_range=[{ds['stats']['x_min']:.2f}, {ds['stats']['x_max']:.2f}]")

In [None]:
# Plot collapse
fig, ax = plt.subplots(figsize=(8, 6))

colors = ['blue', 'red']
for i, ds in enumerate(collapse_result['datasets']):
    ax.scatter(ds['x'], ds['y'], c=colors[i], label=ds['label'], alpha=0.7, s=50)

ax.set_xlabel('x = p2/p1')
ax.set_ylabel('y = (1-x)^(-1/2)')
ax.set_title('Collapse Plot: Two Systems with Different Capacities')
ax.legend()
ax.grid(True, alpha=0.3)

plt.show()

### Interpretation of Collapse Plot

Both systems overlap on the y(x) curve because y is computed from the **ratio** x = p2/p1, not from absolute values.

**What this shows:** Systems with different absolute capacities produce the same y for the same x.

**What this does NOT show:**
- This is a property of the transformation, not a discovery about the systems
- No claim about "collapse" implying universality or shared mechanism
- The overlap is definitional, not empirical

---

## 4. Negative Controls

We generate shuffled versions of p2 to compare against the real data. This helps identify whether observed structure differs from random baseline.

In [None]:
# Generate negative controls
nc_result = negative_control_plot(
    p1=p1_values,
    p2_real=p2_loads,
    n_shuffles=50,
    random_state=42
)

print(f"Real data: {len(nc_result['real']['x'])} points")
print(f"Shuffled controls: {nc_result['n_shuffles']} replicates")

In [None]:
# Plot negative controls
fig, ax = plt.subplots(figsize=(10, 6))

# Plot shuffled controls (gray, low alpha)
for shuf in nc_result['shuffled']:
    ax.scatter(shuf['x'], shuf['y'], c='gray', alpha=0.1, s=10)

# Plot real data (red, prominent)
ax.scatter(nc_result['real']['x'], nc_result['real']['y'], 
           c='red', alpha=0.8, s=30, label='Real data')

ax.set_xlabel('x = p2/p1')
ax.set_ylabel('y = (1-x)^(-1/2)')
ax.set_title('Negative Controls: Real Data vs. Shuffled')
ax.legend()
ax.grid(True, alpha=0.3)

plt.show()

### Interpretation of Negative Controls

In this synthetic example, shuffling breaks the relationship between p1 and p2 at each observation.

**What this shows:** The structure in the real data (monotonic increase) differs from shuffled data.

**What this does NOT show:**
- No statistical test is performed (no p-values)
- No claim about significance or meaningfulness
- Shuffling is a diagnostic, not a hypothesis test

---

## 5. Uncertainty Propagation

We propagate measurement uncertainty from p1 and p2 into y using both bootstrap and delta method.

In [None]:
# Single point estimate with uncertainty
p1_point = 1000.0
p2_point = 850.0
p1_std = 50.0  # 5% uncertainty
p2_std = 30.0  # ~3.5% uncertainty

print(f"Point estimates: p1 = {p1_point} ± {p1_std}, p2 = {p2_point} ± {p2_std}")

# Bootstrap CI
boot_result = bootstrap_ci(
    p1=p1_point, p2=p2_point,
    p1_std=p1_std, p2_std=p2_std,
    n_boot=2000,
    random_state=42
)

print(f"\nBootstrap (n={boot_result['n_boot']}):")
print(f"  x = {boot_result['x_mean']:.4f}, 95% CI: ({boot_result['x_ci'][0]:.4f}, {boot_result['x_ci'][1]:.4f})")
print(f"  y = {boot_result['y_mean']:.4f}, 95% CI: ({boot_result['y_ci'][0]:.4f}, {boot_result['y_ci'][1]:.4f})")

In [None]:
# Delta method CI
delta_result = delta_method_ci(
    p1=p1_point, p2=p2_point,
    p1_std=p1_std, p2_std=p2_std
)

print("Delta Method:")
print(f"  x = {delta_result['x']:.4f} ± {delta_result['x_std']:.4f}")
print(f"  y = {delta_result['y']:.4f} ± {delta_result['y_std']:.4f}")
print(f"  x 95% CI: ({delta_result['x_ci'][0]:.4f}, {delta_result['x_ci'][1]:.4f})")
print(f"  y 95% CI: ({delta_result['y_ci'][0]:.4f}, {delta_result['y_ci'][1]:.4f})")

### Interpretation of Uncertainty

**What this shows:** Measurement uncertainty in p1 and p2 propagates to uncertainty in y.

**What this does NOT show:**
- These are confidence intervals for the **diagnostic coordinate**, not for system behavior
- No claim about prediction intervals or future outcomes
- Assumes specific distributional properties (see function docstrings)

---

## 6. Sensitivity Analysis

We examine how sensitive y is to perturbations in p1 and p2.

In [None]:
# Sensitivity check
sens_result = sensitivity_check(
    p1=1000.0,
    p2=850.0,
    perturbation_fractions=[0.01, 0.05, 0.10]
)

print(f"Baseline: x = {sens_result['baseline']['x']:.4f}, y = {sens_result['baseline']['y']:.4f}")

print("\nSensitivity to p1 increase (capacity increase):")
for entry in sens_result['p1_sensitivity']:
    print(f"  +{entry['perturbation']*100:.0f}%: x = {entry['x_new']:.4f}, "
          f"y = {entry['y_new']:.4f}, y change = {entry['y_change_pct']:.2f}%")

print("\nSensitivity to p2 increase (load increase):")
for entry in sens_result['p2_sensitivity']:
    print(f"  +{entry['perturbation']*100:.0f}%: x = {entry['x_new']:.4f}, "
          f"y = {entry['y_new']:.4f}, y change = {entry['y_change_pct']:.2f}%")

### Interpretation of Sensitivity

**What this shows:** 
- Increasing p1 (capacity) decreases y
- Increasing p2 (load) increases y
- The sensitivity is higher when x is close to 1

**What this does NOT show:**
- No guidance on acceptable sensitivity thresholds
- No validation of measurement requirements
- Sensitivity is reported, not judged

---

## 7. Validation Protocol: Outcome vs Different Variables

To understand what y(p) captures, we examine a synthetic "outcome" (failure probability) against different variables. This is a **synthetic illustration** of the validation protocol—it does not validate real-world applicability.

### Constructing a Synthetic Outcome

We define a synthetic failure probability that increases as load approaches capacity:

In [None]:
# Construct a synthetic "outcome" (failure probability)
# This is SYNTHETIC - defined by us, not observed from a real system

# Synthetic failure probability: increases sharply as x approaches 1
# This is a CONSTRUCTED relationship for illustration only
def synthetic_failure_prob(x):
    """Synthetic failure probability. NOT a model of any real system."""
    # Logistic-like increase near x=1
    return 1 / (1 + np.exp(-20 * (x - 0.85)))

# Compute synthetic outcomes
synthetic_outcome = synthetic_failure_prob(x_values)

print("Synthetic outcome constructed (for illustration only)")
print(f"Outcome range: {synthetic_outcome.min():.4f} to {synthetic_outcome.max():.4f}")

In [None]:
# Plot outcome vs different variables
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Panel A: Outcome vs p1 (NEGATIVE CONTROL - should show no relationship)
# Since p1 is constant in our synthetic data, we create varied p1 for illustration
p1_varied = np.linspace(800, 1200, 50)
p2_fixed = 850.0  # Fixed load
x_for_p1_varied = p2_fixed / p1_varied
outcome_vs_p1 = synthetic_failure_prob(x_for_p1_varied)

axes[0, 0].scatter(p1_varied, outcome_vs_p1, c='gray', alpha=0.7)
axes[0, 0].set_xlabel('p₁ (capacity)')
axes[0, 0].set_ylabel('Synthetic Outcome')
axes[0, 0].set_title('A: Outcome vs p₁\n(Negative Control: varies with p₁ only through x)')
axes[0, 0].grid(True, alpha=0.3)

# Panel B: Outcome vs p2 (NEGATIVE CONTROL - relationship mediated by x)
axes[0, 1].scatter(p2_loads, synthetic_outcome, c='gray', alpha=0.7)
axes[0, 1].set_xlabel('p₂ (load)')
axes[0, 1].set_ylabel('Synthetic Outcome')
axes[0, 1].set_title('B: Outcome vs p₂\n(Negative Control: relationship mediated by x)')
axes[0, 1].grid(True, alpha=0.3)

# Panel C: Outcome vs x
axes[1, 0].scatter(x_values, synthetic_outcome, c='blue', alpha=0.7)
axes[1, 0].set_xlabel('x = p₂/p₁')
axes[1, 0].set_ylabel('Synthetic Outcome')
axes[1, 0].set_title('C: Outcome vs x\n(Direct relationship with ratio)')
axes[1, 0].axvline(x=0.85, color='orange', linestyle='--', alpha=0.5)
axes[1, 0].grid(True, alpha=0.3)

# Panel D: Outcome vs y
axes[1, 1].scatter(y_values, synthetic_outcome, c='red', alpha=0.7)
axes[1, 1].set_xlabel('y = (1-x)^(-1/2)')
axes[1, 1].set_ylabel('Synthetic Outcome')
axes[1, 1].set_title('D: Outcome vs y\n(Diagnostic coordinate)')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.suptitle('Validation Protocol: Synthetic Illustration Only', y=1.02, fontsize=12, fontweight='bold')
plt.show()

### Interpretation of Validation Protocol Plots

**Panel A (Outcome vs p₁):** This is a negative control. The outcome varies with p₁ only because p₁ affects x. There is no direct relationship between p₁ alone and the outcome—the relationship is mediated entirely through x.

**Panel B (Outcome vs p₂):** This is also a negative control. The outcome varies with p₂ only because p₂ affects x. In isolation, neither p₁ nor p₂ captures the diagnostic information that x provides.

**Panel C (Outcome vs x):** The ratio x = p₂/p₁ shows a direct relationship with the synthetic outcome. This illustrates that x is the relevant quantity for this constructed system.

**Panel D (Outcome vs y):** The diagnostic coordinate y transforms x such that the approach to x = 1 is emphasized. The relationship is nonlinear but monotonic.

---

### Critical Reminder

> **This is a synthetic illustration.** The relationship between outcome and y was constructed by us, not discovered in data. Real-world validation requires independently collected outcomes and domain-specific analysis.

> **This demonstrates regime organization, not prediction.** The plots show how y organizes synthetic observations. They do not demonstrate that y predicts real outcomes.

---

## Summary

**This notebook is a synthetic illustration only.** It does not validate real-world applicability.

### What This Notebook Demonstrated

1. **Finite system with absorbing failure**: A synthetic server with explicit capacity (p₁) and load (p₂)
2. **Explicit p₁ and p₂ construction**: Values were defined directly, not inferred
3. **Diagnostic coordinate computation**: y(p) = (1 - p₂/p₁)^(-1/2)
4. **Validation protocol plots**: Outcome vs p₁, p₂, x, and y (all synthetic)
5. **Negative controls and sensitivity analysis**

### Key Takeaways

| Shows | Does NOT Show |
|-------|---------------|
| y increases as x → 1 | Prediction of failure |
| Regime organization by x | Mechanism or causation |
| Sensitivity to perturbations | Universality claims |
| Uncertainty propagation | Validation of real systems |

### Final Reminder

> **This is a synthetic illustration.** The diagnostic coordinate y(p) organizes observations by their proximity to x = 1. It does not predict outcomes, model mechanisms, or establish causation.

> **This does not validate real-world applicability.** Any application to real systems requires independent, domain-specific validation that is beyond the scope of this package.

> **This demonstrates regime organization, not prediction.** The y(p) coordinate is a descriptor, not a predictor.

For further details, see `docs/scope.md` and `docs/how_not_to_use.md`.