# Data Reconciliation → Parameter Estimation Workflow

This notebook demonstrates the complete plant-data-to-model-tuning loop using NeqSim:

| Step | What | NeqSim Class |
|------|------|--------------|
| 1 | Collect plant measurements | Python / DCS |
| 2 | Reconcile measurements (close mass/energy balances) | `DataReconciliationEngine` |
| 3 | Detect and remove gross errors | `reconcileWithGrossErrorElimination()` |
| 4 | Feed reconciled values into parameter estimation | `BatchParameterEstimator` |
| 5 | Solve for best-fit model parameters (L-M) | `BatchParameterEstimator.solve()` |
| 6 | Update process model with tuned parameters | Apply `BatchResult` |
| 7 | Repeat at next time interval | Loop |

## Key Java Classes

- **`DataReconciliationEngine`** — Weighted Least Squares reconciliation with gross error detection
- **`BatchParameterEstimator`** — Levenberg-Marquardt batch fitting of model parameters
- **`EnKFParameterEstimator`** — Online Ensemble Kalman Filter for live streaming data
- **`SteadyStateDetector`** — Monitors process variables to confirm steady state before reconciliation

In [None]:
# Cell 1 — Start JVM with latest NeqSim build
from neqsim_dev_setup import neqsim_init
ns = neqsim_init(project_root=r"C:\Users\ESOL\Documents\GitHub\neqsim2", recompile=False)

In [None]:
# Cell 2 — Import all needed Java classes via jpype
import jpype

# Thermo
SystemSrkEos = jpype.JClass("neqsim.thermo.system.SystemSrkEos")

# Process equipment
ProcessSystem = jpype.JClass("neqsim.process.processmodel.ProcessSystem")
Stream = jpype.JClass("neqsim.process.equipment.stream.Stream")
Compressor = jpype.JClass("neqsim.process.equipment.compressor.Compressor")
Heater = jpype.JClass("neqsim.process.equipment.heatexchanger.Heater")
Separator = jpype.JClass("neqsim.process.equipment.separator.Separator")
Mixer = jpype.JClass("neqsim.process.equipment.mixer.Mixer")

# Data Reconciliation
DataReconciliationEngine = jpype.JClass("neqsim.process.util.reconciliation.DataReconciliationEngine")

# Parameter Estimation
BatchParameterEstimator = jpype.JClass("neqsim.process.calibration.BatchParameterEstimator")

# Java collections
HashMap = jpype.JClass("java.util.HashMap")

print("All classes loaded.")

## Part 1: Build a Process Model

We create a simple gas compression process:
- Feed gas at 15°C, 30 bara
- Compressor with unknown polytropic efficiency (the parameter we want to estimate)
- Discharge at 80 bara

In [None]:
# Cell 3 — Build the process model
fluid = SystemSrkEos(288.15, 30.0)  # 15°C, 30 bara
fluid.addComponent("methane", 0.85)
fluid.addComponent("ethane", 0.10)
fluid.addComponent("propane", 0.05)
fluid.setMixingRule("classic")

feed = Stream("feed", fluid)
feed.setFlowRate(10000.0, "kg/hr")
feed.setTemperature(15.0, "C")
feed.setPressure(30.0, "bara")

comp = Compressor("comp", feed)
comp.setOutletPressure(80.0)  # bara
comp.setPolytropicEfficiency(0.75)  # initial guess
comp.setUsePolytropicCalc(True)

process = ProcessSystem()
process.add(feed)
process.add(comp)
process.run()

print(f"Outlet temperature: {comp.getOutletStream().getTemperature() - 273.15:.1f} °C")
print(f"Power: {comp.getPower('kW'):.1f} kW")

## Part 2: Simulate "Plant Measurements"

In a real application, these come from the DCS/historian. Here we generate synthetic data
at a known true efficiency of 0.78, then add noise to simulate measurement uncertainty.

In [None]:
# Cell 4 — Generate synthetic plant data
import random
random.seed(42)

TRUE_EFFICIENCY = 0.78  # The "real" plant efficiency we want to recover
MEASUREMENT_NOISE_K = 0.3  # ±0.3 K measurement noise

# Simulate measurements at different discharge pressures
discharge_pressures = [50.0, 60.0, 70.0, 80.0, 90.0, 100.0]
plant_measurements = []  # list of (pressure, measured_temp_K)

for p_out in discharge_pressures:
    comp.setOutletPressure(p_out)
    comp.setPolytropicEfficiency(TRUE_EFFICIENCY)
    process.run()
    
    true_temp = float(comp.getOutletStream().getTemperature())
    noisy_temp = true_temp + random.gauss(0, MEASUREMENT_NOISE_K)
    plant_measurements.append((p_out, noisy_temp))
    print(f"  P_out={p_out:.0f} bara → T_true={true_temp-273.15:.2f}°C, T_meas={noisy_temp-273.15:.2f}°C")

print(f"\nGenerated {len(plant_measurements)} plant data points")

## Part 3: Data Reconciliation

Before feeding measurements into parameter estimation, we reconcile them to close
mass/energy balance constraints. The `DataReconciliationEngine` uses Weighted Least
Squares (WLS) and can detect gross errors (bad sensors).

For this single-stream example we demonstrate the API — in a real multi-stream
network the reconciliation would adjust all flow rates to close the mass balance.

In [None]:
# Cell 5 — Data Reconciliation (demonstrating the API)
#
# In a real plant with multiple streams, you would reconcile flow rates 
# around a mixer or splitter. Here we show the pattern:

recon = DataReconciliationEngine()

# Add measured flow variables (e.g., three streams that should sum to zero
# at a mixer: flow_in1 + flow_in2 - flow_out = 0)
recon.addVariable("flow_in1", 5000.0, 100.0)   # measured value, uncertainty (kg/hr)
recon.addVariable("flow_in2", 5100.0, 100.0)
recon.addVariable("flow_out", 10200.0, 150.0)   # slightly biased

# Mass balance constraint: flow_in1 + flow_in2 - flow_out = 0
# Constraint matrix row: [1.0, 1.0, -1.0]
recon.addConstraint([1.0, 1.0, -1.0])

result = recon.reconcile()

print("=== Data Reconciliation Results ===")
print(f"Converged: {result.isConverged()}")
print(f"Chi-square: {result.getChiSquareValue():.4f}")
print(f"Global test passed: {result.isGlobalTestPassed()}")
print()

# Show reconciled values  
for i in range(recon.getVariableCount()):
    v = recon.getVariable(i)
    print(f"  {v.getName()}: measured={v.getMeasuredValue():.1f}, "
          f"reconciled={v.getReconciledValue():.1f}, "
          f"adjustment={v.getAdjustment():.2f}")

## Part 4: Gross Error Detection

The reconciliation engine can identify sensors with gross errors using the
normalized residual test. Sensors exceeding the z-threshold (default 1.96 for 95%)
are flagged and can be iteratively removed.

In [None]:
# Cell 6 — Gross error detection demo
recon2 = DataReconciliationEngine()

# Same flows but with a gross error on flow_out (sensor reads 12000 instead of ~10000)
recon2.addVariable("flow_in1", 5000.0, 100.0)
recon2.addVariable("flow_in2", 5100.0, 100.0)
recon2.addVariable("flow_out_bad", 12000.0, 150.0)  # gross error!

recon2.addConstraint([1.0, 1.0, -1.0])

result2 = recon2.reconcileWithGrossErrorElimination()

print("=== Gross Error Detection ===")
print(f"Global test passed: {result2.isGlobalTestPassed()}")

for i in range(recon2.getVariableCount()):
    v = recon2.getVariable(i)
    flag = " *** GROSS ERROR" if v.isGrossError() else ""
    print(f"  {v.getName()}: measured={v.getMeasuredValue():.1f}, "
          f"reconciled={v.getReconciledValue():.1f}{flag}")

## Part 5: Parameter Estimation with `BatchParameterEstimator`

Now we use the reconciled / cleaned measurements to estimate model parameters.
The `BatchParameterEstimator` uses Levenberg-Marquardt optimization to find the
parameter values that minimize the weighted sum of squared differences between
model predictions and plant measurements.

### How it works internally:

1. **Property paths** like `"comp.polytropicEfficiency"` use reflection to
   call `processSystem.getUnit("comp").setPolytropicEfficiency(value)`
2. **Conditions** (e.g., `"comp.outletPressure"`) are applied before each simulation
3. **Measurements** (e.g., `"comp.outletStream.temperature"`) are read after simulation
4. The L-M optimizer iterates: perturb params → run simulation → compute residuals → update

### Important: Property path conventions

| Path | Resolves to |
|------|------------|
| `"comp.polytropicEfficiency"` | `comp.setPolytropicEfficiency(double)` / `comp.getPolytropicEfficiency()` |
| `"comp.outletPressure"` | `comp.setOutletPressure(double)` |
| `"comp.outletStream.temperature"` | `comp.getOutletStream().getTemperature()` |
| `"heater1.outletTemperature"` | `heater1.setOutletTemperature(double)` |

**Note:** Only single-argument `(double)` setters work for conditions. Two-argument
setters like `setFlowRate(double, String)` are not supported via reflection paths.

In [None]:
# Cell 7 — Set up and run BatchParameterEstimator

# Reset the efficiency to a wrong initial guess — this is what the estimator starts from
comp.setPolytropicEfficiency(0.65)  # deliberately wrong

estimator = BatchParameterEstimator(process)

# Define what to tune: path, unit, lower_bound, upper_bound, initial_guess
estimator.addTunableParameter(
    "comp.polytropicEfficiency",  # property path
    "",                            # unit (dimensionless)
    0.50,                          # lower bound
    0.95,                          # upper bound  
    0.65                           # initial guess (wrong on purpose)
)

# Define what to match: path, unit, measurement_std_dev
estimator.addMeasuredVariable(
    "comp.outletStream.temperature",  # measurement path
    "K",                               # unit
    MEASUREMENT_NOISE_K                 # measurement uncertainty (std dev)
)

# Add the plant data points
for p_out, t_meas in plant_measurements:
    conditions = HashMap()
    conditions.put("comp.outletPressure", jpype.JDouble(p_out))
    
    measurements = HashMap()
    measurements.put("comp.outletStream.temperature", jpype.JDouble(t_meas))
    
    estimator.addDataPoint(conditions, measurements)

estimator.setMaxIterations(100)

print(f"Configuration:")
print(f"  Parameters to tune: {list(estimator.getParameterNames())}")
print(f"  Measurements: {list(estimator.getMeasurementNames())}")
print(f"  Data points: {estimator.getDataPointCount()}")
print(f"  Initial guess: 0.65")
print(f"  True value: {TRUE_EFFICIENCY}")

In [None]:
# Cell 8 — Solve!
result = estimator.solve()

print("\n" + "="*70)
result.printSummary()
print("="*70)

estimated_eff = float(result.getEstimate(0))
uncertainty = float(result.getUncertainty(0))

print(f"\n>>> True efficiency:      {TRUE_EFFICIENCY:.4f}")
print(f">>> Estimated efficiency: {estimated_eff:.4f} ± {uncertainty:.4f}")
print(f">>> Error:                {abs(estimated_eff - TRUE_EFFICIENCY):.6f}")
print(f">>> R-squared:            {float(result.getRSquared()):.6f}")
print(f">>> Converged:            {result.isConverged()}")

## Part 6: Apply Tuned Parameters and Verify

After estimation, update the process model with the fitted parameter values
and verify the model now matches the plant data.

In [None]:
# Cell 9 — Apply tuned parameters and compare
comp.setPolytropicEfficiency(estimated_eff)

print("Model vs Plant comparison after tuning:")
print(f"{'P_out (bara)':>12} {'T_plant (°C)':>12} {'T_model (°C)':>12} {'Error (°C)':>12}")
print("-" * 52)

for p_out, t_meas in plant_measurements:
    comp.setOutletPressure(p_out)
    process.run()
    t_model = float(comp.getOutletStream().getTemperature())
    error = (t_meas - t_model)
    print(f"{p_out:12.0f} {t_meas - 273.15:12.2f} {t_model - 273.15:12.2f} {error:12.3f}")

## Part 7: The Complete Loop (Automated)

Here's a Python helper function that wraps the full workflow into a single call.
In production, you would call this periodically (e.g., every hour) as new plant data
arrives from the DCS/historian.

In [None]:
# Cell 10 — Reusable helper: reconcile → estimate → update

def reconcile_and_estimate(process, plant_data, tunable_params, measured_vars, 
                           recon_constraints=None, max_iter=100):
    """
    Full plant-data-to-model-tuning loop.
    
    Parameters
    ----------
    process : ProcessSystem
        The NeqSim process model to calibrate.
    plant_data : list of dict
        Each dict has 'conditions' (path->value) and 'measurements' (path->value).
    tunable_params : list of dict
        Each dict has 'path', 'unit', 'lower', 'upper', 'guess'.
    measured_vars : list of dict
        Each dict has 'path', 'unit', 'std_dev'.
    recon_constraints : optional reconciliation setup function
    max_iter : int
        Maximum L-M iterations.
        
    Returns
    -------
    dict with 'estimates', 'uncertainties', 'r_squared', 'converged', 'result'
    """
    # Step 1: Data Reconciliation (if constraints provided)
    reconciled_data = plant_data
    if recon_constraints:
        reconciled_data = recon_constraints(plant_data)
    
    # Step 2: Set up BatchParameterEstimator
    estimator = BatchParameterEstimator(process)
    
    for p in tunable_params:
        estimator.addTunableParameter(
            p['path'], p.get('unit', ''), 
            p['lower'], p['upper'], p['guess']
        )
    
    for m in measured_vars:
        estimator.addMeasuredVariable(m['path'], m.get('unit', ''), m['std_dev'])
    
    for dp in reconciled_data:
        conditions = HashMap()
        for k, v in dp.get('conditions', {}).items():
            conditions.put(k, jpype.JDouble(v))
        
        measurements = HashMap()
        for k, v in dp['measurements'].items():
            measurements.put(k, jpype.JDouble(v))
        
        estimator.addDataPoint(conditions, measurements)
    
    estimator.setMaxIterations(max_iter)
    
    # Step 3: Solve
    result = estimator.solve()
    
    # Step 4: Extract results
    estimates = {}
    uncertainties = {}
    for i, p in enumerate(tunable_params):
        estimates[p['path']] = float(result.getEstimate(i))
        uncertainties[p['path']] = float(result.getUncertainty(i))
    
    return {
        'estimates': estimates,
        'uncertainties': uncertainties,
        'r_squared': float(result.getRSquared()),
        'chi_square': float(result.getChiSquare()),
        'converged': bool(result.isConverged()),
        'result': result  # Java object for further inspection
    }

print("Helper function defined.")

In [None]:
# Cell 11 — Use the helper

# Format plant data as list of dicts
data_points = []
for p_out, t_meas in plant_measurements:
    data_points.append({
        'conditions': {'comp.outletPressure': p_out},
        'measurements': {'comp.outletStream.temperature': t_meas}
    })

# Reset efficiency to bad initial guess
comp.setPolytropicEfficiency(0.60)

# Run the full loop
output = reconcile_and_estimate(
    process=process,
    plant_data=data_points,
    tunable_params=[{
        'path': 'comp.polytropicEfficiency',
        'lower': 0.50, 'upper': 0.95, 'guess': 0.60
    }],
    measured_vars=[{
        'path': 'comp.outletStream.temperature',
        'unit': 'K',
        'std_dev': 0.3
    }]
)

print(f"Converged: {output['converged']}")
print(f"Estimates: {output['estimates']}")
print(f"Uncertainties: {output['uncertainties']}")
print(f"R²: {output['r_squared']:.6f}")
print(f"\nTrue value: {TRUE_EFFICIENCY}, Estimated: {output['estimates']['comp.polytropicEfficiency']:.6f}")

## Summary

### What works today

| Capability | Class | Status |
|-----------|-------|--------|
| WLS data reconciliation | `DataReconciliationEngine` | ✅ Working |
| Gross error detection & elimination | `reconcileWithGrossErrorElimination()` | ✅ Working |
| Steady-state detection | `SteadyStateDetector` | ✅ Working |
| Batch parameter estimation (L-M) | `BatchParameterEstimator` | ✅ Working (verified E2E) |
| Online estimation (EnKF) | `EnKFParameterEstimator` | ✅ Working (≤2 measurements) |
| Result statistics | `BatchResult` | ✅ chi², R², CI, covariance |

### Property path reference

| Equipment | Tunable Parameter Path | Measurable Output Path |
|-----------|----------------------|----------------------|
| Compressor | `comp.polytropicEfficiency` | `comp.outletStream.temperature` |
| Compressor | `comp.isentropicEfficiency` | `comp.outletStream.pressure` |
| Heater | `heater.outletTemperature` | `heater.outletStream.temperature` |
| Compressor | `comp.outletPressure` (condition) | `comp.power` |

### Limitations

1. **Two-arg setters** (`setFlowRate(double, String)`) not supported as condition paths
2. **EnKF** limited to ≤2 measurements (matrix inversion hardcoded for 1×1 / 2×2)
3. **No built-in reconciliation→estimation pipeline** — use the Python helper above

### Next steps

- Wrap the helper in a scheduled loop for periodic re-estimation
- Connect to DCS/historian for live data
- Use `EnKFParameterEstimator` for online tracking between batch updates
- Add more measurement points (pressure drop across equipment, power consumption, etc.)