# Parameter Estimation for Reverse Osmosis Systems

**WaterTAP Academy Tutorial**

---

## Learning Objectives

By the end of this tutorial, you will be able to:

1. Understand what parameter estimation is and why it matters for water treatment modeling
2. Prepare experimental data for use with Pyomo's `parmest` tool
3. Define a model function that connects your RO model to pilot plant data
4. Solve parameter estimation problems to fit membrane transport parameters (A and B coefficients)
5. Visualize and validate your results against experimental data

---

## Background

### Why Parameter Estimation?

Reverse osmosis membrane models rely on transport parameters that characterize how water and solutes move through the membrane:

- **A coefficient (water permeability)**: How easily water passes through the membrane (m/Pa/s)
- **B coefficient (salt permeability)**: How easily dissolved solids pass through (m/s)

These parameters vary by:
- Membrane manufacturer and type
- Operating conditions (temperature, pressure, fouling state)
- Age and condition of the membrane

**Parameter estimation** allows us to calibrate our models to real pilot plant data, making our simulations more accurate and useful for design and optimization.

### Data Source

This tutorial uses data from the **Orange County Water District (OCWD)** pilot plantâ€”a real-world RO system that provides valuable operational data for model validation.

---

## Part 1: Setting Up Parameter Estimation

### Step 1.1: Import Required Modules

We need modules from three ecosystems:
- **Pyomo**: Core optimization and parameter estimation (`parmest`)
- **IDAES**: Flowsheet infrastructure and utilities
- **WaterTAP**: RO unit models and property packages

In [1]:
# === Pyomo Imports ===
from pyomo.environ import (
    ConcreteModel,
    value,
    TransformationFactory,
    assert_optimal_termination,
)
from pyomo.network import Arc
import pyomo.contrib.parmest.parmest as parmest

# === IDAES Imports ===
from idaes.core import FlowsheetBlock
from idaes.models.unit_models import Feed, Separator
from idaes.core.util.initialization import propagate_state
from idaes.core.util.model_statistics import degrees_of_freedom
import idaes.core.util.scaling as iscale
import idaes.logger as idaeslog

# === WaterTAP Imports ===
from watertap.unit_models.reverse_osmosis_0D import (
    ReverseOsmosis0D as RO,
    ConcentrationPolarizationType,
    MassTransferCoefficient,
)
from watertap.property_models import seawater_prop_pack as props
from watertap.core.solvers import get_solver

# === Data & Visualization ===
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from utility_functions import load_data

# === Suppress verbose logging ===
import logging
import warnings
logging.getLogger("pyomo").setLevel(logging.CRITICAL)
warnings.filterwarnings("ignore")

print("âœ“ All modules imported successfully!")

âœ“ All modules imported successfully!


### Step 1.2: Load and Prepare Data

**Key Concept**: `parmest` requires data in one of three formats:
1. **Pandas DataFrame** (we'll use this) â€” each row is one experimental scenario
2. List of dictionaries
3. List of JSON file names (for large parallel computing jobs)

Our data includes:
- **Inputs**: Feed flow rate, TDS concentration, inlet pressure, pressure drop
- **Outputs** (what we're fitting to): Permeate flow rate and TDS concentration

In [2]:
# Load raw data from CSV
raw_data = pd.read_csv("Plant_data.csv")

# Process data (load_data handles unit conversions and filtering)
data, full_data = load_data(raw_data)

print(f"Training data: {len(data)} scenarios")
print(f"Full dataset: {len(full_data)} scenarios")
print("\n--- Training Data Preview ---")
display(data)

Training data: 10 scenarios
Full dataset: 50 scenarios

--- Training Data Preview ---


Unnamed: 0,flow_vol_in,mass_frac_TDS_in,pressure_in,deltaP,flow_vol_permeate,mass_frac_TDS_permeate
31,6.011,0.001276,179.121,11.701,4.687,2.4e-05
32,6.035,0.001275,179.166,11.611,4.716,2.5e-05
33,6.049,0.001275,179.03,11.565,4.732,2.5e-05
34,6.037,0.001276,178.895,11.656,4.718,2.4e-05
35,6.024,0.001271,178.804,11.655,4.71,2.5e-05
36,6.05,0.001275,178.985,11.565,4.724,2.5e-05
37,6.06,0.001272,178.985,11.656,4.74,2.5e-05
38,6.034,0.001276,179.121,11.746,4.708,2.4e-05
39,6.035,0.001278,179.075,11.565,4.704,2.5e-05
40,6.026,0.00128,179.211,11.61,4.712,2.5e-05


#### ðŸ’¡ Understanding the Data Columns

| Column | Description | Units |
|--------|-------------|-------|
| `flow_vol_in` | Feed volumetric flow rate | GPM |
| `mass_frac_TDS_in` | Feed TDS mass fraction | - |
| `pressure_in` | Feed pressure | psi |
| `deltaP` | Pressure drop across membrane | psi |
| `flow_vol_permeate` | Permeate flow rate (measured) | GPM |
| `mass_frac_TDS_permeate` | Permeate TDS (measured) | - |

### Step 1.3: Define Unit Conversion Constants

The pilot plant data uses imperial units, but WaterTAP models use SI units internally.

In [3]:
# Initialize solver
solver = get_solver()

# Unit conversion factors
PSI_TO_PA = 6894.75        # psi â†’ Pascal
GPM_TO_M3PS = 6.309e-5     # GPM â†’ mÂ³/s

print(f"1 psi = {PSI_TO_PA:,.2f} Pa")
print(f"1 GPM = {GPM_TO_M3PS:.3e} mÂ³/s")

1 psi = 6,894.75 Pa
1 GPM = 6.309e-05 mÂ³/s


### Step 1.4: Define the Model Function

**Critical**: `parmest` requires a function that:
1. Takes a single data row (as a DataFrame) as input
2. Builds and initializes a Pyomo model
3. Sets operating conditions from the data
4. Returns a model with **0 degrees of freedom**

The parameters we want to estimate (`A_comp` and `B_comp`) must be **fixed** during model buildingâ€”`parmest` will unfix and optimize them.

In [4]:
def ro_parmest(data):
    """
    Build an RO model configured for parameter estimation.
    
    Args:
        data: DataFrame with a single row of operating conditions
        
    Returns:
        Pyomo ConcreteModel with 0 degrees of freedom
    """
    # ===== BUILD FLOWSHEET =====
    m = ConcreteModel()
    m.fs = FlowsheetBlock(dynamic=False)
    m.fs.properties = props.SeawaterParameterBlock()
    
    # Add units
    m.fs.feed = Feed(property_package=m.fs.properties)
    m.fs.RO = RO(
        property_package=m.fs.properties,
        has_pressure_change=True,
        concentration_polarization_type=ConcentrationPolarizationType.none,
        mass_transfer_coefficient=MassTransferCoefficient.none,
    )
    
    # Connect units
    m.fs.s00 = Arc(source=m.fs.feed.outlet, destination=m.fs.RO.inlet)
    TransformationFactory("network.expand_arcs").apply_to(m)
    
    # ===== INITIAL CONDITIONS (for initialization) =====
    m.fs.feed.properties[0].flow_vol_phase.fix(GPM_TO_M3PS * 8)
    m.fs.feed.properties[0].temperature.fix(273.15 + 25)  # 25Â°C
    m.fs.feed.properties[0].pressure.fix(PSI_TO_PA * 188)
    m.fs.feed.properties[0].mass_frac_phase_comp["Liq", "TDS"].fix(0.001)
    
    # Membrane configuration (4 elements @ 7.2 mÂ² each)
    m.fs.RO.area.fix(28.8)
    m.fs.RO.permeate.pressure[0].fix(101325)  # Atmospheric
    m.fs.RO.deltaP.fix(-PSI_TO_PA * 24.6)
    
    # ===== PARAMETERS TO ESTIMATE =====
    # Initial guesses for A and B coefficients
    m.fs.RO.A_comp[0, "H2O"].fix(5e-12)  # Water permeability
    m.fs.RO.B_comp[0, "TDS"].fix(4e-8)   # Salt permeability
    
    # ===== SCALING =====
    m.fs.properties.set_default_scaling(
        "flow_mass_phase_comp", 1e1, index=("Liq", "H2O")
    )
    m.fs.properties.set_default_scaling(
        "flow_mass_phase_comp", 1e6, index=("Liq", "TDS")
    )
    iscale.set_scaling_factor(m.fs.RO.area, 1e-1)
    iscale.calculate_scaling_factors(m)
    
    # ===== INITIALIZE =====
    solver.solve(m.fs.feed)
    propagate_state(m.fs.s00)
    m.fs.RO.initialize(outlvl=idaeslog.ERROR)
    
    # ===== SET ACTUAL OPERATING CONDITIONS FROM DATA =====
    m.fs.feed.properties[0].flow_vol_phase.fix(
        GPM_TO_M3PS * float(data.iloc[0]["flow_vol_in"])
    )
    m.fs.feed.properties[0].pressure.fix(
        PSI_TO_PA * float(data.iloc[0]["pressure_in"])
    )
    m.fs.feed.properties[0].mass_frac_phase_comp["Liq", "TDS"].fix(
        float(data.iloc[0]["mass_frac_TDS_in"])
    )
    m.fs.RO.deltaP.fix(-PSI_TO_PA * float(data.iloc[0]["deltaP"]))
    
    # Verify DOF = 0
    assert degrees_of_freedom(m) == 0, f"DOF = {degrees_of_freedom(m)}, expected 0"
    
    return m

print("âœ“ Model function defined")

âœ“ Model function defined


### Step 1.5: Specify Parameters to Estimate

We tell `parmest` which model variables to optimize using their full Pyomo path names.

In [5]:
# Parameter names (must match exactly how they appear in the model)
theta_names = [
    "fs.RO.A_comp[0, 'H2O']",  # Water permeability coefficient
    "fs.RO.B_comp[0, 'TDS']"   # Salt permeability coefficient
]

print("Parameters to estimate:")
for name in theta_names:
    print(f"  â€¢ {name}")

Parameters to estimate:
  â€¢ fs.RO.A_comp[0, 'H2O']
  â€¢ fs.RO.B_comp[0, 'TDS']


### Step 1.6: Define the Objective Function

We minimize the **sum of squared errors (SSE)** between model predictions and measurements.

**Important**: We normalize by the standard deviation of each output to ensure both permeate flow and TDS concentration contribute equally to the fit.

$$\text{SSE} = \left(\frac{Q_{perm}^{data} - Q_{perm}^{model}}{\sigma_Q}\right)^2 + \left(\frac{x_{TDS}^{data} - x_{TDS}^{model}}{\sigma_x}\right)^2$$

In [6]:
def SSE(m, data):
    """
    Calculate normalized sum of squared errors.
    
    Args:
        m: Pyomo model
        data: DataFrame with measured values
        
    Returns:
        Pyomo expression for SSE
    """
    # Normalization factors (from full dataset)
    flow_std = np.std(GPM_TO_M3PS * full_data["flow_vol_permeate"])
    tds_std = np.std(full_data["mass_frac_TDS_permeate"])
    
    # Measured values
    flow_measured = GPM_TO_M3PS * float(data.iloc[0]["flow_vol_permeate"])
    tds_measured = float(data.iloc[0]["mass_frac_TDS_permeate"])
    
    # Model predictions
    flow_model = m.fs.RO.mixed_permeate[0.0].flow_vol_phase["Liq"]
    tds_model = m.fs.RO.mixed_permeate[0.0].mass_frac_phase_comp["Liq", "TDS"]
    
    # Normalized SSE
    expr = ((flow_measured - flow_model) / flow_std)**2 + \
           ((tds_measured - tds_model) / tds_std)**2
    
    return expr

print("âœ“ Objective function defined")

âœ“ Objective function defined


---

## Part 2: Solve the Parameter Estimation Problem

### Step 2.1: Create the Estimator

In [None]:
# Create parmest Estimator object
pest = parmest.Estimator(
    ro_parmest,    # Model function
    data,          # Training data
    theta_names,   # Parameters to estimate
    SSE,           # Objective function
    tee=False      # Suppress solver output
)

print("âœ“ Estimator created")
print(f"  Training scenarios: {len(data)}")
print(f"  Parameters: {len(theta_names)}")

### Step 2.2: Solve and Display Results

In [None]:
# Solve the parameter estimation problem
print("Solving parameter estimation...")
obj, theta = pest.theta_est()

print("\n" + "="*50)
print("PARAMETER ESTIMATION RESULTS")
print("="*50)
print(f"\nOptimal objective value: {obj:.6f}")
print("\nEstimated parameters:")
print(f"  A (water permeability): {theta.iloc[0]:.6e} m/Pa/s")
print(f"  B (salt permeability):  {theta.iloc[1]:.6e} m/s")

---

## Part 3: Validate and Visualize Results

### Step 3.1: Create Prediction Function with Optimal Parameters

In [None]:
def ro_opt(theta):
    """
    Build an RO model with the optimal estimated parameters.
    
    Args:
        theta: Series containing optimal A and B values
        
    Returns:
        Initialized Pyomo model ready for simulation
    """
    m = ConcreteModel()
    m.fs = FlowsheetBlock(dynamic=False)
    m.fs.properties = props.SeawaterParameterBlock()
    m.fs.feed = Feed(property_package=m.fs.properties)
    m.fs.RO = RO(
        property_package=m.fs.properties,
        has_pressure_change=True,
        concentration_polarization_type=ConcentrationPolarizationType.none,
        mass_transfer_coefficient=MassTransferCoefficient.none,
    )

    m.fs.s00 = Arc(source=m.fs.feed.outlet, destination=m.fs.RO.inlet)
    TransformationFactory("network.expand_arcs").apply_to(m)

    # Initial conditions
    m.fs.feed.properties[0].flow_vol_phase.fix(GPM_TO_M3PS * 8)
    m.fs.feed.properties[0].temperature.fix(273.15 + 25)
    m.fs.feed.properties[0].pressure.fix(PSI_TO_PA * 188)
    m.fs.feed.properties[0].mass_frac_phase_comp["Liq", "TDS"].fix(0.001)

    m.fs.RO.area.fix(28.8)
    m.fs.RO.permeate.pressure[0].fix(101325)
    m.fs.RO.deltaP.fix(-PSI_TO_PA * 24.6)

    # SET OPTIMAL PARAMETERS
    m.fs.RO.A_comp[0, "H2O"].fix(theta.iloc[0])
    m.fs.RO.B_comp[0, "TDS"].fix(theta.iloc[1])

    # Scaling
    m.fs.properties.set_default_scaling(
        "flow_mass_phase_comp", 1e1, index=("Liq", "H2O")
    )
    m.fs.properties.set_default_scaling(
        "flow_mass_phase_comp", 1e6, index=("Liq", "TDS")
    )
    iscale.set_scaling_factor(m.fs.RO.area, 1e-1)
    iscale.calculate_scaling_factors(m)

    # Initialize
    solver.solve(m.fs.feed)
    propagate_state(m.fs.s00)
    m.fs.RO.initialize(outlvl=idaeslog.ERROR)

    assert degrees_of_freedom(m) == 0
    return m

print("âœ“ Optimal model function defined")

### Step 3.2: Generate Model Predictions

In [None]:
def save_model_results(model_results, m):
    """
    Run model for each data point and save predictions.
    """
    for i in range(model_results.shape[0]):
        # Update operating conditions
        m.fs.feed.properties[0].flow_vol_phase.fix(
            GPM_TO_M3PS * float(model_results.iloc[i]["flow_vol_in"])
        )
        m.fs.feed.properties[0].pressure.fix(
            PSI_TO_PA * float(model_results.iloc[i]["pressure_in"])
        )
        m.fs.feed.properties[0].mass_frac_phase_comp["Liq", "TDS"].fix(
            float(model_results.iloc[i]["mass_frac_TDS_in"])
        )
        m.fs.RO.deltaP.fix(-PSI_TO_PA * float(model_results.iloc[i]["deltaP"]))

        # Solve
        results = solver.solve(m, tee=False)
        assert_optimal_termination(results)

        # Save predictions
        model_results.iloc[i, model_results.columns.get_loc("flow_vol_permeate")] = (
            value(m.fs.RO.mixed_permeate[0.0].flow_vol_phase["Liq"]) / GPM_TO_M3PS
        )
        model_results.iloc[i, model_results.columns.get_loc("mass_frac_TDS_permeate")] = value(
            m.fs.RO.mixed_permeate[0.0].mass_frac_phase_comp["Liq", "TDS"]
        )
    
    return model_results

# Initialize model with optimal parameters and generate predictions
m = ro_opt(theta)
model_results = full_data.copy()
model_results = save_model_results(model_results, m)

print("âœ“ Model predictions generated for all data points")

### Step 3.3: Visualize Results

In [None]:
# Create figure with two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# === Plot 1: Permeate Flow Rate ===
ax1.scatter(
    full_data.index, 
    full_data["flow_vol_permeate"], 
    label="Measured Data", 
    color="red", 
    s=30,
    alpha=0.7
)
ax1.plot(
    model_results.index, 
    model_results["flow_vol_permeate"], 
    label="Model Prediction",
    color="blue",
    linewidth=2
)
ax1.set_xlabel("Time Node", fontsize=12)
ax1.set_ylabel("Permeate Flow Rate (GPM)", fontsize=12)
ax1.set_title("Permeate Flow: Model vs Data", fontsize=14)
ax1.legend()
ax1.grid(True, alpha=0.3)

# === Plot 2: Permeate TDS ===
ax2.scatter(
    full_data.index,
    full_data["mass_frac_TDS_permeate"] * 1e6,  # Convert to ppm for readability
    label="Measured Data",
    color="red",
    s=30,
    alpha=0.7
)
ax2.plot(
    model_results.index, 
    model_results["mass_frac_TDS_permeate"] * 1e6,
    label="Model Prediction",
    color="blue",
    linewidth=2
)
ax2.set_xlabel("Time Node", fontsize=12)
ax2.set_ylabel("Permeate TDS (ppm)", fontsize=12)
ax2.set_title("Permeate TDS: Model vs Data", fontsize=14)
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Step 3.4: Calculate Model Fit Statistics

In [None]:
# Calculate RÂ² for both outputs
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

# Flow rate statistics
r2_flow = r2_score(full_data["flow_vol_permeate"], model_results["flow_vol_permeate"])
mae_flow = mean_absolute_error(full_data["flow_vol_permeate"], model_results["flow_vol_permeate"])
rmse_flow = np.sqrt(mean_squared_error(full_data["flow_vol_permeate"], model_results["flow_vol_permeate"]))

# TDS statistics
r2_tds = r2_score(full_data["mass_frac_TDS_permeate"], model_results["mass_frac_TDS_permeate"])
mae_tds = mean_absolute_error(full_data["mass_frac_TDS_permeate"], model_results["mass_frac_TDS_permeate"])

print("="*50)
print("MODEL FIT STATISTICS")
print("="*50)
print("\nPermeate Flow Rate:")
print(f"  RÂ²:   {r2_flow:.4f}")
print(f"  MAE:  {mae_flow:.4f} GPM")
print(f"  RMSE: {rmse_flow:.4f} GPM")
print("\nPermeate TDS:")
print(f"  RÂ²:   {r2_tds:.4f}")
print(f"  MAE:  {mae_tds*1e6:.2f} ppm")

---

## Part 4: Hands-On Exercise

### ðŸŽ¯ Your Challenge

Use a **different subset of the data** to estimate parameters and compare results.

**Questions to explore:**
1. How do the estimated parameters change with different training data?
2. Does the model still fit well on data it wasn't trained on?
3. What does this tell you about the robustness of your parameter estimates?

In [None]:
# Here's a different subset of the data
data2 = full_data.iloc[30:40].copy()
print(f"New training data: indices {data2.index[0]} to {data2.index[-1]}")
display(data2)

### Exercise 4.1: Create a New Estimator

Create a parmest Estimator using `data2` instead of `data`.

<details>
<summary>ðŸ’¡ Click for hint</summary>

```python
pest2 = parmest.Estimator(ro_parmest, data2, theta_names, SSE, tee=False)
```
</details>

In [None]:
# YOUR CODE HERE
# pest2 = ...


### Exercise 4.2: Solve and Display Results

Solve the parameter estimation problem and print the optimal parameters.

<details>
<summary>ðŸ’¡ Click for hint</summary>

```python
obj2, theta2 = pest2.theta_est()
print(f"A coefficient: {theta2.iloc[0]:.6e}")
print(f"B coefficient: {theta2.iloc[1]:.6e}")
```
</details>

In [None]:
# YOUR CODE HERE
# obj2, theta2 = ...


### Exercise 4.3: Generate Predictions and Compare

Use `theta2` to generate predictions and compare both models visually.

<details>
<summary>ðŸ’¡ Click for hint</summary>

```python
m2 = ro_opt(theta2)
model_results2 = full_data.copy()
model_results2 = save_model_results(model_results2, m2)
```
</details>

In [None]:
# YOUR CODE HERE


### Exercise 4.4: Create Comparison Plot

Plot both models against the data to see how they compare.

<details>
<summary>ðŸ’¡ Click for hint</summary>

```python
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(full_data.index, full_data["flow_vol_permeate"], label="Data", color="red", s=30)
ax.plot(model_results.index, model_results["flow_vol_permeate"], label="Model 1", linewidth=2)
ax.plot(model_results2.index, model_results2["flow_vol_permeate"], label="Model 2", linewidth=2, linestyle="--")
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()
```
</details>

In [None]:
# YOUR CODE HERE


---

## Discussion Questions

1. **Parameter Sensitivity**: How much did the A and B coefficients change between the two training datasets? What might cause these differences?

2. **Generalization**: Does the model trained on one subset still predict well on other time periods? Why or why not?

3. **Practical Applications**: How might you use parameter estimation in a real plant setting? (Hint: think about membrane aging, fouling detection)

4. **Model Limitations**: What limitations does this simple 0D RO model have? What additional phenomena might need to be included for better accuracy?

---

## Additional Resources

- [Pyomo parmest Documentation](https://pyomo.readthedocs.io/en/6.7.0/contributed_packages/parmest/index.html)
- [WaterTAP RO Model Documentation](https://watertap.readthedocs.io/)
- [IDAES Process Modeling Framework](https://idaes-pse.readthedocs.io/)