# Surrogate modeling for corrosion constraints in a WaterTAP MVC model
This tutorial demonstrates how to build surrogate models for general and localized corrosion design constraints and how to integrate these surrogate models in a WaterTAP model of mechanical vapor compression (MVC) system treating seawater <sup id="cite_ref-1"><a href="#cite_note-1">1</a></sup>.

## Motivation
MVC is typically used to treat high salinity feeds. In MVC systems, the evaporator capital cost accounts for approximately 30% of the levelized cost of water (LCOW) and is highly dependent on the material used. The evaporator requires a material that can resist corrosion at high salinities and temperaures, but more corrosion-resistant materials are more expensive. There is a need to select the most cost-effective material that still provides sufficient corrosion resistance. Because the underlying physics of corrosion are highly complex, we use surrogate models to predict general and localized corrosion in the MVC evaporator and inform the cost-optimal design and operation. 

## Workflow
This notebook breaks down the three primary steps for developing surrogate models for corrosion and incorporating them in a WaterTAP model: 
1. Data generation of corrosion metrics
2. Fitting corrosion surrogate models
3. Integrating corrosion design constraints

<p align="center">
<img src="corrosion_demo_figures\methods_figure.png" alt="methods" width="624"/>
</p>

<hr />
<ol>
<li id="cite_note-1"> Carson I. Tucker, Oluwamayowa O. Amusat, Adam A. Atia, Alexander V. Dudchenko, Timothy V. Bartholomew, and Meagan S. Mauter. “Incorporating Corrosion Design Constraints in Desalination Process Optimization: A Case Study in Mechanical Vapor Compression.” Desalination 621 (March 2026): 119698. <a href="https://doi.org/10.1016/j.desal.2025.119698">https://doi.org/10.1016/j.desal.2025.119698</a> <a href="#cite_ref-1">↩</a></li>
</ol>

## 1. Data generation of corrosion metrics
For a material to be considered corrosion resistant, we assume the following two constraints must be satisfied:
1. Corrosion rate is less than a maximum corrosion rate of 0.1 mm/yr (sufficiently low general corrosion): $$CR \leq CR_{max}$$
2. The repassivation potential is greater than the corrosion potential (conservative prediction of no localized corrosion): $$V_{rp}-V_{c} \geq 0$$

We used [OLI Systems' Corrosion Analyzer](https://www.olisystems.com/software/oli-studio/oli-studio-corrosion-analyzer/) to predict corrosion rate $CR$, repassivation potential $V_{rp}$, and corrosion potential $V_{c}$ data under different operating conditions relevant to the evaporator in MVC. 

We considered the following ranges of inputs:
* Temperature: 25-95 C (15 samples)
* pH: 4-8 (9 samples)
* Recovery: 0-80% (17 samples)
* Dissolved oxygen: 0-8 mg/L (9 samples)
* Materials: carbon steel 1018 → stainless steel 304 → stainless steel 316 → duplex stainless steel 2205 → duplex stainless steel 2507 → nickel alloy 825 → nickel alloy 625 (7 materials listed in order of increasing cost)

We decided on these ranges based on the application of MVC but also through initial exploration: 

<p align="center">
<img src="corrosion_demo_figures\material_selection.png" alt="material_selection" width="624"/>
</p>

This script for data generation via [OLI's Cloud API](https://www.olisystems.com/software/oli-cloud-apis/) is included but requires an OLI license to run. 

<img src="corrosion_demo_figures\oli_api_call.png" alt="oli_api" width="624"/>

### 1.1 Data for the tutorial
For the purpose of this demo, we will use data that has been generated as a function of temperature and dissolved oxygen for the repassivation-corrosion potential difference to predict localized corrosion. 
$$V_{rp}-V_{c} = f(T, DO)$$

<p align="center">
<img src="corrosion_demo_figures\synthetic_material_behavior.png" alt="synthetic_material_behavior" width="624"/>
</p>

The code used to generate and plot the synthetic data can be found in the [`week7/synthetic_corrosion_data`](corrosion_demo_figures/) folder.

## 2. Fitting corrosion surrogate models
We use [PySMO](https://idaes-pse.readthedocs.io/en/stable/explanations/modeling_extensions/surrogate/api/pysmo/index.html) to fit [radial basis functions](https://idaes-pse.readthedocs.io/en/stable/explanations/modeling_extensions/surrogate/api/pysmo/pysmo_radialbasisfunctions.html) with a cubic basis for the corrosion rate and repassivation corrosion potential difference for each material. For our application, RBFs work well for representing highly nonlinear relationships and modeling diverse operating regimes. 

We use an adaptive sampling approach to improve the surrogate accuracy while reducing the number of training points, which reduces the size of the RBF.
In this approach, we select an initial training set and fit the surrogate model. We then add additional samples to improve the model fit. See the supplementary material to see how we do adaptive sampling. 

We are just going to look at how to fit the localized corrosion surrogate model on the generated data using the initial training set: 
$$V_{rp}-V_{c} = f(T, DO)$$

### 2.1 Load the data

In [1]:
from pathlib import Path
import pandas as pd
import fit_surrogates

# load the localized corrosion data
current_directory = Path.cwd()
survey_path = (
    current_directory.parent
    / f"week7/synthetic_corrosion_data/synthetic_potential_difference.csv"
)
data = pd.read_csv(survey_path)
print(data.head())
print(f'Number of data points: {len(data)}')

   temperature_C  do_mg_L  synthetic_potential_difference_V
0             25      0.0                          0.595005
1             25      0.5                          0.352455
2             25      1.0                          0.104955
3             25      1.5                          0.101786
4             25      2.0                          0.098571
Number of data points: 255


### 2.2 Assign inputs and output
Our inputs are temperature and dissolved oxygen. The output is the potential difference. We printed the column names above. 

In [2]:
inputs = ["temperature_C", "do_mg_L"]
output = "synthetic_potential_difference_V"
# get the input bounds
input_bounds = {
    name: (min(data[name]), max(data[name]))
    for i, name in enumerate(inputs)
}

### 2.3 Select training set
We first select an initial training size and use [Hammersley sampling](https://idaes-pse.readthedocs.io/en/1.5.1/surrogate/pysmo/pysmo_hammersley.html) to select initial training set. This is a space filling approach.

In [3]:
from idaes.core.surrogate.pysmo.sampling import (
    HammersleySampling,
)

n = 40 # number of training samples
training_data = HammersleySampling(data, n, sampling_type="selection").sample_points()
print(training_data.head())
print(f'Number of training points: {len(training_data)}')

Sampling type:  selection 

No column information provided. All except last column will be considered as x variables.

Number of unique samples returned by sampling algorithm: 40
   temperature_C  do_mg_L  synthetic_potential_difference_V
0           25.0      0.0                          0.595005
1           25.0      4.0                          0.085714
2           30.0      1.0                          0.084707
3           30.0      2.0                          0.078643
4           30.0      6.0                          0.054214
Number of training points: 40


### 2.4 Fit the surrogate model
We now fit the surrogate model using the training data using PySMO. 

In [4]:
%%capture
from idaes.core.surrogate.pysmo_surrogate import (
    PysmoRBFTrainer,
    PysmoSurrogate,
)

trainer = PysmoRBFTrainer(
    input_labels=inputs,
    output_labels=[output],
    training_dataframe=training_data,
)
# select the basis function for the RBF
basis = 'cubic'
trainer.config.basis_function = basis
# fit the surrogate
rbf_train = trainer.train_surrogate()
# get surrogate object
rbf_surr = PysmoSurrogate(rbf_train, inputs, [output], input_bounds)
# save the surrogate to use later
folder = current_directory.parent / f"week7/synthetic_corrosion_data/"
surrogate_file = folder / f"{output}_pysmo_rbf_{basis}_surrogate_demo.json"
model = rbf_surr.save_to_file(surrogate_file, overwrite=True)

2026-01-21 21:29:32 [INFO] idaes.core.surrogate.pysmo_surrogate: Model for output synthetic_potential_difference_V trained successfully


### 2.5 Assess the error metrics
For this surrogate, we want the maximum absolute error to be low, indicating that the surrogate aligns with the generated data. While minimizing the overall error is important, we also care about ensuring that we have a high accuracy in predicting whether localized corrosion occurs. Therefore, we also define a classification accuracy and [balanced accuracy](https://en.wikipedia.org/wiki/Precision_and_recall#Imbalanced_data).

In [5]:
from idaes.core.surrogate.metrics import compute_fit_metrics

# built in function to compute RMSE, MSE, MAE, maxAE, SSE, R2
err = compute_fit_metrics(rbf_surr, data)

# calculate true positives, true negatives, false positives, and false negatives to calculate accuracies
data_true = data.copy()
data_true["lc"] = data_true[output] > 0 # find all data points where localized corrosion occurs
predicted_data = rbf_surr.evaluate_surrogate(data_true) # get the surrogate predictions
predicted_data["lc"] = predicted_data[output] >=0 # find all points where the surrogate predicts localized corrosion occurs
tp = ((data_true["lc"] == True) & (predicted_data["lc"] == True)).sum() # number of true positives
tn = ((data_true["lc"] == False) & (predicted_data["lc"] == False)).sum() # number of true negatives
fp = ((data_true["lc"] == False) & (predicted_data["lc"] == True)).sum() # number of false positives
fn = ((data_true["lc"] == True) & (predicted_data["lc"] == False)).sum() # number of false negatives
classification_accuracy = (tp + tn)/len(data_true)*100 # percentage of correctly classified points
balanced_accuracy = 0.5 * (tp / (tp + fn) + tn / (tn + fp)) * 100 # balanced accuracy accounts for data imbalance

# print errors
print(f"R2: {round(err[output]['R2'], 4)}")
print(f"MSE: {round(err[output]['MSE'], 4)}")
print(f"maxAE: {round(err[output]['maxAE'], 4)}")
print(f"Classification accuracy: {round(classification_accuracy, 4)}")
print(f"Balanced accuracy: {round(balanced_accuracy, 4)}")

R2: 0.9737
MSE: 0.0004
maxAE: 0.1303
Classification accuracy: 98.8235
Balanced accuracy: 99.0506


### 2.6 Try it yourself - improve the surrogate
We want the maximum absolute error (maxAE) to be less than 0.15 V (SHE) and the balanced accuracy greater than 99%. What can you change to improve the error metrics of the surrogate? Try it and see how the error metrics change. 

<details>
<summary>Click the arrow for hint</summary>

*Increase the number of training points used*

</details>

### 2.7 Adaptive sampling
Please see the adaptive sampling supplementary material for another approach for improving the maximum absolute error and accuracies while reducing the number of training samples. Additionally, using too many training samples may indicate that the surrogate model is overfitting to the points, so it is also important to plot the surrogate and verify the trends between points.  

## 3. Integrating corrosion design constraints
We will first build and solve the [MVC flowsheet](https://github.com/watertap-org/watertap/tree/main/watertap/flowsheets/mvc) without corrosion and then add the localized corrosion surrogate. In this demo, we will assume the generated data approximates **Duplex Stainless Steel 2507**. We will compare the results of without and with corrosion design constraints for the case of seawater at 70 g/kg to 58% recovery.

### 3.1 Import modules and MVC flowsheet

In [2]:
import mvc_corrosion as mvc
from watertap.core.solvers import get_solver
from pyomo.environ import (
    Objective,
    Var,
    Constraint,
    assert_optimal_termination,
    value,
    units as pyunits,
)
from pyomo.util.calc_var_value import calculate_variable_from_constraint
from idaes.core.util.model_statistics import degrees_of_freedom
import idaes.core.util.scaling as iscale
from idaes.core.surrogate.pysmo_surrogate import PysmoSurrogate
from idaes.core.surrogate.surrogate_block import SurrogateBlock

### 3.2 Build and solve MVC flowsheet without corrosion
First build and solve the MVC flowsheet to an initial solution. This helper function scales, sets operating conditions, and initializes the MVC flowsheet and sets up the flowsheet to minimize the LCOW.

In [3]:
mat = "Duplex stainless 2205"
rr=0.58
wf=70
m = mvc.build_system_demo(material=mat, recovery=rr, feed_salinity=wf)

DOF after setting operating conditions:  0
2026-01-21 21:40:01 [INFO] idaes.init.fs.separator_feed.mixed_state: fs.separator_feed.mixed_state State Released.
Initialization termination condition:  optimal
Scaled costs
First solve termination condition:  optimal
DOF for optimization:  4


### 3.3 Minimize the levelized cost of water
In the `build_system_demo` helper function, we use another helper function `set_up_optimization`, which sets the LCOW as the objective and unfixes the evaporator temperature, evaporator area, heat exchanger areas, and compressor pressure ratio. The `display_demo` helper function displays metrics relevant for this tutorial. 

The evaporator material factor is a multiplier on the capital cost of the evaporator based on the cost of the material used. The evaporator material factor is initially assigned a value that increases linearly with the brine salinity. We will change this later to reflect duplex stainless steel 2205. 

In [8]:
# Verify the objective is the LCOW
m.fs.objective.pprint()
# Since we already set up for optimization, we can now solve
results = mvc.solve(m)
mvc.display_demo(m)

objective : Size=1, Index=None, Active=True
    Key  : Active : Sense    : Expression
    None :   True : minimize : fs.costing.LCOW
Levelized cost of water:                  4.88 $/m3
Evaporator (brine, vapor) temperature:    75.00 C
Evaporator material factor:               6.51 


### 3.4 Try it yourself - add corrosion surrogate models
We break this into three steps:
1. Add dimensionless input variables for corrosion surrogates: temperature and dissolved oxygen.
2. Add dimensionless output variables: repassivation-corrosion potential difference (potential_difference).
3. Load fitted surrogates
   
#### 3.4.1 Add input variables
We create variables for the inputs to the surrogate. These variables have to be indexed and dimensionless for the PySMO surrogate. We then need to connect the input variables created to the corresponding variable in the model. 

In [9]:
# Temperature surrogate input
m.fs.temperature_indexed = Var(
        [0],
        initialize=m.fs.evaporator.properties_brine[0].temperature.value,
        units=pyunits.dimensionless,
)

# Add the constraint to connect the surrogate input to the evaporator temperature variable
# Note that the surrogate expects an input in C but the model variable is in K
m.fs.eq_temperature_indexed = Constraint(
        expr=m.fs.evaporator.properties_brine[0].temperature
        == m.fs.temperature_indexed[0] + 273.15
)

#### Try it yourself
We have created the input variable for the dissolved oxygen. Now add the constraint connecting the surrogate input to the dissolved oxygen variable. 

Hint: the dissolved oxygen variable is: `m.fs.dissolved_oxygen`

<details>
<summary>Click for solution</summary>
    
```
# Add a constraint to connect the surrgate input variable created to the dissolved oxygen variable
m.fs.eq_do_indexed = Constraint(
    expr=m.fs.dissolved_oxygen==m.fs.dissolved_oxygen_index[0]
)
```
</details>

In [10]:
# Dissolved oxygen surrogate input
m.fs.dissolved_oxygen_index = Var(
        [0], initialize=0, units=pyunits.dimensionless, bounds=(0, 8)
)

# Add a constraint to connect the surrgate input variable created to the dissolved oxygen variable
# YOUR CODE

#### 3.4.2 Add the output variable
We also need to create the output variable of repassivation-corrosion potential difference for the PySMO surrogate. We also previously did not have this potential difference in the model, so we also create that variable as well. Finally, we write a constraint connecting them. 

In [11]:
# Create the input variable: m.fs.potential_difference_indexed
m.fs.potential_difference_indexed = Var(
    [0], initialize=0.0, units=pyunits.dimensionless
)
iscale.set_scaling_factor(m.fs.potential_difference_indexed[0], 1e3)

m.fs.potential_difference = Var(
    initialize=0, units=pyunits.V
)
iscale.set_scaling_factor(m.fs.potential_difference, 1e3)

m.fs.eq_potential_difference_indexed = Constraint(
    expr=m.fs.potential_difference == m.fs.potential_difference_indexed[0]
)

#### Try it yourself
To prevent localized corrosion, $V_{rp}-V_{c} \geq 0$, so set the lower bound on the `m.fs.potential_difference`. 

<details>
<summary>Click for solution</summary>

```
    m.fs.potential_difference.setlb(0)
```
</details>

In [12]:
# set the bound to prevent localized corrosion
# YOUR CODE

#### 3.4.3 Load the surrogate
Note that the surrogate used was fit using adaptive sampling. You can later change the filename to "synthetic_potential_difference_V_pysmo_rbf_cubic_surrogate_demo.json" to see how the surrogate you fit performs. 

#### Try it yourself
Fill in the code cell below to load the surrogate model, add a surrogate block, and build the surrogate model constraint on the surrogate block.
<details>
<summary>Click for solution</summary>

```
# load the PySMO surrogate object from the file
potential_difference_surrogate = PysmoSurrogate.load_from_file(filename)
# Create SurrogateBlock - where the surrogate model lives in the flowsheet
m.fs.potential_difference_surrogate = SurrogateBlock(concrete=True)
# Build the localized corrosion surrogate on the SurrogateBlock- 
m.fs.potential_difference_surrogate.build_model(
    potential_difference_surrogate, # the surrogate we loaded
    input_vars=[m.fs.temperature_indexed[0], m.fs.dissolved_oxygen_index[0]], # input variables we created - input order is the same as when fitting
    output_vars=[m.fs.potential_difference_indexed[0]], # output variable we created
)
```
</details>

In [1]:
# where the surrogate is stored 
filename = (folder / "synthetic_potential_difference_V_pysmo_rbf_cubic_surrogate.json")

# load the PySMO surrogate object from the file
# YOUR CODE

# Create SurrogateBlock - where the surrogate model lives in the flowsheet
# YOUR CODE

# Build the localized corrosion surrogate on the SurrogateBlock
# Fill in the function below
m.fs.potential_difference_surrogate.build_model(
    # the surrogate we loaded, 
    # input_vars - input order is the same as when fitting,
    # output_vars,
)

NameError: name 'folder' is not defined

### 3.5 Initialize the surrogate constraint
Using `calculate_variable_from_constraint`, we can solve for the current value of the repassivation-corrosion potential difference. 

In [15]:
# initialize potential difference
calculate_variable_from_constraint(
    m.fs.potential_difference_indexed[0], # variable to solve for
    m.fs.potential_difference_surrogate.pysmo_constraint["synthetic_potential_difference_V"], # surrogate constraint that calculates the variable
)

### 3.6 Fix the corrosion conditions
We fix the level of dissolved oxygen.

In [16]:
do = 0.5  # assume almost all dissolved oxygen as been removed
m.fs.dissolved_oxygen.fix(do)

### 3.7 Update material factor based on material selected

In [17]:
material_factor = {
    "Carbon steel 1018": 1,
    "Stainless steel 304": 3.0,
    "Stainless steel 316": 3.2,
    "Duplex stainless 2205": 3.5,
    "Duplex stainless 2507": 4.0,
    "Nickel alloy 825": 5.0,
    "Nickel alloy 625": 6.0,
}
m.fs.costing.evaporator.material_factor_cost.fix(material_factor[m.fs.material.value])
print(f"Evaporator material: {m.fs.material.value}")
print(
    f"Evaporator material factor: {m.fs.costing.evaporator.material_factor_cost.value}"
)

Evaporator material: Duplex stainless 2205
Evaporator material factor: 3.5


### 3.8 Update evaporator temperature bounds
When we were not considering corrosion, the evaporator temperature upper bound was 75 C. We can now solve for conditions where localized corrosion does not occur, so we can increase the evaporator temperature bound to 95 C.

### Try it yourself
Increase the evaporator temperature upper bound to 95 C. Note the evaporator temperature is in K.
<details>
<summary>Click for solution</summary>

```
m.fs.evaporator.properties_brine[0].temperature.setub(95 + 273.15)
```
</details>

In [18]:
# Increase the evaporator temperature upper bound to 95 C
# YOUR CODE

### 3.9 Solve model with corrosion constraints
Solving for a feed concentration of 70 g/kg and recovery of 58%.

In [20]:
results = mvc.solve(m)
print("Termination condition: ", results.solver.termination_condition)
mvc.display_demo(m)

Termination condition:  optimal
Levelized cost of water:                  3.75 $/m3
Evaporator (brine, vapor) temperature:    78.67 C
Evaporator material factor:               3.50 
Dissolved oxygen:                         0.50 mg/L
Potential difference:                     0.0000 V


### 3.10 Increase the level of dissolved oxygen from 0.5 to 8 mg/L
The result will be infeasible because duplex stainless steel 2507 has poor resistance to localized corrosion at high levels of dissolved oxygen.

### Try it yourself
Increase the dissolved oxygen to 8 mg/L, solve the model, and print the termination condition. 

<details>
<summary>Click for solution</summary>

```
# Set dissolved oxygen to 8 mg/L
m.fs.dissolved_oxygen.fix(8)
# Solve the mvc flowsheet
results = mvc.solve(m)
# Print the termination condition
print("Termination condition: ", results.solver.termination_condition)
```
</details>

In [22]:
# Set dissolved oxygen to 8 mg/L
# YOUR CODE

# Solve the mvc flowsheet
results = #YOUR CODE

# Print the termination condition
# YOUR CODE

model.name="unknown";
    - termination condition: infeasible
    - message from solver: Ipopt 3.13.2\x3a Converged to a locally infeasible
      point. Problem may be infeasible.
The current configuration is infeasible. Please adjust the decision variables.
Termination condition:  infeasible


### 3.11 How do we know that this due to the localized corrosion bound?
Remove the bound on potential difference and resolve. We see that the repassivation-corrosion potential difference is negative, indicating localized corrosion occurs.  

In [24]:
m.fs.potential_difference.setlb(None)
results = mvc.solve(m)
print("Termination condition: ", results.solver.termination_condition)
mvc.display_demo(m)

model.name="unknown";
    - termination condition: infeasible
    - message from solver: Ipopt 3.13.2\x3a Converged to a locally infeasible
      point. Problem may be infeasible.
The current configuration is infeasible. Please adjust the decision variables.
Termination condition:  infeasible
Levelized cost of water:                  3.93 $/m3
Evaporator (brine, vapor) temperature:    93.01 C
Evaporator material factor:               3.50 
Dissolved oxygen:                         8.00 mg/L
Potential difference:                     -0.1853 V


### 3.12 Results without vs. with corrosion
For all levels of dissolved oxygen, we have the same result when not accounting for corrosion. At low levels of dissolved oxygen, we have overestimated the cost while at high levels of dissolved oxygen we are at a similar cost but do not know if the operating conditions and material selection are corrosion resistant.
<p align="center">
<img src="corrosion_demo_figures\synthetic_lcow_comparison.png" alt="lcow_comparison" width="312"/>
</p>

### 3.13 Sensitivity to dissolved oxygen
Using the [parameter sweep tool](https://watertap.readthedocs.io/en/stable/how_to_guides/how_to_use_parameter_sweep.html#how-to-explore-a-model-with-parameter-sweep), we solve for the cost-optimal design and operation for each material at different levels of dissolved oxygen for the case of 70 g/kg to 58%. Going from 8 mg/L to 0.5 mg/L reduces the LCOW by 1.16 $/m3. See the supplementary material for the parameter sweep code used to generate this plot. 

<p align="center">
<img src="corrosion_demo_figures\do_sensitivity.png" alt="do_sensitivity" width="312"/>
</p>

## 4. Conclusions
We have fit a surrogate to represent localized corrosion, and we are able to account for corrosion design constraints in selecting materials for the evaporator in MVC. For other applications, other input variables and factors affecting material selection can be incorporated. 

Surrogate model integration is effective when:
* your mechanistic model is too complex to represent in your process optimization model.
* you have access to data or a simulator that can be used to train a surrogate model for your specific case. 