# Optimization of reactor output 

This script uses real weather data between 2013 and 2020 to simulate plant output as a function of wind turbines, solar panels, battery size, and three control parameters. Multiple simulations structured as a DOE are run for different scenarios, and a model is generated to predict the profit as a function of the aforementioned factors. Finally, this model is optimized to find the most profitable configuration for the plant.

To start, import the necessary libraries and define a couple necessary constants

In [1]:
import numpy as np
import plant_components as pc
from plant_components import pph
import weather_energy_components as wec
import time
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as sm
import re
import gurobipy as gp
from gurobipy import GRB
import warnings

warnings.filterwarnings("ignore")

# Number of years the model spans
years = 8

# Capex lifetime to calculate value of capex per year
capex_life = 10

## Forecast data

Next, an array containing forecast data is generated. In this version a perfect forecast is assumed, so the actual weather data used. (Note that a later step of this project will include considering various forecasting accuracies).
Time points are broken down into hours. For each time point, the energy generation for the next 6 hours is considered, separated into two parts: the summation of hours 0-3 and 4-6. As the power generation is dependent on the wind turbine and solar panel configuration, the forecast needs to be generated for each combination of wind turbine and solar panel size.

In [2]:
# Pre-calculate forecast data
def prep_forecast(parameters):
    wt_list = parameters["wt_list"]
    sp_list = parameters["sp_list"]
    
    forecast_store = np.zeros([len(wt_list), len(sp_list), wec.data_length-6, 2])
    forecast_arr = np.zeros(6)
    
    for i in range(len(wt_list)):
        wt = wt_list[i]
        wind_turbine_specs = {
            "cut_in": 13, # km/h
            "rated_speed": 50, # km/h
            "cut_out": 100, # km/h
            "max_energy": 1000, # kW
            "count": wt,
            "cost": wt*1.5*10**6 # EUR/MW 
            }
        
        for j in range(len(sp_list)):
            sp = sp_list[j]
            solar_panel_specs = {
                "area": sp, # m^2
                "efficiency": 0.1,
                "cost": sp*200/1.1 # $200/m2 (/1.1 eur to usd)
                }
    
            for hour in range(wec.data_length-6):
                for k in range(6): 
                    future_state = wec.Hourly_state(hour+k+1, solar_panel_specs, wind_turbine_specs)
                    forecast_arr[k] = wec.calc_generated_kw(future_state)
                
                forecast_store[i][j][hour][0] = sum(forecast_arr[0:3])
                forecast_store[i][j][hour][1] = sum(forecast_arr[3:7])
                
    return forecast_store


## DOE

The next two functions are to run the DOE and store the results. The DOE form is saved in a separate excel spreadsheet, with levels 0 - 2 for each factor. Actual factor values are stored in the 'parameters' dictionary. This allows the DOE design to be independent of the values we choose for each factor.
Levels from 0 - 2 were chosen as most factors will have 3 levels so that any quadratic relationships can be considered. Keeping all factors at the same levels ensures no further scaling is needed, and 0 - 2 were chosen as this conveniently corresponds to the indicies of the the lists that store the different parameters for each factor.

In [3]:
# Run DOE scenario to generate profit at specified conditions in 'run'
def run_scenario(forecast_store, parameters, run):
    
    # factor levels are between 0 and 2 to match indicies 
    # wt is a special case because only 2 levels are considered
    wt_index = int(run["wt_level"]) if int(run["wt_level"]) == 0 else 1
    sp_index = int(run["sp_level"]) 
    b_index = int(run["b_level"])
    c1_index = int(run["c1_level"])
    c2_index = int(run["c2_level"])
    c3_index = int(run["c3_level"])
    
    wt_list = parameters["wt_list"]
    sp_list = parameters["sp_list"]
    b_list = parameters["b_list"]
    c1_list = parameters["c1_list"]
    c2_list = parameters["c2_list"]
    c3_list = parameters["c3_list"]
    
    wt = wt_list[wt_index]
    sp = sp_list[sp_index]
    b = b_list[b_index]
    c1 = c1_list[c1_index]
    c2 = c2_list[c2_index]
    c3 = c3_list[c3_index]
    
    battery_specs = { 
        "max_charge": b, # kWh
        "cost": 1000*b
        }

    solar_panel_specs = {
        "area": sp, # m^2
        "efficiency": 0.1,
        "cost": sp*200/1.1 # $200/m2 (/1.1 eur to usd)
        }

    wind_turbine_specs = {
        "cut_in": 13, # km/h
        "rated_speed": 50, # km/h
        "cut_out": 100, # km/h
        "max_energy": 1000, # kW
        "count": wt,
        "cost": wt*1.5*10**6 # EUR/MW 
        }
    
    b_sp_constants = {
        "c1": c1,
        "c2": c2,
        "c3": c3
        }

    # initialize counters
    e_from_grid = 0 # kwh
    e_to_grid = 0 # kwh
    total_renewable = 0 # kwh
    total_sx = 0 # mol
    
    # Initiate necessary variables
    r2_prev = 0 
    reactor1_1 = pc.Reactor1()
    reactor1_1.state = "active"
    reactor2 = pc.Reactor2()
    battery = pc.Battery(0.5*battery_specs["max_charge"], battery_specs)
    energy_flow = wec.Energy_flow()
    energy_tally = pph
    r2_e_prev = 0
    p_renew_tmin1 = 0
    
    # Calculate conditions at each hourly state and store in arrays
    for hour in range(wec.data_length-12):
        state = wec.Hourly_state(hour, solar_panel_specs, wind_turbine_specs)
        
        # Energy flowing to the plant
        p_renew_t_actual = wec.calc_generated_kw(state)
        total_renewable += p_renew_t_actual
        
        forecast = (forecast_store[wt_index][sp_index][hour][0],
                    forecast_store[wt_index][sp_index][hour][1])
        
        # Allow for multiple periods per hour
        for i in range(pph):
            
            # Energy distribution for current period
            energy_tally, r2_e_prev, energy_flow = wec.distribute_energy(p_renew_t_actual,
                                                            p_renew_tmin1,
                                                            energy_tally, 
                                                            r2_e_prev, 
                                                            energy_flow, 
                                                            battery, 
                                                            b_sp_constants,
                                                            reactor2,
                                                            forecast)
                   
            # Update battery charge
            battery.charge += wec.battery_charge_differential(energy_flow.to_battery, battery)
            
            # Calculate reactor 2 state
            r2_sx_current = reactor2.react(energy_flow.to_r2, r2_prev)
            total_sx += r2_sx_current/pph
            
            r2_prev = r2_sx_current
            
            # Add up energy taken from grid
            if energy_flow.from_grid > 0:
                e_from_grid += energy_flow.from_grid/pph
                
            if energy_flow.from_grid < 0:
                e_to_grid -= energy_flow.from_grid/pph
                
        p_renew_tmin1 = p_renew_t_actual
    
    # spread capex cost over 10 years, so cost per year is always /10
    capex = (battery_specs["cost"] + solar_panel_specs["cost"] + wind_turbine_specs["cost"])/capex_life
    
    # Target production is 240 kmol S per year (1.92 million for 8 years). So, assume that S above this 
    # value is worth 0. Otherwise, it always becomes advantageous to produce more
    # S, even if we just use grid energy to do it.
    revenue = (9.6*min(total_sx, 240000*years) + 0.1*e_to_grid)/years # divide by 8 years because 8 years of training data, -> revenue / year
    
    # Roughly 15 kW required to make 1 mol S
    opex = 0.25*e_from_grid/years # divide by 8 years because 8 years of training data, -> opex / year
    
    profit = revenue - opex - capex
    
    return profit, revenue, opex, capex, total_sx, e_to_grid, e_from_grid

# This is the CPU intensive function that runs run_scenario function for each 
# run in the DOE
def run_doe(doe, parameters, forecast_store = 0, show_run_status = True ):
    
    doe[["profit", "revenue", "opex", "capex", "total_sx", "e_to_grid", "e_from_grid"]] = np.zeros([len(doe), 7])
    
    if not isinstance(forecast_store, np.ndarray):
        forecast_store = prep_forecast(parameters)
    
    for i in range(len(doe)):
        
        start = time.time()
        
        run = doe.iloc[i]
        
        profit, revenue, opex, capex, total_sx, e_to_grid, e_from_grid = run_scenario(forecast_store, parameters, run)
        doe["profit"][i] = profit
        doe["revenue"][i] = revenue
        doe["opex"][i] = opex
        doe["capex"][i] = capex
        doe["total_sx"][i] = total_sx
        doe["e_to_grid"][i] = e_to_grid
        doe["e_from_grid"][i] = e_from_grid
        
        end = time.time()
        
        if show_run_status: 
            print("Run: ", i)
            print("Time elapsed: ", end - start)
        
    return doe, forecast_store

The first pass at the DOE is below. Note that all parameters are 3 levels except for the wind turbine (wt). This is because some preliminary research suggests 1MW is one of the smallest standard sizes for wind turbines, and that this is quite expensive and powerful. We effectively structure the DOE as if this is a binary variable and we do a full factorial plus a Box-Behnken design at both levels of the wind turbine variable. When analyzing later, however, this variable will be categorized as continuous to allow the model to signal to us if a smaller wind turbine is optimal (quick research suggests that smaller wind turbines do exist, although their availability is unclear).

In [4]:
parameters = {
    "wt_list" : [0, 1], # number of 1MW wind turbines
    "sp_list" : [5000, 10000, 15000], # area in m2 of solar panels
    "b_list" : [516, 1144, 2288], # battery sizes in kW
    "c1_list" : [0, 1, 2], # constants for r2_max eqn
    "c2_list" : [0, 1, 2],
    "c3_list" : [-1, 0, 1]
    }

# DOE input is a 2-level full factorial plus a Box-Behnken to capture curvature
doe = pd.read_excel("DOE.xlsx")

# Run DOE
doe_results, forecast_store = run_doe(doe, parameters, show_run_status = False)

## Analyze the DOE

The first function 'fit_results' does a regression on the DOE results for first-order, second-order, and interaction terms.
The second function 'fit_sig_results' removes any insignificant factors (with a p-value above 0.05) and re-runs the regression. If any first-order terms are removed but still present in higher-order terms, these are added back into the model.
The third function 'generate_sig_model' re-runs these functions until a model exists that only contains significant terms and their first-order components.
The model coefficients are returned in the 'coefs' variable

In [5]:
def fit_results(doe_results, show_initial_results = True):
    X = doe_results[["wt_level", "sp_level", "b_level", "c1_level", "c2_level", "c3_level"]]
    y = doe_results["profit"]
    
    model = PolynomialFeatures(degree=2)
    fit_tr_model = model.fit_transform(X)
    lr_model = LinearRegression()
    lr_model.fit(fit_tr_model, y)
    
    factors = ["const", "wt", "sp", "b", "c1", "c2", "c3", 
               "wt^2", "wt*sp", "wt*b", "wt*c1", "wt*c2", "wt*c3",
               "sp^2", "sp*b", "sp*c1", "sp*c2", "sp*c3",
               "b^2", "b*c1","b*c2","b*c3",
               "c1^2", "c1*c2", "c1*c3", "c2^2", "c2*c3", "c3^2"]
    
    X_labeled = pd.DataFrame(fit_tr_model, columns=factors)
    
    est = sm.OLS(y, X_labeled)
    est_fit = est.fit()
    
    if show_initial_results:
        print("Initial results:")
        print(est_fit.summary())
        
    return X_labeled, y, est_fit

def fit_sig_results(X_labeled, y, est_fit):
    
    # filter non-significant factors
    sig_factors = list(est_fit.pvalues[est_fit.pvalues <= 0.05].index)
    
    # Ensure all 1st-order terms are present if they're in a higher-level term
    # Start with squared terms
    for factor in sig_factors:
        if "^" in factor:
            first_order_term = re.search("^[a-z0-9]*",factor).group(0)  
            if first_order_term not in sig_factors:
                sig_factors.append(first_order_term)
        
        # Ensure all 1st-order terms from interactions are present
        if "*" in factor:
            first_order_term1 = re.search("^[a-z0-9]*",factor).group(0)
            first_order_term2 = re.search("[a-z0-9]*$",factor).group(0)
            for first_order_term in (first_order_term1, first_order_term2):
                if first_order_term not in sig_factors:
                    sig_factors.append(first_order_term)    
    
    # Create new df of significant factors
    X_sig = X_labeled[sig_factors]
    
    # New model with only significant factors
    est_sig = sm.OLS(y, X_sig)
    est_sig_fit = est_sig.fit()
    
    return X_sig, est_sig_fit

# Generate a pd.Series of coefficients for only the significant factors and 
# their 1st-order terms
def generate_sig_model(doe_results, show_initial_results = True):
    
    # Fit results on all factors
    X_labeled, y, est_fit = fit_results(doe_results, show_initial_results)  
    
    est_fit_old = est_fit
    
    # Re-fit results on only significant factors (and their first order components)
    X_sig, est_sig_fit = fit_sig_results(X_labeled, y, est_fit)
    
    # Keep eliminating factors until there is no change in the set of factors
    break_counter = 0
    while len(est_fit_old.pvalues) != len(est_sig_fit.pvalues) and break_counter < 10:
        break_counter += 1
        est_fit_old = est_sig_fit
        X_sig, est_sig_fit = fit_sig_results(X_sig, y, est_sig_fit)
    
    coefs = est_sig_fit.params
    
    print("\nSignificant results:")
    print(est_sig_fit.summary()) 
    
    return coefs

coefs = generate_sig_model(doe_results)

Initial results:
                            OLS Regression Results                            
Dep. Variable:                 profit   R-squared:                       0.998
Model:                            OLS   Adj. R-squared:                  0.997
Method:                 Least Squares   F-statistic:                     2076.
Date:                Tue, 25 Jul 2023   Prob (F-statistic):          1.02e-145
Time:                        10:09:06   Log-Likelihood:                -1742.1
No. Observations:                 146   AIC:                             3538.
Df Residuals:                     119   BIC:                             3619.
Df Model:                          26                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       1.045e+05    2.1e+04   

## Optimize

The function below optimizes the model using gurobi. Special care is taken to consider square terms and interactions so that they are constrained to equal the product of their first-order components.

In [6]:
# Create Gurobi environment and suppress output
env = gp.Env(empty=True)
env.setParam("OutputFlag", 0)
env.start()

# Optimizes model based on the series of coefficients for each factor.
# Returns a Gurobi Model object and prints the optimal factor values
def optimize_model(coefs):
    model = gp.Model(env=env)
    model.setParam('NonConvex', 2) # To allow for quadratic equality constraints
    
    factors_dict = {}
    
    # Create gurobi variables
    for factor in coefs.index:
        if factor == 'wt': # Can modify this to make 'wt' a binary variable
            factors_dict[factor] = model.addVar(vtype=GRB.CONTINUOUS, lb=0, ub=2, name=factor)
        elif '^' not in factor and '*' not in factor:
            factors_dict[factor] = model.addVar(vtype=GRB.CONTINUOUS, lb=0, ub=2, name=factor)
        else:
            factors_dict[factor] = model.addVar(vtype=GRB.CONTINUOUS, name=factor)
    
    # Add equality constraints
    for f in factors_dict:
        if f == 'const': # Must maintain constant term, constrain this to 1
            factor = factors_dict[f]
            model.addConstr(factor == 1)
        if '^' in f: # Squared terms must be equal to the square of the first-order term
            first_order_f = re.search("^[a-z0-9]*", str(f)).group(0) 
            first_order_factor = factors_dict[first_order_f]
            factor = factors_dict[f]
            model.addConstr(factor == first_order_factor**2)
        if '*' in f: # Interaction terms must be the product of the first-order terms
            first_order_f1 = re.search("^[a-z0-9]*", str(f)).group(0)
            first_order_f2 = re.search("[a-z0-9]*$", str(f)).group(0)
            first_order_factor1 = factors_dict[first_order_f1]
            first_order_factor2 = factors_dict[first_order_f2]
            factor = factors_dict[f]
            model.addConstr(factor == first_order_factor1 * first_order_factor2)
            
    # Objective function
    model.setObjective(gp.quicksum(coefs[factor]*factors_dict[factor] for factor in factors_dict),
                        GRB.MAXIMIZE)    
    
    model.optimize()
    
    for var in model.getVars():
        print(var.varName, '=', var.x)
    print('objective value: ', model.objVal)
    
    return model

model = optimize_model(coefs)
            

const = 1.0
wt = 2.0
sp = 0.0
b = 0.0
c2 = 0.0
wt^2 = 4.0
wt*sp = 0.0
wt*b = 0.0
wt*c1 = 4.0
wt*c2 = 0.0
wt*c3 = 4.0
sp^2 = 0.0
sp*b = 0.0
b^2 = 0.0
c1 = 2.0
c3 = 2.0
objective value:  1978569.7018621417


It can be seen in the results that the optimal model is at corner point, so re-run is re-run below with new parameter values. The wind turbine number is set to 1, as the results of the first DOE indicate that 1 wind turbine is better than 0. Because they're so expensive, it is assumed that a 2nd wind turbine isn't an option. In practice, a 2nd wind turbine might indeed increase revenue by producing a large excess of energy for the grid. As the goal of this plant is to produce Sulfur and not energy, it is reasonable to limit the model to 1 wind turbine. Because of this, the size of the DOE can be reduced to eliminate wt = 0. The new DOE design is saved in DOE2.xlsx.
The other parameters are adjusted to be centered around their previous optimal values

In [7]:
parameters2 = {
    "wt_list" : [1, 1], # number of 1MW wind turbines is set to 1
    "sp_list" : [2000, 5000, 8000], # area in m2 of solar panels
    "b_list" : [263, 516, 1144], # battery sizes in kW
    "c1_list" : [1, 2, 3], # constants for battery setpoint eqn
    "c2_list" : [-1, 0, 1],
    "c3_list" : [0, 1, 2]
    }

# New DOE which removes the wind turbine factor (i.e. sets it always equal to 1)
doe2 = pd.read_excel("DOE2.xlsx")

# Run DOE
doe_results2, forecast_store = run_doe(doe2, parameters2, show_run_status = False)

coefs2 = generate_sig_model(doe_results2, show_initial_results = False)

print()

model2 = optimize_model(coefs2)


Significant results:
                            OLS Regression Results                            
Dep. Variable:                 profit   R-squared:                       0.996
Model:                            OLS   Adj. R-squared:                  0.995
Method:                 Least Squares   F-statistic:                     1848.
Date:                Tue, 25 Jul 2023   Prob (F-statistic):           1.30e-72
Time:                        10:13:20   Log-Likelihood:                -759.02
No. Observations:                  73   AIC:                             1536.
Df Residuals:                      64   BIC:                             1557.
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       8.406e+04    166.6

Adjust parameters again and re-run.

In [8]:
parameters3 = {
    "wt_list" : [1, 1], # Only the case with 1 wind turbine is considered
    "sp_list" : [4000, 6500, 9000], # area in m2 of solar panels
    "b_list" : [0, 263, 516], # battery sizes in kW
    "c1_list" : [1, 2, 3], # constants for battery setpoint eqn
    "c2_list" : [-1, 0, 1],
    "c3_list" : [0, 1, 2]
    }

# DOE form doesn't change, so 'doe2' is still valid

# Run DOE
doe_results3, forecast_store = run_doe(doe2, parameters3, show_run_status = False)

coefs3 = generate_sig_model(doe_results3, show_initial_results = False)

print()

model3 = optimize_model(coefs3)


Significant results:
                            OLS Regression Results                            
Dep. Variable:                 profit   R-squared:                       0.984
Model:                            OLS   Adj. R-squared:                  0.982
Method:                 Least Squares   F-statistic:                     674.0
Date:                Tue, 25 Jul 2023   Prob (F-statistic):           3.54e-57
Time:                        10:17:43   Log-Likelihood:                -725.13
No. Observations:                  73   AIC:                             1464.
Df Residuals:                      66   BIC:                             1480.
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        9.32e+04     93.0

Results suggest it is optimal to have no battery. This could be reasonable, as it suggests the cost of the battery is too high to be offset by the energy from grid we must buy when renewable energy production is low. As all the constants relate to parameters for determining battery set point, they are no longer relevant in the model and will be set to 0 for the next calculation.
Below, we calculate the expected results at this optimal level. Note that the optimal profit below deviates slightly from the value calculated above. This is due partly to the decision to hold the solar panel area to a round number, and partly to the error between the actual and predicted values.

In [9]:
parameters_final = {
    "wt_list" : [1], # Only the case with 1 wind turbine is considered
    "sp_list" : [6500], # area in m2 of solar panels
    "b_list" : [0], # battery sizes in kW
    "c1_list" : [0], # constants for battery setpoint eqn
    "c2_list" : [0],
    "c3_list" : [0]
    }

run = pd.Series([0, 0, 0, 0, 0, 0], ["wt_level", "sp_level", "b_level", "c1_level",
                                     "c2_level", "c3_level"])

profit, revenue, opex, capex, total_sx, e_to_grid, e_from_grid \
    = run_scenario(forecast_store, parameters_final, run)
    
print("Profit (€/yr): ", round(profit))
print("Revenue (€/yr): ", round(revenue))
print("Opex (€/yr): ", round(opex))
print("Capex (€/yr): ", round(capex))
print("Sulfur (kmol/yr): ", round(total_sx/years/1000))
print("Energy sold to grid (MW/yr): ", round(e_to_grid/years/1000, 1))
print("Energy purchased from grid (MW/yr): ", round(e_from_grid/years/1000, 1))

Profit (€/yr):  2014687
Revenue (€/yr):  2304598
Opex (€/yr):  21730
Capex (€/yr):  268182
Sulfur (kmol/yr):  261
Energy sold to grid (MW/yr):  6.0
Energy purchased from grid (MW/yr):  86.9


This is acceptable. The production goal was 240 kmol Sulfur per year, and this is slightly exceeded. The venture is profitable, with around €2 million per year, if the assumptions that went into this model are accurate.