# PARETO - Strategic Model Demo
The purpose of this Jupyter notebook is to provide a demonstration of PARETO's strategic model. This demo will show how to use PARETO's Python coding interface. If you prefer a graphical interface, you can download the PARETO GUI [here](https://www.project-pareto.org/software/).

## Introduction
This demo is based on PARETO's strategic toy case study. Relevant links:
- [Strategic model documentation](https://pareto.readthedocs.io/en/latest/model_library/strategic_water_management/index.html)
- [Documentation of PARETO case studies](https://pareto.readthedocs.io/en/latest/case_studies/index.html)

The strategic toy case study features a very small produced water network. This network is smaller than most realistic produced water networks, but the small size of this example makes it useful for testing, debugging, demonstrations, etc. Below is a schematic image of the strategic toy network:

![Strategic toy case study network](../../../docs/img/strategic_toy_network.png)

Please note that the strategic toy case study data is completely arbitrary, but meant to be representative of a real produced water network. We will now proceed with demonstrating all of the steps that are needed to set up and solve an instance of PARETO's strategic model.

## Step 1: Import needed files and libraries

In [None]:
#####################################################################################################
# PARETO was produced under the DOE Produced Water Application for Beneficial Reuse Environmental
# Impact and Treatment Optimization (PARETO), and is copyright (c) 2021 by the software owners: The
# Regents of the University of California, through Lawrence Berkeley National Laboratory, et al. All
# rights reserved.
#
# NOTICE. This Software was developed under funding from the U.S. Department of Energy and the
# U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted
# for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license
# in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform
# publicly and display publicly, and to permit other to do so.
#####################################################################################################

from pareto.strategic_water_management.strategic_produced_water_optimization import (
    WaterQuality,
    create_model,
    Objectives,
    solve_model,
    PipelineCost,
    PipelineCapacity,
)
from pareto.utilities.get_data import get_data
from pareto.utilities.results import (
    plot_bars,
    plot_sankey,
    generate_report,
    PrintValues,
    OutputUnits,
    is_feasible,
    nostdout,
)
from importlib import resources

## Step 2: Set up the set list and parameter list
The data defining the produced water network is stored in an Excel workbook (often referred to as the "input file"). Some of the tabs in the input file define index sets for the model, whereas other tabs contain the parameter data that will used to build the model.

A rather large amount of data must be provided by the user, and so we will not attempt to summarize all of the data input requirements here. Instead, we refer the reader to the [strategic model documentation](https://pareto.readthedocs.io/en/latest/model_library/strategic_water_management/index.html). You can also download and peruse several example input files from GitHub [here](https://github.com/project-pareto/project-pareto/tree/main/pareto/case_studies) (`strategic_toy_case_study.xlsx` is the input file which will be loaded and used subsequently in this demo).

In [None]:
# Each entry in set_list corresponds to a tab in the Excel input file that defines an index set.
set_list = [
    "ProductionPads", "CompletionsPads", "SWDSites", "FreshwaterSources", "StorageSites",
    "TreatmentSites", "ReuseOptions", "NetworkNodes", "PipelineDiameters", "StorageCapacities",
    "InjectionCapacities", "TreatmentCapacities", "TreatmentTechnologies",
]
# Each entry in parameter_list also corresponds to a tab in the Excel input file, but these
# tabs have parameter data.
parameter_list = [
    "Units", "PNA", "CNA", "CCA", "NNA", "NCA", "NKA", "NRA", "NSA", "FCA", "RCA", "RNA",
    "RSA", "SCA", "SNA", "PCT", "PKT", "FCT", "CST", "CCT", "CKT", "CompletionsPadOutsideSystem",
    "DesalinationTechnologies", "DesalinationSites", "TruckingTime", "CompletionsDemand",
    "PadRates", "FlowbackRates", "NodeCapacities", "InitialPipelineCapacity",
    "InitialDisposalCapacity", "InitialTreatmentCapacity", "FreshwaterSourcingAvailability",
    "PadOffloadingCapacity", "CompletionsPadStorage", "DisposalOperationalCost",
    "TreatmentOperationalCost", "ReuseOperationalCost", "PipelineOperationalCost",
    "FreshSourcingCost", "TruckingHourlyCost", "PipelineDiameterValues",
    "DisposalCapacityIncrements", "InitialStorageCapacity", "StorageCapacityIncrements",
    "TreatmentCapacityIncrements", "TreatmentEfficiency", "DisposalExpansionCost",
    "StorageExpansionCost", "TreatmentExpansionCost", "PipelineCapexDistanceBased",
    "PipelineCapexCapacityBased", "PipelineCapacityIncrements", "PipelineExpansionDistance",
    "Hydraulics", "Economics", "PadWaterQuality", "StorageInitialWaterQuality",
    "PadStorageInitialWaterQuality", "DisposalOperatingCapacity",
]

## Step 3: Load data into Python
The `get_data` function called below reads the data from the input file into the `df_sets` and `df_parameters` variables. The variables `df_sets` and `df_parameters` which are returned are dictionaries where the keys are the strings from `set_list` and `parameter_list`, respectively, and the values are either dictionaries or Pandas dataframe objects containing the data as read in from the corresponding workbook tab.

In [None]:
# Load data from Excel input file into Python
with resources.path(
    "pareto.case_studies",
    "strategic_toy_case_study.xlsx",
) as fpath:
    [df_sets, df_parameters] = get_data(fpath, set_list, parameter_list)

### Step 3.1 (optional): Display input data
A simple way to view the input data is to simply print members from `df_sets` and `df_parameters`:

In [None]:
print(type(df_sets["ProductionPads"]))
print(df_sets["ProductionPads"])
print()
print(type(df_parameters["CompletionsDemand"]))
print(df_parameters["CompletionsDemand"])
print()
print(type(df_parameters["DesalinationTechnologies"]))
print(df_parameters["DesalinationTechnologies"])

The `plot_bars` function can also be used to create bar charts. If the plotted variable is indexed by time, then a dynamic, animated chart is created.

In [None]:
input_data = {"pareto_var": df_parameters["PadRates"],
              "labels": [("Production pad", "Time", "Production forecast (bbl/day)")],
             }
args = {"plot_title": "Production forecast",
        "output_file": "demo_bar.html",
        "print_data": False,
        "jupyter_notebook": True,  # setting this option to True causes the bar chart to appear in the Jupyter notebook
       }
plot_bars(input_data, args)

## Step 4: Build the Pyomo model
The `create_model` function called below uses the [Pyomo](http://www.pyomo.org/) modeling language to build a mathematical model of the produced water network. There are five different settings which can be specified for building the model:
| Setting | Possible values (default in *italics*) |
| ------ | ----------------------------------------------- |
| `objective` | *`Objectives.cost`* - minimize the total annualized cost of produced water management over the decision horizon<br>`Objectives.reuse` - maximuze the amount of produced water that is reused over the decision horizon |
| `pipeline_cost` | *`PipelineCost.capacity_based`* - use pipeline capacities and rate in [currency/volume] to calculate pipeline CAPEX costs<br>`PipelineCost.distance_based` - use pipeline distances and rate in [currency/(diameter-distance)] to calculate pipeline CAPEX costs |
| `pipeline_capacity` | *`PipelineCapacity.input`* - flow capacity for each pipe diameter is provided by the user<br>`PipelineCapacity.calculate` - flow capacity for each pipe diameter is calculated based on the diameter and provided pipe hydraulics data |
| `node_capacity` | *`True`* - Include upper bound on network node flow capacity <br>`False` - Exclude upper bound on network node flow capacity |
| `water_quality` | `WaterQuality.false` - Exclude any water quality calculations from the model<br>*`WaterQuality.post_process`* - Calculate water quality throughout the network post-optimization<br>`WaterQuality.discrete` - Discretize the water quality variables to include in the optimization model |


In [None]:
# Create Pyomo model
strategic_model = create_model(
    df_sets,
    df_parameters,
    default={
        "objective": Objectives.cost,
        "pipeline_cost": PipelineCost.distance_based,
        "pipeline_capacity": PipelineCapacity.input,
        "node_capacity": True,
        "water_quality": WaterQuality.false,
    },
)

## Step 5: Solve the model
The `solve_model` function below solves the model with the provided options. There are six different options which can be passed:
| Option | Description | Default value |
| ------ | ----------- | ------------- |
| `solver` | Either a string with solver name or a tuple of strings with several solvers to try and load in order. PARETO currently supports the Gurobi (commercial) and CBC (free) solvers, but it might be possible to use other MILP solvers as well. | `("gurobi_direct", "gurobi", "cbc")` |
| `deactivate_slacks` | `True` to deactivate slack variables, `False` to use slack variables. | `True` |
| `scale_model` | `True` to apply scaling to the model, `False` to not apply scaling. | `False` |
| `scaling_factor` | Scaling factor to apply to the model (only relevant if `scale_model` is `True``). | 1000000 |
| `running_time` | Maximum solver running time in seconds. | 60 |
| `gap` | Solver gap. | 0 |

In [None]:
# Solve Pyomo model with specified options
# TODO change solver to CBC when ready
options = {
    "solver": "gurobi",  # "gurobi" is another sovler option if you have it installed
    "deactivate_slacks": True,
    "scale_model": False,
    "scaling_factor": 1000000,
    "running_time": 300,
    "gap": 0,
}
results = solve_model(model=strategic_model, options=options)

The `is_feasible` function can be called after solving the model to check whether or not the solution returned by the solver violates any constraints.

In [None]:
# Check feasibility of the solved model.
with nostdout():
    feasibility_status = is_feasible(strategic_model)
if not feasibility_status:
    print("Model results are not feasible and should not be trusted")
else:
    print("Model results validated and found to pass feasibility tests")

## Step 6: Analyze results

After running the model we can print the results in the format of an excel report which allows us to visualize the data.

The `generate_report` function 

One required argument for the model and then four optional keyword arguments:

| Option | Description | Default value |
| ------ | ----------- | ------------- |
| `results_obj` | TODO | TODO |
| `is_print` | TODO | TODO |
| `output_units` | TODO | TODO |
| `fname` | TODO | TODO |

TODO: Note that the value for is_print just affects what gets printed to the console; it does not affect what gets written to the Excel report. Rarely will there be a reason for it to be anything other than empty or PrintValues.essential.
TODO consider rewriting generate report so that is_print does not have to be a list.

In [None]:
# Generate report with results in Excel
[model, results_dict] = generate_report(
    strategic_model,
    results_obj=results,
    is_print=PrintValues.essential,
    output_units=OutputUnits.user_units,
    fname="strategic_optimization_results.xlsx",
)

In [None]:
# This shows how to read data from PARETO reports
set_list = []
parameter_list = ["v_F_Trucked", "v_C_Trucked"]
fname = "strategic_optimization_results.xlsx"
[sets_reports, parameters_report] = get_data(fname, set_list, parameter_list)

### Step 6.1 (optional): Generate Sankey diagram

In [None]:
args = {"plot_title": "Trucked Water",
        "output_file": "demo_sankey.html"
       }

input_data = {"pareto_var": results_dict["v_F_Piped_dict"], 
              "sections": {"Region 1": ["PP01", "PP03", "N01", "N02", "N03", "PP02", "CP01", "N05", "N06", "N07", "N08"],
                             "Region 2": ["PP03", "N03", "N04"]
                          },
             }

plot_sankey(input_data, args)

TODO notes
- Should I include a section on multi-objective optimization? Some notes that were in Miguel's version of the notebook said "4.5 MM increase in cost will allow us to reuse 8.5MM additional barrels of produced water. (doted lines)" and "Further investments do not have a drastic effect on water reuse. (red box)"
- Consider adding an optional segment at the end with a version of sensitivity analysis