# Plot Bars Demo

This demo illustrates use of PARETO's visualization method `plot_bars()`. Two different formats of source data can be used to create a bar plot:

1. [Data from Excel files](#example_1)

   - Data can be loaded from an Excel workbook with PARETO's `get_data()` utility function. The names of the sheets in the workbook should correspond to variable names. Each data column in a sheet should have a descriptive header. The column headers are important as they are used to detect whether the data is indexed by time and also for specifying which column a user would like the plot to be grouped by. The last column in a data sheets should always contain the values of the data; the remaining columns should have the corresponding indexes.
   - Refer to [Working With Data Indexed by Time](#section_3.1) for some examples of how you can set up these Excel sheets.<br><br>

2. [Data from PARETO models](#example_2)

   - PARETO's `generate_report()` method returns a dictionary containing variables and the data associated with them. Data in this format is already formatted for use in `plot_bars()`.

This notebook shows how to use both types of source data to create bar plots.

<div class="alert alert-block alert-warning">
<b>Warning:</b> PARETO makes use of the Plotly package to create plots. Plotly has a well-known bug where occasionally it hangs indefinitely when trying to save a figure. If you encounter this problem while using <span style="font-family: monospace">plot_bars()</span>, please refer to <a href="https://github.com/project-pareto/project-pareto/wiki/PARETO-visualization-functions:-saving-Plotly-figure-hangs">this wiki page</a> to learn about potential workarounds.
</div>

### `plot_bars()` Arguments

The plotting function is called as `plot_bars(input_data, args)`. The two input arguments are described below:

1. __`input_data`__:
The first argument is a dictionary that contains the information required to build the bar plot. Different options provide data and labels for use in `plot_bars()`. This Jupyter notebook provides two examples describing the combination of `input_data` requirements depending on the source of the data. Refer to [`input_data` Parameters](#section_3.2) for more information.<br><br>

2. __`args`__:
The second argument is a dictionary containing options for customizing the bar plot. Below is a list of keys that can be included in the `args` dictionary and the possible values for each:

  - __`plot_title`__: This value will be used as the title of the plot.
     - Default: "" (blank - no title)

  - __`y_axis`__: This specifies if the user wants to take the logarithm of the y-axis. This is helpful if you have data that span several orders of magnitude. If provided, the only possible value for this argument that will not result in an error is `"log"`.
     - Default: If this argument is not provided, then the y-axis remains scaled linearly.

  - __`group_by`__: This specifies what column will be used as the x-axis in the plot; therefore it must be typed (or copied) exactly as it is in the column header of the data. The possible values for this argument are dependent on the data that is provided.
     - For example, if the column headers/labels are ("Origin, "Destination", "Time", "Value"), then the possible values for this argument are: "Origin" or "Destination" ("Time" is not a valid choice).
     - Default: If this argument is not provided, the plot will be grouped by the first column in the data.

  - __`output_file`__: This parameter is used for creating the file that contains the bar plot generated by this method. This will be the name and the type of the file that contains the plot.
     - The possible file extensions are ".html", ".png", ".jpg", ".jpeg", ".svg", and ".pdf".
        - Warning: Plotly sometimes hangs when using file types other than ".html"; therefore, it is recommended to stick with saving plots in ".html" format. The html versions of plots include a button to download the plot in png format.
     - Set `args["output_file"] = None` to avoid saving the plot in any format.
     - Defualt: "first_bar.html"

  - __`print_data`__: This specifies if the user would like the dataframe produced by the `plot_bars()` method to be displayed in the console. This is useful for verifying that the data is correct. This argument is a boolean (`True` or `False`).
     - Default: `False`

Note: Note of these arguments are required to be provided, i.e., `plot_bars(input_data, {})` is a valid call. However, in most cases, the user will want to override some of the default argument values.

<div class="alert alert-block alert-warning">
<b>Warning:</b> Please note that the arguments are case sensitive.
</div>

<a id="example_1"></a>
# 1. Visualizing Data From Excel Files
Below are some examples of bar plots that were created with the `get_data()` output format:

### 1.1 Animated Plot
The first example shows how to create an animated plot with the following argument options:
 - Setting the specific plot title with: `"plot_title": "Trucked Water"`
 - Taking the logarithm of the y-axis with the argument: `"y_axis": "log"`
 - Grouping the data by the column named "Destination": `"group_by": "Destination"`
 - Specifying the name of the file and type of file with the argument: `"output_file": "demo_bar.html"`
 - Specifying that we would not like to print the data in the console: `"print_data": False`
 
 __Step 1: Importing files and running `get_data()`__

In [None]:
from pareto.utilities.results import plot_bars
from pareto.utilities.get_data import get_data

set_list = []
parameter_list = ["v_F_Trucked_Static", "v_F_Trucked"]
fname = "visualization_demos_data.xlsx"
[df_sets, df_parameters] = get_data(
    fname, set_list, parameter_list, sum_repeated_indexes=True
)

Note: The `sum_repeated_indexes` argument which is passed to `get_data()` above causes the values for any repeated indexes in the data to be added together. The default value for this argument is `False`.

__Step 2: Setting up the parameters and running `plot_bars()`__

In [None]:
args = {
    "plot_title": "Trucked Water",
    "y_axis": "log",
    "group_by": "Destination",
    "output_file": "demo_bar.html",
    "print_data": False,
}

# We provide "labels" since they are required for this data format (see the "Input Data"
# section near the end of this notebook for a full explanation)
input_data = {
    "pareto_var": df_parameters["v_F_Trucked"],
    "labels": [("Origin", "Destination", "Time", "Trucked Water (bbl/day)")],
}

fig = plot_bars(input_data, args)
fig.update_layout(width=1000, height=500)
fig  # Display figure in Jupyter notebook

This plot shows trucked water volumes grouped by destination over time. Note that the data plotted here is arbitrary and has been created solely for the purpose of helping demonstrating the capabilities of the `plot_bars()` function.

### 1.2 Static Plot
This example shows how to create a static plot with the following argument options:
 - Using the default plot title value (blank title) by not providing `"plot_title"`
 - Grouping the data by the column named "Origin": `"group_by": "Origin"`
 - Specifying the output file: `"output_file": "demo_bar.html"`
 - Specifying that we would like to print the data in the console: `"print_data": True`

In [None]:
# Using the imports from the above cells
args = {"group_by": "Origin", "output_file": "demo_bar.html", "print_data": True}

# We provide "labels" since they are required for this data format (see the "Input Data"
# section near the end of this notebook for a full explanation)
input_data = {
    "pareto_var": df_parameters["v_F_Trucked_Static"],
    "labels": [("Origin", "Destination", "Trucked Water (bbl)")],
}

fig = plot_bars(input_data, args)
fig.update_layout(width=1000, height=500)
fig  # Display figure in Jupyter notebook

The "Destination" column in the above printed table is an artifact that should be ignored. Since we grouped by "Origin", the first row of the table indicates that a cumulative total of 1850 bbl of trucked water has A as its origin, but the destinations for that water are various other nodes, not simply D.

<a id="example_2"></a>
# 2. Visualizing Optimization Model Results
Below are some examples of how to use results from an optimized PARETO model with the `generate_report()` method to create bar plots.

### 2.1 Animated Plot
The first example will show how to create an animated plot using the optimized model with the following argument options:
 - Setting the specific plot title with: `"plot_title": "Trucked Water"`
 - Using the default y-axis setting (linear) by not providing the argument: `"y_axis"`
 - Using the default group_by setting (first column in data) by not providing the argument: `"group_by"`
 - Specifying the name of the file and type of file with the argument: `"output_file": "demo_bar2.html"`
 - Using the default print_data setting (no printing) by not providing the argument: `"print_data"`
 
 __Step 1: Importing files and setting up the parameter and set lists__

In [None]:
from pareto.operational_water_management.operational_produced_water_optimization_model import (
    WaterQuality,
    create_model,
    ProdTank,
)
from pareto.utilities.results import generate_report, PrintValues
from pareto.utilities.solvers import get_solver, set_timeout
from pareto.utilities.results import plot_bars
from pareto.utilities.get_data import get_data
from importlib import resources

# Tabs in the input Excel spreadsheet
set_list = [
    "ProductionPads",
    "CompletionsPads",
    "ProductionTanks",
    "ExternalWaterSources",
    "WaterQualityComponents",
    "StorageSites",
    "SWDSites",
    "TreatmentSites",
    "ReuseOptions",
    "NetworkNodes",
]
parameter_list = [
    "Units",
    "RCA",
    "FCA",
    "PCT",
    "FCT",
    "CCT",
    "PKT",
    "PRT",
    "CKT",
    "CRT",
    "PAL",
    "CompletionsDemand",
    "PadRates",
    "FlowbackRates",
    "ProductionTankCapacity",
    "DisposalCapacity",
    "CompletionsPadStorage",
    "TreatmentCapacity",
    "ExtWaterSourcingAvailability",
    "PadOffloadingCapacity",
    "TruckingTime",
    "DisposalOperationalCost",
    "TreatmentOperationalCost",
    "ReuseOperationalCost",
    "PadStorageCost",
    "PipelineOperationalCost",
    "TruckingHourlyCost",
    "ExternalSourcingCost",
    "ProductionRates",
    "TreatmentEfficiency",
    "ExternalWaterQuality",
    "PadWaterQuality",
    "StorageInitialWaterQuality",
]

__Step 2: Setting file name, passing file into `get_data()`, and creating the operational model__

In [None]:
# Load data
with resources.path(
    "pareto.case_studies", "operational_generic_case_study.xlsx"
) as fpath:
    [df_sets, df_parameters] = get_data(fpath, set_list, parameter_list)

# Additional input data
df_parameters["MinTruckFlow"] = 75
df_parameters["MaxTruckFlow"] = 37000

# Create mathematical model
operational_model = create_model(
    df_sets,
    df_parameters,
    default={
        "has_pipeline_constraints": True,
        "production_tanks": ProdTank.equalized,
        "water_quality": WaterQuality.false,
    },
)

__Step 3: Solve and optimize the model__

In [None]:
# Initialize Pyomo solver
opt = get_solver("gurobi_direct", "gurobi", "cbc")
set_timeout(opt, timeout_s=60)

# Solve mathematical model
results = opt.solve(operational_model, tee=True)
results.write()

__Step 4: Run `generate_report()`__

In [None]:
[model, results_dict] = generate_report(
    operational_model,
    is_print=PrintValues.essential,
    fname=None,
)

__Step 5: Setting up the parameters and using the returned data from `generate_report()` to run `plot_bars()`__

In [None]:
args = {"plot_title": "Trucked Water", "output_file": "demo_bar2.html"}

# Notice there are no labels provided since they are not required for the generate_report output format
input_data = {"pareto_var": results_dict["v_F_Trucked_dict"]}

fig = plot_bars(input_data, args)
fig.update_layout(width=1000, height=500)
fig  # Display figure in Jupyter notebook

### 2.2 Static Plot
This example will show how to create a static plot using the optimized model with no argument options specified allowing the method to use all of the default settings for the plot (note that this example saves a file called "first_bar.html", as that is the default when no file name is provided):

In [None]:
# Using the imports from the above cells
args = {}

# Notice there are no labels provided since they are not required for the generate_report output format
input_data = {"pareto_var": results_dict["v_D_Capacity_dict"]}

fig = plot_bars(input_data, args)
fig.update_layout(width=1000, height=500)
fig  # Display figure in Jupyter notebook

# 3. Additional Information
<a id="section_3.1"></a>
## 3.1 Working With Data Indexed by Time
Notice that the data in the below image includes a column that has a header named "Time" which is what allows the method to detect that the data is indexed by time and therefore should create an animated plot:

![time_indexed_data.png](attachment:time_indexed_data.png)

The Time column can be formatted in a few ways:
- __Full Dates:__
   - These can be formatted with forward slashes or dashes as long as the day, month and year have been provided.
        - __Year__ - The year should always be entered with all 4 digits so that the system is not confused about which value is the year when using any of the various naming conventions. For example, use "2022" instead of "22".
        - __Month__ - The month can be the actual name of the month if preferred but it will not appear on the plot with the full name and will instead be abbreviated. The month can be the numerical value of the month, the abbreviated name of the month, or the full name of the month. For example, if we are using December, the possible values are: "12", "Dec", or "December".
        - __Day__ - The day will always be the numerical value and should be placed logically so as to not confuse the system. For example, "2022/30/12" is not a normal structure for a date. Instead, it should be either "12/30/2022" or "2022/12/30".
   - There are many combinations that can be used, but we suggest using one of the following naming conventions for full dates: 
        - 2022/12/30 or 2022-12-30
        - 12/30/2022 or 12-30-2020
        - December/30/2022 or December-30-2022
        - 2022/December/30 or 2022-December-30
        - Dec/30/2022 or Dec-30-2022
        - 30/Dec/2022 or 30-Dec-2022
   - Please refer to [pandas documentation](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) for more details on how full dates can be formatted for `plot_bars()`.
- __One Letter Time Periods:__
  - These must be formatted with exactly one letter before the number of the time period. Although any letter can be used, we suggest using the following conventions:
      - T1 or T01 (time period), M1 or M01 (month), Y1 or Y01 (year), D1 or D01 (day)

<a id="section_3.2"></a>
## 3.2 `input_data` Parameters
Input data for `plot_bars()` can be loaded in two different ways which were demonstrated above:
1. Importing data from an Excel workbook with `get_data()`.
2. Using data from a solved PARETO model with `generate_report()`.

Below is more info about the `input_data` parameter.

### Input Data
The first parameter passed into `plot_bars()` is `"input_data"` which is a dictionary with the following keys:
 - __`"pareto_var"`__ (required): This key contains the data to be plotted. For example:
     - `[df_sets, df_parameters] = get_data(fname, set_list, parameter_list)` returns `df_parameters`, which contains a dictionary of the variable names and their data. So if `v_F_Trucked` is the data we want to plot, we would set `input_data["pareto_var"] = df_parameters["v_F_Trucked"]`.
 - __`"labels"`__ (only required for option 1, loading data from Excel): These labels are used to head the data and to describe the x- and y-axes. These labels should match the headings of the data in the Excel sheets. If using data generated by `generate_report()` (option 2), these labels do not have to be provided. You must provide a tuple of the labels like the following:
     - If the data being passed into `plot_bars()` contains rows of an Origin, Destination, Time and the amount of Trucked Water, then the value to be assigned to `"labels"` should be something like: `[("Origin", "Destination", "Time", "Trucked Water")]`. The value column of the data can be given any label desired as this will be used on the y-axis of the data. All columns preceding the value column are case sensitive and could be used by the `"group_by"` argument (except for "Time" column).

<div class="alert alert-block alert-warning">
<b>Warning:</b> Please note that the input data options are case sensitive.
</div>

```Python 
# Example 1 (plot Excel data):
input_data = {"pareto_var": df_parameters["v_F_Trucked"], 
              "labels": [("Origin", "Destination", "Time", "Trucked Water")]
    }

# Example 2 (plot solved model data):
input_data = {"pareto_var": results_dict["v_F_Trucked"], 
              # "labels":  # not required
    }
```