# Spatially explicit optimization - Adjacency 

#### Category: Integer programming (IP)

#### What is it about?
- Use integer programming to solve a problem with spatial constraints.
- Read input data from excel and shapefile.
- Process and display the solver results in various ways. Add the result to an existing shapefile for further processing in a GIS software.

## Introduction

The Uetliberg forest is divided up into more than 600 stands which are used as decision units for managing actions. All stands in the timber stage (n=266) have been selected for __harvest in the next n upcoming periods__. 
The stands should be harvested in order to __maximize harvested timber volume__ under the restriction that the size of openings after harvest is limited. Therefore, the management authority has set the policy that __harvesting of neighboring stands during the same period is not allowed__. The authority would like to know how to schedule harvest of the stands in order to concurrently meet the objective and the harvesting constraint. - Slides SAMO Exercise 7

The following tasks have already been taken care of by a colleague:
- All stands in timber stage were selected and saved in a shapefile in folder shp.
- A one-sided adjacency list of all stands in timber stage was generated using a GIS software and saved as an excel document in folder xls.

## Mathematical model

#### Description
$
\begin{equation*}
n_{stands}: \text{Number of stands} \\
n_{periods}: \text{Number of periods}\\
S : \text{Set of stand IDs}\\
P : \text{Set of time periods}\\
X_{p,s} : \text{Binary variable whether stand s is harvested in time period p}\\
vol: \text{Volumes of stands in $m^3$}\\
adja\_list: \text{Adjacency list of stands, one-sided}\\
\end{equation*}
$

#### Index sets
$
\begin{equation*}
P = \{0,1,2, ... , (n_{periods}-1)\}\\
S = \{0,1,2, ... , (n_{stands}-1)\}\\
\end{equation*}
$

#### Decision variables
$
\begin{equation*}
X_{p,s} \qquad p \in P, \: s \in S, \: X \in \{0,1\}\\
\end{equation*}
$

#### Objective
$
\begin{equation*}
MAX \: \sum\limits_{p \in P} \sum\limits_{s \in S}vol_sX_{p,s}\\
\end{equation*}
$

#### Constraint: Harvest each stand only once
$
\begin{equation*}
\sum\limits_{p \in P}X_{p,s} \leq 1 \qquad \forall \: s \in S\\
\end{equation*}
$

#### Constraint: Adjacency restriction
$
\begin{equation*}
X_{p,k} + X_{p,l} \leq 1 \qquad \forall \: p \in P, \: \forall \:k,l \in adja\_list\\
\end{equation*}
$

## Pyomo implementation

#### Important: Because the following code cells build on each other, you MUST run every code cell starting from now! If you get an error, try selecting the cell and click "Cell" -> "Run All Above" in the taskbar above and then run the cell again.

#### Suggested workflow
1. Load all needed packages and data in your script and transform the data into a suitable structure.
2. Create a model object.
3. Define the index sets.
4. Based on the index sets, define the decision variables.
5. Specify the objective.
6. Specify the constraints.
7. Decide on a suitable solver depending on your problem and solve it.
8. Process the results.

### Step 1: Load all needed packages and data in your script and transform the data into a suitable structure
- Import everything from pyomo.environ to use it without prefix.
- Import numpy for array processing.
- Import pandas to read the excel file containing the adjacency list into a dataframe.
- Import geopandas to read and write shapefiles from and to dataframes.
- Import display from IPython to nicely display pandas and geopandas dataframes.

In [None]:
from pyomo.environ import *
import numpy as np
import pandas as pd
import geopandas as gpd
from IPython.display import display

Specify the path to the solver executable:

In [None]:
# For windows: r'../_Solvers/Cbc-2.9.9-win32-msvc14/bin/cbc.exe'
# For ubuntu bionic beaver: r'../_Solvers/Ubuntu_Bionic/Cbc-2.9.8/bin/cbc'
solver_path = r'../_Solvers/Cbc-2.9.9-win32-msvc14/bin/cbc.exe'

Specify the number of time periods to consider in the optimization:

In [None]:
n_periods = 3

Use pandas (prefixed pd) to read the adjacency list from an excel file into a pandas dataframe. Similar to R, dataframes are two dimensional labelled data structures, which provide some convenient functionality like the to_string() function for nice printing. If you are into data analysis and want to learn more about pandas, you could start <a href="https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python" target="_blank">here</a>.

Note the use of the imported IPython function __display()__ to nicely display the data. Alternatively you could use the standard print(x) to display a variable x. If x is a pandas and geopandas dataframe, you can also use print(x.to_string()). Feel free to give it a try!

In [None]:
adja_list_df = pd.read_excel('xls/AdjacencyUetliberg.xls', usecols=[1, 2])
display(adja_list_df)

Because pandas dataframes are based on numpy arrays, we can extract the underlying numpy array by accessing the .values attribute.
The adjacency list is 1-based, but numpy's Matlab like syntax makes it easy to subtract 1 to make it 0-based. That's it, the adjacency list $adja\_list$ is ready!

In [None]:
adja_list = adja_list_df.values
adja_list = adja_list -1
print(adja_list)

Next, geopandas (prefixed gpd) is used to read the shapefile containing the forest stand polygons into a geopandas dataframe. The attribute table of the shapefile contains a column with the stand volumes that we need for our model. Feel free to have a look at the shapefile in your favorite GIS program.

In [None]:
shp = gpd.read_file("shp/StandsUetliberg.shp")
display(shp)

Using typical pandas syntax, square brackets allows us to easily extract a column from the dataframe. Accessing the .values attribute again returns the underlying numpy array. Et voila, the volume array $vol$ is ready for use in our model!

In [None]:
vol = shp['Vol'].values
print(vol)

There are several ways to get the number of stands. Here the length of the volume vector is used:

In [None]:
n_stands = len(vol)
print(n_stands)

__That is it for the data preparation!__ The following data is now ready in a suitable form to be used in the model:
- $n_{periods}$
- $n_{stands}$
- $adja\_list$
- $vol$

### Step 2: Create a model object
Create a concrete model object and save it as mo:

In [None]:
mo = ConcreteModel()

### Step 3: Define the index sets
$
\begin{equation*}
P = \{0,1,2, ... , (n_{periods}-1)\}\\
S = \{0,1,2, ... , (n_{stands}-1)\}\\
\end{equation*}
$
<br>
We define one index set of type integer to account for the time periods and one for the stand IDs. The range() function is very well suited for that purpose. Remember that range(x) returns a list of consecutive integers ranging from 0 up to but not including integer x:

In [None]:
mo.P = Set(initialize=range(n_periods))
mo.S = Set(initialize=range(n_stands))

In [None]:
mo.P.pprint()

In [None]:
mo.S.pprint()

### Step 4: Based on the index sets, define the decision variables
$
\begin{equation*}
X_{p,s} \qquad p \in P, \; s \in S, \: X \in \{0,1\}\\
\end{equation*}
$
<br>
This is an example of a decision variable being indexed over two index sets. For each combination of indices, a decision variable is created. Note how closely the code resembles the mathematical notation. For this model there are a total of 798 decision variables (3 time periods * 266 stands). They can be accessed via __X[period, stand]__.

In [None]:
mo.X = Var(mo.P, mo.S, within=Binary, initialize=0)

In [None]:
mo.X.pprint()

### Step 5: Specify the objective
$
\begin{equation*}
MAX \: \sum\limits_{p \in P} \sum\limits_{s \in S}vol_sX_{p,s}\\
\end{equation*}
$
<br>
Nested for-loops are used to sum over both index sets:

In [None]:
mo.obj = Objective(sense=maximize,
                   expr=sum(vol[s] * mo.X[p,s] for p in mo.P for s in mo.S))

In [None]:
mo.obj.pprint()

### Step 6: Specify the constraints

#### Constraint: Harvest each stand only once
$
\begin{equation*}
\sum\limits_{p \in P}X_{p,s} \leq 1 \qquad \forall \: s \in S \\
\end{equation*}
$
<br>
This is a constraint that needs to be specified for each stand, as indicated by the $\forall \: s \in S$. This calls for the use of the component __ConstraintList__, a container for Constraint components. Constraints are added to ConstraintList by using ConstraintLists' __.add()__ function, whereby the constraint to be added is specified with the __expr__ keyword.

First, a ConstraintList object is created and added to the model. A for-loop then iterates over all items in the stand index set and adds a constraint for each.

In [None]:
mo.c_harvest_once = ConstraintList()
for s in mo.S:
    mo.c_harvest_once.add(expr=sum(mo.X[p, s] for p in mo.P) <= 1)

In [None]:
mo.c_harvest_once.pprint()

<hr>

### Rule of thumb:
1. Sum symbols $\sum$ in the mathematical notation can be transferred to pyomo code by using sum() and list comprehension like syntax.
2. For-all symbols $\forall$ can be transferred to pyomo code by using a ConstraintList and a for-loop to iterate over the specified set. Multiple $\forall$ symbols require nested for loops.

<hr>

#### Constraint: Adjacency restriction
$
\begin{equation*}
X_{p,k} + X_{p,l} \leq 1 \qquad \forall \: p \in P, \: \forall \:k,l \in adja\_list\\
\end{equation*}
$
<br>
Following the rule of thumb above, $\forall \: p \in P$ and $\forall \:k,l \in adja\_list$ can be transferred to pyomo using nested for-loops. Note the simultaneous loop variable assignment in the inner loop, as adja_list is basically an array consisting of nested 2-element arrays of adjacent stand IDs.

In [None]:
mo.c_adjacency_restriction = ConstraintList()
for p in mo.P:
    for k, l in adja_list:
        mo.c_adjacency_restriction.add(expr=mo.X[p, k] + mo.X[p, l] <= 1)

In [None]:
mo.c_adjacency_restriction.pprint()

### Step 7: Decide on a suitable solver depending on your problem and solve it
Save model structure to a filename opti_model.txt in folder logs. Then create a solver object using the CBC solver as this is a IP. Save solver log to solver_log.txt in folder logs. With keword tee=True, log will also be printed.

In [None]:
with open('logs/opti_model.txt', 'w') as f:
    mo.pprint(ostream=f)

In [None]:
print('--- start solver ---')
solver = SolverFactory('cbc', executable=solver_path)
solver.solve(mo, tee=True, logfile='logs/solver_log.txt')
print('--- finished ---')

### Step 8: Process the results

The goal of this section is to convert the model solution from its binary form into a aggregated list, where each entry holds the time period of when the corresponding stand is harvested. This list is then added to the geopandas dataframe containing the stand polygons and exported as shapefile. The newly created attribute can then be used for visualization purposes of the optimal harvesting schedule in the GIS application of your choice.

Lets first have a look at the decision variables. We use the items() function, which returns key:value pairs, which are assigned to the loop variables k and v respectively. Note that k is a tuple of the form (period, stand), because the decision variables use a 2-dimensional key. Also note, that the function value() must be used on v to retrieve the value of the decision variable.

In [None]:
print('P | S | V') # Time period | Stand ID | 1 if stand is harvested, 0 otherwise
print('-' * 9)
for k,v in mo.X.items():
    period = k[0]
    stand = k[1]
    harvested = value(v)
    print(str(period) + ' | ' + str(stand) + ' | ' + str(int(harvested)))

Next we need to aggregate results from binary decision variables to a list that contains the harvest period for each stand. Some stands may not get harvested due to constraints. We want to assign them the value -1 in the aggregated list. To that end, lets first create a numpy array with a default value -1 for each stand using the numpy function <a href="https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.full.html" target="_blank">np.full()</a>:

In [None]:
aggregated_periods = np.full(n_stands, fill_value=-1)
print(aggregated_periods)

Because of the constraint that every stand can be harvested only once, each stand can have at most one non-zero decision variable value over all time periods. 

That means, we can loop over all decision variables and check their value. If the value is larger than 0 (meaning the stand is harvested), we index into position k[1] of the aggregated_periods array and set its value to k[0]. __Remember:__ k[1] is the stand ID of the current decision variable and k[0] is the time period.

In [None]:
for k, v in mo.X.items():
    if value(v) > 0:
        aggregated_periods[k[1]] = k[0]
print(aggregated_periods)

It is more intuitive if the first time period is 1 instead of 0 and the 3rd is 3 instead of 2. However, stands that are not harvested are perfectly fine at -1. Let's therefore increase all elements by 1, except for the ones that are -1. As you see below, numpy arrays support __boolean indexing__ the same way as R and Matlab do.

In [None]:
aggregated_periods[aggregated_periods > -1] += 1
print(aggregated_periods)

When processing data, it is worth to pause from time to time and write a quick test to see if everything works as expected. This may save a lot of frustration and time down the road. For the first 10 entries, let's check if our aggregation makes sense, i.e. whether the aggregated periods array is similar to original binary result. If you are interested in how the join() function works, have a look <a href="https://www.tutorialspoint.com/python/string_join.htm" target="_blank">here</a>.

In [None]:
print('Stand ID: Periods...')
for s in range(10):
    print(str(s) + ': ' + ' | '.join(str(int(value(mo.X[p, s]))) for p in range(n_periods) ))
print('-' * 30)
print('Aggregated:')
print(aggregated_periods[:10])

Looks good! Now let's assign the aggregated period array as new attribute "period" to the geopandas dataframe shp. Similar to a dictionary, a new column can be added by assigning data to a key that does not yet exist in the dataframe.

In [None]:
shp['period'] = aggregated_periods
display(shp)

The modified dataframe is exported as shapefile named output.shp to folder result using geopandas' to_file() function.

In [None]:
shp.to_file(filename='result/output.shp', driver='ESRI Shapefile')