# IEOR4575 Project
Instructor: Professor Shipra Agrawal\
Contributors: Yunhao Tang, Abhi Gupta

## State-Action Description

State s is an array with give components

* s[0]:  constraint matrix $A$of the current LP ($\max  -c^Tx \text{ s.t. }Ax \le  b$) . Dimension is $m \times n$. See by printing s[0].shape. Here $n$ is the (fixed) number of variables. For instances of size 60 by 60 used in the above command, $n$ will remain fixed as 60. And $m$ is the current number of constraints. Initially, $m$ is to the number of constraints in the IP instance. (For instances generated with --num-c=60, $m$ is 60 at the first step).  But $m$ will increase by one in every step of the episode as one new constraint (cut) is added on taking an action.
* s[1]: rhs $b$ for the current LP ($Ax\le b$). Dimension same as the number $m$ in matrix A.
* s[2]: coefficient vector $c$ from the LP objective ($-c^Tx$). Dimension same as the number of variables, i.e., $n$.
* s[3],  s[4]: Gomory cuts available in the current round of Gomory's cutting plane algorithm. Each cut $i$ is of the form $D_i x\le d_i$.   s[3] gives the matrix $D$ (of dimension $k \times n$) of cuts and s[4] gives the rhs $d$ (of dimension $k$). The number of cuts $k$ available in each round changes, you can find it out by printing the size of last component of state, i.e., s[4].size or s[-1].size.

## Example
You can use the Jupyter notebook example.ipnyb on colab to familiarize yourself with the cutting plane environment that we have built for you.

If you are using an offline environment (not colab) you can use example.py file.
```
$ python example.py
```

## TASK
Train on two training environments: easy and hard:
 10 instances and
100 instances
of size n=60, m=60, episode length 50

Submit Code + Report of at most 5 pages, with algorithm, plots etc.
Additional pages can be used to provide supplementary material which may or may not be reviewed, as necessary.

These two can be loaded by using the following two configs (see example.py). Each mode is characterized by a set of parameters that define the cutting plane environment.

The easy setup defines the environment as follows:
```
easy_config = {
    "load_dir"        : 'instances/train_10_n60_m60',
    "idx_list"        : list(range(10)),
    "timelimit"       : 50,
    "reward_type"     : 'obj'
}
```
For your reference, the maximum total sum of rewards achievable in any given episode in the easy mode is 2.947 +- 0.5469.


The hard setup defines the environment as follows:
```
hard_config = {
    "load_dir"        : 'instances/train_100_n60_m60',
    "idx_list"        : list(range(99)),
    "timelimit"       : 50,
    "reward_type"     : 'obj'
}
```
On average, the maximum total sum of rewards achievable in any given episode in the hard mode is 2.985 +- 0.8427. But, the achieving close to 1 reward (i.e. closing the integrality gap by 1) is a reasonably good performance and can be achieved with what we have learned in this course.

The main difference between the easy and hard modes is the number of training instances. Easy contains 10 instances while hard contains 100. Please read the ```example.py``` script would further details about what these environment parameters mean.

## Generating New Instances (Optional)

To make sure your algorithm generalizes to instances beyond those in the instances folder, you can create new environments with random IP instances and train/test on those. To generate new instances, run the following script. This will create 100 new instances with 60 constraints and 60 variables.

You can show generalization performance on new instances that you didn't train for, for extra credit. You can also show other aspects of your solution like robustness to size of instances.

```
$ python generate_randomip.py --num-v 60 --num-c 60 --num-instances 100
```

The above instances will be saved in a directory named 'instances/randomip_n60_m60'. Then, we can load instances into gym env and train a cutting agent. The following code loads the 50th instance and run an episode with horizon 50:

```
python testgymenv.py --timelimit 50 --instance-idx 50 --instance-name randomip_n60_m60
```

We should see the printing of step information till the episode ends.

If you do not provide --instance-idx, then the environment will load random instance out of the 100 instances in every episode. It is sometimes easier to train on a single instance to start with, instead of a pool of instances.

## Notes

- The env is not exactly equivalent to gym env where the state and action spaces are fixed. Here, the size of state and action space vary over time. The RL agent needs to handle variable state-action spaces.
- The env uses python interface and computes optimal LP solution using Gurobi. If you are not using colab, make sure Gurobi is installed and license is valid. There is a free academic license as well as an online course limited use license available. See the installation instructions below. You don't need to do this if you are using example jupyter notebook in colab.

## Installation
```
$ conda install -c gurobi gurobi
```

In addition, you need an academic license from gurobi. After getting the license, go to the license page.

(https://www.gurobi.com/downloads/end-user-license-agreement-academic/)

 In order to activate the license, you will need to run the **grbgetkey** command with the license key written there. After this step, you can use the `ieor4575` environment that you have used for labs to complete the class project.

## WandB for Visualizaition
Class labs have made extensive use of wandb to familiarize you with some great machine learning visualization tools. You are encouraged to use wandb in the development of this project. See example notebook for the project name to use. You can move your best runs to the leaderboard.



## See README.md file for further details about the project and the environment.

### State-Action Description

### State
State s is an array with give components

* s[0]:  constraint matrix $A$of the current LP ($\max  -c^Tx \text{ s.t. }Ax \le  b$) . Dimension is $m \times n$. See by printing s[0].shape. Here $n$ is the (fixed) number of variables. For instances of size 60 by 60 used in the above command, $n$ will remain fixed as 60. And $m$ is the current number of constraints. Initially, $m$ is to the number of constraints in the IP instance. (For instances generated with --num-c=60, $m$ is 60 at the first step).  But $m$ will increase by one in every step of the episode as one new constraint (cut) is added on taking an action.
* s[1]: rhs $b$ for the current LP ($Ax\le b$). Dimension same as the number $m$ in matrix A.
* s[2]: coefficient vector $c$ from the LP objective ($-c^Tx$). Dimension same as the number of variables, i.e., $n$.
* s[3],  s[4]: Gomory cuts available in the current round of Gomory's cutting plane algorithm. Each cut $i$ is of the form $D_i x\le d_i$.   s[3] gives the matrix $D$ (of dimension $k \times n$) of cuts and s[4] gives the rhs $d$ (of dimension $k$). The number of cuts $k$ available in each round changes, you can find it out by printing the size of last component of state, i.e., s[4].size or s[-1].size.

### Actions
There are k=s[4].size actions available in each state $s$, with $i^{th}$ action corresponding to the $i^{th}$ cut with inequality $D_i x\le d_i$ in $s[3], s[4]$.

In [2]:
#Run below after copying the folder "Project_learn2cut" to your google drive

#You will need to allow google drive to mount

# from google.colab import drive
# drive.mount('/content/drive')
# from google.colab import files

#IMPORTANT change below to 
#!cp -av /content/drive/<path>  /content/ 
#where <path> is the path to folder Project_learn2cut in your google drive. You can click on the folder icon on left and navigate to the path of this folder under drive/MyDrive to find the path.

# !cp -av /content/drive/MyDrive/Colab\ Notebooks/ORCS4529\ Spring\ 2022\ public/Project_learn2cut/* /content/

In [None]:
# !pip install -i https://pypi.gurobi.com gurobipy

In [None]:
# !pip install wandb -qqq

In [3]:
import gymenv_v2
from gymenv_v2 import make_multiple_env
import numpy as np


import wandb
wandb.login()
run=wandb.init(project="finalproject", entity="orcs4529", tags=["training-easy"])

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /Users/joaoromeuferraz/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mjrferraz[0m ([33morcs4529[0m). Use [1m`wandb login --relogin`[0m to force relogin


In [4]:
# Setup: You may generate your own instances on which you train the cutting agent.
custom_config = {
    "load_dir"        : 'instances/randomip_n60_m60',   # this is the location of the randomly generated instances (you may specify a different directory)
    "idx_list"        : list(range(20)),                # take the first 20 instances from the directory
    "timelimit"       : 50,                             # the maximum horizon length is 50
    "reward_type"     : 'obj'                           # DO NOT CHANGE reward_type
}

# Easy Setup: Use the following environment settings. We will evaluate your agent with the same easy config below:
easy_config = {
    "load_dir"        : 'instances/train_10_n60_m60',
    "idx_list"        : list(range(10)),
    "timelimit"       : 50,
    "reward_type"     : 'obj'
}

# Hard Setup: Use the following environment settings. We will evaluate your agent with the same hard config below:
hard_config = {
    "load_dir"        : 'instances/train_100_n60_m60',
    "idx_list"        : list(range(99)),
    "timelimit"       : 50,
    "reward_type"     : 'obj'
}

In [5]:
env = make_multiple_env(**easy_config) 

loading training instances, dir instances/train_10_n60_m60 idx 0
loading training instances, dir instances/train_10_n60_m60 idx 1
loading training instances, dir instances/train_10_n60_m60 idx 2
loading training instances, dir instances/train_10_n60_m60 idx 3
loading training instances, dir instances/train_10_n60_m60 idx 4
loading training instances, dir instances/train_10_n60_m60 idx 5
loading training instances, dir instances/train_10_n60_m60 idx 6
loading training instances, dir instances/train_10_n60_m60 idx 7
loading training instances, dir instances/train_10_n60_m60 idx 8
loading training instances, dir instances/train_10_n60_m60 idx 9


In [19]:
s = env.reset()

In [20]:
d = False
t = 0 
repisode = 0

In [21]:
print([s_.shape for s_ in s])

[(60, 60), (60,), (60,), (60, 60), (60,)]


In [26]:
a = np.random.randint(0, s[-1].size, 1)

In [27]:
s, r, d, _ = env.step(list(a))

In [28]:
print([s_.shape for s_ in s])

[(62, 60), (62,), (60,), (62, 60), (62,)]


In [None]:


    # create env
 

    for e in range(20):
        # gym loop
        s = env.reset()   # samples a random instance every time env.reset() is called
        d = False
        t = 0
        repisode = 0

        while not d:
            #Take a random action
            a = np.random.randint(0, s[-1].size, 1)            # s[-1].size shows the number of actions, i.e., cuts available at state s
            
            #simulate the environment to get the next state
            s, r, d, _ = env.step(list(a))
            print('episode', e, 'step', t, 'reward', r, 'action space size', s[-1].size, 'action', a[0])
            
            A, b, c0, cuts_a, cuts_b = s
            #print(A.shape, b.shape, c0.shape, cuts_a.shape, cuts_b.shape)

            t += 1
            repisode += r

    	    #wandb logging
            wandb.log({"Training reward (easy config)" : repisode})
	    #make sure to use the correct tag in wandb.init in the initialization on top


     