# First-Year Writing Seminars: Extension

**Objectives:**
* Formulate a new approach to the FWS assignment problem using ideas from min-cost flow.
* See how adding and changing constraints and the objective function affects the solution.
* #TODO: clean this up

**Key Ideas:**
* integrality property
* the min-cost flow problem
* the transportation problem
* the assignment problem

**Reading Assignment:**
* Read Handout 7.5 on the min-cost flow problem.

**Brief description:** #TODO

In [7]:
# imports -- don't forget to run this cell!
import numpy as np
import pandas as pd
import math, itertools
import matplotlib.pyplot as plt
import networkx as nx
from networkx.algorithms import bipartite
from ortools.linear_solver import pywraplp as OR
from ortools.graph import pywrapgraph as ORMC
from networkx.algorithms.flow import min_cost_flow

# Part 1: Min-Cost Flow Formulation

**Review** 

Recall the min-cost flow problem. It takes as input
* A directed graph $G = (V,A)$,
* costs $c(i,j)$ for shipping one unit of good from node $i$ to node $j$ for each arc $(i,j) \in A$,
* capacities $u(i,j)$ for each arc $(i,j) \in A$,
* supply values $b(i)$ for each node $i \in V$, such that $\sum_{i \in V} b(i) = 0$.

Remember also that at each node $i$, our supply value $b(i)$ is greater than 0 if there is supply at node $i$, less than 0 if there is demand at node $i$, and equal to 0 if there is neither supply nor demand at node $i$ (i.e., node $i$ is a transit node). Using shortest-path terminology, supply nodes are "sources," demand nodes are "sinks," and transit nodes are interior nodes.

Our goal is to find a feasible flow that satisfies both flow-capacity constraints and flow-conservation constraints; that is, we wish to find a flow $f(i,j)$ on all arcs such that $0 \leq f(i,j) \leq u(i,j)$ for every arc $(i,j) \in A$ and $\sum_{(i,j) \in A} f(i,j) - \sum_{(j,i) \in A} f(j,i) = b(i)$ for every node $i \in V$.

The objective value of a feasible solution is given by $\sum_{(i,j) \in A} c(i,j)*f(i,j)$. We'd like to minimize this cost function--in other words, find a "min-cost" flow.



\#(The above was taken/adapted from Handout 7.5 and the min-cost flow lab.)

**Formulating the model**

In Handout 7.5, we learned that the transportation problem is really just a specific case of the min-cost flow problem. Let's use this fact, along with the transportation model we've already created in the FWS lab, to formulate a min-cost flow model.

Nodes for the students and classes, as well as the special 'dummy' supply node, remain the same as before, as do our arcs and edge costs (which we'll experiment with later). All we need to do is define the capacities and supply values, and we'll be all set!

**Q1:** What should the capacity $u(i,j)$ on each arc $(i,j)$ be?

**A:** <font color='blue'>1</font>

**Q2:** We also need to define the capacity $u(dummy,j)$ on each arc leaving the 'dummy' node to a class node. We could set it to infinity, since there are an infinite number of "dummy students" we could assign to each class. Can you find a better upper bound? (Hint: we cannot have more than 16 students in a class)

**A:** <font color='blue'> Yes; set $u(dummy,j) = 16$. More generally, we set $u(dummy,j) = $ (max number of 'real' students) $ - $ (min number of 'real' students), or (in this case) $16 - 0 = 16$.</font>

Now, let's define our supply values $b(i)$.

**Q3:** For a student node $i$, what should the supply value $b(i)$ be?

**A:** <font color='blue'>1</font>

**Q4:** For a class node $j$, what should the supply value $b(j)$ be? (Remember that if there is demand at a node $k$, then $b(k) < 0$.)

**A:** <font color='blue'>-16</font>

Once again, we must account for our dummy supply node. Recall that for a min-cost flow input to be valid, the "net supply/demand" summed up over all nodes should be equal to 0:  $\sum_{i \in V} b(i) = 0$. 

Suppose we have $m$ students selecting from $n$ classes, each of which can have up to 16 students. 

**Q5:** Using this information, what should the supply value $b(dummy)$ be?

**A:** <font color='blue'>To satisfy our input condition $\sum_{i \in V} b(i) = 0$, we must have $\sum_{students,i} b(i)$ + $\sum_{classes,j} b(j)$ + $b(dummy) = 0$. Thus $m(1) - n(16) + b(dummy) = 0$, which gives $b(dummy) = 16n - m$.</font>

This should make sense intuitively; essentially, we are saying that after every student has been assigned a class, whatever spots are left over should be filled by our "fake students." (Of course, we assume there are enough spots for every real student!)

Let's take a look at our formulation in Python.

\#TODO: Should I show the code implementation? It's less straightforward than what students have seen from ORTools (not an LP/IP formulation, but rather constructing a graph)

In [2]:
# ORTools formulation
from fws_lab_ex import mincostflow
mincostflow()

#TODO: figure out why optimal solution here differs from FWS lab optimal sol'n
#      Has something to do with dummy edge costs not working properly, not sure why that is

#from fws_lab_ex import mincostflow2
#mincostflow2()

10
Success
Minimum cost: 5432

Student cost: 5343
Preferences received:
1: 1525
2: 668
3: 384
4: 215
5: 94


Success! If everything ran properly, you should now have a working min-cost flow formulation for the FWS assignment problem. As we'll see, thinking in terms of flows can be helpful when working with different constraints.

You may have noticed that our formulation is fairly simple in terms of its assumptions. For example, based off your answer to **Q2**, a feasible (though expensive) solution might involve assigning 16 fake 'filler' students to a section! It's also easy to imagine our model assigning just 1 or 2 "real" students to a less interesting section that doesn't rank as high on people's preferences.

The Knight Institute wants students to take full advantage of the diversity of FWS classes offered, so they decide to implement a new rule: now, each section must have a minimum of 6 students enrolled, but no more than 16 (as before). 

**Q6:** How can we account for this "minimum class size constraint" in our model? (Hint: take a look at **Q2**)

**A:** <font color='blue'>(taken from answer to Q2) Set $u(dummy,j) = $ (max class size) $ - $ (min class size), or (in this case) $16 - 6 = 10$.</font>

It turns out that our Python function can take as input a parameter called 'minstudents' that specifies the minimum number of "real" students assigned to each class section. (The code basically does what you did in **Q6**.)

In [7]:
mincostflow(minstudents=0)

Success
Minimum cost: 5432

Student cost: 5343
Preferences received:
1: 1525
2: 668
3: 384
4: 215
5: 94


**Q7:** Play around with different values for the 'minstudents' parameter and see what outputs you get. What do you observe?

**A:** <font color='blue'>Answers may vary. Should see that a feasible flow exists for values of minstudents from 0 to 10, and the input becomes infeasible for values of minstudents from 11 to 16.</font>

Run the following cell, which outputs the least popular class (or classes) among students' preferences. (Define "least popular" as appearing the least on students' list of preferences.) If you'd like, read the comments alongside each line of code to understand what the function does.

In [11]:
def leastpopular(dataset='f09_fws_ballots.csv'):
    data = pd.read_csv(dataset) # reads in the dataset
    
    a = np.array(data[['1','2','3','4','5']].values) # creates a list of all the class preferences students put
    unique, counts = np.unique(a, return_counts=True) # counts how many of each class number appears on the preference list
    classlist = dict(zip(unique, counts)) # creates a dictionary of class number : number of preferences
    
    least_students = min(classlist.values()) # finds the minimum number of preferences in the dictionary
    res = [c for c in classlist if classlist[c] == least_students] # finds class number corresponding to min number of prefs.
    
    print('The class (or classes) with the least students interested is ' + str(res) + '.')
    print('Only ' + str(least_students) + ' students put this class as one of their top 5 preferences.') # prints results 
    
leastpopular()

The class (or classes) with the least students interested is [130].
Only 10 students put this class as one of their top 5 preferences.


**Q8:** Does this output make sense based on what you observed in **Q7**? Explain.

**A:** <font color='blue'>The function output states that the class with the minimum number of students putting it as a preference has only 10 students interested. Thus a maximum of 10 "real" students will be assigned this class--so if we set 'minstudents' higher than 10, there aren't enough students interested to satisfy the minimum class size constraint, and the mincostflow function returns 'Infeasible'.</font>

# Part 2: Improving Our Integer Program

In [None]:
# An FWS assignment model

# INPUTS:
# students: a list of students
# classes: a list of classes
# edges: a dictionary of edge costs
# csize: the class capacity 
# dcost: the cost of not assigning a student to one of their top 5 picks
def Assign(students, classes, edges, csize, dcost, solver):
    STUDENT = students + ['dummy']  # create student list add dummy node 
    CLASS = classes                 # create class list
    EDGES = list(edges.keys())      # create edge list
    
    newedges = list(itertools.product([0], CLASS))
    EDGES.extend(newedges)          # add dummy edges
    
    c = edges.copy()                # define c[i,j]
    for edge in newedges:
        c.update({edge : dcost})    # add c[i,dummy] costs
    
    # define model
    m = OR.Solver('assignFWS', solver)
    
    # decision variables
    x = {}    
    for i,j in EDGES:
        # define x(i,j) here
        x[i,j] = m.IntVar(0, m.infinity(), ('(%d, %s)' % (i,j))) 
        
    # define objective function here
    m.Minimize(sum(c[i,j]*x[i,j] for i,j in EDGES))
       
    # add constraint to ensure each student (not including the dummy) is assigned at most one class
    for k in students:
        if k != 'dummy':
            m.Add(sum(x[i,j] for i,j in EDGES if i==k) <= 1)
        
    # add constraint to ensure each class is full
    # HINT: Mimic the constraint code above 
    # ADD YOUR CODE HERE
    for k in classes:
        m.Add(sum(x[i,j] for i,j in EDGES if j==k) == csize)
    
    m.Solve()
    # print(m.Objective().Value())
    
    unmatched = []
    for k in STUDENT:
        if (sum(x[i,j].solution_value() for i,j in EDGES if i==k) == 0) and (k!='dummy'):
            unmatched.append(k)
    print("Unmatched students:", len(unmatched))
    
    matched = {}
    for i,j in EDGES:
        if x[i,j].solution_value() == 1:
            if c[i,j] in matched:
                matched[c[i,j]] += 1
            else:
                matched.update({c[i,j] : 1})
    if dcost in matched.keys():
        del matched[dcost]
    
    return matched

In [None]:
# read in the dataset

# 2886 students, 183 class sections
data = pd.read_csv('f09_fws_ballots.csv')
data.head()    # preview

In [None]:
# solve the instance

from fws_lab import inputData
students, classes, edges = inputData('f09_fws_ballots.csv')
data_sol = Assign(students, classes, edges, 16, 100000, OR.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
print(data_sol)

### Miscellaneous Stuff

Taking a look back at the FWS lab, we found a solution which matched every student to a class section:

Unmatched students: 0<br>
{1: 1532, 4: 208, 2: 673, 3: 384, 5: 89}

**Q(???):** In formulating the input, we set the dummy edge costs to be an arbitrarily large number (100,000). Behind the scenes, we assumed the cost of a student receiving their $k$th preference was $k$; that is, a student receiving their top choice cost 1, second choice cost 2, and so on. Using this information, what should be the objective value of our solution? (Remember that there are 2886 students and 183 class sections, each with 16 students.)

**A:** <font color='blue'>This question plays off of Q5. We know that all 2886 students have been assigned a class section, and the total class capacity is $183 * 16 = 2928$. Thus there are $2928 - 2886 = 42$ dummy students. Then the objective value is just the summation of the number of each "type" of edge used times its edge cost: $1532(1) + 673(2) + 384(3) + 208(4) + 89(5) + 42(100000) = 4,205,307$.</font>

To verify your solution is correct, you can add the following line to the Assign method on the line after m.Solve():<br>
print(m.Objective().Value())<br>
Then re-run the notebook to check your answer.