## Chapter 13
# The Linear Program

## 13.13 Lab: Learning through linear programming

**Task 13.13.1:** Write a procedure `main_constraint(i, a_i, d_i, features)` with the following spec:
* _input:_ patient ID `i`, feature vector `a_i`, diagnosis `d_i` (+1 or -1), and the set `features`
* _output:_ the vector $\boldsymbol{v}_i$ that should be row $i$ of $A$.

Try out your procedure on some data. Check that the resulting vector $\boldsymbol{v}_i$ is correct:
* The entry for a feature label should be positive if `d_i` is +1, negative if `d_i` is -1.
* The entry for label $i$ is 1.
* The entry for label `'gamma'` is negative if `d_i` is +1, positive if `d_i` is -1.

In [1]:
import sys
sys.path.append('../')
sys.path.append('../chapter_8') # contains cancer data & train/validate sets

In [2]:
from vec import Vec

def main_constraint(i, a_i, d_i, features):
    v_i = {
        i: 1, # "slop variable"
        'gamma': -d_i,
    }
    for feature_label in features:
        v_i[feature_label] = d_i * a_i[feature_label]

    return Vec(set(v_i.keys()), v_i)

In [3]:
from cancer_data import read_training_data

feature_subset = {'area(worst)', 'smoothness(worst)', 'texture(mean)'}

train_A, train_b = read_training_data('../chapter_8/train.data', feature_subset)
validate_A, validate_b = read_training_data('../chapter_8/validate.data', feature_subset)

In [4]:
from matutil import mat2rowdict

features_by_patient_id = mat2rowdict(train_A)
patient_id = list(features_by_patient_id.keys())[0]
patient_features = features_by_patient_id[patient_id]

main_constraint(patient_id, patient_features, train_b[patient_id], patient_features.D)

Vec({862722, 'area(worst)', 'smoothness(worst)', 'gamma', 'texture(mean)'},{862722: 1, 'gamma': 1, 'smoothness(worst)': -0.1584, 'texture(mean)': -13.43, 'area(worst)': -185.2})

**Task 13.13.2:** Write a procedure `make_matrix(feature_vectors, diagnoses, features)` with the following spec:
* _input:_ a dictionary `feature_vectors` that maps patient IDs to feature vectors, a vector `diagnoses` that maps patient IDs to +1/-1, and a set `features` of feature labels
* _output:_ the matrix $A$ to be used in the linear program.

The rows of $A$ labeled with positive integers (patient IDs) should be the vectors for the main constraints. Those labeled with negative integers (negatives of patient IDs) should be the vectors for the nonnegative constraints.

In [5]:
from matutil import rowdict2mat

def make_matrix(feature_vectors, diagnoses, features):
    A = {}
    for patient_id, feature_vector in feature_vectors.items():
        A[patient_id] = main_constraint(patient_id, feature_vector, diagnoses[patient_id], features)
        A[patient_id] = Vec(A[patient_id].D | feature_vectors.keys(), A[patient_id].f)
        A[-patient_id] = Vec(A[patient_id].D, {patient_id: 1}) # slop variable constraint (nonnegativity constraints)
    return rowdict2mat(A)

**Task 13.13.3:** Write a procedure that, given a set of patient IDs, returns the right-hand side vector $\boldsymbol{b}$

In [6]:
def make_b(patient_ids):
    b = {}
    for patient_id in patient_ids:
        b[patient_id] = 1
        b[-patient_id] = 0
    return Vec(set(b.keys()), b)

**Task 13.13.4:** Write a procedure that, given a set of patient IDs and feature labels, returns the objective function vector $\boldsymbol{c}$.

In [7]:
def make_c(patient_ids, feature_labels):
    return Vec(feature_labels, {patient_id: 1 for patient_id in patient_ids})

**Task 13.13.5:** Using the procedures you defined, construct the matrix $A$ and the vectors $\boldsymbol{b}$ and $\boldsymbol{c}$.

In [10]:
A = make_matrix(features_by_patient_id, train_b, feature_subset)
b = make_b(list(features_by_patient_id.keys()))
c = make_c(list(features_by_patient_id.keys()), A.D[1])

In [11]:
from simplex import find_vertex, optimize

n = len(A.D[1])
R_square = {patient_id for patient_id in features_by_patient_id.keys()} | {-patient_id for patient_id in list(features_by_patient_id.keys())[:(n - len(features_by_patient_id))]}
assert(len(R_square) == n)
assert(find_vertex(A, b, R_square)) # mutates R_square to be the set of row-lables of A defining a vertex

(value:  299.64438660894876 ) 

AttributeError: 'Vec' object has no attribute 'values'

I have concluded that the given simplex algorithm isn't quite up-to-snuff for the problem at hand, due to long running time and the bug above (it should be `min(y.f.values())`, not `min(y.values())`).

This last chapter seems to be a bit of a "bonus" since it's not in the Brown University or Coursera course material, and there is no published errata or seemingly anything on Github to do with this lab.