# Linear Optimization using Regex

This problem will test your mastery of basic python data structures as well as the use of Regex. It consists of four (4) exercises, numbered 0 to 3, and it is worth a total of ten (10) points.

## Introduction to Linear Optimization
A <a href="https://www2.isye.gatech.edu/~spyros/LP/node2.html#SECTION00010010000000000000">linear optimization</a> problem has a linear **objective function** and linear equality and/or linear inequality constraints. Here is an example:

$$min \:  3p+q$$

with the **objective function** being of the form:
$$c_{1}x_{1}+c_{2}x_{2}+...+c_{n}x_{n}$$
which you minimize or maximize subject to the **constraints** 

$$\mathrm{ p + 2q >= 2}$$
$$\mathrm{ 2p + 5q >= 3}$$
$$\mathrm{ p >= 0, q >= 0}$$

with **constraints** being either linear equalities or inequalities of the form:

$$a_{i1}x_{1}+a_{i2}x_{2}+a_{i3}x_{3}+...+a_{in}x_{n}=b_{i}$$ 

for the ith equation, where the $=$ sign may be replaced by $<=$ or $>=$, and
    $$x_{i} >= 0$$
for every variable. 

**Exercise 0** (3 points) Complete the `getRegEx()` function which returns a regular expression string to match any linear equations or inequations, such as the ones below. You may have any number of terms on the left hand side or the right hand side of the equation.

You need to return a regular expression string such that when compiled, the `.findall()` method will parse the entire equation/inequation. Eg.

The equation:

$$3x <= 6+2y+7z$$

will parse into the list of tuples:

`[('', '3', 'x', '<='), ('', '6', '', ''), ('+', '2', 'y', ''), ('+', '7', 'z', '')]`

It is worth noting that any of the linear equations/inequations can be parsed into separate groups with the parts:
- the sign,
- the coefficient,
- the variable,
- the equality/inequality symbol

any of which may or may not be present.

The first test cell will test against the above equations while the second test cell will test against some similar randomly generated equations.

In [1]:
import re

In [2]:
test_equation = '3x<=6+2y+7z'

In [3]:
pattern = "(\+|\-)?(\d+|.)(\w)?(\<=|\>=|\>|\<|\=)?"
re.findall(pattern, test_equation)

[('', '3', 'x', '<='),
 ('', '6', '', ''),
 ('+', '2', 'y', ''),
 ('+', '7', 'z', '')]

In [4]:
import re
# the function is intended to establish the patterns we're trying to find

def getRegEx():
    regex = "(\+|\-)?(\d+|.)(\w)?(\<=|\>=|\>|\<|\=)?"
    return regex

In [5]:
# Test cell 1: test against the above equations (1 point)

equation_1= '3x<=6+2y+7z'
pattern= getRegEx()
result = re.findall(pattern, equation_1)
assert result == [('', '3', 'x', '<='), ('', '6', '', ''), ('+', '2', 'y', ''), ('+', '7', 'z', '')], "The equation is not parsed by your regular equation"
print("Passed!")

Passed!


In [6]:
# Test cell 2: test against random equations (2 points)

import random
import string
import os
from random import randint
from collections import defaultdict

def is_vocareum():
    return os.path.exists('.voc')

def generateEquation():
    #Generating the number of variables in the equation:
    num_var=randint(1,20)
    vc_dict=defaultdict(list)
    vc_const = 0
    #Generating the variables and coefficients:
    variables=[]
    coefficients=[]
    variables2=[]
    coefficients2=[]
    equation1=''
    equation2=''
    equation=''
    
    for i in range(0, num_var):
        coefficients.append(randint(1, 20))
        variables.append(random.choice([*string.ascii_lowercase] + ['']))
        
    for i in range(0, num_var):
        coefficients2.append(randint(1, 20))
        variables2.append(random.choice([*string.ascii_lowercase] + ['']))

    #Forming the equations:
    for i in range(0, num_var):
        prob= random.uniform(0, 1)
        sign=''
        if(prob>=0.5):
            sign='+'
        else:
            sign='-'
        equation1=equation1+sign+str(coefficients[i])+variables[i]
        equation2=equation2+sign+str(coefficients2[i])+variables2[i]
        if variables[i]:
            if(variables[i] not in vc_dict.keys()):
                vc_dict[variables[i]] = [int(sign+str(coefficients[i]))]
            else:
                value = int(vc_dict.pop(variables[i])[0])
                value = value + int(sign+str(coefficients[i]))
                vc_dict[variables[i]] = [value]
        else:
            vc_const -= int(sign+str(coefficients[i]))
            
        if variables2[i]:
            if(variables2[i] not in vc_dict.keys()):
                vc_dict[variables2[i]] = [-1 * int(sign+str(coefficients2[i]))]
            else:
                value = vc_dict.pop(variables2[i])[0]
                value = value - int(sign+str(coefficients2[i]))
                vc_dict[variables2[i]] = [value]
        else:
            vc_const += int(sign+str(coefficients2[i]))
            

    #Finding an inequality for the equation
    inequality_prob= random.uniform(0, 1)
    inequality=''
    if(inequality_prob>=0 and inequality_prob<0.33):
        inequality='='
    elif(inequality_prob>=0.33 and inequality_prob<0.66):
        inequality='>='
    else:
        inequality='<='

    equation= equation1+inequality
    if(equation2[0]=='+'):
        equation= equation+equation2[1:]
    else:
        equation= equation+equation2

    if(equation[0]=='+'):    
        return equation[1:], vc_dict, vc_const
    else:
        return equation, vc_dict, vc_const
    
filename=''
if is_vocareum():
    filename = '../resource/asnlib/publicdata/e2.txt'
else:
    filename = 'e2.txt'

with open(filename) as f:
    data = f.readlines()
    motif = data
motif=str(motif)[2:-4]
motif=motif[:3]+motif[4:]
motif=motif[:5]+motif[6:]
motif=motif[:12]+motif[13:]
pattern2 = re.compile(motif)
    
# Test cases
for i in range(0, 20):
    equation_2, vc_dict, vc_const = generateEquation()
    print(equation_2)
    result1 = re.findall(pattern, equation_2)
    result2 = pattern2.findall(equation_2)
    print(result1, "\n")
    assert result1 == result2, "The equation is not parsed by your regular expression"
    
print("Passed!")

FileNotFoundError: [Errno 2] No such file or directory: 'e2.txt'

In [7]:
test_equation2 = '-20j+2v-16a+19l>=-20u+1s-5u+6j'
pattern = "(\+|\-)?(\d+|.)(\w)?(\<=|\>=|\>|\<|\=)?"
re.findall(pattern, test_equation2)

[('-', '20', 'j', ''),
 ('+', '2', 'v', ''),
 ('-', '16', 'a', ''),
 ('+', '19', 'l', '>='),
 ('-', '20', 'u', ''),
 ('+', '1', 's', ''),
 ('-', '5', 'u', ''),
 ('+', '6', 'j', '')]

**Exercise 1** (3 points)
Convert the terms of a linear equation into a python dictionary, *equation_dict*. Store the variables of the equation as keys and their corresponding coefficients as values in the form of lists. 

For example, if we have the following equation: $x-2y=20 + 6y$, we will have the equation dictionary and the constant as follows:

*equation_dict* = {'x': [1], 'y': [-8]}

*constant* = 20

Return the dictionary as the first parameter of the return statement and the constant of the equation as the second parameter of the return statement.

Before forming the dictionaries, make sure that all terms of variables and coefficients are on the left hand side of the equation, and all the constants/intercepts are on the right hand side of the equation i.e. given $6x=7y+4$, convert it to $6x-7y=4$.

In [11]:
test_equation3 = 'x-2y=20+6y'

# split on the inequality
inequalities_to_try = ['<=', '>=', '<', '>', '=']

for ineq in inequalities_to_try:
    split_ineq = test_equation3.split(ineq)
    
print(split_ineq)

['x-2y', '20+6y']


In [55]:
matched = []
for i, ineq in enumerate(split_ineq):
    
    # flip the sign on one side of the equation
    if i%2 != 0:        
        pattern = getRegEx()
        match = re.findall(pattern, ineq)
        
        match = [list(m) for m in match]
        matched.extend(match)
        
        for m in match:
            if m[0] == '+':
                m[0] = '-'
            elif m[0] == '-':
                m[0] = '+'
            else:
                m[0] = '-'
                    
    else:
        match = re.findall(pattern, ineq)
        matched.extend([list(m) for m in match])
    
matched

[['', 'x', '', ''],
 ['-', '2', 'y', ''],
 ['-', '20', '', ''],
 ['-', '6', 'y', '']]

In [56]:
my_equation_dict = {}

for match in matched:
    for i, m in enumerate(match):
        if m in string.ascii_lowercase and m not in string.whitespace:
            my_equation_dict[m] = []
        if m.isdigit():
            match[i] = int(match[i-1]+m)
            print(match[i])
            
my_equation_dict            


-2
-20
-6


{'x': [], 'y': []}

In [66]:
def getDict(equation:str) -> dict:
    import string

    equation_dict={}
    constant=0

    # split on the inequalities so you can then flip the signs
    inequalities_to_try = ['<=', '>=', '<', '>', '=']

    for ineq in inequalities_to_try:
        split_ineq = test_equation3.split(ineq)

    # flip the signs for one side 
    matched = []
    for i, ineq in enumerate(split_ineq):
    
        # flip the sign on one side of the equation
        if i%2 != 0:        
            pattern = getRegEx()
            match = re.findall(pattern, ineq)

            match = [list(m) for m in match]
            matched.extend(match)

            for m in match:
                if m[0] == '+':
                    m[0] = '-'
                elif m[0] == '-':
                    m[0] = '+'
                else:
                    m[0] = '-'

        else:
            match = re.findall(pattern, ineq)
            matched.extend([list(m) for m in match])
    
    # 
    return equation_dict, -constant

In [None]:
# Testing cell

for i in range(0, 5):
    equation, vc_dict, vc_const= generateEquation()
    print(vc_dict)
    print(vc_const)
    print("The equation is: ", equation)
    equation_dict, constant= getDict(equation)
    assert len(vc_dict)==len(equation_dict), "The length of the dictionaries are unequal"
    assert vc_dict.keys() == equation_dict.keys(), "The keys of the dictionary do not match"
    for key in vc_dict.keys():
        assert set(vc_dict[key]) == set(equation_dict[key]), "The dictionaries do not match"
    assert vc_const == constant, "Constants do not match"
    print("Passed!")

### Standard Normal Form of Linear Optimization Problem
The standard form of a Linear Program can be defined as,

$$\mathrm{min\ {c}^{T} x}$$
subject to the constraints$$Ax=b$$
$$x>=0$$

We can then define the original example linear program as:

$c=\begin{bmatrix}3\\1\\\end{bmatrix},x=\begin{bmatrix}p\\q\\\end{bmatrix},
A=\begin{bmatrix}
    1 & 2\\
    2 & 5\\
\end{bmatrix},b=\begin{bmatrix}2\\3\\\end{bmatrix}
$

This format can be used to solve optimization problems using packages such as PuLP and software such as EXPRESS. These packages typically use an algorithm called Simplex to solve the system of equation/inequations to get a solution to the linear optimization problem. However, for the sake of convenience, we will use numpy to solve the system of equations. If your matrices are correct, you will get the correct solution for the system of equations.

**Exercise 2** (3 points) In the below function, complete the code to fill the matrix A and the arrays c and b from a list of equations. 

To facilitate the test cell's use of numpy, format your matrix A as nested lists, with a list for each row, like
```
matrix = [[row1_col1, row1_col2, row1_col3],[row2_col1, row2_col2, row2_col3]]
```
and your arrays c and b as a list, like
```
array = [element1, element2]
```


The original example, then, would be provided to `ConvertToStandardNormalForm()` as a list of strings and a separate string for the objective function:
```
equations = ['p+2q>=2', '2p+5q>=3']
objective_function = '3p+q'
```
The $x_{i} >= 0$ is assumed and is not included in either the input arguments, nor the solution result.

The results returned from `ConvertToStandardNormalForm()` would then be:
```
c = [3,1]
b = [2,3]
A = [[1,2],[2,5]]
```
for `x = {p, q}`

or 
```
c = [1,3]
b = [2,3]
A = [[2,1],[5,2]]
```
for `x = {q, p}` which is an equivalent representation (can you see why?)

In [None]:
def ConvertToStandardNormalForm(equations, objective_function):
    ##INPUT: equation: string
    ##OUTPUT: c, b, and A
    #
    # YOUR CODE HERE
    #

    return A, b, c

In [None]:
#Test cell for the standard normal form

import numpy as np
import scipy as sp
import os
import hashlib
import io
from scipy import optimize

equations= []

def is_vocareum():
    return os.path.exists('.voc')

filename=''
if is_vocareum():
    filename = '../resource/asnlib/publicdata/equations.txt'
else:
    filename = 'equations.txt'

with open(filename) as f:
    data = f.readlines()
    for line in data:
        equations.append(line[:-1])
        print(line)

A, b, c = ConvertToStandardNormalForm(equations[:-1], equations[-1])

result= sp.optimize.linprog(c, A_ub=A, b_ub=b, bounds=((None, None), (-3, None)), options={"disp": False})

solution= result.x
objective_value= result.fun

print("Your solution:", solution)
print("Your objective value", objective_value)

sol=[]

if is_vocareum():
    filename = '../resource/asnlib/publicdata/Solution1.txt'
else:
    filename = 'Solution1.txt'
with open(filename) as f:
    data = f.readlines()
    sol = data[0].split(',')

assert int(sol[-1])==int(result.fun), "Your objective value does not match ours, i.e. recheck your solution"
print("Passed!")

## Use of Optimization: 
### Production planning by Compaq
This is a problem that Compaq had faced illustrating the usefulness of linear optimization. Compaq introduced three new computer systems and two workstations: GP-1, GP-2, and GP-3, as well as WS-1 and WS-2. In the following table, we list the models, the list prices (in dollars), and the memory usage.

<table>
  <tr>
    <th>System</th>
    <th>Price</th>
    <th>#256K Boards</th>
  </tr>
  <tr>
    <td>GP-1</td>
    <td>60,000</td>
    <td>4</td>
  </tr>
  <tr>
    <td>GP-2</td>
    <td>40,000</td>
    <td>2</td>
  </tr>
  <tr>
    <td>GP-3</td>
    <td>30,000</td>
    <td>2</td>
  </tr>
  <tr>
    <td>WS-1</td>
    <td>30,000</td>
    <td>2</td>
  </tr>
  <tr>
    <td>WS-2</td>
    <td>15,000</td>
    <td>1</td>
  </tr>
</table>

The following dificulties were anticipated:
1. The in-house supplier of CPUs could provide at most 7,000 units, due to debugging problems.
2. The supply of 256K memory boards was limited to be no more than 8,000 units.

On the demand side, the marketing department established the following:
1. The maximum demand for the first quarter would be 1,800 for GP-1 system, 300 for GP-3 system, 3,800 for the whole GP, and 3,200 for the whole WS family.
2. Included in these projects were 500 orders for GP-2 system, 500 orders for WS-1, and 400 orders for WS-2 that had already been received and had to be fulfilled in the first quarter.

Compaq needed to make a production plan to consider all the above production limitations and demand projections and to maximize the revenue.

### Linear Model for Compaq

The above problem can be reformulated as: 

$${max \: a*60000 + b*40000 + c*30000 + d*30000 + e*15000}$$
subject to the constraints
$$a+b+c+d+e <= 7000$$
$$4a+2b+2c+2d+e <= 8000$$
$$a+b+c <= 3800$$
$$d+e <= 3200$$
$$a <= 1800$$
$$c <= 300$$
$$b >= 500$$
$$d >= 500$$
$$e >= 400$$

Also, keep in mind
$${max \: {c}^{T}x} = {-min \: (-{c}^{T}x)}$$
and minimization is the standard form of a Linear Program

**Exercise 3** (1 points) Your task will be to solve the above problem, and maximize Compaq's profits. 
Complete the Compaq function to maximize Compaq's profits. Use the ConvertToStandardNormalForm function to help you. 
The matrices formed will be passed into a linear program solver to output the best solution.

*Hint: Keep in mind that since this is a maximization problem, each value of matric c will have to be multiplied by -1 to bring it into standard form (please ignore the minus sign '-' outside the minimization expression, this will be accounted for in the solution).*

In [None]:
#Your Function should return A, b, c, and X matrices
def Compaq(equations, objective_function):
    ##INPUT: equation: string
    ##OUTPUT: Matrices c, b, and A
    #
    # YOUR CODE HERE
    #
    return A,b,c

In [None]:
#Test cell for the compaq problem

import numpy as np
import scipy as sp
import hashlib
import io
from scipy import optimize

#Test cases for the standard normal form

equations= []

filename = 'equations2.txt'
filename=''
if is_vocareum():
    filename = '../resource/asnlib/publicdata/equations2.txt'

with open(filename) as f:
    data = f.readlines()
    for line in data:
        equations.append(line[:-1])

A, b, c = Compaq(equations[:-1], equations[-1])
print(A,b,c)
result= sp.optimize.linprog(c, A_ub=A, b_ub=b, bounds=((0, 1800),(500, None),(0, 300),(500, None),(400, None)), options={"disp": False})

solution= result.x
objective_value= result.fun

print("Your solution:", solution)
print("Your objective value", objective_value)

sol=[]
filename = 'Solution2.txt'
filename=''
if is_vocareum():
    filename = '../resource/asnlib/publicdata/Solution2.txt'

with open(filename) as f:
    data = f.readlines()
    sol= data[0].split(',')
    
print(int(sol[-1]))

assert int(sol[-1])==int(result.fun), "Your objective value does not match ours, i.e. recheck your solution"
print("So the maximum profit is: $", -1*objective_value)
print("Passed!")

**Fin!** You've reached the end of this problem. Don't forget to restart the kernel and run the entire notebook from top-to-bottom to make sure you did everything correctly. If that is working, try submitting this problem. (Recall that you must submit and pass the autograder to get credit for your work!)