In [17]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))

<font size=18>Project 1: Report</font>

Use this Jupyter notebook to summarize the details of this project organized in the following sections. 

The file *Airfares.xlsx* contains real data that were collected between Q3-1996 and Q2-1997. The first sheet contains variable descriptions while the second sheet contains the data.  A csv file of the data is also provided (called *Airfares.csv*).

# Introduction

Summarize the problem statement, establishing the context and methods used in this project.

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

The opportunity presented by industry-wide deregulation has caused a competitive environment between different entities in the aviation sector. The goal of this project is to maximize airfare. In particular, the consulting firm is seeking to maximize fares as a function of coupons, the Herfindel Index, and distance between 2 cities for each route. Fare maximization is constrained by the number of passengers, starting city income, and ending city income for each route - all of which are modeled as a function of the same predictors in the objective function.

# Linear Regression Models

Provide a brief summary of the linear regression models used to estimate coefficients that will be used in the linear programming problem.  Explain why the multiple regression equations had to be fitted through the origin (consider the assumptions of linear programming).

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

In [15]:
import statsmodels.api as sm
import pandas as pd

vars = pd.read_csv("data/Airfares.csv")

# define predictor variables

x = vars[['COUPON','HI','DISTANCE']]

# define response variables
y_fare = vars['FARE']
y_pax = vars['PAX']
y_s_income = vars['S_INCOME']
y_e_income = vars['E_INCOME']

# functions
model_obj = sm.OLS(y_fare, x).fit()
model_pax = sm.OLS(y_pax, x).fit()
model_s_income = sm.OLS(y_s_income, x).fit()
model_e_income = sm.OLS(y_e_income, x).fit()

# coefficients
coefs_obj = model_obj.params
coefs_pax = model_pax.params
coefs_s_income = model_s_income.params
coefs_e_income = model_e_income.params

print(f"\n------------ OBJECTIVE FUNCTION SUMMARY: ------------ \n\n{model_obj.summary()}")
print(f"\n\n ------------ PAX FUNCTION SUMMARY: ------------\n\n{model_pax.summary()}")
print(f"\n\n ------------ S_INCOME FUNCTION SUMMARY: ------------\n\n{model_s_income.summary()}")
print(f"\n\n ------------ E_INCOME FUNCTION SUMMARY: ------------\n\n{model_e_income.summary()}")


------------ OBJECTIVE FUNCTION SUMMARY: ------------ 

                                 OLS Regression Results                                
Dep. Variable:                   FARE   R-squared (uncentered):                   0.911
Model:                            OLS   Adj. R-squared (uncentered):              0.911
Method:                 Least Squares   F-statistic:                              2165.
Date:                Tue, 24 Sep 2019   Prob (F-statistic):                        0.00
Time:                        18:37:20   Log-Likelihood:                         -3439.5
No. Observations:                 638   AIC:                                      6885.
Df Residuals:                     635   BIC:                                      6898.
Df Model:                           3                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

Above are the summary statistics for 4 functions:
* Objective function to maximize airfare
* Constraint function for PAX
* Constraint function for S_INCOME
* Constraint function for E_INCOME

All of these functions share the same 3 predictors of *coupon*, *HI*, and *distance*. However, the first function is the objective function that we are seeking to maximize (airfare). The remaining 3 functions will be used to constrain the variables *coupon, HI, and distance* and thus maximum airfare. The right-hand side constraint values ($B_1$, $B_2$, $B_3$) are <= 20000, <= 30000, and >=30000 for PAX, S_INCOME and E_INCOME respectively.

The tables of coefficients for each function indicate a number of useful statistics.  For example, the objective function summary indicates an adjusted R-squared of 0.911, which means that the fitted model explains 91% of total variance in the data. This is a fairly high predictive accuracy. With exception of the PAX constraint, the other constraint models offer a high adj. R-squared as well.

Additionally, each coefficient itself is the amount that the response variable will change for a single unit increase in the predictor variable. Other important statistics in the summary tables are: 
* the t-statistics and p-values (indicating the significance of a coefficient on the response var under the null hypothesis)
* 95% confidence intervals 
* the level of kurtosis and skewness

Regarding fitting through the origin:
* Multiple regression equations must be fitted through the origin in order to satisfy the proportionality constraint of linear programming.  Specifically, this means that "the contribution of each activity $x_j$ to the value of the objective function is proportional to the level of the activity $x_j$" (p.38, Introduction to Operations Research).

# Optimal LP Solution

The optimal value of the airfare and for which values of COUPON, HI, and DISTANCE it occurs. 

<font color = "blue"> *** 8 points -  answer in cell below *** (don't delete this cell) </font>

In [19]:
# unfold to see Pyomo solution with a vector of decision variables
from pyomo.environ import *

# Concrete Model
model = ConcreteModel(name="max_airfare")

predictors = ['coupon', 'hi', 'distance']

bounds_dict = {'coupon': (0, 1.5), 'hi': (4000, 8000), 'distance': (500, 1000)}

def bounds_rule(model, predictor):
    return (bounds_dict[predictor])

model.x = Var(predictors, domain=Reals, bounds=bounds_rule)

# Objective func
model.airfare = Objective(expr=coefs_obj[0] * model.x['coupon'] + coefs_obj[1] * model.x['hi'] + coefs_obj[2] * model.x['distance'],
                         sense=maximize)
# Constraint funcs
model.pax = Constraint(expr=coefs_pax[0]*model.x['coupon'] + coefs_pax[1] * model.x['hi'] + coefs_pax[2]*model.x['distance'] <= 20000)
model.s_income = Constraint(expr=coefs_s_income[0]*model.x['coupon'] + coefs_s_income[1]*model.x['hi'] + coefs_s_income[2]*model.x['distance'] <= 30000)
model.e_income = Constraint(expr=coefs_e_income[0]*model.x['coupon'] + coefs_e_income[1]*model.x['hi'] + coefs_e_income[2]*model.x['distance'] >= 30000)

# Solve
solver = SolverFactory('glpk')
solver.solve(model)

# display solution
import babel.numbers as numbers  # needed to display as currency
print("Max airfare = ", model.airfare())
print("coupon = ", model.x['coupon']())
print("hi = ", model.x['hi']())
print("distance = ", model.x['distance']())

Max airfare =  203.55404681213537
coupon =  1.14372280657134
hi =  8000.0
distance =  1000.0


# Sensitivity Report

From the sensitivity report, explain which constraints are binding for the number of passengers on that route (PAX), the starting city’s average personal income (S_INCOME), and the ending city’s average personal income (E_INCOME). If the constraint is binding, interpret the shadow price in the context of the problem.  If the constraint is not binding, interpret the slack in the context of the problem.

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

In [20]:
# write the model to a sensitivity report
model.write('model.lp', io_options={'symbolic_solver_labels': True})
!glpsol -m model.lp --lp --ranges sensit.sen

# widen browser and/or close TOC to see sensitivity report
import numpy as np
np.set_printoptions(linewidth=110)
f = open('sensit.sen', 'r')
file_contents = f.read()
print(file_contents)
f.close()

GLPSOL: GLPK LP/MIP Solver, v4.65
Parameter(s) specified in the command line:
 -m model.lp --lp --ranges sensit.sen
Reading problem data from 'model.lp'...
4 rows, 4 columns, 10 non-zeros
36 lines were read
GLPK Simplex Optimizer, v4.65
4 rows, 4 columns, 10 non-zeros
Preprocessing...
2 rows, 3 columns, 6 non-zeros
Scaling...
 A: min|aij| =  1.020e+00  max|aij| =  2.091e+04  ratio =  2.050e+04
GM: min|aij| =  7.309e-01  max|aij| =  1.368e+00  ratio =  1.872e+00
EQ: min|aij| =  5.342e-01  max|aij| =  1.000e+00  ratio =  1.872e+00
Constructing initial basis...
Size of triangular part is 2
      0: obj =   8.885866366e+01 inf =   2.215e+04 (1)
      3: obj =   1.739717779e+02 inf =   0.000e+00 (0)
*     4: obj =   2.035540468e+02 inf =   0.000e+00 (0)
OPTIMAL LP SOLUTION FOUND
Time used:   0.0 secs
Memory used: 0.0 Mb (40412 bytes)
Write sensitivity analysis report to 'sensit.sen'...
GLPK 4.65 - SENSITIVITY ANALYSIS REPORT                                                                   

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

From the sensitivity report, explain which constraints are binding for the number of passengers on that route (PAX), the starting city’s average personal income (S_INCOME), and the ending city’s average personal income (E_INCOME). If the constraint is binding, interpret the shadow price in the context of the problem.  If the constraint is not binding, interpret the slack in the context of the problem.

The only binding constraint is S_INCOME <= 30000. The shadow price of 0.0018 indicates that maximum airfare (Z) increases by 0.0018 dollars for a dollar increase in S_INCOME constraint (RHS), all other variables remaining unchanged.

The non-binding constraints are PAX <= 20000 and E_INCOME >= 30000. The slack of of roughly 7,938 for PAX implies that the number of passengers on the route could increase by ~7,938 before hitting the upper bound of the constraint (RHS). In the case of E_INCOME, the -1,200 implies that the starting city's personal income could decrease by $1,200 before hitting the lower bound of the constraint (RHS).

# Activity Ranges

Interpret the activity ranges (allowable ranges) for COUPON, HI, and DISTANCE in the context of the problem.

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

To quote the class text, "for any $C_j$, its allowable range is the range of values for this coefficient over which the current optimal solution remains optimal, assuming no change in the other coefficients" (p.139). The same idea applies to the (optimal) activity values/decision values for each variable.

In the context of the problem:
* The activity range for COUPON indicates that this decision variable can vary from 1.07825 to 1.29258 per route (assuming no change in the other variables) and still satisfy the optimal solution (Z).
* The activity range of HI can vary from ~5207 to ~29456 (assuming no change in the other variables) and still satisfy the optimal solution (Z).
* The activity range of distance can vary from ~179 to ~3631 miles (assuming no change in the other variables) and still satisfy the optimal solution (Z).

# Conclusion

Briefly summarize the main conclusion of this project, state what you see as any limitations of the methods used here, and suggest other possible methods of addressing the maximizing of airfare in this problem scenario.

<font color = "blue"> *** 7 points -  answer in cell below *** (don't delete this cell) </font>

Max airfare =  203.55404681213537
coupon =  1.14372280657134
hi =  8000.0
distance =  1000.0

The maximum airfare for the circumstances given above is $203.55. At this price, the average number of coupons is 1.14, the Herfindel Index is 8000, and the distance between two cities is 1000 miles.

In terms of limitations, I think perhaps there are a couple that come to mind. The first potential limitation is that only 3 predictor variables are used in modeling/regressing airfare, out of a possible 18. Although most of the models in this project (with exception of the PAX model) have a high adj. R-squared, it's possible that using more predictors could have further increased accuracy.

Another limitation is the use of a concrete model whereas the use of an abstract model was possible.

# Appendix

Show the mathematical formulation for the linear programming problem used in this project.

You can either use LaTeX and markdown or take a clean, cropped picture of neatly handwritten equations and drag-n-drop it here.

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

<table class="tleft" width="40%">
    <tr>
        <td align="left" width="30%" colspan="2"><p> <b>Description</b></p></td>
        <td align="left"><p><b>Concrete Model</b></p></td>
    </tr>
    <tr>
        <td align="left" colspan="2"> Maximize Cost = </td>
        <td align="left">
            $22.59*(x_c) + 0.0118*(x_h) + 0.0833 (x_d) $
        </td>
    </tr>
    <tr>
        <td rowspan="3">Constraints</td>
        <td align="left">PAX</td>
        <td>$10819.32*(x_c) + 0.2481*(x_h) -2.2980*(x_d) <= 20000$</td>
    </tr>
    <tr>
        <td>S_INCOME</td>
        <td>$20909.19*(x_c) + 1.1145*(x_h) -2.8309*(x_d) <= 30000$</td>
    </tr>
    <tr>
        <td>E_INCOME</td>
        <td>$18330.37*(x_c) + 1.406*(x_h) -1.019*(x_d) >= 30000$</td>
    </tr>
</table>