#Linear Programing In Python : Deciding Where to Go on Vacation

This iPython Notebook is an Example of Constructing a Linear Program in Python with PULP module.

----------
**Problem Formulation:**
You want to go on vacation, but you have only limited number of days. On top of it, we also want keep the cost at minimum.
And the internet offers plenty of options how to decide, so which packages/mix of packages should we select? 

----------
> - Objective: Minimize Cost of Vacation while selecting Optimal Vacation Package
> - LP Form: Minimization
> - Decision Variables: Binary Variables whether to purchase the package or not.
> - Constrains: Limited Number of Vacation

In [1]:
from pulp import *
import numpy as np
import pandas as pd
import re 
import matplotlib.pyplot as plt
from IPython.display import Image
%matplotlib inline

**Getting the Data**
>There are multiple websites that provide full-priced and discount vacation packages.
The dataset from this problem was scraped from The Clymb Adventures http://www.theclymb.com/adventures 

![TheClymb](https://photos-1.dropbox.com/t/2/AAB8eX8O_-HLLEXt482rsjiDDj-Cy-mvF1DZT6MjP5GKVg/12/49846494/png/32x32/1/_/1/2/thclymb.png/ENCbqCYY-KgMIAEoAQ/S5J_1b9un4Uy-zuWdfU7bKaGgVECPSAFNnMgrfDttqA?size=1024x768&size_mode=2)


**Understanding the Dataset**
>The dataset contains:
> - Final Destination
> - Duration of the trip
> - Total Cost of the trip
> - Short Description of the adventure

In [2]:
data = pd.read_csv('clymb_adventures.csv')
data[:5]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 231: invalid start byte

In [None]:
fig, axs = plt.subplots(1,2)
my_plot = data[['destination', 'cost']].plot(kind='bar', title="Destination by Cost", ax=axs[0])
my_plot = data[['destination', 'duration']].plot(kind='bar', title="Destination by Duration", ax=axs[1])

**Setting Up LP Problem:**

> Define The LP Object

> The *prob* variable is created to contain the formulation, and the usual parameters are passed into LpProblem.

In [None]:
# create the LP object, set up as a minimization problem --> since we want to minimize the costs 
prob = pulp.LpProblem('GoingOnVacation', pulp.LpMinimize)

> Create Decision Variables:

In [None]:
decision_variables = []
for rownum, row in data.iterrows():
	variable = str('x' + str(rownum))
	variable = pulp.LpVariable(str(variable), lowBound = 0, upBound = 1, cat= 'Integer') #make variables binary
	decision_variables.append(variable)

print ("Total number of decision_variables: " + str(len(decision_variables)))
print ("Array with Decision Variables:" + str(decision_variables))

> Define Objective Function: (*Minimixing the Cost of The Trip*)

> The variable prob now begins collecting problem data with the += operator. The objective function is logically entered first, with an important comma , at the end of the statement and a short string explaining what this objective function is:

In [None]:
total_cost = ""
for rownum, row in data.iterrows():
	for i, schedule in enumerate(decision_variables):
		if rownum == i:
			formula = row['cost']*schedule
			total_cost += formula

prob += total_cost
print ("Optimization function: " + str(total_cost))	

> Define Constrains: (*Selected Packages should not exceed total vacation days available*)

In [None]:
aval_vacation_days = 10
total_vacation_days = ""
for rownum, row in data.iterrows():
	for i, schedule in enumerate(decision_variables):
		if rownum == i:
			formula = row['duration']*schedule
			total_vacation_days += formula

prob += (total_vacation_days == aval_vacation_days)

>The Final Format

In [None]:
print prob
prob.writeLP("GoingOnVacation.lp" )

> The Actual Optimization:

In [None]:

optimization_result = prob.solve()

assert optimization_result == pulp.LpStatusOptimal
print("Status:", LpStatus[prob.status])
print("Optimal Solution to the problem: ", value(prob.objective))
print ("Individual decision_variables: ")
for v in prob.variables():
	print(v.name, "=", v.varValue)

> The results are stored 
> If you don't pass the names to the variables and want to append the decision variables back in your desired file format, you want to loop through variable name object. 

> Depending on your initial data format you might want to parse the results differently. Since in this example we have used pandas dataframe, we will use the number of the variable as index to append the results back to initial dataset

In [None]:
variable_name = []
variable_value = []

for v in prob.variables():
	variable_name.append(v.name)
	variable_value.append(v.varValue)

df = pd.DataFrame({'variable': variable_name, 'value': variable_value})
for rownum, row in df.iterrows():
	value = re.findall(r'(\d+)', row['variable'])
	df.loc[rownum, 'variable'] = int(value[0])

df = df.sort_index(by='variable')

#append results
for rownum, row in data.iterrows():
	for results_rownum, results_row in df.iterrows():
		if rownum == results_row['variable']:
			data.loc[rownum, 'decision'] = results_row['value']
            
data[:5]

> The Final Decisions and Results of the Optimization in the "User Friendly Way":

In [None]:
data[data['decision'] == 1]

In [None]:
data[data['decision'] == 1]['cost'].sum(axis=1)