## Linear Programing In Python : Create Watch List for TED Videos

This iPython Notebook is an Example of Constructing a Linear Program in Python with PULP module.

----------
**Problem Formulation:**
TED (www.ted.com) is a nonprofit devoted to spreading ideas, usually in the form of short, powerful talks (18 minutes or less). TED began in 1984 as a conference where Technology, Entertainment and Design converged, and today covers almost all topics — from science to business to global issues — in more than 100 languages.

Many of us would like to listen to popular talks in a given period. However, there are typically constraints on how much time one can allocate and how many talks can be assimilated. This notebook applies Linear Optimization techniques in Python to answer the question of "What should your TED talks viewing list so that you can cover the most popular talks given the constraints of time & number of talks" 

----------
> - Objective: Maximize the Number of Popular Talks to listen to
> - LP Form: Maximization
> - Decision Variables: Binary Variables indicating whether the talk is viewed or not.
> - Constraints: Limited Number of Time available to watch videos in a month

** Step 1: Import Relevant Packages **

In [1]:
from pulp import *
import numpy as np
import pandas as pd
import re 
import matplotlib.pyplot as plt
from IPython.display import Image
%matplotlib inline

** Step 2: Download the TED talks dataset from Kaggle and read it into pandas dataframe **

In [2]:
# Download the dataset from: https://www.kaggle.com/rounakbanik/ted-talks

# Read the dataset into pandas dataframe, convert duration from seconds to minutes
ted = pd.read_csv('ted_main.csv',encoding = "ISO-8859-1")
ted['duration'] = ted['duration']/60
ted=ted.round({'duration':1})

# Select subset of columns & rows (if required)
#data = ted.sample(n=1000)  # 'n' can be changed as required
data = ted
selected_cols = ['name','event','duration','views']
data = data[selected_cols]
data.reset_index(inplace=True)
data.head()

Unnamed: 0,index,name,event,duration,views
0,0,Ken Robinson: Do schools kill creativity?,TED2006,19.4,47227110
1,1,Al Gore: Averting the climate crisis,TED2006,16.3,3200520
2,2,David Pogue: Simplicity sells,TED2006,21.4,1636292
3,3,Majora Carter: Greening the ghetto,TED2006,18.6,1697550
4,4,Hans Rosling: The best stats you've ever seen,TED2006,19.8,12005869


>The resulting dataset contains:
> - Index of the talk
> - Name of the talk
> - TED Event Name
> - Talk duration (in minutes)
> - Number of Views (Proxy for Popularity of the talk)

** Step 3: Setting Up LP Problem:**

> Define The LP Object

> The *prob* variable is created to contain the formulation, and the usual parameters are passed into LpProblem.

In [3]:
# create the LP object, 
# set up as a maximization problem --> since we want to maximize the number of TED talks to watch
prob = pulp.LpProblem('WatchingTEDTalks', pulp.LpMaximize)

> Step 3.1: Create Decision Variables:

In [4]:
#create decision - yes or no to watch the talk?
decision_variables = []
for rownum, row in data.iterrows():
    #variable = str('x' + str(rownum))
    variable = str('x' + str(row['index']))
    variable = pulp.LpVariable(str(variable), lowBound = 0, upBound = 1, cat= 'Integer') #make variables binary
    decision_variables.append(variable)

print ("Total number of decision_variables: " + str(len(decision_variables)))

Total number of decision_variables: 2550


> Step 3.2: Define Objective Function: (*Maximizing the number of views*)

> The variable prob now begins collecting problem data with the += operator. The objective function is logically entered first, with an important comma , at the end of the statement and a short string explaining what this objective function is:

In [5]:
# Create Optimization Function
total_views = ""
for rownum, row in data.iterrows():
    for i, talk in enumerate(decision_variables):
        if rownum == i:
            formula = row['views']*talk
            total_views += formula

prob += total_views
#print ("Optimization function: " + str(total_views))	

> Step 3.3:Define Constrains:(*We have a Fixed Amount of time to view the talks and only so many talks can be viewed*)

In [6]:
# Constraints
total_time_available_for_talks = 10*60 # Total time available is 10 hours. Converted to minutes
total_talks_can_watch = 25 # Don't want an overload of information

In [7]:
# Create Constraint 1 - Time for talks
total_time_talks = ""
for rownum, row in data.iterrows():
	for i, talk in enumerate(decision_variables):
		if rownum == i:
			formula = row['duration']*talk
			total_time_talks += formula

prob += (total_time_talks == total_time_available_for_talks)

In [8]:
# Create Constraint 2 - Number of talks
total_talks = ""

for rownum, row in data.iterrows():
	for i, talk in enumerate(decision_variables):
		if rownum == i:
			formula = talk
			total_talks += formula

prob += (total_talks == total_talks_can_watch)

>Step 3.4:The Final Format

In [9]:
print(prob)
prob.writeLP("WatchingTEDTalks.lp" )

WatchingTEDTalks:
MAXIMIZE
47227110*x0 + 3200520*x1 + 1211416*x10 + 717002*x100 + 1079565*x1000 + 5447236*x1001 + 1055562*x1002 + 1399333*x1003 + 740934*x1004 + 983929*x1005 + 1451656*x1006 + 790122*x1007 + 593099*x1008 + 693722*x1009 + 1451846*x101 + 946354*x1010 + 992224*x1011 + 1264969*x1012 + 872169*x1013 + 852507*x1014 + 647752*x1015 + 293626*x1016 + 1564173*x1017 + 1783040*x1018 + 834926*x1019 + 577502*x102 + 484266*x1020 + 1258574*x1021 + 1390908*x1022 + 667985*x1023 + 399332*x1024 + 2204314*x1025 + 787092*x1026 + 1776828*x1027 + 598693*x1028 + 3729820*x1029 + 1683456*x103 + 8744428*x1030 + 975365*x1031 + 502832*x1032 + 2901853*x1033 + 950387*x1034 + 576592*x1035 + 16861578*x1036 + 1426518*x1037 + 3630894*x1038 + 1048905*x1039 + 779873*x104 + 471545*x1040 + 841471*x1041 + 729857*x1042 + 2487499*x1043 + 648251*x1044 + 1067460*x1045 + 507746*x1046 + 933319*x1047 + 822884*x1048 + 1529057*x1049 + 940913*x105 + 924764*x1050 + 1736183*x1051 + 1042789*x1052 + 148971*x1053 + 291251*x105

> Step 3.5: The Actual Optimization

In [10]:
optimization_result = prob.solve()

assert optimization_result == pulp.LpStatusOptimal
print("Status:", LpStatus[prob.status])
print("Optimal Solution to the problem: ", value(prob.objective))
print ("Individual decision_variables: ")
for v in prob.variables():
	print(v.name, "=", v.varValue)

Status: Optimal
Optimal Solution to the problem:  470591400.0
Individual decision_variables: 
x0 = 1.0
x1 = 0.0
x10 = 0.0
x100 = 0.0
x1000 = 0.0
x1001 = 0.0
x1002 = 0.0
x1003 = 0.0
x1004 = 0.0
x1005 = 0.0
x1006 = 0.0
x1007 = 0.0
x1008 = 0.0
x1009 = 0.0
x101 = 0.0
x1010 = 0.0
x1011 = 0.0
x1012 = 0.0
x1013 = 0.0
x1014 = 0.0
x1015 = 0.0
x1016 = 0.0
x1017 = 0.0
x1018 = 0.0
x1019 = 0.0
x102 = 0.0
x1020 = 0.0
x1021 = 0.0
x1022 = 0.0
x1023 = 0.0
x1024 = 0.0
x1025 = 0.0
x1026 = 0.0
x1027 = 0.0
x1028 = 0.0
x1029 = 0.0
x103 = 0.0
x1030 = 0.0
x1031 = 0.0
x1032 = 0.0
x1033 = 0.0
x1034 = 0.0
x1035 = 0.0
x1036 = 1.0
x1037 = 0.0
x1038 = 0.0
x1039 = 0.0
x104 = 0.0
x1040 = 0.0
x1041 = 0.0
x1042 = 0.0
x1043 = 0.0
x1044 = 0.0
x1045 = 0.0
x1046 = 0.0
x1047 = 0.0
x1048 = 0.0
x1049 = 0.0
x105 = 0.0
x1050 = 0.0
x1051 = 0.0
x1052 = 0.0
x1053 = 0.0
x1054 = 0.0
x1055 = 0.0
x1056 = 0.0
x1057 = 0.0
x1058 = 0.0
x1059 = 0.0
x106 = 0.0
x1060 = 0.0
x1061 = 0.0
x1062 = 0.0
x1063 = 0.0
x1064 = 0.0
x1065 = 0.0
x1066 = 0

** Step 4: Convert the optimization results into an interpretable decision making format **

In [11]:
#reorder results
variable_name = []
variable_value = []

for v in prob.variables():
	variable_name.append(v.name)
	variable_value.append(v.varValue)

df = pd.DataFrame({'index': variable_name, 'value': variable_value})
for rownum, row in df.iterrows():
	value = re.findall(r'(\d+)', row['index'])
	df.loc[rownum, 'index'] = int(value[0])

#df = df.sort_index(by='index')
df = df.sort_values(by='index')
result = pd.merge(data,df,on='index')
result = result[result['value'] == 1].sort_values(by='views',ascending=False)
selected_cols_final = ['name','event','duration','views']
final_set_of_talks_to_watch = result[selected_cols_final]

## The Final List of Talks to Watch

In [12]:
from IPython.display import display, HTML
display(HTML(final_set_of_talks_to_watch.to_html(index=False)))

name,event,duration,views
Ken Robinson: Do schools kill creativity?,TED2006,19.4,47227110
Amy Cuddy: Your body language may shape who yo...,TEDGlobal 2012,21.0,43155405
Simon Sinek: How great leaders inspire action,TEDxPuget Sound,18.1,34309432
BrenÃ© Brown: The power of vulnerability,TEDxHouston,20.3,31168150
Mary Roach: 10 things you didn't know about or...,TED2009,16.7,22270883
Julian Treasure: How to speak so that people w...,TEDGlobal 2013,10.0,21594632
Jill Bolte Taylor: My stroke of insight,TED2008,18.3,21190883
Tony Robbins: Why we do what we do,TED2006,21.8,20685401
James Veitch: This is what happens when you re...,TEDGlobal>Geneva,9.8,20475972
Cameron Russell: Looks aren't everything. Beli...,TEDxMidAtlantic,9.6,19787465
