## Linear Programming with Python, by [AnalyticsVidhya](#) <design by kcEmenike>

### Case Study: Creating a TED watchlist of videos
***

TED publishes hundreds of videos in various languages, and because there are so many videos and so little time available, I intend to maximise my use of time to watch as many videos as possible.

The objective is:
- Decide on which talk to watch (i.e. to watch or not to watch a talk) so that I watch the highest number of videos

**Constraints are such that:**
- <font color=red>Only 10 hours are available for all videos</font>
- <font color=red>I can only watch 25 videos</font>

How can I maximise the little time to make a decision on which video to watch?

*The dataset is available at [rounakbanik/ted-talks](https://www.kaggle.com/rounakbanik/ted-talks) on Kaggle*

***
### Table of Content
- [Import items](#import-items)
- [Get data](#get-data)
- [Choose important data for analytics](#choose-data)
- [Create optimisation object from PuLP and define optimisation problem type (minimisation or maximisation)](#create-lp-object)
- [Define constraints](#constraints)
- [Run optimisation and write results to LP file](#run-lp-optimisation)
- [Convert optimisation result to readable decision-making format](#convert-optimisation)
- [Show optimisation result](#show-lp-result)

<a id='import-items'></a>

In [1]:
import pandas as pd, numpy as np, re, matplotlib.pyplot as plt
from pulp import *
from IPython.display import display, HTML
%matplotlib inline

<a id='get-data'></a>

In [2]:
!kaggle datasets files rounakbanik/ted-talks

name             size  creationDate         
---------------  ----  -------------------  
ted_main.csv      2MB  2017-09-25 21:08:55  
transcripts.csv  10MB  2017-09-25 21:09:14  


In [3]:
!kaggle datasets download rounakbanik/ted-talks --unzip
ted = pd.read_csv('ted_main.csv', encoding='ISO-8859-1')

ted-talks.zip: Skipping, found more recently modified local copy (use --force to force download)


In [4]:
ted.head()

Unnamed: 0,comments,description,duration,event,film_date,languages,main_speaker,name,num_speaker,published_date,ratings,related_talks,speaker_occupation,tags,title,url,views
0,4553,Sir Ken Robinson makes an entertaining and pro...,1164,TED2006,1140825600,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,1151367060,"[{'id': 7, 'name': 'Funny', 'count': 19645}, {...","[{'id': 865, 'hero': 'https://pe.tedcdn.com/im...",Author/educator,"['children', 'creativity', 'culture', 'dance',...",Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_sc...,47227110
1,265,With the same humor and humanity he exuded in ...,977,TED2006,1140825600,43,Al Gore,Al Gore: Averting the climate crisis,1,1151367060,"[{'id': 7, 'name': 'Funny', 'count': 544}, {'i...","[{'id': 243, 'hero': 'https://pe.tedcdn.com/im...",Climate advocate,"['alternative energy', 'cars', 'climate change...",Averting the climate crisis,https://www.ted.com/talks/al_gore_on_averting_...,3200520
2,124,New York Times columnist David Pogue takes aim...,1286,TED2006,1140739200,26,David Pogue,David Pogue: Simplicity sells,1,1151367060,"[{'id': 7, 'name': 'Funny', 'count': 964}, {'i...","[{'id': 1725, 'hero': 'https://pe.tedcdn.com/i...",Technology columnist,"['computers', 'entertainment', 'interface desi...",Simplicity sells,https://www.ted.com/talks/david_pogue_says_sim...,1636292
3,200,"In an emotionally charged talk, MacArthur-winn...",1116,TED2006,1140912000,35,Majora Carter,Majora Carter: Greening the ghetto,1,1151367060,"[{'id': 3, 'name': 'Courageous', 'count': 760}...","[{'id': 1041, 'hero': 'https://pe.tedcdn.com/i...",Activist for environmental justice,"['MacArthur grant', 'activism', 'business', 'c...",Greening the ghetto,https://www.ted.com/talks/majora_carter_s_tale...,1697550
4,593,You've never seen data presented like this. Wi...,1190,TED2006,1140566400,48,Hans Rosling,Hans Rosling: The best stats you've ever seen,1,1151440680,"[{'id': 9, 'name': 'Ingenious', 'count': 3202}...","[{'id': 2056, 'hero': 'https://pe.tedcdn.com/i...",Global health expert; data visionary,"['Africa', 'Asia', 'Google', 'demo', 'economic...",The best stats you've ever seen,https://www.ted.com/talks/hans_rosling_shows_t...,12005869


<a id='choose-data'></a>

In [5]:
# What is the target, and what are the labels needed?
data = ted.copy()

In [6]:
# Let's use only the name, event, duration and views columns (discretion)
data = data[['name','event','duration','views']].reset_index()#.drop('index', axis=1)
data['duration'] = data['duration'].div(60).round(1)
data.head()

Unnamed: 0,index,name,event,duration,views
0,0,Ken Robinson: Do schools kill creativity?,TED2006,19.4,47227110
1,1,Al Gore: Averting the climate crisis,TED2006,16.3,3200520
2,2,David Pogue: Simplicity sells,TED2006,21.4,1636292
3,3,Majora Carter: Greening the ghetto,TED2006,18.6,1697550
4,4,Hans Rosling: The best stats you've ever seen,TED2006,19.8,12005869


<a id='create-lp-object'></a>

In [7]:
# Setup the Linear Programmign object as a Maximisation problem
prob = pulp.LpProblem('WatchingTEDTalks',pulp.LpMaximize)

In [8]:
# Create the decision variables
# Iterate over each row and assign to the decision variable using LpVariable

decision_variables = []
for rownum, row in data.iterrows():
    variable = str('x' + str(row['index'])) # create variables x0, x1, etc
    variable = pulp.LpVariable(str(variable), lowBound=0, upBound=1, cat='Integer') # make variables binary
    decision_variables.append(variable)
    
print(f"Total number of decision variables is {len(decision_variables)}")

Total number of decision variables is 2550


In [9]:
# Create optimisation function
total_views = ""
for rownum, row in data.iterrows():
    for i, talk in enumerate(decision_variables):
        if rownum==i:
            formula = row['views']*talk
            total_views += formula
            
prob += total_views

<a id='constraints'></a>

In [10]:
# Define constraints
total_time_available_for_talks = 10*60 # 10 hours available
total_talks_can_watch = 25 # Can't watch more than 25 talks in the time available

# Build the constraint function for total time for talks
total_time_talks = ""
for rownum, row in data.iterrows():
    for i, talk in enumerate(decision_variables):
        if rownum == i:
            formula = row['duration'] * talk
            total_time_talks += formula
            
prob += (total_time_talks == total_time_available_for_talks)

In [11]:
# Another constraint
total_talks = ""

for rownum, row in data.iterrows():
    for i, talk in enumerate(decision_variables):
        if rownum == i:
            formula = talk
            total_talks += formula
            
prob += (total_talks == total_talks_can_watch)

<a id='run-lp-optimisation'></a>

In [12]:
prob.writeLP("WatchingTEDTalks.lp")

In [13]:
optimisation_result = prob.solve()

In [14]:
LpStatus[prob.status]

'Optimal'

In [15]:
value(prob.objective)

470591400.0

In [16]:
# Show value of all optimisation variables
#for v in prob.variables():
    #print(f"{v.name} = {v.varValue}")

<a id='convert-optimisation'></a>

In [17]:
# Convert the optimisation to interpretable decision making format

variable_name = []
variable_value = []

for v in prob.variables():
    variable_name.append(v.name)
    variable_value.append(v.varValue)
    

df = pd.DataFrame({'index':variable_name, 'value':variable_value})
for rownum, row in df.iterrows():
    value = re.findall(r'(\d+)', row['index'])
    df.loc[rownum, 'index'] = int(value[0])
    
df = df.sort_values(by='index')
result = pd.merge(data, df, on='index')
result = result[result['value']==1].sort_values(by='views', ascending=False)

final_set_of_talks_to_watch = result[['name','event','duration','views']]
final_set_of_talks_to_watch.head()

Unnamed: 0,name,event,duration,views
0,Ken Robinson: Do schools kill creativity?,TED2006,19.4,47227110
1346,Amy Cuddy: Your body language may shape who yo...,TEDGlobal 2012,21.0,43155405
677,Simon Sinek: How great leaders inspire action,TEDxPuget Sound,18.1,34309432
837,BrenÃ© Brown: The power of vulnerability,TEDxHouston,20.3,31168150
452,Mary Roach: 10 things you didn't know about or...,TED2009,16.7,22270883


<a id='show-lp-result'></a>

In [18]:
display(HTML(final_set_of_talks_to_watch.to_html(index=False)))

name,event,duration,views
Ken Robinson: Do schools kill creativity?,TED2006,19.4,47227110
Amy Cuddy: Your body language may shape who yo...,TEDGlobal 2012,21.0,43155405
Simon Sinek: How great leaders inspire action,TEDxPuget Sound,18.1,34309432
BrenÃ© Brown: The power of vulnerability,TEDxHouston,20.3,31168150
Mary Roach: 10 things you didn't know about or...,TED2009,16.7,22270883
Julian Treasure: How to speak so that people w...,TEDGlobal 2013,10.0,21594632
Jill Bolte Taylor: My stroke of insight,TED2008,18.3,21190883
Tony Robbins: Why we do what we do,TED2006,21.8,20685401
James Veitch: This is what happens when you re...,TEDGlobal>Geneva,9.8,20475972
Cameron Russell: Looks aren't everything. Beli...,TEDxMidAtlantic,9.6,19787465
