In [1]:
# TODO Problem Descrip -- moet beter; nog meer research based.
# TODO Including Genres -- perhaps as extension? Or own planning of the schedule?

# Tomorrowland Artist Scheduling
*Vlerick Business School, Master in Data Analytics and Artificial Intelligence 2024-2025. Made by Luna Geens. For the course Decision Optimisation given by Mario Vanhoucke, as a final assignment due on Friday 20 Dec 2024.*

<br>

----

<br>

## 1. Introduction

###  <b style="color: #8B0000;">1.1 Problem Description</b>

Tomorrowland, as one of the largest and most prestigious electronic music festivals globally, attracts hundreds of thousands of attendees from over **200 countries**, with tickets often selling out within minutes. Since its founding in **2005**, Tomorrowland has expanded to host over **15 stages** simultaneously, accommodating **800+ artists** across multiple weekends, making efficient scheduling a necessity rather than a choice.

Efficient artist scheduling is not only a logistical challenge, but also a critical factor in enhancing the **festival experience** and **business outcomes**.
- Festivals thrive on providing unforgettable experiences, where attendees expect their favorite artists to perform at ideal times and on prominent stages. Balancing artist popularity with time slots ensures maximum audience engagement and satisfaction.
-  Strategic artist placement directly influences festival success:
     - Ticket Sales: Scheduling popular artists during prime slots drives higher attendance.
     - Reputation: Delivering a seamless experience strengthens Tomorrowland's global standing as a premier festival.
     - Cost Optimization: Efficient scheduling minimizes operational costs related to stage management, logistics, and artist coordination.

<br>

Tomorrowland’s scale introduces significant complexity:

- High Artist Volume: Hundreds of artists perform multiple sets across various stages over several days.
- Time and Stage Constraints: Each stage has a defined availability window per day, while artists have specific limits on the number of performances and required rest time.
- Audience Demand: Aligning artist popularity with desirable time slots adds another layer of optimization.

<Br>

At Tomorrowland's scale, manual scheduling is impractical. By leveraging optimizatoin-driven approaches, this research provides insights into optimizing festival scheduling, delivering tangible benefits to both attendees and organizers. In recent years, studies on festival optimization have increasingly relied on **mathematical models** (e.g., linear programming) to address these challenges.




###  <b style="color: #8B0000;">1.2 Mathematical Problem Description</b>

#### <b style="color: #8B0000;">1.2.1 Input Sets and Parameters</b>

Sets:
- $S$ = Set of **stages**, where $S = { s_1, s_2, \dots, s_n } $.
- $A$ = Set of **artists**, where $ A = { a_1, a_2, \dots, a_m} $.
- $K_a$ = Set of **sets performed** by artist $ a \in A$, where $K_a = {k_1, k_2, \dots , k_l}$ .
- $D$ = Set of **days**, where $D={D=a_1, a_2, \dots, a_p}$.
- $T_s,d$ = Set of **starting points of time slots** available for stage $s$ during day $d$, where $T_{s,d} = {t_1, t_2, \dots, t_q}$.







Parameters:

**Set-Specific** Parameters: For each artist's set $k \in K_a $:
- $b_{k}$: Time point in minutes of the beginning of the set $k$. *Note that this is a decision variable in the problem, not a parameter!*
- $d_{k}$: Day of the set $k$.
- $e_{k} $: Duration of set $ k $ in minutes.

**Stage-Specific** Parameters: For each stage $s \in S $:
- $t^{open}_{s,d}$: Opening time of stage $s$ on day $d$.
- $t^{close}_{s,d}$: Closing time of stage $s$ on day $d$.
- $c$: Clean-up time required after a performance on each stage (e.g., $ c = 45  \text{ minutes} $).

**Artist-Specific** Parameters: For each artist $a \in A$:
- $r$:  Rest time required for each artist between two sets (e.g., $ r = 120  \text{ minutes} $).
- $ \text{max\_sets}$: Maximum number of performances each artist can play per day. (e.g., $ \text{max\_sets} = 2 $).

**Popularity** Parameters:
- $ p_a $: Popularity score for artist $ a $, where $ a \in A $.
- $ p_s $: Popularity score for stage $ s $, where $ s \in S $.
- $ p_t $: Weight indicating the popularity of the starting point of the timeslot $ t $, where $ t \in T $.

<br>


#### <b style="color: #8B0000;">1.2.2 Decision Variables</b>

The decision variable accounts for artists performing multiple sets:

$
    x_{a, k, s, d, t} =
\begin{cases}
1 & \text{if artist } a \text{ performs set } k \text{ on stage } s \text{ starting at time slot } t \text{ on day } d \\
    0 & \text{otherwise.}
\end{cases}
$

Where:
- $ a \in A $: Artist in the set of  Artists.
- $ k \in K_a $: Set in the set of Sets performed by artist $a$.
- $ s \in S $: Stage in the set of Stages.
- $ t \in T_{s,d} $: Start time in the set of Start times for stage $s$ on day $d$.
- $ d \in D$: Day in the set of Days.

<br>

#### <b style="color: #8B0000;">1.2.3 Objective Function</b>


The objective is to **maximize customer satisfaction** by considering artist, stage, and timeslot popularity. These weights are predefined inputs based on audience preferences or prior analysis.

$
\text{Maximize } Z = \sum_{a \in A} \sum_{k \in K_a} \sum_{s \in S} \sum_{d \in D} \sum_{t \in T_{s,d}} (\alpha \cdot p_{a} + \beta \cdot p_{s} + \gamma \cdot p_{t}) \cdot x_{a, k, s, d, t}.
$

Where:
- $ a \in A $: Artist in the set of  Artists.
- $ k \in K_a $: Set in the set of Sets performed by artist $a$.
- $ s \in S $: Stage in the set of Stages.
- $ t \in T_{s,d} $: Start time in the set of Start times for stage $s$ on day $d$.
- $ d \in D$: Day in the set of Days.
- $ p_a $: Popularity score for artist $ a $, where $ a \in A $.
- $ p_s $: Popularity score for stage $ s $, where $ s \in S $.
- $ p_t $: Popularity score of the starting point of the timeslot $ t $, where $ t \in T_{s,d}$.
- $ \alpha, \beta, \gamma$: Weigths indicating the relative importance of artists, stage, and timeslot popularity,
- $x_{a, k, s, d, t}$: Binary variable indicating if artist $a$ performs set $k$ on stage $s$ starting at time slot $t$ on day $d$.

<br>


#### <b style="color: #8B0000;">1.2.4 Constraints</b>



**(1) Each Set is Scheduled Exactly Once.**

Each set $ k $ of artist $ a $ must be scheduled at exactly one stage and time (timeslot and day):

For $\forall a \in A; \ k \in K_{a}$:

$
\sum_{s \in S} \sum_{d \in D} \sum_{t \in T_{s,d}}  x_{a, k, s, d, t} = 1
$.

Where:
- $ a \in A $: Artists.
- $ k \in K_a $: Sets performed by artist $a$.
- $ s \in S $: Stages.
- $ d \in D$: Days.
- $ t \in T_{s,d} $: Start times for stage $s$ on day $d$.
- $x_{a, k, s, d, t}$: Binary variable indicating if artist $a$ performs set $k$ on stage $s$ starting at time slot $t$ on day $d$.

<br>



**(2) Each Set is Scheduled within Stage Availability.**

The start and end times of each set $k$ of artist $a$ played on stage $s$ on a day $d$ must lie within the stage’s open and close times of that day $d$.

For $\forall s \in S; \ d \in D; \ a \in A; \ k \in K_a$: Both conditions (1) and (2) are required to hold.

1. Start time $b_{k}$ of set $k$ on stage $s$ is after the stage's opening time $t^{open}_{s,d}$: $ \ b_{k} \leq t^{open}_{s,d}$.
2. The set finishes ($b_{k} + e_k$) before the stage's closing time $t^{close}_{s,d}$: $ \ b_{k} + e_{k} \leq t^{close}_{s,d}$.

Where:
- $ s \in S $: Stages.
- $ d \in D$: Days.
- $ a \in A $: Artists.
- $ k \in K_a $: Sets performed by artist $a$.
- $b_{k}$: Time point in minutes of the beginning of the set $k$.
- $e_k$: Duration of set $ k $ in minutes.
- $t^{open}_{s,d}$: Opening time of stage $s$ on day $d$.
- $t^{close}_{s,d}$: Closing time of stage $s$ on day $d$.

<br>


**(3) Sets on the Same Stage are Scheduled at a Different Time with a Clean-Up Time inbetween.**

Two sets $k$ and $k′$ on the same stage $s$ do not overlap in time.  Additionally, a clean-up time $c$ is required between the two sets. This ensures that no two performances occur simultaneously on the same stage, and sufficient time is allowed for stage preparation between sets.

For $\forall s \in S$; $d \in D$; $a \in A$;  $k, k' \in K_a$ and $k \neq k' $: At least one of the conditions (1) and (2) are required to hold.

1. Set $k$ must finish before $k′$ starts: $b_k + e_k + c \leq b'_k$.
2. Set $k′$ must finish before $k$ starts: $b'_k + e'_k + c \leq b_k$.

Where:
- $ s \in S $: Stages.
- $ d \in D$: Days.
- $ a \in A $: Artists.
- $ k \in K_a $: Sets performed by artist $a$.
- $b_{k}$: Time point in minutes of the beginning of the set $k$.
- $e_k$: Duration of set $ k $ in minutes.
- $c$: Clean-up Time $(c = 45  \text{ minutes} )$.

<br>


**(4) Sets of the Same Artist are Scheduled at a Different Time with a Rest Time inbetween.**

Two sets $k$ and $k′$ of the same artist $a$ do not overlap in time.  Additionally, a rest time $r$ is required between the two sets. This ensures that no two performances occur simultaneously of the same artist, and sufficient time is allowed for artist preparation between sets.

For $\forall s, s' \in S$; $d \in D$; $a \in A$;  $k, k' \in K_a$ and $k \neq k' $: At least one of the conditions (1) and (2) are required to hold.

1. Set $k$ must finish before $k′$ starts: $b_k + e_k + r \leq b'_k$.
2. Set $k′$ must finish before $k$ starts: $b'_k + e'_k + r \leq b_k$.

Where:
- $ s, s' \in S $: Stages. The sets can be on the same or different stages.
- $ d \in D$: Days.
- $ a \in A $: Artists.
- $ k \in K_a $: Sets performed by artist $a$.
- $b_{k}$: Time point in minutes of the beginning of the set $k$.
- $e_k$: Duration of set $ k $ in minutes.
- $r$: Rest Time $(r = 120  \text{ minutes} )$.

<br>



**(5) No more than 2 Sets of an Artist are Scheduled on the Same Day.**

Each artist is restricted to performing at most two sets on any single day. This ensures a manageable workload and a balanced schedule.

For $d \in D$; $a \in A$:

$\sum_{k \in K_a} \sum_{s \in S} \sum_{t \in T_{s,d}} x_{a, k, s, d, t} \leq 2$

Where:
- $ d \in D$: Days.
- $ a \in A $: Artists.
- $ k \in K_a $: Sets performed by artist $a$.
- $ s \in S $: Stages.
- $ t \in T_{s,d} $: Start times for stage $s$ on day $d$.
- $x_{a, k, s, d, t}$: Binary variable indicating if artist $a$ performs set $k$ on stage $s$ starting at time slot $t$ on day $d$.

<br>


<b style="color: #8B0000;">5. Outputs</b>

The solution will provide:
1. **Artist Schedule**: Table mapping each artist's set to a stage, time, and day.
2. **Stage Utilization**: Ensures clean-up time is respected between sets.
3. **Artist Rest Validation**: Verifies artists have sufficient rest and do not exceed two sets per day.
4. **Stage Availability**: Summarizes the overall availability for each stage.

<br>

<b style="color: #8B0000;">6. Summary Table of Symbols</b>

| **Symbol**                 | **Description**                                                                                                             |
|----------------------------|-----------------------------------------------------------------------------------------------------------------------------|
| $S$                        | Set of stages, where $S = \{s_1, s_2, \dots, s_n\} $.                                                                       |
| $A$                        | Set of artists, where $ A = \{a_1, a_2, \dots, a_m\} $.                                                                     |
| $ K_a $                  | Set of sets performed by artist $ a \in A $, $ K_a = \{k_1, k_2, \dots, k_l\} $.                                            |
| $ D $                    | Set of days, where $ D = \{d_1, d_2, \dots, d_p\} $.                                                                        |
| $ T_{s,d} $              | Set of starting points of time slots available for stage $ s $ during day $ d $.                                            |
| $ b_k $                  | Time point in minutes of the beginning of the set $ k $.                                                                    |
| $ d_k $                  | Day of the set $ k $.                                                                                                       |
| $ e_k $                  | Duration of set $ k $ in minutes.                                                                                           |
| $ t^{open}_{s,d} $       | Opening time of stage $ s $ on day $ d $.                                                                                   |
| $ t^{close}_{s,d} $      | Closing time of stage $ s $ on day $ d $.                                                                                   |
| $ c $                    | Clean-up time required after a performance on each stage.                                                                   |
| $ r $                    | Rest time required for each artist between two sets.                                                                        |
| $ \text{max\_sets} $     | Maximum number of performances each artist can play per day.                                                                |
| $ p_a $                  | Popularity score for artist $ a $.                                                                                          |
| $ p_s $                  | Popularity score for stage $ s $.                                                                                           |
| $ p_t $                  | Popularity of the starting point of the timeslot $ t $.                                                                     |
| $ \alpha, \beta, \gamma $ | Weights indicating the importance of artist, stage, and timeslot popularity.                                                |
| $ x_{a, k, s, d, t} $    | Binary variable: 1 if artist $ a $ performs set $ k $ on stage $ s $ starting at time slot $ t $ on day $ d $, 0 otherwise. |


<br>

<br>

----

<br>


## 2. Data Imports


In [1]:
# * All imports
import pulp
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import spotipy
from spotipy.oauth2 import SpotifyOAuth
import re
import time  # For handling delays
import csv

###  <b style="color: #8B0000;">2.1 Getting the data</b>

####  <b style="color: #8B0000;">2.1.1 Web Scraping Line-up Information</b>

First, we need to collect some data on the artists. This process involves gathering and organizing Tomorrowland 2024's timetable directly from the web. The schedule is extracted from [an online page providing information on timetables of different festivals](https://festivalviewer.com/tomorrowland/lineup/2024), structured into a table format, and saved as a CSV file. This makes sure that next year, we can reuse the code.

In [2]:
# * Scrape the web for the timetable of Tomorrowland 2024
#
# # URL of the Tomorrowland Lineup Page
# url = 'https://festivalviewer.com/tomorrowland/lineup/2024'
#
# # Fetch the page content
# response = requests.get(url)
# soup = BeautifulSoup(response.content, 'html.parser')
#
# # Find the table containing the lineup data
# table = soup.find('table', {'id': 'table_id'})
#
# # Extract table headers
# headers = [th.text.strip() for th in table.find('thead').find_all('th')]
#
# # Extract rows directly without 'tbody'
# rows = []
# for tr in table.find_all('tr'):  # Find all rows in the table
#     td_elements = tr.find_all('td')  # Find all data cells
#     if td_elements:  # Only process rows with <td> elements
#         row = [td.text.strip() for td in td_elements]
#         rows.append(row)
#
# # Convert to DataFrame
# lineup_df = pd.DataFrame(rows, columns=headers)
#
# # Save to CSV
# lineup_df.to_csv('doc/tomorrowland_lineup_2024.csv', index=False)
# print("Data saved to tomorrowland_lineup_2024.csv")
#
# lineup_df

Data saved to tomorrowland_lineup_2024.csv


Unnamed: 0,Artist,Stage Name,Host Name,Timeslot,Date,Year,Day,Weekend,Genre(s)
0,DJ Mars,The Gathering,The Gathering hosted by MC Gunner,13:30 - 14:15,18 July,2024,Thursday,Weekend 1,
1,Bosart,The Gathering,The Gathering hosted by MC Gunner,14:15 - 15:00,18 July,2024,Thursday,Weekend 1,
2,Dietro,The Gathering,The Gathering hosted by MC Gunner,15:00 - 15:45,18 July,2024,Thursday,Weekend 1,
3,Voltage,The Gathering,The Gathering hosted by MC Gunner,15:45 - 16:30,18 July,2024,Thursday,Weekend 1,
4,Elfigo,The Gathering,The Gathering hosted by MC Gunner,16:30 - 17:15,18 July,2024,Thursday,Weekend 1,
...,...,...,...,...,...,...,...,...,...
815,Da Tweekaz,Rose Garden,Tweekamania,17:00 - 18:00,28 July,2024,Sunday,Weekend 2,Hardcore Techno
816,MANDY,Rose Garden,Tweekamania,18:00 - 19:00,28 July,2024,Sunday,Weekend 2,Hard Dance
817,Tweekacore,Rose Garden,Tweekamania,19:00 - 20:00,28 July,2024,Sunday,Weekend 2,
818,Mark with a K & Mc Chucky,Rose Garden,Tweekamania,20:00 - 21:00,28 July,2024,Sunday,Weekend 2,"Jumpstyle, Hardstyle"


####  <b style="color: #8B0000;">2.1.2 Web Scraping Artist Information</b>


This code retrieves artist data, including their popularity score, from Spotify Developer Platform. The popularity score is a value between 0 and 100 that reflects an artist's global appeal, based on factors such as streaming activity and audience engagement. This metric provides a quantitative measure of an artist's popularity, which can be used to prioritize performances and optimize scheduling for maximum audience satisfaction.

In [3]:
# Spotify API credentials
# client_id = "8e48f7e04b65411b841111071db3137e"
# client_secret = "511c136e11dd46938a8d8b7581bb6ff9"
# redirect_uri = "http://localhost:8080/callback"
#
# # Define the scope to read artist data
# scope = "user-read-private"
#
# # Authenticate with Spotify
# sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id=client_id,
#                                                client_secret=client_secret,
#                                                redirect_uri=redirect_uri,
#                                                scope=scope))
#
# # Function to get artist popularity
# def get_artist_popularity(artist_name):
#     results = sp.search(q=f"artist:{artist_name}", type="artist", limit=1)
#     if results['artists']['items']:
#         artist = results['artists']['items'][0]
#         return {
#             "name": artist['name'],
#             "popularity": artist['popularity'],
#             "genres": artist['genres']
#         }
#     return None
#
# # Example usage
# a = "Amy Winehouse"
# artist_info = get_artist_popularity(a)
#
# if artist_info:
#     print(f"Artist: {artist_info['name']}")
#     print(f"Popularity: {artist_info['popularity']}")
#     print(f"Genres: {', '.join(artist_info['genres'])}")
# else:
#     print("Artist not found.")

Artist: Amy Winehouse
Popularity: 80
Genres: british soul, neo soul


###  <b style="color: #8B0000;">2.2 Constructing Data Variables</b>

####  <b style="color: #8B0000;">2.2.1 From Line-up Information</b>


In [4]:
# * Load Data
file_path = 'doc/tomorrowland_lineup_2024.csv'
lineup_df = pd.read_csv(file_path)

First, we will put the time in a correct format and create the start, end and duration time of each set:

In [5]:
# * Combine Year, Date, and Timeslot for Start and End Times as well as Duration
def parse_and_adjust_day(row):
    try:
        # Parse Start and End Times
        start_time = pd.to_datetime(
            f"{row['Year']} {row['Date']} {row['Timeslot'].split('-')[0].strip()}",
            format='%Y %d %B %H:%M'
        )
        end_time = pd.to_datetime(
            f"{row['Year']} {row['Date']} {row['Timeslot'].split('-')[1].strip()}",
            format='%Y %d %B %H:%M'
        )

        # Adjust for midnight crossover
        if end_time < start_time:
            end_time += pd.Timedelta(days=1)

        # Adjust Day_Combined_ID if set occurs entirely after midnight
        # Assume the festival day ends at 6 AM
        if start_time.hour >= 0 and start_time.hour < 6:
            adjusted_day = row['Day_Combined_ID'] - 1  # Attribute it to the previous day
        else:
            adjusted_day = row['Day_Combined_ID']  # Keep the current day

        return start_time, end_time, adjusted_day
    except Exception as e:
        print(f"Error processing row {row}: {e}")
        return pd.NaT, pd.NaT, row['Day_Combined_ID']


# Apply parsing logic
lineup_df[['Start Time', 'End Time', 'Adjusted Day']] = lineup_df.apply(
    lambda row: pd.Series(parse_and_adjust_day(row)), axis=1
)

# Step 3: Calculate Duration
lineup_df['Duration (minutes)'] = (lineup_df['End Time'] - lineup_df['Start Time']).dt.total_seconds() / 60

In [6]:
# * Create Unique Day IDs
# Combine 'Day' and 'Weekend' to create a unique identifier for each combination
lineup_df['Day_Combined'] = lineup_df['Day'] + " " + lineup_df['Weekend']
# Generate unique IDs for each combination
lineup_df['Day_Combined_ID'] = lineup_df['Day_Combined'].astype('category').cat.codes + 1

In [7]:
# * Create unique set numbers
lineup_df['Set_ID'] = lineup_df.groupby('Artist').cumcount() + 1

In [8]:
# * Display Results
print("Cleaned Lineup Data:")
lineup_df

Cleaned Lineup Data:


Unnamed: 0,Artist,Stage Name,Host Name,Timeslot,Date,Year,Day,Weekend,Genre(s),Start Time,End Time,Duration (minutes),Day_Combined,Day_Combined_ID,Set_ID
0,DJ Mars,The Gathering,The Gathering hosted by MC Gunner,13:30 - 14:15,18 July,2024,Thursday,Weekend 1,,2024-07-18 13:30:00,2024-07-18 14:15:00,45.0,Thursday Weekend 1,7,1
1,Bosart,The Gathering,The Gathering hosted by MC Gunner,14:15 - 15:00,18 July,2024,Thursday,Weekend 1,,2024-07-18 14:15:00,2024-07-18 15:00:00,45.0,Thursday Weekend 1,7,1
2,Dietro,The Gathering,The Gathering hosted by MC Gunner,15:00 - 15:45,18 July,2024,Thursday,Weekend 1,,2024-07-18 15:00:00,2024-07-18 15:45:00,45.0,Thursday Weekend 1,7,1
3,Voltage,The Gathering,The Gathering hosted by MC Gunner,15:45 - 16:30,18 July,2024,Thursday,Weekend 1,,2024-07-18 15:45:00,2024-07-18 16:30:00,45.0,Thursday Weekend 1,7,1
4,Elfigo,The Gathering,The Gathering hosted by MC Gunner,16:30 - 17:15,18 July,2024,Thursday,Weekend 1,,2024-07-18 16:30:00,2024-07-18 17:15:00,45.0,Thursday Weekend 1,7,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
815,Da Tweekaz,Rose Garden,Tweekamania,17:00 - 18:00,28 July,2024,Sunday,Weekend 2,Hardcore Techno,2024-07-28 17:00:00,2024-07-28 18:00:00,60.0,Sunday Weekend 2,6,2
816,MANDY,Rose Garden,Tweekamania,18:00 - 19:00,28 July,2024,Sunday,Weekend 2,Hard Dance,2024-07-28 18:00:00,2024-07-28 19:00:00,60.0,Sunday Weekend 2,6,2
817,Tweekacore,Rose Garden,Tweekamania,19:00 - 20:00,28 July,2024,Sunday,Weekend 2,,2024-07-28 19:00:00,2024-07-28 20:00:00,60.0,Sunday Weekend 2,6,1
818,Mark with a K & Mc Chucky,Rose Garden,Tweekamania,20:00 - 21:00,28 July,2024,Sunday,Weekend 2,"Jumpstyle, Hardstyle",2024-07-28 20:00:00,2024-07-28 21:00:00,60.0,Sunday Weekend 2,6,1


Then we can use this to create the stage availability for each stage on each day:

In [9]:
# Step 5: Prepare Stage Availability Windows
stage_availability = lineup_df.groupby(['Stage Name', 'Day_Combined_ID']).agg({
    'Start Time': 'min',
    'End Time': 'max'
}).reset_index()

print("\nStage Availability:")
stage_availability


Stage Availability:


Unnamed: 0,Stage Name,Day_Combined_ID,Start Time,End Time
0,Atmosphere,1,2024-07-19 12:00:00,2024-07-20 00:55:00
1,Atmosphere,2,2024-07-26 12:00:00,2024-07-27 00:55:00
2,Atmosphere,3,2024-07-20 00:00:00,2024-07-21 00:00:00
3,Atmosphere,4,2024-07-27 00:00:00,2024-07-28 00:00:00
4,Atmosphere,5,2024-07-21 12:00:00,2024-07-21 23:55:00
...,...,...,...,...
87,Rose Garden,4,2024-07-27 13:00:00,2024-07-28 01:00:00
88,Rose Garden,5,2024-07-21 13:00:00,2024-07-21 23:00:00
89,Rose Garden,6,2024-07-28 13:00:00,2024-07-28 23:00:00
90,The Gathering,7,2024-07-18 13:30:00,2024-07-19 00:00:00


In [10]:
# Create list of artist
A = lineup_df['Artist'].unique().tolist()

####  <b style="color: #8B0000;">2.2.2 From Artist Information</b>


Let's now look into the popularity scores.

In [11]:
# * Get popularity scores of artists
# # ! Limitation: Cannot find popularity score for 10% of artists (mostly DJs with no own numbers); assigned a popularity score of 50
# # Initialize a dictionary to store artist popularity
# p_a = {}
# failed_artists = []  # List to track artists for whom all attempts failed

In [12]:
# Helper function to clean up artist names and extract parts
# def extract_artist_parts(artist_name):
#     """
#     Split artist names into parts, e.g., for 'Artist_1 ft Artist_2 & Friends':
#     Return a list: ['Artist_1', 'Artist_2', 'Friends']
#     """
#     # Use regex to split on 'ft', 'b2b', '&', ',', and parentheses
#     parts = re.split(r'ft|b2b|&|,|\(|\)', artist_name)
#     return [part.strip() for part in parts if part.strip()]  # Remove empty parts

In [13]:
# Loop through each artist to get their popularity
# for a in A:
#     try:
#         # First attempt: Full artist name
#         artist_info = get_artist_popularity(a)
#         if artist_info:
#             p_a[a] = artist_info['popularity']
#             continue  # Move to the next artist if found
#
#         # If not found, split the artist name and try each part
#         artist_parts = extract_artist_parts(a)
#
#         # Sequentially try each artist part
#         found = False
#         for part in artist_parts:
#             artist_info_part = get_artist_popularity(part)
#             if artist_info_part:
#                 p_a[a] = artist_info_part['popularity']
#                 found = True
#                 break  # Stop as soon as one part succeeds
#
#             # Optional: Add a short delay to avoid hitting API rate limits
#             time.sleep(0.1)
#
#         if not found:
#             # If all attempts fail, assign default popularity and record the artist
#             p_a[a] = 50
#             failed_artists.append(a)
#
#     except Exception as e:
#         # Handle API errors or other issues
#         p_a[a] = 0
#         failed_artists.append(a)
#         print(f"Error fetching popularity for {a}: {e}")

In [14]:
# # Print only the failed artists
# print("\nArtists for whom all fallback attempts failed (" + str(len(failed_artists)) + "):")
# print(failed_artists)
#
# # Print the resulting popularity dictionary
# print("\nFinal Artist Popularity (" + str(len(p_a)) + " artists, default popularity of 50 if not found):")
# print(p_a)


Artists for whom all fallback attempts failed (69):
['>rthur Lewis ft. Gerben Tuerlinckx', 'Q DANCE End show', 'Daybreak Session: Oliver Heldens', 'Kamal Bankay', 'Thierry VonderWarth', 'Piaggio Disco Club', 'Stasi Sanlin', 'Milan Evens', 'Schuurrallures', 'Deuxfwa', 'Stephine B', 'Symphony of Unity', 'Juul Pence', 'Daybreak Session: Laidback Luke presents Wayback Luke', 'Naomi Cazier', 'DJ Señor Funk', 'Jerrooo', 'Tobi Trekhaak', 'Hollaback Soundystem', 'Mariah Curry', 'Olsan Twins', 'Discobaar A Moeder', 'Gomorris', 'Lucca van Damme', 'Daybreak Session: One World Radio Soundsystem', 'Frank Mellemans', 'Mad Maxx vs Stryker', 'Delafino B2B DJ Gee', 'Niki August', 'Sako Glitch', 'Medaase', 'Nederhand', 'Lovelee Days', 'Noaffection', 'D. Guaetta', 'Surprise Vinyl only set', 'Lilihell', 'Tous Les Deux', 'DJ Chanelle', 'FlavãSoul Soundsystem', 'Flavour Drop', 'Tom Cosyns', 'JUSTANOTHERDJ', 'Bram Delux', 'PartyShakerz', 'Winterclubbing All Stars', 'Ellli Acula', 'So Blonde Sound System', '

In [15]:
 # # Save the dictionary to a CSV file
# with open("doc/artist_popularity.csv", "w", newline="") as file:
#     writer = csv.writer(file)
#     writer.writerow(["Artist", "Popularity"])  # Add a header row
#     for artist, popularity in p_a.items():
#         writer.writerow([artist, popularity])
#
# print("Artist popularity dictionary saved to 'artist_popularity.csv'")

Artist popularity dictionary saved to 'artist_popularity.csv'


In [16]:
# Read the dictionary from a CSV file
with open("doc/artist_popularity.csv", "r") as file:
    reader = csv.reader(file)
    next(reader)  # Skip the header row
    p_a = {rows[0]: int(rows[1]) for rows in reader}

print("Artist popularity dictionary loaded:")
print(p_a)

Artist popularity dictionary loaded:
{'DJ Mars': 50, 'Bosart': 0, 'Dietro': 19, 'Voltage': 49, 'Elfigo': 27, '>rthur Lewis ft. Gerben Tuerlinckx': 50, 'Nina Black': 9, 'Michael Amani': 24, 'The Oddword': 9, 'Andromedik': 54, 'Mandy': 65, 'Surprise': 72, 'Unsyn': 26, 'Rayzen': 43, 'K1 & Aradia': 11, 'Hyperverb': 44, 'Footworxx Militant Crew': 18, 'Sakyra': 30, 'HYSTA': 44, 'Vandal': 39, 'Billx': 51, 'Reflexx': 37, 'Unicorn on K': 48, 'Juwlz': 0, 'Maxim Lany': 41, 'Rose Ringed': 33, 'Agents Of Time (Live)': 53, 'Mau P': 54, 'Mathame': 53, 'Don Diablo': 68, 'Hardwell': 69, 'Hawkeyes': 0, 'Yamo': 3, 'Dave Lambert': 6, 'Makasi': 15, 'DJs From Mars': 50, 'Nicky Romero': 63, 'Seth Hills': 48, 'Chico Rose': 58, 'Henri PFR': 48, 'Kayu': 64, 'Meera': 37, 'Samm & Ajna': 0, 'DBN Gogo': 52, 'Alan Dixon': 52, 'WhoMadeWho': 59, 'Keinemusik (&ME, Adam Port, Rampa)': 77, 'Q DANCE End show': 50, 'Dolores': 50, 'Reygel': 8, 'Whisnu Santika': 62, 'Kris Kross Amsterdam': 63, 'Trinix': 68, 'B Jones': 54, 'R

####  <b style="color: #8B0000;">2.2.3 From Both Line-up and Artist Information</b>

We calculated normalized popularity scores for both days ($p_d$) and stages ($p_ s$) based on the aggregate popularity of artists performing on each day and stage. The normalization rescales the scores to a range of 0 to 100 for consistency and comparability.

Day Popularity Score $p_d$: $ p_d = \sum_{a \in A_{d}} p_a $ with $A_{d}$ the set of artists performing on day $d$.

Stage Popularity Score $p_s$: $ p_s = \sum_{a \in A_{s}} p_a $ with $A_{s}$ the set of artists performing on stage $s$.

In [17]:
# * Popularity score for days
# Calculate day popularity scores
p_d = lineup_df.groupby('Day_Combined_ID').apply(
    lambda group: group['Artist'].map(p_a).sum()
).to_dict()

# Print day popularity scores
print("\nDay Popularity Scores (p_d):")
print(p_d)

# Normalize day popularity scores
# Avoid division by zero if all values are identical
# Normalize day popularity scores to a range of 0-100
min_pd, max_pd = 0, max(p_d.values())

if max_pd - min_pd == 0:
    p_d_normalized = {day: 100 for day in p_d.keys()}
else:
    p_d_normalized = {
        day: ((score - min_pd) / (max_pd - min_pd)) * 100
        for day, score in p_d.items()
    }

# Print day popularity scores
print("\nNormalized Day Popularity Scores (p_d):")
print(p_d_normalized)

p_d = p_d_normalized


Day Popularity Scores (p_d):
{1: 5396, 2: 5363, 3: 5411, 4: 5399, 5: 5811, 6: 5429, 7: 644, 8: 516}

Normalized Day Popularity Scores (p_d):
{1: 92.85837205300292, 2: 92.29048356565134, 3: 93.11650318361728, 4: 92.90999827912579, 5: 100.0, 6: 93.4262605403545, 7: 11.082429874376183, 8: 8.879710893133712}


  p_d = lineup_df.groupby('Day_Combined_ID').apply(


In [18]:
# * Popularity score for stage
# Calculate stage popularity scores
p_s = lineup_df.groupby('Stage Name').apply(
    lambda group: group['Artist'].map(p_a).sum()
).to_dict()

# Print stage popularity scores
print("\nStage Popularity Scores (p_s):")
print(p_s)

# Normalize stage popularity scores
# Avoid division by zero if all values are identical
# Normalize stage popularity scores to a range of 0-100
min_ps, max_ps = 0, max(p_s.values())

if max_ps - min_ps == 0:
    p_s_normalized = {day: 100 for day in p_s.keys()}
else:
    p_s_normalized = {
        day: ((score - min_ps) / (max_ps - min_ps)) * 100
        for day, score in p_s.items()
    }

# Print day popularity scores
print("\nNormalized Stage Popularity Scores (p_s):")
print(p_s_normalized)

p_s = p_s_normalized


Stage Popularity Scores (p_s):
{'Atmosphere': 2015, 'Cage': 1857, 'Casa Corona': 1504, 'Core': 1647, 'Crystal Garden': 2261, 'Elixir': 1935, 'Freedom': 2102, 'House of Fortune': 2252, 'Library': 4154, 'Mainstage': 3698, 'Moosebar': 746, 'Planaxis': 2165, 'Rave Cave': 1672, 'Rise': 1946, 'Rose Garden': 2855, 'The Gathering': 1160}

Normalized Stage Popularity Scores (p_s):
{'Atmosphere': 48.507462686567166, 'Cage': 44.703899855560906, 'Casa Corona': 36.20606644198363, 'Core': 39.64853153586904, 'Crystal Garden': 54.429465575349056, 'Elixir': 46.58160808858931, 'Freedom': 50.60182956186809, 'House of Fortune': 54.21280693307655, 'Library': 100.0, 'Mainstage': 89.02262879152623, 'Moosebar': 17.958594126143478, 'Planaxis': 52.118440057775636, 'Rave Cave': 40.25036109773712, 'Rise': 46.846413095811265, 'Rose Garden': 68.72893596533461, 'The Gathering': 27.924891670678864}


  p_s = lineup_df.groupby('Stage Name').apply(


###  <b style="color: #8B0000;">2.3 Preparing Data Variables for PuLP</b>

####  <b style="color: #8B0000;">2.3.1 Sets</b>



In [19]:
# * Set of Stages S
S = lineup_df['Stage Name'].unique().tolist()
print(f"Stage if of type {type(S[0])}") # Set of strings
print(S)

Stage if of type <class 'str'>
['The Gathering', 'Cage', 'Freedom', 'House of Fortune', 'Crystal Garden', 'Library', 'Mainstage', 'Elixir', 'Moosebar', 'Rose Garden', 'Rave Cave', 'Rise', 'Core', 'Planaxis', 'Atmosphere', 'Casa Corona']


In [20]:
# * Set of Artists
print(f"Artist is of type {type(A[0])}") # Set of strings
print(A)

Artist is of type <class 'str'>
['DJ Mars', 'Bosart', 'Dietro', 'Voltage', 'Elfigo', '>rthur Lewis ft. Gerben Tuerlinckx', 'Nina Black', 'Michael Amani', 'The Oddword', 'Andromedik', 'Mandy', 'Surprise', 'Unsyn', 'Rayzen', 'K1 & Aradia', 'Hyperverb', 'Footworxx Militant Crew', 'Sakyra', 'HYSTA', 'Vandal', 'Billx', 'Reflexx', 'Unicorn on K', 'Juwlz', 'Maxim Lany', 'Rose Ringed', 'Agents Of Time (Live)', 'Mau P', 'Mathame', 'Don Diablo', 'Hardwell', 'Hawkeyes', 'Yamo', 'Dave Lambert', 'Makasi', 'DJs From Mars', 'Nicky Romero', 'Seth Hills', 'Chico Rose', 'Henri PFR', 'Kayu', 'Meera', 'Samm & Ajna', 'DBN Gogo', 'Alan Dixon', 'WhoMadeWho', 'Keinemusik (&ME, Adam Port, Rampa)', 'Q DANCE End show', 'Dolores', 'Reygel', 'Whisnu Santika', 'Kris Kross Amsterdam', 'Trinix', 'B Jones', 'Regi & Mark With a K', 'Maddix', 'Nervo', 'Sound Rush', 'Ran-D', 'Daybreak Session: Oliver Heldens', 'Miss Monique', 'Öwnboss', 'W&W', 'Vintage Culture', 'Da Tweekaz', 'John Newman', 'Netsky', 'Kölsch', 'Swedish H

In [21]:
# * Set of Sets of an Artist
K_a = lineup_df.groupby('Artist')['Set_ID'].apply(list).to_dict()
# Print type information for the first row only
artist, sets = next(iter(K_a.items()))
print(f"Key (Artist) is of type {type(artist)}") # String
print(f"Value (Set of Set_IDs) is of type {type(sets)}") # List
print(f"Part of Value (Set_ID) is of type {type((sets[0]))}") # Int

print(K_a)

Key (Artist) is of type <class 'str'>
Value (Set of Set_IDs) is of type <class 'list'>
Part of Value (Set_ID) is of type <class 'int'>
{'22heures30': [1], '3 Are Legend': [1], '5napback': [1, 2], '8Kays': [1], '999999999': [1], '>rthur Lewis ft. Gerben Tuerlinckx': [1], 'A Little Sound': [1], 'A Local Hero': [1], 'A-trak b2b The Magician': [1], 'ANNA': [1, 2], 'ARKADYAN': [1], 'ARTEN': [1], 'ATLiens': [1], 'Aaron Hibell': [1], 'Abena': [1], 'Acraze': [1], 'Adam Beyer': [1], 'Adam Sellouk': [1], 'AdamK': [1], 'Admess': [1], 'Adriatique': [1, 2], 'Afrojack': [1, 2], 'Afshin Momadi': [1], 'Agents Of Time': [1], 'Agents Of Time (Live)': [1], 'Airglo': [1], 'Alan Dixon': [1], 'Alesso': [1, 2], 'Alex Wann': [1], 'Alexander Merlin': [1], 'Alexander Popov b2b Re-Twin': [1], 'Alfred Beck': [1], 'Alibi': [1], 'Aline Rocha': [1], 'Alok': [1, 2], 'Aly & Fila': [1], 'Amber Broos': [1, 2], 'Amelie Lens': [1, 2], 'Amotik b2b Answer Code Request': [1], 'Amémé': [1], 'Andrei Stan': [1], 'Andromedik': [

In [22]:
# * Set of Days D
D = lineup_df['Day_Combined_ID'].unique().tolist()
print(f"Day is of type {type(D[0])}") #Int
print(D)

for d in D:
    rows = lineup_df[lineup_df['Day_Combined_ID'] == d]
    # Correctly select the specified columns
    print(rows[['Day_Combined_ID', 'Day', 'Weekend']].iloc[0])

Day is of type <class 'int'>
[7, 1, 3, 5, 8, 2, 4, 6]
Day_Combined_ID            7
Day                 Thursday
Weekend            Weekend 1
Name: 0, dtype: object
Day_Combined_ID            1
Day                   Friday
Weekend            Weekend 1
Name: 15, dtype: object
Day_Combined_ID            3
Day                 Saturday
Weekend            Weekend 1
Name: 149, dtype: object
Day_Combined_ID            5
Day                   Sunday
Weekend            Weekend 1
Name: 276, dtype: object
Day_Combined_ID            8
Day                 Thursday
Weekend            Weekend 2
Name: 408, dtype: object
Day_Combined_ID            2
Day                   Friday
Weekend            Weekend 2
Name: 422, dtype: object
Day_Combined_ID            4
Day                 Saturday
Weekend            Weekend 2
Name: 561, dtype: object
Day_Combined_ID            6
Day                   Sunday
Weekend            Weekend 2
Name: 698, dtype: object


In [23]:
# * Set of Starting Times
# Set of Starting Times
time_slot_granularity = 5  # Change this value as needed
T_ds = {}

for _, row in stage_availability.iterrows():
    stage = row['Stage Name']
    day = row['Day_Combined_ID']

    # Use Start Time and End Time directly (already datetime objects)
    start_time = row['Start Time']
    end_time = row['End Time']

    # Generate time slots
    time_slots = pd.date_range(
        start=start_time,
        end=end_time,
        freq=f'{time_slot_granularity}min'
    ).tolist()

    # Assign to T_ds with (stage, day) tuple key
    T_ds[(stage, day)] = time_slots

# Print type information for the first row only
key, value = next(iter(T_ds.items()))
print(f"Key (Stage, Day) is of type {type(key)}; e.g. {key}") # Tuple
print(f"Part of Key (Stage) is of type {type(key[0])}; e.g. {key[0]}") #String
print(f"Part of Key (Day) is of type {type(key[1])}; e.g. {key[1]}") # Int
print(f"Value (Time Slots) is of type {type(value)}; e.g. {value}") # List
print(f"Part of Value (Time Slot) is of type {type(value[0])}; e.g. {value[0]}") # Timestamp

# Print the resulting T_ds
print("T_ds:")
print(T_ds)

Key (Stage, Day) is of type <class 'tuple'>; e.g. ('Atmosphere', 1)
Part of Key (Stage) is of type <class 'str'>; e.g. Atmosphere
Part of Key (Day) is of type <class 'int'>; e.g. 1
Value (Time Slots) is of type <class 'list'>; e.g. [Timestamp('2024-07-19 12:00:00'), Timestamp('2024-07-19 12:05:00'), Timestamp('2024-07-19 12:10:00'), Timestamp('2024-07-19 12:15:00'), Timestamp('2024-07-19 12:20:00'), Timestamp('2024-07-19 12:25:00'), Timestamp('2024-07-19 12:30:00'), Timestamp('2024-07-19 12:35:00'), Timestamp('2024-07-19 12:40:00'), Timestamp('2024-07-19 12:45:00'), Timestamp('2024-07-19 12:50:00'), Timestamp('2024-07-19 12:55:00'), Timestamp('2024-07-19 13:00:00'), Timestamp('2024-07-19 13:05:00'), Timestamp('2024-07-19 13:10:00'), Timestamp('2024-07-19 13:15:00'), Timestamp('2024-07-19 13:20:00'), Timestamp('2024-07-19 13:25:00'), Timestamp('2024-07-19 13:30:00'), Timestamp('2024-07-19 13:35:00'), Timestamp('2024-07-19 13:40:00'), Timestamp('2024-07-19 13:45:00'), Timestamp('2024-07-

In [24]:
# For Pulp, we can onlu use real number and no timestamps.... lets look at a solution here
def to_minutes_since_midnight(timestamp, d):
    # Get total minutes since midnight
    minutes = timestamp.hour * 60 + timestamp.minute
    # Add 1440 minutes (24 hours) for times past midnight
    rows = lineup_df[lineup_df['Day_Combined_ID'] == d].iloc[0]
    day = rows['Start Time'].day
    if timestamp.day > day:  # Adjust "18" based on the logical festival day
        minutes += 1440
    return minutes


In [25]:
T_ds = {
    key: [to_minutes_since_midnight(slot, key[1]) for slot in slots]
    for key, slots in T_ds.items()
}
# Print the resulting T_ds
print("T_ds:")
print(T_ds)

T_ds:
{('Atmosphere', 1): [720, 725, 730, 735, 740, 745, 750, 755, 760, 765, 770, 775, 780, 785, 790, 795, 800, 805, 810, 815, 820, 825, 830, 835, 840, 845, 850, 855, 860, 865, 870, 875, 880, 885, 890, 895, 900, 905, 910, 915, 920, 925, 930, 935, 940, 945, 950, 955, 960, 965, 970, 975, 980, 985, 990, 995, 1000, 1005, 1010, 1015, 1020, 1025, 1030, 1035, 1040, 1045, 1050, 1055, 1060, 1065, 1070, 1075, 1080, 1085, 1090, 1095, 1100, 1105, 1110, 1115, 1120, 1125, 1130, 1135, 1140, 1145, 1150, 1155, 1160, 1165, 1170, 1175, 1180, 1185, 1190, 1195, 1200, 1205, 1210, 1215, 1220, 1225, 1230, 1235, 1240, 1245, 1250, 1255, 1260, 1265, 1270, 1275, 1280, 1285, 1290, 1295, 1300, 1305, 1310, 1315, 1320, 1325, 1330, 1335, 1340, 1345, 1350, 1355, 1360, 1365, 1370, 1375, 1380, 1385, 1390, 1395, 1400, 1405, 1410, 1415, 1420, 1425, 1430, 1435, 1440, 1445, 1450, 1455, 1460, 1465, 1470, 1475, 1480, 1485, 1490, 1495], ('Atmosphere', 2): [720, 725, 730, 735, 740, 745, 750, 755, 760, 765, 770, 775, 780, 785, 79

####  <b style="color: #8B0000;">2.3.2 Set-specific parameters</b>

In [25]:
# * Bk is a decision variable, not a parameter.

In [26]:
# * Dk is a decision variable, not a parameter.

Parameter (d_k): Day of each set (artist-specific)
{'22heures30': [1], '3 Are Legend': [1], '5napback': [1, 2], '8Kays': [1], '999999999': [1], '>rthur Lewis ft. Gerben Tuerlinckx': [1], 'A Little Sound': [1], 'A Local Hero': [1], 'A-trak b2b The Magician': [1], 'ANNA': [1, 2], 'ARKADYAN': [1], 'ARTEN': [1], 'ATLiens': [1], 'Aaron Hibell': [1], 'Abena': [1], 'Acraze': [1], 'Adam Beyer': [1], 'Adam Sellouk': [1], 'AdamK': [1], 'Admess': [1], 'Adriatique': [1, 2], 'Afrojack': [1, 2], 'Afshin Momadi': [1], 'Agents Of Time': [1], 'Agents Of Time (Live)': [1], 'Airglo': [1], 'Alan Dixon': [1], 'Alesso': [1, 2], 'Alex Wann': [1], 'Alexander Merlin': [1], 'Alexander Popov b2b Re-Twin': [1], 'Alfred Beck': [1], 'Alibi': [1], 'Aline Rocha': [1], 'Alok': [1, 2], 'Aly & Fila': [1], 'Amber Broos': [1, 2], 'Amelie Lens': [1, 2], 'Amotik b2b Answer Code Request': [1], 'Amémé': [1], 'Andrei Stan': [1], 'Andromedik': [1, 2, 3], 'Angemi': [1], 'Ann Clue': [1], 'Annabel Stop It': [1], 'Anonymize': [1], 

In [27]:
# * ek Duration of set k of artist a
# Nested dictionary for Duration of set (e_k)
e_k = lineup_df.groupby('Artist').apply(
    lambda group: group.set_index('Set_ID')['Duration (minutes)'].to_dict()
).to_dict()
print("Parameter (e_k): Duration of each set in minutes (artist-specific)")
print(e_k)

Parameter (e_k): Duration of each set in minutes (artist-specific)
{'22heures30': {1: 60.0}, '3 Are Legend': {1: 60.0}, '5napback': {1: 45.0, 2: 60.0}, '8Kays': {1: 90.0}, '999999999': {1: 85.0}, '>rthur Lewis ft. Gerben Tuerlinckx': {1: 45.0}, 'A Little Sound': {1: 60.0}, 'A Local Hero': {1: 120.0}, 'A-trak b2b The Magician': {1: 60.0}, 'ANNA': {1: 60.0, 2: 120.0}, 'ARKADYAN': {1: 90.0}, 'ARTEN': {1: 60.0}, 'ATLiens': {1: 60.0}, 'Aaron Hibell': {1: 90.0}, 'Abena': {1: 60.0}, 'Acraze': {1: 60.0}, 'Adam Beyer': {1: 115.0}, 'Adam Sellouk': {1: 90.0}, 'AdamK': {1: 60.0}, 'Admess': {1: 30.0}, 'Adriatique': {1: 60.0, 2: 120.0}, 'Afrojack': {1: 60.0, 2: 60.0}, 'Afshin Momadi': {1: 90.0}, 'Agents Of Time': {1: 60.0}, 'Agents Of Time (Live)': {1: 90.0}, 'Airglo': {1: 60.0}, 'Alan Dixon': {1: 90.0}, 'Alesso': {1: 60.0, 2: 60.0}, 'Alex Wann': {1: 90.0}, 'Alexander Merlin': {1: 60.0}, 'Alexander Popov b2b Re-Twin': {1: 90.0}, 'Alfred Beck': {1: 60.0}, 'Alibi': {1: 60.0}, 'Aline Rocha': {1: 90.0},

  e_k = lineup_df.groupby('Artist').apply(


####  <b style="color: #8B0000;">2.3.3 Stage-specific parameters</b>

In [28]:
# * t open s, d
# Update t_open to use string days
t_open = {
    (row['Stage Name'], row['Day_Combined_ID']): row['Start Time']
    for _, row in stage_availability.iterrows()
}

# Print results
print("\n Parameter t_open_sd: Opening time of each stage on each day")
print(t_open)


 Parameter t_open_sd: Opening time of each stage on each day
{('Atmosphere', 1): Timestamp('2024-07-19 12:00:00'), ('Atmosphere', 2): Timestamp('2024-07-26 12:00:00'), ('Atmosphere', 3): Timestamp('2024-07-20 00:00:00'), ('Atmosphere', 4): Timestamp('2024-07-27 00:00:00'), ('Atmosphere', 5): Timestamp('2024-07-21 12:00:00'), ('Atmosphere', 6): Timestamp('2024-07-28 12:00:00'), ('Cage', 1): Timestamp('2024-07-19 12:00:00'), ('Cage', 2): Timestamp('2024-07-26 12:00:00'), ('Cage', 3): Timestamp('2024-07-20 12:00:00'), ('Cage', 4): Timestamp('2024-07-27 12:00:00'), ('Cage', 5): Timestamp('2024-07-21 13:00:00'), ('Cage', 6): Timestamp('2024-07-28 13:00:00'), ('Casa Corona', 1): Timestamp('2024-07-19 12:00:00'), ('Casa Corona', 2): Timestamp('2024-07-26 12:00:00'), ('Casa Corona', 3): Timestamp('2024-07-20 12:00:00'), ('Casa Corona', 4): Timestamp('2024-07-27 12:00:00'), ('Casa Corona', 5): Timestamp('2024-07-21 12:30:00'), ('Casa Corona', 6): Timestamp('2024-07-28 12:00:00'), ('Core', 1): 

In [29]:
# Convert to minutes, since pulp doesnt work with timestamps
t_open = {key: to_minutes_since_midnight(value, key[1]) for key, value in t_open.items()}
# Print the resulting T_ds
print("t_open:")
print(t_open)

t_open:
{('Atmosphere', 1): 720, ('Atmosphere', 2): 720, ('Atmosphere', 3): 0, ('Atmosphere', 4): 0, ('Atmosphere', 5): 720, ('Atmosphere', 6): 720, ('Cage', 1): 720, ('Cage', 2): 720, ('Cage', 3): 720, ('Cage', 4): 720, ('Cage', 5): 780, ('Cage', 6): 780, ('Casa Corona', 1): 720, ('Casa Corona', 2): 720, ('Casa Corona', 3): 720, ('Casa Corona', 4): 720, ('Casa Corona', 5): 750, ('Casa Corona', 6): 720, ('Core', 1): 750, ('Core', 2): 720, ('Core', 3): 0, ('Core', 4): 750, ('Core', 5): 780, ('Core', 6): 750, ('Crystal Garden', 1): 720, ('Crystal Garden', 2): 720, ('Crystal Garden', 3): 720, ('Crystal Garden', 4): 720, ('Crystal Garden', 5): 720, ('Crystal Garden', 6): 720, ('Elixir', 1): 780, ('Elixir', 2): 0, ('Elixir', 3): 780, ('Elixir', 4): 0, ('Elixir', 5): 780, ('Elixir', 6): 780, ('Freedom', 1): 720, ('Freedom', 2): 720, ('Freedom', 3): 720, ('Freedom', 4): 720, ('Freedom', 5): 720, ('Freedom', 6): 720, ('House of Fortune', 1): 780, ('House of Fortune', 2): 780, ('House of Fortun

In [30]:
# * t close s, d
# Create t_close as a flat dictionary
t_close = {
    (row['Stage Name'], row['Day_Combined_ID']): row['End Time']
    for _, row in stage_availability.iterrows()
}

# Print results
print("\n Parameter t_close_sd: Closing time of each stage on each day")
print(t_close)


 Parameter t_close_sd: Closing time of each stage on each day
{('Atmosphere', 1): Timestamp('2024-07-20 00:55:00'), ('Atmosphere', 2): Timestamp('2024-07-27 00:55:00'), ('Atmosphere', 3): Timestamp('2024-07-21 00:00:00'), ('Atmosphere', 4): Timestamp('2024-07-28 00:00:00'), ('Atmosphere', 5): Timestamp('2024-07-21 23:55:00'), ('Atmosphere', 6): Timestamp('2024-07-28 23:55:00'), ('Cage', 1): Timestamp('2024-07-19 23:00:00'), ('Cage', 2): Timestamp('2024-07-26 23:00:00'), ('Cage', 3): Timestamp('2024-07-20 23:00:00'), ('Cage', 4): Timestamp('2024-07-27 23:00:00'), ('Cage', 5): Timestamp('2024-07-21 23:00:00'), ('Cage', 6): Timestamp('2024-07-28 23:00:00'), ('Casa Corona', 1): Timestamp('2024-07-20 00:00:00'), ('Casa Corona', 2): Timestamp('2024-07-27 00:00:00'), ('Casa Corona', 3): Timestamp('2024-07-21 00:00:00'), ('Casa Corona', 4): Timestamp('2024-07-28 00:00:00'), ('Casa Corona', 5): Timestamp('2024-07-21 23:00:00'), ('Casa Corona', 6): Timestamp('2024-07-28 23:00:00'), ('Core', 1):

In [31]:
# Convert to minutes, since pulp doesn't work with timestamps
t_close = {key: to_minutes_since_midnight(value, key[1]) for key, value in t_close.items()}
# Print the resulting T_ds
print("t_close:")
print(t_close)

t_close:
{('Atmosphere', 1): 1495, ('Atmosphere', 2): 1495, ('Atmosphere', 3): 1440, ('Atmosphere', 4): 1440, ('Atmosphere', 5): 1435, ('Atmosphere', 6): 1435, ('Cage', 1): 1380, ('Cage', 2): 1380, ('Cage', 3): 1380, ('Cage', 4): 1380, ('Cage', 5): 1380, ('Cage', 6): 1380, ('Casa Corona', 1): 1440, ('Casa Corona', 2): 1440, ('Casa Corona', 3): 1440, ('Casa Corona', 4): 1440, ('Casa Corona', 5): 1380, ('Casa Corona', 6): 1380, ('Core', 1): 1500, ('Core', 2): 1500, ('Core', 3): 1440, ('Core', 4): 1500, ('Core', 5): 1440, ('Core', 6): 1440, ('Crystal Garden', 1): 1470, ('Crystal Garden', 2): 1470, ('Crystal Garden', 3): 1470, ('Crystal Garden', 4): 1470, ('Crystal Garden', 5): 1410, ('Crystal Garden', 6): 1410, ('Elixir', 1): 1500, ('Elixir', 2): 1440, ('Elixir', 3): 1500, ('Elixir', 4): 1440, ('Elixir', 5): 1440, ('Elixir', 6): 1440, ('Freedom', 1): 1470, ('Freedom', 2): 1470, ('Freedom', 3): 1440, ('Freedom', 4): 1470, ('Freedom', 5): 1410, ('Freedom', 6): 1410, ('House of Fortune', 1):

In [32]:
# * c
c = 45  # Clean-up time in minutes. Constant for all stages.
print(f"\nClean-up Time (c): {c} minutes")


Clean-up Time (c): 45 minutes


####  <b style="color: #8B0000;">2.3.4 Artist-specific parameters</b>

In [33]:
# * r
r = 120  # Default rest time for all artists
print(f"\nRest Time (r): {r} minutes")


Rest Time (r): 120 minutes


In [34]:
# * max_sets
max_sets = 2  # Default for all artists
print(f"\nMaximum Sets Per Day (max_sets): {max_sets}")


Maximum Sets Per Day (max_sets): 2


####  <b style="color: #8B0000;">2.3.5 Popularity parameters</b>

In [35]:
# * Popularity score for artists
print("Parameter p_a: Popularity score for each artist")
print(p_a)

Parameter p_a: Popularity score for each artist
{'DJ Mars': 50, 'Bosart': 0, 'Dietro': 19, 'Voltage': 49, 'Elfigo': 27, '>rthur Lewis ft. Gerben Tuerlinckx': 50, 'Nina Black': 9, 'Michael Amani': 24, 'The Oddword': 9, 'Andromedik': 54, 'Mandy': 65, 'Surprise': 72, 'Unsyn': 26, 'Rayzen': 43, 'K1 & Aradia': 11, 'Hyperverb': 44, 'Footworxx Militant Crew': 18, 'Sakyra': 30, 'HYSTA': 44, 'Vandal': 39, 'Billx': 51, 'Reflexx': 37, 'Unicorn on K': 48, 'Juwlz': 0, 'Maxim Lany': 41, 'Rose Ringed': 33, 'Agents Of Time (Live)': 53, 'Mau P': 54, 'Mathame': 53, 'Don Diablo': 68, 'Hardwell': 69, 'Hawkeyes': 0, 'Yamo': 3, 'Dave Lambert': 6, 'Makasi': 15, 'DJs From Mars': 50, 'Nicky Romero': 63, 'Seth Hills': 48, 'Chico Rose': 58, 'Henri PFR': 48, 'Kayu': 64, 'Meera': 37, 'Samm & Ajna': 0, 'DBN Gogo': 52, 'Alan Dixon': 52, 'WhoMadeWho': 59, 'Keinemusik (&ME, Adam Port, Rampa)': 77, 'Q DANCE End show': 50, 'Dolores': 50, 'Reygel': 8, 'Whisnu Santika': 62, 'Kris Kross Amsterdam': 63, 'Trinix': 68, 'B Jon

In [36]:
# * Popularity score for stages
print("Parameter p_s: Popularity score for each stage")
print(p_s)

Parameter p_s: Popularity score for each stage
{'Atmosphere': 48.507462686567166, 'Cage': 44.703899855560906, 'Casa Corona': 36.20606644198363, 'Core': 39.64853153586904, 'Crystal Garden': 54.429465575349056, 'Elixir': 46.58160808858931, 'Freedom': 50.60182956186809, 'House of Fortune': 54.21280693307655, 'Library': 100.0, 'Mainstage': 89.02262879152623, 'Moosebar': 17.958594126143478, 'Planaxis': 52.118440057775636, 'Rave Cave': 40.25036109773712, 'Rise': 46.846413095811265, 'Rose Garden': 68.72893596533461, 'The Gathering': 27.924891670678864}


In [37]:
# * Popularity score for days
print("Parameter p_d: Popularity score for each day")
print(p_d)

Parameter p_d: Popularity score for each day
{1: 92.85837205300292, 2: 92.29048356565134, 3: 93.11650318361728, 4: 92.90999827912579, 5: 100.0, 6: 93.4262605403545, 7: 11.082429874376183, 8: 8.879710893133712}


-------

## 3. Model

Let's first define our **decision variable**. This cell creates binary decision variables for the optimization model, where each variable represents whether a specific artist performs on a particular stage, on a specific day, and at a specific time slot. The `LpVariable.dicts` function generates these variables in the form of a dictionary, indexed by the 5-tuple `(a, k, s, d, t)`. Here:
- `a` belongs to the set of artists, `A`, which lists all artists scheduled to perform.
- `k` is belongs to set of sets of an artist, `K_a` is the set of sets (performances) that an artist a can perform. It accounts for multiple sets that each artist may have.
- `s` belongs to the set of stages, `S`, where performances take place. `d` belongs to the set of days, `D`, representing the festival days. To ensure feasibility, the code iterates over only the valid stage-day pairs `(s, d)` from `T_ds.keys()`. This is because not all stages are open on all days.
- `t` is taken from the set of valid time slots, `T_ds[(s, d)]`, which maps stage-day combinations `(s, d)` to their available starting times for performances.

In [38]:
# * Decision variable
x = pulp.LpVariable.dicts(
    "x",
    (
        (a, k, s, d, t)
        for a in A
        for k in K_a[a]
        for (s, d) in T_ds.keys()
        for t in T_ds[(s, d)]
    ),
    cat="Binary",
)

The resulting dictionary x contains binary decision variables, constrained to take values of 0 or 1. A value of 1 indicates that the artist `a` performs set `k` on stage `s`, on day `d`, starting at time `t`. For example, `x[('Artist1', 1, 'Atmosphere', 2, '12:00')]` represents whether "Artist1" performs his first set on the "Atmosphere" stage on day with id 2 starting at 12:00.

In [39]:
print(len(x)) # There are 11 119 200 scheduling possibilities :o
x

11119200


{('DJ Mars', 1, 'Atmosphere', 1, 720): x_('DJ_Mars',_1,_'Atmosphere',_1,_720),
 ('DJ Mars', 1, 'Atmosphere', 1, 725): x_('DJ_Mars',_1,_'Atmosphere',_1,_725),
 ('DJ Mars', 1, 'Atmosphere', 1, 730): x_('DJ_Mars',_1,_'Atmosphere',_1,_730),
 ('DJ Mars', 1, 'Atmosphere', 1, 735): x_('DJ_Mars',_1,_'Atmosphere',_1,_735),
 ('DJ Mars', 1, 'Atmosphere', 1, 740): x_('DJ_Mars',_1,_'Atmosphere',_1,_740),
 ('DJ Mars', 1, 'Atmosphere', 1, 745): x_('DJ_Mars',_1,_'Atmosphere',_1,_745),
 ('DJ Mars', 1, 'Atmosphere', 1, 750): x_('DJ_Mars',_1,_'Atmosphere',_1,_750),
 ('DJ Mars', 1, 'Atmosphere', 1, 755): x_('DJ_Mars',_1,_'Atmosphere',_1,_755),
 ('DJ Mars', 1, 'Atmosphere', 1, 760): x_('DJ_Mars',_1,_'Atmosphere',_1,_760),
 ('DJ Mars', 1, 'Atmosphere', 1, 765): x_('DJ_Mars',_1,_'Atmosphere',_1,_765),
 ('DJ Mars', 1, 'Atmosphere', 1, 770): x_('DJ_Mars',_1,_'Atmosphere',_1,_770),
 ('DJ Mars', 1, 'Atmosphere', 1, 775): x_('DJ_Mars',_1,_'Atmosphere',_1,_775),
 ('DJ Mars', 1, 'Atmosphere', 1, 780): x_('DJ_Mars',

Secondly, our **decision function** that maximizes statisfaction is defined. This code defines the optimization problem "Tomorrowland_Scheduling" as a linear programming model using the PuLP library, where the goal is to maximize satisfaction.
- The objective function sums the weighted satisfaction scores for all possible combinations of artists, their sets, stages, days, and time slots. Specifically, the satisfaction score for each combination is calculated as the sum of three components: `p_a[a]` (artist-specific popularity), `p_s[s]` (stage-specific popularity), and `p_d[d]` (day-specific popularity). These values are then multiplied by the binary decision variable `x[(a, k, s, d, t)]`, which indicates whether artist a performs set k on stage s, on day d, at time t.
- The use of lpSum ensures that all valid combinations of indices `(a, k, s, d, t)` — as defined by the sets `A` (artists), `K_a[a]` (sets per artist), `T_ds.keys()` (valid stage-day pairs), and `T_ds[(s, d)]` (time slots for each stage-day pair) — are included in the summation.
- Finally, the problem is labeled "Maximize_Satisfaction" to clarify its objective within the model.

In [40]:
# * Decision Function
# Define the problem
problem = pulp.LpProblem("Tomorrowland_Scheduling", pulp.LpMaximize)

# Objective function
# TODO Weights?
# TODO Popularity of timeslots?
problem += pulp.lpSum(
    [
        (p_a[a] + p_s[s] + p_d[d]) * x[(a, k, s, d, t)]
        for a in A
        for k in K_a[a]
        for (s, d) in T_ds.keys()
        for t in T_ds[(s, d)]
    ]
), "Maximize_Satisfaction"
print("Finished defining the decision function.")

Finished defining the decision function.


Lastly, we define the constraints.

In [None]:
# * Define b_k per artist
b = { (a, k): pulp.LpVariable(f"b_{a}_{k}", cat="Continuous") for a in A for k in K_a[a] }
# Ensure b[a,k] aligns with the time slot t where x is 1 for the artist a and their set k.
for a in A:
    for k in K_a[a]:
        problem += (
            b[a, k] == pulp.lpSum(t * x[(a, k, s, d, t)] for s, d in T_ds.keys() for t in T_ds[(s, d)]),
            f"Link_bk_to_x_{a}_{k}"
        )

In [None]:
# * Constraint: Each set is scheduled exactly one
for a in A:
    for k in K_a[a]:
        problem += (
            pulp.lpSum(x[(a, k, s, d, t)] for s, d in T_ds.keys() for t in T_ds[(s, d)]) == 1,
            f"Unique_Assignment_{a}_{k}"
        )

In [None]:
# * Constraint: Each set is scheduled within stage availability
for a in A:
    for k in K_a[a]:
        for (s, d) in T_ds.keys():
            problem += (
                b[a, k] >= t_open[(s, d)],
                f"Stage_Open_Time_{a}_{k}_{s}_{d}"
            )
            problem += (
                b[a, k] + e_k[k] <= t_close[(s, d)],
                f"Stage_Close_Time_{a}_{k}_{s}_{d}"
            )

The big-M approach introduces a binary decision variable $y$ for each pair $(k,k′)$. This binary variable $y$ ensures that one of the two conditions holds:
- If $y=1$, enforce Condition 1 ($k$ finishes before $k'$ starts). Condition 1: $b[a,k] + e[k] + c <= b[a', k'] + M * (1-y)$
    - If $y=1$, the term $M * (1-y)$ becomes 0, and the constraint ensures that $k$ finishes before $k'$ starts.
    - If $y=0$, the first constraint becomes non-binding because the constraint $b[a,k] + e[k] + c <= b[a', k'] + M * (1-y)$ always holds for a sufficiently large M.
- If $y=0$, enforce Condition 2 ($k'$ finishes before $k$ starts). Condition 2: $b[a',k'] + e[k'] + c <= b[a, k] + M * (1-y)$
    - If $y=1$, the second constraint becomes non-binding because the constraint  $b[a',k'] + e[k'] + c <= b[a, k] + M * (1-y)$ always holds for a sufficiently large M.
    - If $y=0$,  the term $M * (1-y)$ becomes 0, and the constraint ensures that $k'$ finishes before $k$ starts.



In [41]:
# * Constraint: No 2 sets on the same stage are scheduled at the same time + clean up time
M = 2000
for (s, d) in T_ds.keys():
    for a in A:
        for k in K_a[a]:
            for a_prime in A:
                for k_prime in K_a[a_prime]:
                    if (a, k) != (a_prime, k_prime):
                        # Binary variable for either-or condition
                        y = pulp.LpVariable(f"y_{a}_{k}_{a_prime}_{k_prime}_{s}_{d}", cat="Binary")

                        # Condition 1: k finishes before k accent
                        problem += (
                            b[a, k] + e_k[k] + c <= b[a_prime, k_prime] + M * (1 - y),
                            f"NonOverlap_Condition1_{a}_{k}_{a_prime}_{k_prime}_{s}_{d}"
                        )

                        # Condition 2: k accent finishes before k
                        problem += (
                            b[a_prime, k_prime] + e_k[k_prime] + c <= b[a, k] + M * y,
                            f"NonOverlap_Condition2_{a}_{k}_{a_prime}_{k_prime}_{s}_{d}"
                        )

Starting defining the constraints.


In [None]:
print("hey")

-------


## 4. Results

In [None]:
# Solve the problem
# solver = pulp.PULP_CBC_CMD()
# problem.solve(solver)

In [None]:
# Output the results
# print("Status:", pulp.LpStatus[problem.status])
# for v in problem.variables():
#     if v.varValue > 0:
#         print(v.name, "=", v.varValue)
#
# print("Objective Value:", pulp.value(problem.objective))