# *Music Scheduling using Evolutionary Algorithms*
This notebook implements a music scheduling algorithm for generating playlists which have an exact length (e.g. 60 minutes) while respecting constraints, i.e. specific categories of playlist elements. This could be used to create playlists for radio stations which need to backtime to a specific point of time.

## General
For using the algorithm, different data is required.
- playlistelements.csv
- playliststructure.csv

### playlistelements.csv
The playlistelements.csv contains information about elements which can be placed into the playlist. More specifically, this is the length of the element in seconds, the category of the element and a unique identifier.

### playliststructure.csv
The playliststructure.csv maintains information about the desired playlist structure, i.e. which element has to be played at a specific point of time.

# _Data Generation_

## Retrieve playlistelements.csv
For now, we are going to retrieve a PlaylistElements from the [CORGIS Dataset Project](https://think.cs.vt.edu/corgis/csv/music/music.html). We download the original file and create a new csv file which contains only the necessary data.

### Download File

In [1]:
# Imports
import urllib
import urllib.request

# Download file
urllib.request.urlretrieve('https://think.cs.vt.edu/corgis/csv/music/music.csv?forcedownload=1', 'playlistelementstmp.csv')

('playlistelementstmp.csv', <http.client.HTTPMessage at 0x1c3a9039588>)

### Clean File / Create cleaned File

In [2]:
# Imports
import csv

# Genre list. later used for playliststructure.csv
genres = []

def getgenre(fieldname):
    if len(fieldname) > 0 and ' ' in fieldname:
            return fieldname.split(' ')[0]
    return fieldname

fieldnames = ['artist', 'song', 'duration', 'genre']
with open('playlistelementstmp.csv', newline='') as readcsv, open('playlistelements.csv', 'w', newline='') as writecsv:
        csvreader = csv.DictReader(readcsv, delimiter=',')
        csvwriter = csv.DictWriter(writecsv, delimiter=',', fieldnames=fieldnames)
        csvwriter.writeheader()
        for i, row in enumerate(csvreader):
            genre = getgenre(row['artist_mbtags'])
            csvwriter.writerow({'artist': row['artist.name'],
                                'song': row['title'],
                                'duration': round(float(row['duration'])),
                                'genre': genre})
            if len(genre) > 0:
                genres.append(genre)
        print('Tranformed and wrote', i, 'rows and found', len(set(genres)), 'genres')

Tranformed and wrote 9999 rows and found 241 genres


## Create a playliststructure.csv

In [3]:
# Imports
import random

# Set seed for reproducability
random.seed(1)

# Number of elements
NO_ELEMENTS = 20
# Probability of 'any' element (== Any Element can be played)
PROB_ANY = 0.45

fieldnames = ['element_categorie']
with open('playliststructure.csv', 'w', newline='') as writecsv:
    csvwriter = csv.DictWriter(writecsv, delimiter=',', fieldnames=fieldnames)
    csvwriter.writeheader()
    for i in range(NO_ELEMENTS):
        if random.random() <= PROB_ANY:
            element_categorie = 'ANY'
        else:
            element_categorie = genres[random.randrange(0, len(genres))]
        csvwriter.writerow({'element_categorie': element_categorie})

# _Playlist Generation_

Now this is where things get interesting: We are gonna create a playlist with evolutionary algorithms. Therefore we first load the elements and the playliststructure.

## Read CSV Files

In [4]:
# Imports

import csv
import random

playlist_structure = []
playlist_elements = []
playlist_categories = dict()

with open('playliststructure.csv', 'r', newline='') as readstructure:
    structurereader = csv.DictReader(readstructure, delimiter=',')
    for r in structurereader:
        playlist_structure.append(r['element_categorie'])
        
with open('playlistelements.csv', 'r', newline='') as readelements:
    elementsreader = csv.DictReader(readelements, delimiter=',')
    for i, r in enumerate(elementsreader):
        if 'genre' in r:
            genrelist = playlist_categories.get(r['genre'], [])
            genrelist.append(i)
            playlist_categories[r['genre']] = genrelist
        playlist_elements.append(r)
print(playlist_structure)

['ANY', 'soul', 'ANY', 'ANY', 'american', 'rock', 'ANY', 'british', 'american', 'uk', 'ANY', 'ANY', 'alternative', 'hip', 'classic', 'ANY', 'soul', 'hip', 'jazz', 'dancehall']


## Define Evolutionary Algorithms
**General**
This is subset sum problem, since we want to find a subset of the songs which are - in sum - exactly 60 Minutes long. Following wikipedia, this problem is [np-complete](https://en.wikipedia.org/wiki/Subset_sum_problem).

### Generate Helping Functions


In [5]:
# Permutation changes one item of the solution to another one
def mutation(solution, probability = 0.05):
    for i in range(len(solution)):
        if random.random() <= probability:
            # Do permutation
            while True:
                if playlist_structure[i] == 'ANY':
                    v = random.randrange(0, len(playlist_elements))
                else:
                    glist = playlist_categories[playlist_structure[i]]
                    v = glist[random.randrange(0, len(glist))]
                if not(v in solution):
                    break
            solution[i] = v
    return solution

def crossover(solution0, solution1):
    # This crossover works like this
    # We take the first half of solution0
    # and then we try to add elements of solution1.
    # If we are not able to add an element of solution1, i.e.
    # because solution1 does not contain an element which is ok
    # with the categorie-restrictions, we instead choose another element
    # of the first solution0, so we do not violate the restrictions.
    cutpoint = len(solution0) // 2
    child = solution0[0:cutpoint]
    changed = True
    while len(child) != len(solution0):
        if not(changed):
            child.append(solution0[len(child)])
            continue
        changed = False
        for x in solution1:
            if x in child:
                continue
            # Get required type
            req_type = playlist_structure[len(child)]
            if req_type == 'ANY':
                changed = True
                child.append(x)
            elif x in playlist_categories[req_type]:
                changed = True
                child.append(x)
    return child

def generatechild():
    solution = []
    for i in range(len(playlist_structure)):
        while True:
            if playlist_structure[i] == 'ANY':
                v = random.randrange(0, len(playlist_elements))
            else:
                glist = playlist_categories[playlist_structure[i]]
                v = glist[random.randrange(0, len(glist))]
            if not(v in solution):
                break
        solution.append(v)
    return solution

def evaluate(solution, maxlength):
    playlist_length = [int(x['duration']) for x in [playlist_elements[p] for p in solution]]
    return abs(maxlength-sum(playlist_length))

### EA

In [6]:
MAX_SUM = 60 * 60 #60 Seconds * 60 Minutes = 3600 Seconds
GEN_SIZE = 100 # Amount of child
MAX_EVAL = 100 # Maximum Evaluations

#imports
import time

t0 = time.time()

solutions = [generatechild() for i in range(GEN_SIZE)]
solution_evaluation = [evaluate(x, MAX_SUM) for x in solutions]
sorted_indices = sorted(range(len(solution_evaluation)),key=lambda x:solution_evaluation[x])

for iteration in range(MAX_EVAL):
    for i in range(GEN_SIZE):        #For now: Just take parents from the best 10%. --> Maybe change later!
        #This differs strongly from general EAs, but for this task it is suitable.
        parent1 = solutions[random.randrange(0, int(len(solutions) * .1))]
        parent2 = solutions[random.randrange(0, int(len(solutions) * .1))]
        co_child = crossover(parent1[:], parent2[:])
        co_child = mutation(co_child, 0.1)
        solutions.append(co_child)

    solution_evaluation = [evaluate(x, MAX_SUM) for x in solutions]
    sorted_indices = sorted(range(len(solution_evaluation)),key=lambda x:solution_evaluation[x])
    #if iteration % 10 == 0:
    print('Generation', iteration, '-', 'Best Value -', solution_evaluation[sorted_indices[0]])
    solutions = [solutions[x] for x in sorted_indices[0:GEN_SIZE]]
    if solution_evaluation[sorted_indices[0]] == 0:
        break
t1 = time.time()
print()
#print('Best Solution:', solutions[0])
print('Best Solution:')
for i,x in enumerate(solutions[0]):
    print((i+1), '--', playlist_elements[x]['artist'], '--', playlist_elements[x]['song'] )
print('Playlist-Length:', sum(int(playlist_elements[x]['duration']) for x in solutions[0]), 'seconds.')
print('Generation-Duration:', t1-t0, 'seconds.')


Generation 0 - Best Value - 213
Generation 1 - Best Value - 9
Generation 2 - Best Value - 9
Generation 3 - Best Value - 1
Generation 4 - Best Value - 0

Best Solution:
1 -- The Game feat. Jim Jones Camron & Bezell -- Certified Gangstas (featuring Jim Jones Camron & Bezell)
2 -- Redman -- Jam 4 U
3 -- Kim English -- Love That Jazz (Basements Boys Album Vers)
4 -- Sal Mineo -- Start Moving
5 -- Marty Robbins -- DEU470956112
6 -- MC5 -- Intro 2/ Kick Out The Jams (LP Version)
7 -- Carl Smith -- Loose Talk
8 -- David Bowie -- Baal's Hymn
9 -- Minnie Riperton -- Completeness
10 -- The Clash -- I'm So Bored With The U.S.A.
11 -- Nana Caymmi -- Palavras
12 -- Boyz II Men -- In My Life
13 -- Phantom Planet -- The Meantime
14 -- Dead Prez -- Be Healthy
15 -- Jackie Wilson -- One Moment With You
16 -- React -- Fabrication
17 -- Ayo -- Watching You
18 -- Toni Braxton -- Let Me Show You The Way (Out)
19 -- Stevie Ray Vaughan And Double Trouble -- Mary Had A Little Lamb
20 -- Capleton -- Raggy Road