# A program to generate statistics on Absolute Radio

The program monitors the weekday show (9am - 6pm) on absolute radio and will:

1. 2. generate statistics on the number of songs played
3. (in future) keep a tally of genres so that we can make some interesting plots in plotly

There are a few things that must be implemented

1. Getting data to plotly
2. Running program only on specific days of the week
3. Genres

In [71]:
'''
Created on 15 Jan 2017

@author: afunn
'''
print("Absolute Radio - Did they repeat?")

Absolute Radio - Did they repeat?


### Import the required libraries

In [72]:
from lxml import html  # for scraping web information
import requests
from threading import Timer  # for timing
from collections import Counter  # for counting list
import time
import plotly
from datetime import datetime
import pandas as pd # for dataframes used for plotly
from random import randint # for random numbers for testing
plotly.tools.set_credentials_file(username='benisme', api_key='clBZNegtB5pqHJZXRnnP')
plotly.__version__

'2.0.8'

### Definitions
#### Create the global variables that are used throughout the program
It is generally not best practice to do this, but because we are using the timer, it is quite a difficult problem to get around and this is an easy solution

#### Other Definitions
We use a number of lists to keep track of what's happening with the songs and counts

In [73]:
# Global variables (Very bad to do this :( but its easy)
global CurrentRepeatList
global CurrentRepeatListLength
CurrentRepeatList = []
CurrentRepeatListLength = 0

### Here we write the code to strip the data from the absolute radio webpage

This is needed at the start of the program so that the starting state of the program is defined.
We place this into three arrays, *artists*, *songs* and *times*

In [74]:
# Get the data from absolute radio

page = requests.get('https://absoluteradio.co.uk/absolute-radio/music/')
tree = html.fromstring(page.content)

# This will create a list of artists
artists = tree.xpath('//p[@class="song-artist"]/text()')
# This will create a list of songs
songs = tree.xpath('//p[@class="song-title"]/a/text()')
# This will create a list of the times of the songs
times = tree.xpath('//div[@class="song-inner"]/time/text()')

We then print these to the console, just so that we know things are working

In [75]:
print(artists)
print(songs)
print(times)

['Soundgarden', 'Starsailor', 'Tears For Fears', 'Cast', 'The Killers', 'Smash Mouth', 'Keane', 'The Cars', 'Toto', 'Puddle Of Mudd', 'Shed Seven', 'The Pretenders', 'George Ezra']
['Black Hole Sun', 'Goodsouls', 'Everybody Wants To Rule The World', 'Beat Mama', 'Human', 'All Star', 'This Is The Last Time', 'Just What I Needed', 'Africa', 'Blurry', 'Going For Gold', 'Stop Your Sobbing', 'Budapest']
['9.27pm', '9.23pm', '9.16pm', '9.16pm', '9.12pm', '9.08pm', '9.05pm', '8.59pm', '8.54pm', '8.48pm', '8.43pm', '8.41pm', '8.36pm']


Here we will change the times in the *ArtistsSongsTimesCount* list to standard format

In [76]:
 
print(times) # Troubleshooting code

def timeconversion_and_committ(times_list):
    x = 0
    for item in times_list:
        time_string = item
        #plus_string = '2009-11-29'
        plus_string = datetime.today().strftime("%Y-%m-%d")
        newformat = '%Y-%m-%d %I.%M%p'
        format1 = '%Y-%m-%d %I:%M %p'
        total_string = plus_string + " " + time_string
        my_date = datetime.strptime(total_string, newformat)
        times_list[x] = my_date.strftime(format1)
        x = x+1
    
    #print(times_list)
    return(times_list)

times = timeconversion_and_committ(times)
print(times)


['9.27pm', '9.23pm', '9.16pm', '9.16pm', '9.12pm', '9.08pm', '9.05pm', '8.59pm', '8.54pm', '8.48pm', '8.43pm', '8.41pm', '8.36pm']
['2017-05-15 09:27 PM', '2017-05-15 09:23 PM', '2017-05-15 09:16 PM', '2017-05-15 09:16 PM', '2017-05-15 09:12 PM', '2017-05-15 09:08 PM', '2017-05-15 09:05 PM', '2017-05-15 08:59 PM', '2017-05-15 08:54 PM', '2017-05-15 08:48 PM', '2017-05-15 08:43 PM', '2017-05-15 08:41 PM', '2017-05-15 08:36 PM']


And we define two variable lists based on the length of the number of songs that was returned.

In [92]:
# Main list definitions
ArtistsSongsTimesCount = [artists, songs, times, [0] * len(artists)]

"""# This is just code to assign a value in count - this can be replaced by the duration function ! 
for x in range(0,(len(ArtistsSongsTimesCount[3]))):
    #ArtistsSongsTimesCount[3][x] = x + 1
    if x>8:
        ArtistsSongsTimesCount[3][x] = '0:06:00'
    elif x>5:
        ArtistsSongsTimesCount[3][x] = '0:03:00'
    else:
        ArtistsSongsTimesCount[3][x] = '0:04:00'
    
print(ArtistsSongsTimesCount[3])"""


def calculate_durations(Local_ArtistsSongsTimesCount):
    OutputFinal = []
    
    for i in range(0,(len(Local_ArtistsSongsTimesCount[0]) - 2)):
        format1 = '%Y-%m-%d %I:%M %p'
        format2 = '%M'
        my_date1 = datetime.strptime(Local_ArtistsSongsTimesCount[2][i], format1)
        my_date2 = datetime.strptime(Local_ArtistsSongsTimesCount[2][i+1], format1)
        ReadyForOutput = (my_date1 - my_date2).seconds
        OutputFinal.append(ReadyForOutput)
        
    return OutputFinal

ArtistsSongsTimesCount[3]= calculate_durations(ArtistsSongsTimesCount)
print(ArtistsSongsTimesCount[3])




[240, 420, 0, 240, 240, 180, 360, 300, 360, 300, 120]


Here we write to a file at the start of the program all of the songs which have been played so far. This will become our master file which we will use to generated statistics.

In [78]:
OutputFinal = []

f = open('Today_Songlist', 'w')

f.write('Artists;Songs;Times;Genres\n')

for i in range((len(ArtistsSongsTimesCount[0]) - 1),0,-1):
    ReadyForOutput = ""
    for list in ArtistsSongsTimesCount:
        ReadyForOutput = ReadyForOutput + str(list[i]) + ';'
    ReadyForOutput = ReadyForOutput + '\n'
    OutputFinal.append(ReadyForOutput)

for item in OutputFinal:
    f.write(item)
    
f.close()

Testing that the file reads

In [79]:
f = open('Today_Songlist', 'r')
hello = f.read()
f.close()

print(hello)


Artists;Songs;Times;Genres
George Ezra;Budapest;2017-05-15 08:36 PM;0:06:00;
The Pretenders;Stop Your Sobbing;2017-05-15 08:41 PM;0:06:00;
Shed Seven;Going For Gold;2017-05-15 08:43 PM;0:06:00;
Puddle Of Mudd;Blurry;2017-05-15 08:48 PM;0:06:00;
Toto;Africa;2017-05-15 08:54 PM;0:03:00;
The Cars;Just What I Needed;2017-05-15 08:59 PM;0:03:00;
Keane;This Is The Last Time;2017-05-15 09:05 PM;0:03:00;
Smash Mouth;All Star;2017-05-15 09:08 PM;0:04:00;
The Killers;Human;2017-05-15 09:12 PM;0:04:00;
Cast;Beat Mama;2017-05-15 09:16 PM;0:04:00;
Tears For Fears;Everybody Wants To Rule The World;2017-05-15 09:16 PM;0:04:00;
Starsailor;Goodsouls;2017-05-15 09:23 PM;0:04:00;



And then we print some stuff into the console to check things are working

In [80]:
print(len(artists))
print(ArtistsSongsTimesCount)

13
[['Soundgarden', 'Starsailor', 'Tears For Fears', 'Cast', 'The Killers', 'Smash Mouth', 'Keane', 'The Cars', 'Toto', 'Puddle Of Mudd', 'Shed Seven', 'The Pretenders', 'George Ezra'], ['Black Hole Sun', 'Goodsouls', 'Everybody Wants To Rule The World', 'Beat Mama', 'Human', 'All Star', 'This Is The Last Time', 'Just What I Needed', 'Africa', 'Blurry', 'Going For Gold', 'Stop Your Sobbing', 'Budapest'], ['2017-05-15 09:27 PM', '2017-05-15 09:23 PM', '2017-05-15 09:16 PM', '2017-05-15 09:16 PM', '2017-05-15 09:12 PM', '2017-05-15 09:08 PM', '2017-05-15 09:05 PM', '2017-05-15 08:59 PM', '2017-05-15 08:54 PM', '2017-05-15 08:48 PM', '2017-05-15 08:43 PM', '2017-05-15 08:41 PM', '2017-05-15 08:36 PM'], ['0:04:00', '0:04:00', '0:04:00', '0:04:00', '0:04:00', '0:04:00', '0:03:00', '0:03:00', '0:03:00', '0:06:00', '0:06:00', '0:06:00', '0:06:00']]


## Plotly
Main Dashboard section

In [81]:
def generate_plotly_dashboard():
    import plotly.dashboard_objs as dashboard
    import plotly.plotly as py

    import IPython.display
    from IPython.display import Image

    my_dboard = dashboard.Dashboard()
    
    
    box_1 = {
    'type': 'box',
    'boxType': 'plot',
    'fileId': 'benisme:0',
    'shareKey': None,
    'title': 'Graph test 1'
    }
    
    box_2 =  {
    'type': 'box',
    'boxType': 'text',
    'text': 'Test Text',
    'title': 'Title for text'
    }
    
    box_3 =  {
    'type': 'box',
    'boxType': 'text',
    'text': 'Test Text 2',
    'title': 'Title for text 2'
    }
    
    my_dboard.insert(box_1)
    my_dboard.insert(box_2, 'above', 1)
    my_dboard.insert(box_3, 'left', 2)
    
    py.dashboard_ops.upload(my_dboard, 'My First Dashboard with Python')
    

In [82]:
#my_dboard.get_preview()

### Plotly graphs no. 1 for dashboard

timeline for the dashboard

In [83]:
def generate_plotly_graph1():
    
    import plotly.plotly as py
    import plotly.graph_objs as go

    import pandas as pd

    headers = ['Artists', 'Songs', 'DateTime', 'Genres']
    dtypes = {'Artists': 'str', 'Songs': 'str', 'DateTime': 'str', 'Genres': 'str'}
    parse_dates = ['DateTime']

    df = pd.read_csv('Today_Songlist', sep=';', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)

    #df = pd.read_csv('Today_Songlist', sep=';')
    #df = pd.read_csv('Today_Songlist',sep=';', parse_dates=['Times'])


    df.drop(df.index[[0]], inplace=True)
    print(df)


    trace_high = go.Scatter(
                    x=df.DateTime,
                    y=df['Genres'],
                    mode='markers',
                    text=df['Songs'],
                    name = "Genres",
                    line = dict(color = '#17BECF'),
                    opacity = 0.8)

    data = [trace_high]

    layout = dict(
        title = "First trial at Plotly",
        xaxis = dict(
            range = ['2017-05-12','2017-05-15'])
    )

    fig = dict(data=data, layout=layout)
    py.iplot(fig, filename = "Manually Set Range")



### The master function

Here we define the master function which will be called by the timer, this will
1. Call the *Retrieve_TimesArtistsSongs()* function 
2. If the *Retrieve_TimesArtistsSongs()* returns true, indicating a new song has been played, the master function will
  * Check whether it is a repeat: It does this by running *UpdateCount()*
  * *Update Count* Returns the current list of repeated songs, and the number of songs played
  * If the length of the current list of repeated songs has changed from the last loop, then actions are generated
  
3. Return values back to the timer

In [84]:
# start of master function definition
def update_song_list_count_and_email():
    global CurrentRepeatList
    global CurrentRepeatListLength

    NewSongBoolean = False
    RepeatBoolean = False
    TotalSongs = 0

    NewSongBoolean = Retrieve_TimesArtistsSongs()
    print("")
    print("NewSongBoolean = ", NewSongBoolean)
    # change this to a case statement later on
    if NewSongBoolean == True:
        print("Updating List")
        update_songs_file()
        generate_plotly_graph1()
        

    else:
        print("Not updating this time around")

    print("CurrentRepeatList = ", CurrentRepeatList)
    print("len(CurrentRepeatList) = ", len(CurrentRepeatList))

    CurrentRepeatListLength = len(CurrentRepeatList)

    # If the update count returns a bigger value then do something

    print("")
    print("ArtistsSongsTimesCount = ", ArtistsSongsTimesCount)
    print("")
   
    # Function check to see if count is > EmailNotificaiton list

    return CurrentRepeatListLength, TotalSongs

## The *Retrieve_TimesArtistsSongs* function

What this function is doing is:
1. Getting the current list of Artists, songs, times from the absolute radio website
2. Comparing the songs list to the existing songs list, and if there is a difference
  1. It inserts the new song into the existing array
  2. It returns the value 'true'

In [85]:
def Retrieve_TimesArtistsSongs():
    page = requests.get('https://absoluteradio.co.uk/absolute-radio/music/')
    tree = html.fromstring(page.content)
    # This will create a list of artists
    artists_B = tree.xpath('//p[@class="song-artist"]/text()')
    # This will create a list of songs
    songs_B = tree.xpath('//p[@class="song-title"]/a/text()')
    # This will create a list of the times of the songs
    times_B = tree.xpath('//div[@class="song-inner"]/time/text()')

    if songs_B[0] != ArtistsSongsTimesCount[1][0]: # i.e. compare the newly created songs array to the existing songs array
        ArtistsSongsTimesCount[0].insert(0, artists_B[0])
        ArtistsSongsTimesCount[1].insert(0, songs_B[0])
        ArtistsSongsTimesCount[2].insert(0, times_B[0])
        ArtistsSongsTimesCount[3].insert(0, randint(0,9))
        return True

    else:
        return False

### The update the songs record file on addition of new song function
    

In [86]:
def update_songs_file():
    OutputFinal = []
    ReadyForOutput = ""
    
    f = open('Today_Songlist', 'a')

    for list in ArtistsSongsTimesCount:
        ReadyForOutput = ReadyForOutput + str(list[0]) + ';'
    ReadyForOutput = ReadyForOutput + '\n'
    OutputFinal.append(ReadyForOutput)

    for item in OutputFinal:
        f.write(item)

    f.close()

## Start of main program

In [87]:
generate_plotly_graph1()
generate_plotly_dashboard()

            Artists                              Songs             DateTime  \
1       George Ezra                           Budapest  2017-05-15 08:36 PM   
2    The Pretenders                  Stop Your Sobbing  2017-05-15 08:41 PM   
3        Shed Seven                     Going For Gold  2017-05-15 08:43 PM   
4    Puddle Of Mudd                             Blurry  2017-05-15 08:48 PM   
5              Toto                             Africa  2017-05-15 08:54 PM   
6          The Cars                 Just What I Needed  2017-05-15 08:59 PM   
7             Keane              This Is The Last Time  2017-05-15 09:05 PM   
8       Smash Mouth                           All Star  2017-05-15 09:08 PM   
9       The Killers                              Human  2017-05-15 09:12 PM   
10             Cast                          Beat Mama  2017-05-15 09:16 PM   
11  Tears For Fears  Everybody Wants To Rule The World  2017-05-15 09:16 PM   
12       Starsailor                          Goodsou

## Timer Function

In [88]:
# Creates a timer function running for 30s, and running function get songs

starttime = time.time()
while True:
    CurrentRepeatList2, TotalSongs2 = update_song_list_count_and_email()
    # Code needed here to
    # pass back whether there has been a repeat and the total number of songs.
    # also check if need to re-tweet at someone
    time.sleep(30.0 - ((time.time() - starttime) % 30.0))


NewSongBoolean =  False
Not updating this time around
CurrentRepeatList =  []
len(CurrentRepeatList) =  0

ArtistsSongsTimesCount =  [['Soundgarden', 'Starsailor', 'Tears For Fears', 'Cast', 'The Killers', 'Smash Mouth', 'Keane', 'The Cars', 'Toto', 'Puddle Of Mudd', 'Shed Seven', 'The Pretenders', 'George Ezra'], ['Black Hole Sun', 'Goodsouls', 'Everybody Wants To Rule The World', 'Beat Mama', 'Human', 'All Star', 'This Is The Last Time', 'Just What I Needed', 'Africa', 'Blurry', 'Going For Gold', 'Stop Your Sobbing', 'Budapest'], ['2017-05-15 09:27 PM', '2017-05-15 09:23 PM', '2017-05-15 09:16 PM', '2017-05-15 09:16 PM', '2017-05-15 09:12 PM', '2017-05-15 09:08 PM', '2017-05-15 09:05 PM', '2017-05-15 08:59 PM', '2017-05-15 08:54 PM', '2017-05-15 08:48 PM', '2017-05-15 08:43 PM', '2017-05-15 08:41 PM', '2017-05-15 08:36 PM'], ['0:04:00', '0:04:00', '0:04:00', '0:04:00', '0:04:00', '0:04:00', '0:03:00', '0:03:00', '0:03:00', '0:06:00', '0:06:00', '0:06:00', '0:06:00']]



KeyboardInterrupt: 

Ideas for next steps

In [None]:
"""    
At end of day (1) tweet about the number of songs and repeats and (2) write day's output to file'

I can have 2 seperate functions (1) that tweets to twitter (2) one that checks twitter



Settings file: 

days of the week to run

start time

stop time


Generated file:

How many full days monitored

How many repeats
"""