## Michael Felzan | GIS 8990, University of Minnesota 2021 | "The Geography of Radio"

### ~

# Table of contents: 


   #### A.) Notebook Preparation
   
         1.) Importing packages
         2.) Assigning working directory path / API key inputs
         3.) Establishing path to radio station data .CSV (sourced from Radio.Garden)
         4.) Assigning path to CSV which is sourced from the 'Master' radio stations .CSV, but only 
             contains stations FM Stations from the state Washington
         5.) Assigning path to a text file which will be the home to URL streams from the above 
             .CSV which consistently 'error out' or return no results
         6.) Function / Class Definitions
         
   #### B.) Local Directory Setup
   
         1.) Creating 'list_of_wash_callsigns;' appending all names of Washington callsigns to list
         2.) Making folder directories for all Washington callsigns
         3.) Creating .txt files for all Washington callsigns in correct format
         4.) Retrieving station info from Radio-Locator and creating "station info" .txt for 
             each station (that yields a valid request from Radio-Locator)
         5.) Creating a list of callsigns for every station in our sample set
         
   #### C.) Gathering information about each radio station's powered FM tower
    
         1.) Creating all individual station info CSVs (involves parsing/cleaning up returned 
             text from initial Radio-Locator request)
         2.) Creating folder that station info tables will be copied into, so that they can 
             be merged into one file
         3.) Copying files to that folder
         4.) Concatenating all station info CSV's into one CSV
         5.) Viewing what "Station Info" CSV looks like
    
   #### D.) MULTI-PROCESSING
    
   #### E.) Turning track-play information into usable data that can be imported into ArcGIS
    
         1.) Creatiing a list of station tuples (callsign + stream URL) in preparation of radio sampling
         2.) Iterating over every station URL, sampling audio, routing audio to both ACRCloud and 
             Discogs APIs, logging associated information in each station's respective text file
         3.) Demonstrating what the 'raw text' inside a given station's "song play log" looks like
             (after the above code block has been run enough times)
         4.) Demonstrating the operations involved in parsing this text, in order to put all of the
             info neatly into a spreadsheet
         5.) Iterating over every station's "song play log" .txt file,  parsing/extracting relevant 
             info (saving to temporary variables), and creating clean song play .CSVs for each station
         6.) Demonstrating what one of these song play .CSV's looks like
         7.) Creating a folder to store every station's song play log .CSV file in one place
         8.) Copying all individual song play .CSVs into new folder
         9.) Merging all individual station song play .CSVs into one concatenated 'master' .CSV
         
   #### F.) Creating statistics based on genres played by each station
    
         1.) Creating a list of unique styles/genres that appear on the concatenatedsong play log .CSV
         2.) Defining, from the above list, which of those genres are "electronic music" genres
         3.) Iterating over 'master' song play log .CSV, and marking tracks which contain one of
             the above 'electronic' genre tags.
         4.) Note the 'elec_or_not' column header added to the below spreadsheet
         5.) Using Pandas 'GroupBy' to summarize statistics by each radio station (callsign)
         6.) Printing summary statistics to .CSV, so this data may be linked with a point-class 
             shapefile in ArcGIS Pro (allowing for spatial analysis)

# A.) Notebook Preparation

## 1.) Importing packages

##### NOTE: in order to import 'acrcloud' properly, user must go to terminal, navigate to base of project folder repo, and and type 'sudo python setup.py install'

In [25]:
import json
import io
from io import BytesIO
import os
import pandas as pd
import csv
from csv import reader
import requests
import pycurl
import certifi
import time
from datetime import datetime, date, timedelta
from acrcloud.recognizer import ACRCloudRecognizer
from acrcloud.recognizer import ACRCloudRecognizeType
from bs4 import BeautifulSoup
import shutil
from shutil import copyfile
import pydub
from pydub.playback import play
from pydub import AudioSegment
import ffmpeg
from functools import partial
from itertools import repeat
from itertools import product
import multiprocessing
from multiprocessing import Pool, freeze_support
import glob

## 2.) Assigning working directory path / API key inputs 

In [26]:
# assign working directory path:
workindir = r'/Users/michaelfelzan/Desktop/Geography-of-Radio/Data'
os.chdir(workindir)

# ACRCloud key / secret keys:
acrcloud_accesskey = '4554490a1c83c19ea6745ed6bfbbea7d'
acrcloud_secretkey = 'HcoVKztgUltoNAwlPxpWoIJzt86HoKk9KTiByPTl'

acrcloud_config = {
    'host':'identify-eu-west-1.acrcloud.com',
    'access_key': acrcloud_accesskey,
    'access_secret': acrcloud_secretkey,
    'recognize_type': ACRCloudRecognizeType.ACR_OPT_REC_AUDIO, # could be 'humming audio' as well
    'debug':False,
    'timeout':5 # seconds
}

ACR_recognizer = ACRCloudRecognizer(acrcloud_config)

# Discogs key / secret keys:
client_key = 'gQylLmyvGPfyGEvmFyRx'
client_secret = 'KJnItkPttweOtFSkqwAiTTCWJsOnIobH'

In [27]:
os.getcwd()

'/Users/michaelfelzan/Desktop/Geography-of-Radio/Data'

In [28]:
os.listdir()

['MASTER_CSV.csv',
 'WASH SONG LOGS',
 'PLACES_DescendingCountriesCSV.csv',
 '.DS_Store',
 'WashingtonStations_filled.csv',
 'sample_mp3_1.mp3',
 'sample_mp3_2.mp3',
 'WashingtonStations_OnlyFM.csv',
 'us_4.csv',
 'WASH STATIONS',
 'US_STATIONS_3.csv',
 'fatal_error_wash.txt',
 'US_STATIONS.csv',
 'WashStationInfoTables']

## 3.) Establishing path to radio station data .CSV (sourced from Radio.Garden)

In [30]:
us4_df = pd.read_csv('us_4.csv')
us4_df.head()  

Unnamed: 0,Country,CityState,City,StateAbv,StationName,RG_ID,RG_LC,CS,RG_URL,URL,CityCords
0,United States,Los Angeles CA,Los Angeles,CA,101 Smooth Jazz,zFDSxGwY,/listen/-101-smooth-jazz/zFDSxGwY,xxxx,https://radio.garden/api/ara/content/listen/zF...,https://streaming.live365.com/b22139_128mp3,"[-118.24368, 34.052235]"
1,United States,Los Angeles CA,Los Angeles,CA,113FM,4KJ_uzy0,/listen/radio-113fm/4KJ_uzy0,xxxx,https://radio.garden/api/ara/content/listen/4K...,http://113fm-edge1.cdnstream.com/1730_128,"[-118.24368, 34.052235]"
2,United States,Los Angeles CA,Los Angeles,CA,88.5 FM KCSN,SsUyqJaN,/listen/kcsn/SsUyqJaN,KCSN,https://radio.garden/api/ara/content/listen/Ss...,http://130.166.82.184:8000/;,"[-118.24368, 34.052235]"
3,United States,Los Angeles CA,Los Angeles,CA,9128.live,hacyg6SN,/listen/radio-9128-live/hacyg6SN,xxxx,https://radio.garden/api/ara/content/listen/ha...,https://streams.radio.co/s0aa1e6f4a/listen,"[-118.24368, 34.052235]"
4,United States,Los Angeles CA,Los Angeles,CA,98.5 Flex FM,t4dqOINA,/listen/radio-98-5-flex-fm/t4dqOINA,xxxx,https://radio.garden/api/ara/content/listen/t4...,https://streaming.live365.com/a23768,"[-118.24368, 34.052235]"


## 4.) Assigning path to CSV which is sourced from the 'Master' radio stations .CSV, but only contains stations FM Stations from the state Washington (these were manually looked up/ selected) :

In [32]:
WA_FM_CSV = r'/Users/michaelfelzan/Desktop/Geography-of-Radio/Data/WashingtonStations_OnlyFM.csv'

# Both variable names are utilized in notebook
source_CSV = WA_FM_CSV

WA_df = pd.read_csv(WA_FM_CSV)
WA_df.head()

Unnamed: 0,Country,CityState,City,StateAbv,StationName,RG_ID,RG_LC,CS,Type,RG_URL,URL,CityCords
0,United States,Spokane WA,Spokane,WA,92.9 ZZU - KZZU-FM,8VcFeepP,/listen/kzzu/8VcFeepP,KZZU,FM,https://radio.garden/api/ara/content/listen/8V...,https://15373.live.streamtheworld.com/KZZUFMAA...,"[-117.42605, 47.65878]"
1,United States,Spokane WA,Spokane,WA,Coyote Country - KEZE,FUCv80QQ,/listen/coyotecountry969/FUCv80QQ,KEZE,FM,https://radio.garden/api/ara/content/listen/FU...,https://17963.live.streamtheworld.com/KEZEFMAA...,"[-117.42605, 47.65878]"
2,United States,Spokane WA,Spokane,WA,KPBX 91.1 Spokane Public Radio,6OP3QxSa,/listen/spokane-public-radio-kpbx-91-1/6OP3QxSa,KPBX,FM,https://radio.garden/api/ara/content/listen/6O...,https://18733.live.streamtheworld.com/KPBX_FM.mp3,"[-117.42605, 47.65878]"
3,United States,Spokane WA,Spokane,WA,KPBZ 90.3 Spokane Public Radio,WDXAsse3,/listen/spokanepublicradio/WDXAsse3,KPBZ,FM,https://radio.garden/api/ara/content/listen/WD...,https://24883.live.streamtheworld.com/KPBZ_FM.mp3,"[-117.42605, 47.65878]"
4,United States,Spokane WA,Spokane,WA,KYRS FM 88.1,ocrbM9-Q,/listen/kyrs-fm-88-1/ocrbM9-Q,KYRS,FM,https://radio.garden/api/ara/content/listen/oc...,https://www.ophanim.net:8444/s/7170,"[-117.42605, 47.65878]"


### 5.) Assigning path to a text file which will be the home to URL streams from the above .CSV which consistently 'error out' or return no results :

In [33]:
failed_wash_infoget = r'/Users/michaelfelzan/Desktop/GEO FM/FailedWashStationsInfoGetter.txt'

In [34]:
fails_txt = r'/Users/michaelfelzan/Desktop/GEO FM/FailedWashStationsInfoGetter.txt'
fails_list = []

with open(fails_txt, "r") as a_file:
    for line in a_file:
        stripped_line = line.strip()
        fails_list.append(stripped_line)

In [35]:
fails_list

['KOWA', 'KORE', 'KVSH']

## 6.) Function / Class Definitions:

##### (Run all of the following cells)

In [39]:
def RadioTowerInfoGetter(callsign):
    """Function that requests the HTML of
    the radio-locator.com webpage for whatever 
    radio station callsign is inputted into the function.
    
    The function parses the HTML to return a dictionary
    containing the radio station's ERP, lat/long,
    HAAT, HAGL, and HASL.
    
    Parameters
    ------------
    callsign : str
        Name of station callsign (eg. KEXP)
    """
    print(f"Attempting to gather info from:  {callsign}...")
    
    # Request headers
    radiolocheaders = {
        'sec-ch-ua' : '"Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"',
        'sec-ch-ua-mobile' : '?0',
        'Upgrade-Insecure-Requests' : '1',
        'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36'
    }
    
    # There are two different ways to search radio stations using this site/API; 
    # sometimes the 1st URL yields the results for the station but the 2nd does not,
    # and vice versa.
    locatorURL_1 = f'https://radio-locator.com/cgi-bin/url?call={callsign}&service=FM'
    locatorURL_2 = f'https://radio-locator.com/cgi-bin/finder?call={callsign}&x=0&y=0&sr=Y&s=C'
    
    # sending request, decoding request, and assessing whether or not req. yielded 
    # valid results:
    radiolocreq = requests.get(locatorURL_1,
                               headers=radiolocheaders)
    stationHTML = radiolocreq.content.decode('utf-8')
    stationsoup = BeautifulSoup(stationHTML)
    techvalues = stationsoup.find_all("td",
                                      class_='tech_value')
    # if no results returned:
    if techvalues == []:
        print(f"No radio tower info could be retrieved for station {callsign}")       
        print("Trying other request URL...waiting 20 seconds before retry...")
        # API will block IP address from sending requests if too many are sent
        # within a certain timeframe
        time.sleep(20)
        # trying second URL:
        radiolocreq = requests.get(locatorURL_2,
                                   headers=radiolocheaders)
        stationHTML = radiolocreq.content.decode('utf-8')
        stationsoup = BeautifulSoup(stationHTML)
        techvalues = stationsoup.find_all("td",
                                          class_='tech_value')
        # if no valid results from second URL:
        if techvalues == []:
            print("Second URL method failed. Stationed logged in 'fails' .txt")
            with open(failed_wash_infoget, "a+") as file_object:
                file_object.seek(0)
                data = file_object.read(100)
                if len(data) > 0 :
                    file_object.write("\n")
                # Writing station's callsign in 'failed
                # Wash stations info' .txt
                file_object.write(f"{callsign}")
                
            return False
        # if valid results returned for 2nd URL:
        # start parsing relevant info from req,
        # values mapped to dictionary:
        else:
            ERP = []
            coords = []
            heights = []
            # Parsing the HTML:
            for item in techvalues:
                for characters in item:
                    for sub in characters:
                        if '" N' in sub:
                            coords.append(sub)
                if 'Watts' in characters:
                    ERP.append(characters)
                elif 'meters' in characters:
                    heights.append(characters)
                else:
                    pass
    
            try:
                itercoords = coords[0]
            except:
                print("This station didn't have coords data")
                itercoords = 'no_data'
            try:
                iterERP = ERP[0]
            except:
                print("This station didn't have ERP data")
                iterERP = 'no_data'
            try:
                iterHAAT = heights[0]
            except:
                print("This station didn't have HAAT data")
                iterHAAT = 'no_data'
            try:
                iterHAGL = heights[1]
            except:
                print("This station didn't have HAGL data")
                iterHAGL = 'no_data'
            try:
                iterHASL = heights[2]
            except:
                print("This station didn't have HASL data")
                iterHASL = 'no_data'
    
            towerinfo = {
                'ERP' : iterERP,
                'Coords' : itercoords,
                'HAAT' : iterHAAT,
                'Height Above Ground Level' : iterHAGL,
                'Height Above Sea Level' : iterHASL
            }
    
            return towerinfo
    
    # if valid results returned for 1st URL:
    # start parsing relevant info from req,
    # values mapped to dictionary:
    else:
        ERP = []
        coords = []
        heights = []
        # Parsing the HTML:
        for item in techvalues:
            for characters in item:
                for sub in characters:
                    if '" N' in sub:
                        coords.append(sub)
            if 'Watts' in characters:
                ERP.append(characters)
            elif 'meters' in characters:
                heights.append(characters)
            else:
                pass
    
        try:
            itercoords = coords[0]
        except:
            print("This station didn't have coords data")
            itercoords = 'no_data'
        try:
            iterERP = ERP[0]
        except:
            print("This station didn't have ERP data")
            iterERP = 'no_data'
        try:
            iterHAAT = heights[0]
        except:
            print("This station didn't have HAAT data")
            iterHAAT = 'no_data'
        try:
            iterHAGL = heights[1]
        except:
            print("This station didn't have HAGL data")
            iterHAGL = 'no_data'
        try:
            iterHASL = heights[2]
        except:
            print("This station didn't have HASL data")
            iterHASL = 'no_data'
    
        towerinfo = {
            'ERP' : iterERP,
            'Coords' : itercoords,
            'HAAT' : iterHAAT,
            'Height Above Ground Level' : iterHAGL,
            'Height Above Sea Level' : iterHASL
        }
    
        return towerinfo

In [40]:
def StationInfoTxtWriter(stationfolderpath, call_sign):
    """Function that writes the dictionary return
    from RadioTowerInfoGetter() for a station 
    to a .txt file in that station's corresponding
    folder (only if the RadioTowerInfoGetter()
    successfully returns tower info).
    
    Parameters
    ------------
    stationfoldername : str
        Name of a radio stations folder (eg. CA_KALX)
    """
    towerinfodict = RadioTowerInfoGetter(call_sign)
    if towerinfodict == False:
        #print("yes, it failed")
        pass
    else:
        infofilename = os.path.join(stationfolderpath,
                                    f"{call_sign}_towerinfo.txt")
        textyfile = open(infofilename,
                    "w+")
        textyfile.write("{\n")
        for k in towerinfodict.keys():
            textyfile.write("'{}':'{}'\n".format(k, towerinfodict[k]))
        textyfile.write("}")
        textyfile.close()
        print(f"Sucessfully wrote {call_sign}_towerinfo.txt")

In [41]:
def dms_to_dd(d, m, s):
    """Function that converts degrees minutes seconds
    into decimal degrees.
    
    Parameters
    ------------
    d : str (tho doesnt matter--converted to float in function)
        degree
    m : str (tho doesnt matter--converted to float in function)
        minutes
    s : str (tho doesnt matter--converted to float in function)
        seconds
    """
    dd = d + float(m)/60 + float(s)/3600
    return round(dd, 6)

In [42]:
def DMS_Isolator(dms_string):
    """function that isolates the degree, minute, and second 
    components  of the DMS return from the StationInfoTxtWriter(),
    so that they may be separately inputted into the
    dms_to_dd() function parameters
    """
    degree = dms_string.split("°")[0]
    minutefirstsplit = dms_string.split("'")[0]
    minute = minutefirstsplit.split("° ")[1]
    secondfirstsplit = dms_string.split('"')[0]
    second = secondfirstsplit.split("' ")[1]
    
    return [degree,minute,second]

In [43]:
class Station:
    """This class is for operations involving accessing
    station info from the source CSV and the methods
    associated with recording/sampling radio streams.
    
    Attributes
    ------------
        sourceCSV (str): The source CSV which links station
           callsigns to their stream URL's, + other info
        callsign (str): universal 4-character station signifier
           (ex. 'KEXP')
    """
    
    def __init__(self, sourceCSV, callsign):
        self.sourceCSV = sourceCSV
        self.callsign = callsign
        
        with open(sourceCSV, 'r') as read_obj:
            csv_reader = reader(read_obj)
            header = next(csv_reader)
            if header != None:
                for row in csv_reader:
                    if row[7] == callsign:
                        st_country = row[0]
                        st_city = row[2]
                        st_state = row[3]
                        st_url = row[10]
                        st_citycords = row[11]
                        
        self.country = st_country
        self.city = st_city
        self.state = st_state
        self.url = st_url
        self.citycords = st_citycords
        self.txtpath = os.path.join(r'/Users/michaelfelzan/Desktop/GEO FM/WASH STATIONS',
                                    callsign,
                                    f'{callsign}.txt')
        self.foldpath = os.path.join(r'/Users/michaelfelzan/Desktop/GEO FM/WASH STATIONS',
                                     callsign)
    
    def Sampler(self, url, outpathname):
        """Function for sampling audio from streams.
    
        Parameters
        ------------
        url : str 
            stream URL
        outpathname : str
            name//dir path for outputted .mp3 file 
        """
        audio_input = ffmpeg.input(url)
        audio_output = ffmpeg.output(audio_input,
                                     outpathname,
                                     **{'b:a': '128k'},
                                     ss=35,
                                     t=10)
        #print(audio_output)
        try:
            audio_output.run()
        except:
            print("Error - Sampler could not work on URL stream. Skipping station.")
            return False
    
    def RouteToACRCloud(self, mp3):
        """Function that routes an mp3 file (path) to the
        ACRCloud API, in order to return the song name info.
        
        Parameters
        ------------
        mp3 : str
            Name of mp3 file (path)
            """
        buf = open(mp3,'rb').read()
        # start second will be 0; will record sampled mp3 for 10 seconds
        songread_output = ACR_recognizer.recognize_by_filebuffer(buf, 0)
        return songread_output
    
    
    def RollRadio(self, url, callsign):
        """Function for initiating iterating over all station stream URLs,
        sampling audio using above "Sampler" methtod, routing audio to APIs
        (ACRCloud, Discogs), and logging song name/genre information in .txt
        file (only if BOTH ACRCloud and Discogs yield valid returns)
    
        Parameters
        ------------
        url : str 
            stream URL
        callsign : str
            station callsign
        """
        timerightnow = datetime.now()
        formattedtime = timerightnow.strftime("%Y_%m_%d_T%H_%M_%S")
        
        input_stream = url
        input_mp3name = f'{callsign}'+f'{formattedtime}'+'.mp3'
        itersongpath = os.path.join(r'/Users/michaelfelzan/Desktop/GEO FM',
                                    input_mp3name)
        
        print(f'~~Now recording... {input_stream}')
        samplerreturn = self.Sampler(url, itersongpath)
        err = True
        if samplerreturn == False:
            pass
        else:
            acrcloud_output = self.RouteToACRCloud(itersongpath)
            songread_dict = json.loads(acrcloud_output)
    
            if songread_dict['status']['msg'] == 'No result':
                print(f'No ACR return from {input_mp3name}')
                pass
            elif songread_dict['status']['msg'] == 'May Be Mute':
                print(f'No ACR return from {input_mp3name}')
                pass
            elif songread_dict['status']['msg'] == 'Decode Audio Error':
                print(f'No ACR return from {input_mp3name}')
                pass
            elif songread_dict['status']['msg'] == 'requests limit exceeded, please upgrade your account':
                print('ACRCLOUD REQUESTS LIMITS EXCEEDED, STOP PROGRAM')
                pass
            else:
                try:
                    songtitle = songread_dict['metadata']['music'][0]['title']
                except:
                    print("Error in 'Title' ACR return metadata...skipping station")
                    err = False
                if err == False:
                    pass
                else:
                    try:
                        songartist = songread_dict['metadata']['music'][0]['artists'][0]['name']
                    except:
                        print("Error in 'Artist' ACR return metadata...skipping station")
                        err = False
                    if err == False:
                        pass
                    else:
                        try:
                            songalbum = songread_dict['metadata']['music'][0]['album']['name']
                        except:
                            print("Error in 'Album' ACR return metadata...skipping station")
                            err = False
                            pass
                if err == False:
                    pass
                else:
                    ACR_return = f'{input_mp3name} ~ song : {songtitle} by {songartist} from album {songalbum}'
    
                    percent20song = songtitle+'%20'+songartist+'%20'+songalbum
                    discogs_input = percent20song.replace(" ", "%20")
                    discogs_req = ('https://api.discogs.com/database/search?q=' + 
                               discogs_input + 
                               '&key=' + 
                               client_key + 
                               '&secret=' +
                               client_secret)
                    discogs_req_obj = requests.get(discogs_req)
                    discogs_songinfo_dict = json.loads(discogs_req_obj.content.decode('utf-8'))
        
                    if discogs_songinfo_dict['results'] == []:
                        print(f'No Discogs return from {input_mp3name}')
                        pass
                    else:
                        try:
                            style_list = discogs_songinfo_dict['results'][0]['style']
                        except:
                            style_list = 'no_data'
                        try:
                            genre_list = discogs_songinfo_dict['results'][0]['genre']
                        except:
                            genre_list = 'no_data'
                        
                        if style_list == 'no_data':
                            syleliststring = 'no_data'
                        else:
                            styleliststring = ','.join(style_list)
                        if genre_list == 'no_data':
                            genreliststring  = 'no_data'
                        else:
                            genreliststring = ','.join(genre_list)
            
                        Discogs_return = f'Discogs returned the genre(s): {styleliststring}'
                
                        parent_dir = r'/Users/michaelfelzan/Desktop/GEO FM/WASH STATIONS'
                
                        itertextpath = os.path.join(parent_dir,
                                                    callsign,
                                                    f'{callsign}.txt')
            
                        with open(itertextpath, "a+") as file_object:
                            file_object.seek(0)
                            data = file_object.read(100)
                            if len(data) > 0 :
                                file_object.write("\n")
                            file_object.write((input_mp3name+
                                               ','+
                                               'song='+songtitle+','+
                                               'artist='+songartist+','+
                                               'album='+songalbum+','+
                                               ####'genres='+genreliststring+','+
                                               'styles='+styleliststring))
                
                        print(f"Returned:{[ACR_return,Discogs_return]}")
                
            os.remove(itersongpath)


##### Example outputs of the RadioTowerInfoGetter function:

In [15]:
kexptower = RadioTowerInfoGetter('KMRE')
kexptower

Attempting to gather info from:  KMRE...
No radio tower info could be retrieved for station KMRE
Trying other request URL...waiting 20 seconds before retry...


{'ERP': '100 Watts',
 'Coords': '48° 44\' 51" N, 122° 28\' 45" W',
 'HAAT': '-13.49 meters (-43 feet)',
 'Height Above Ground Level': '38.1 meters (125 feet)',
 'Height Above Sea Level': '65.1 meters (214 feet)'}

In [16]:
kkxatower = RadioTowerInfoGetter('KKXA')
kkxatower

Attempting to gather info from:  KKXA...
No radio tower info could be retrieved for station KKXA
Trying other request URL...waiting 20 seconds before retry...
This station didn't have HAAT data
This station didn't have HAGL data
This station didn't have HASL data


{'ERP': '50,000 Watts',
 'Coords': '47° 52\' 31" N, 122° 04\' 44" W',
 'HAAT': 'no_data',
 'Height Above Ground Level': 'no_data',
 'Height Above Sea Level': 'no_data'}

# B.) Local Directory Setup

### 1.) Creating 'list_of_wash_callsigns;' appending all names of Washington callsigns to list:

In [15]:
list_of_wash_callsigns = []

with open(source_CSV, 'r') as read_obj:
    csv_reader = reader(read_obj)
    header = next(csv_reader)
    if header != None:
        for row in csv_reader:
            iter_callsign = row[7]
            list_of_wash_callsigns.append(iter_callsign)

### 2.) Making folder directories for all Washington callsigns:
##### !!!!!!! ONLY NEEDS TO BE RUN ONE TIME !!!!!!!

In [None]:
for washcallsign in list_of_wash_callsigns:
    iter_station = Station(WA_FM_CSV, washcallsign)
    # Directory 
    directory_name = iter_station.callsign
    
    # Parent Directory path
    parent_dir = r'/Users/michaelfelzan/Desktop/GEO FM/WASH STATIONS'
    
    # Path 
    foldpath = os.path.join(parent_dir, directory_name)

    # Create the directory 
    os.mkdir(foldpath)

### 3.) Creating .txt files for all Washington callsigns in correct format:
##### !!!!!!! ONLY NEEDS TO BE RUN ONE TIME !!!!!!!

In [None]:
for washcallsign in list_of_wash_callsigns:
    itertextpath = os.path.join(parent_dir,
                               washcallsign,
                               f'{washcallsign}.txt')
    #print(itertextpath)
    
    with open(itertextpath, "w") as f:
        f.write(f'{washcallsign}')
    
    
for washcallsign in list_of_wash_callsigns:
    iter_station = Station(WA_FM_CSV,
                           washcallsign)
    itertextpath = os.path.join(parent_dir,
                               washcallsign,
                               f'{washcallsign}.txt')
    with open(itertextpath, "a+") as file_object:
        file_object.seek(0)
        # If file is not empty then append '\n'
        data = file_object.read(100)
        if len(data) > 0 :
            file_object.write("\n")
        # Append text at the end of file
        file_object.write(f"stream={iter_station.url}")
        file_object.write("\n")

### 4.) Retrieving station info from Radio-Locator and creating "station info" .txt for each station (that yields a valid request from Radio-Locator)
##### !!!!!!! ONLY NEEDS TO BE RUN ONE TIME !!!!!!!

In [None]:
parent_dir = r'/Users/michaelfelzan/Desktop/GEO FM/WASH STATIONS'

for cs in fails_list:
    
    iterfoldpathy = os.path.join(parent_dir,
                                cs)
    # StationInfoTxtWriter(stationfolderpath, call_sign):
    StationInfoTxtWriter(iterfoldpathy,
                         cs)
    time.sleep(20)

### 5.) Creating a list of callsigns for every station in our sample set:

In [16]:
parent_dir = r'/Users/michaelfelzan/Desktop/GEO FM/WASH STATIONS'
os.chdir(parent_dir)
wash_callsign_folds = []
for thing in os.listdir():
    if thing == '.DS_Store':
        pass
    else:
        wash_callsign_folds.append(thing)

In [17]:
wash_callsign_folds

['KPBX',
 'KHUH',
 'KACS',
 'KYRS',
 'KXDD',
 'KZAX',
 'KMRE',
 'KFAE',
 'KUKN',
 'KOSW',
 'KTQA',
 'KBCS',
 'KZTM',
 'KEZE',
 'KPLW',
 'KROH',
 'KGHI',
 'KUPS',
 'KIEV',
 'KSQM',
 'KAOS',
 'KMIH',
 'KPBZ',
 'KTAH',
 'KEFA',
 'KSER',
 'KNKX',
 'KUOW',
 'KIRO',
 'KGRG',
 'KEWU',
 'KDDS',
 'KWCW',
 'KNHC',
 'KEXP',
 'KZZU',
 'KZQM',
 'KUGS',
 'KPTZ',
 'KBFG',
 'KXLY',
 'KODX',
 'KGHP',
 'KING',
 'KYYO',
 'KEGX',
 'KCMS']

In [18]:
len(wash_callsign_folds)

47

In [19]:
os.chdir(r'/Users/michaelfelzan/Desktop/GEO FM')
os.getcwd()

'/Users/michaelfelzan/Desktop/GEO FM'

# C.) Gathering information about each radio station's powered FM tower

#### Example utilization of "fezCC.js" code (This is the Javascript pulled from the FCC FM Propogation Curve Calculator website)  :

In [20]:
stream = os.popen('node fezCC.js 4.7 211 "250" "-" 60 "0"')
output = (stream.read()).split('\n')[0]
output

'37.699469101964276'

### 1.) Creating all individual station info CSVs (involves parsing/cleaning up returned text from initial Radio-Locator request) :
#### !!!!! ONLY NEEDS TO BE RUN ONE TIME !!!!!

In [None]:
for cs in wash_callsign_folds:
    pathtostationfold = os.path.join(parent_dir,
                                    cs)
    #itercallsign = station.split("_")[1]
    #stateabv = station.split("_")[0]
    pathtotowertxt = os.path.join(pathtostationfold,
                            f'{cs}_towerinfo.txt')
    pathtotxt = os.path.join(pathtostationfold,
                             f'{cs}.txt')
    newinfocsv = os.path.join(pathtostationfold,
                             f'{cs}_INFO.csv')
    stationfolditems = os.listdir(pathtostationfold)
    fullpathitems = []
    for item in stationfolditems:
        fullpathitems.append(os.path.join(pathtostationfold,
                                         item))
    if pathtotowertxt in fullpathitems:
        with open(pathtotowertxt) as foo:
            txtz = foo.readlines()
            for item in txtz:
                
                if 'ERP' in item:
                    wattsfirstsplit = item.split(":'")[1]
                    watts = wattsfirstsplit.split(" Watts")[0]
                    kilowatts = (float(watts.replace(',','')))/1000
                    
                elif 'Coords' in item:
                    coordsfirstsplit = item.split(":'")[1]
                    lat_dms = coordsfirstsplit.split(",")[0]
                    long_dms = (coordsfirstsplit.split(", ")[1])[:-2]
                    
                    lat_dms_list = DMS_Isolator(lat_dms)
                    long_dms_list = DMS_Isolator(long_dms)
                    
                    lat_dd = dms_to_dd(float(lat_dms_list[0]),
                                       float(lat_dms_list[1]),
                                       float(lat_dms_list[2]))
                    long_dd = dms_to_dd(float(long_dms_list[0]),
                                        float(long_dms_list[1]),
                                        float(long_dms_list[2]))
                    negative_longdd = long_dd*-1
                    
                elif 'HAAT' in item:
                    HAATfirstsplit = item.split(":'")[1]
                    HAAT_clean = HAATfirstsplit.split(" meters")[0]
                    HAAT_float = float(HAAT_clean)
        
                    stream_param = f'node fezCC.js {kilowatts} {HAAT_float} "250" "-" 60 "0"'
                    stream = os.popen(stream_param)
                    BUFF60DIST = (stream.read()).split('\n')[0]
                
        with open(newinfocsv,'w',newline='') as csvfile:
            fieldnames = ['CALLSIGN',
                          'LAT',
                          'LONG',
                          'ERP',
                          'HAAT',
                          '60dBu_DIST']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerow({
                'CALLSIGN': cs,
                'LAT' : lat_dd,
                'LONG' : negative_longdd,
                'ERP' : kilowatts,
                'HAAT' : HAAT_clean,
                '60dBu_DIST' : BUFF60DIST
            })
                
    else:
        print(f'tower info .txt does not exist for {cs}')
        pass

print("\n     *~*~~~~~**~~~~~~~~~~~~~~**~*~*\n"+
      "Successfully wrote all other station tower info CSVs."+
      "\nCSVs outputted to respective station folders."
      "\n   *~*~~~~~**~~~~~~~~~~~~~~**~*~*")

### 2.) Creating folder that station info tables will be copied into, so that they can be merged into one file:
#### !!!! ONE TIME RUN !!!!!

In [None]:
stationinfopath = os.path.join(r'/Users/michaelfelzan/Desktop/GEO FM',
                               "WashStationInfoTables")
os.mkdir(stationinfopath)

### 3.) Copying files to that folder:
#### !!!! ONE TIME RUN !!!!!

In [None]:
# COPYING FILES TO StationInfoTables FOLDER

for cs in wash_callsign_folds:
    pathtostationfold = os.path.join(parent_dir,
                                    cs,)
    newinfocsv = os.path.join(pathtostationfold,
                             f'{cs}_INFO.csv')
    stationfolditems = os.listdir(pathtostationfold)
    fullpathitems = []
    for item in stationfolditems:
        fullpathitems.append(os.path.join(pathtostationfold,
                                         item))
    if newinfocsv in fullpathitems:
        copyfile(newinfocsv,
                 os.path.join(r'/Users/michaelfelzan/Desktop/GEO FM',
                              "WashStationInfoTables",
                              f'{cs}_INFO.csv'))

### 4.) Concatenating all station info CSV's into one CSV
#### !!!!!ONE TIME RUN!!!!!

In [None]:
pathtofold = r'/Users/michaelfelzan/Desktop/GEO FM/WashStationInfoTables'
INFOconcatCSV = os.path.join(pathtofold,
                             "WASHSTATIONINFOconcat.csv") 
allFiles = glob.glob(pathtofold + "/*.csv")
allFiles.sort()
with open(INFOconcatCSV, 'wb') as outfile:
        for i, fname in enumerate(allFiles):
            with open(fname, 'rb') as infile:
                if i != 0:
                    infile.readline()  # Throw away header on all but first file
            # Block copy rest of file from input to output without parsing
                shutil.copyfileobj(infile, outfile)

print("     *~*~~~~~**~~~~~~~~~~~~~~**~*~*\n"
      "All station tower info CSVs (that exist) have"+
      "\n      been successfully merged together"
      "\n  *~*~~~~~**~~~~~~~~~~~~~~**~*~*")

### 5.) Viewing what "Station Info" CSV looks like:

In [21]:
wash_stn_info_concat = os.path.join(r'/Users/michaelfelzan/Desktop/GEO FM',
                                    "WashStationInfoTables",
                                    "WASHSTATIONINFOconcat.csv")

wash_stn_info_df = pd.read_csv(wash_stn_info_concat)
wash_stn_info_df

Unnamed: 0,CALLSIGN,LAT,LONG,ERP,HAAT,60dBu_DIST
0,KACS,46.730556,-123.025833,6.0,57.0,21.947963
1,KAOS,47.015833,-122.917222,1.25,74.0,16.68653
2,KBCS,47.543889,-122.109167,1.8,389.0,40.247137
3,KBFG,47.6675,-122.355,0.009,95.94,5.546802
4,KCMS,47.544167,-122.108333,54.0,385.0,72.279685
5,KDDS,47.3125,-123.372222,64.0,742.0,92.46147
6,KEFA,47.382778,-120.293333,0.1,-71.0,5.636485
7,KEGX,46.099167,-119.128889,100.0,424.4,81.532897
8,KEWU,47.578611,-117.298333,10.0,429.0,57.879066
9,KEXP,47.615833,-122.308889,4.7,211.0,37.699469


# D.) MULTI-PROCESSING:
##### Note: although headway was made on the parallelization front, I have not yet figured out a way to use the Python starmap() function in tandem with a back-end, command-line program like FFMPEG.
#### Feel free to skip all of this content until the 'END OF MULTI-PROCESSING' message

In [22]:
wash_url_cs_tuples = []

for cs in wash_callsign_folds:
    globals()[f'{cs}'] = Station(WA_FM_CSV, cs)
    wash_url_cs_tuples.append( ( globals()[f'{cs}'].url,
                                globals()[f'{cs}'].callsign) )

#wash_url_cs_tuples

In [None]:
def multi_run_wrapper(args):
    return RadioRoller(*args)
def RadioRoller(url, callsign):
    return (globals()[f'{callsign}']).RollRadio(globals()[f'{callsign}'].url,
                                               globals()[f'{callsign}'].callsign)
if __name__ == "__main__":
    #multiprocessing.set_start_method('spawn')
    pool = multiprocessing.Pool(processes=3, maxtasksperchild=1)
    startcycletime = datetime.time(datetime.now())
    timecounter = 0
    while timecounter < 899:
        pool.map(multi_run_wrapper,
                 wash_url_cs_tuples,
                chunksize=1)
        endcycletime = datetime.time(datetime.now())
        runduration = datetime.combine(date.today(), endcycletime) - datetime.combine(date.today(), startcycletime)
        rundur_sec = runduration.seconds
        if rundur_sec < 300:
            print(f'finished sampling batch of stations...waiting {300-rundur_sec} seconds...')
            time.sleep(300 - rundur_sec)
            timecounter += 300

~~Now recording... https://18733.live.streamtheworld.com/KPBX_FM.mp3
~~Now recording... http://centova.rockhost.com:8001/stream
~~Now recording... https://ic2.sslstream.com/kacs-fm
~~Now recording... https://usa17.fastcast4u.com/proxy/kievradio?mp=/1


In [46]:
help(multiprocessing)

Help on package multiprocessing:

NAME
    multiprocessing

MODULE REFERENCE
    https://docs.python.org/3.7/library/multiprocessing
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    # Package analogous to 'threading.py' but using processes
    #
    # multiprocessing/__init__.py
    #
    # This package is intended to duplicate the functionality (and much of
    # the API) of threading.py but uses processes instead of threads.  A
    # subpackage 'multiprocessing.dummy' has the same API but is a simple
    # wrapper for 'threading'.
    #
    # Copyright (c) 2006-2008, R Oudkerk
    # Licensed to PSF under a Contributor Agreement.
    #

PACKAGE CONTENTS
    connection
    context
    dummy (packag

###  (( END MULTI-PROCESSING ))

# E.) Turning track-play information into usable data that can be imported into ArcGIS

### 1.) Creatiing a list of station tuples (callsign + stream URL) in preparation of radio sampling:

In [23]:
for tup in wash_url_cs_tuples:
    print(tup[1], tup[0])

KPBX https://18733.live.streamtheworld.com/KPBX_FM.mp3
KHUH http://centova.rockhost.com:8001/stream
KACS https://ic2.sslstream.com/kacs-fm
KYRS https://www.ophanim.net:8444/s/7170
KXDD https://ice7.securenetsystems.net/KXDD
KZAX http://199.180.75.2:9395/stream/;
KMRE http://199.180.75.2:9391/;
KFAE http://134.121.234.129:8000/NWPRCLASSICAL
KUKN http://www.streamcontrol.net:11060/;
KOSW http://ophanim.net:7040/;
KTQA https://stream.ktqa.org/ktqa.mp3
KBCS http://www.ophanim.net:7720/stream
KZTM http://bustosradio.com:8039/kztm
KEZE https://17963.live.streamtheworld.com/KEZEFMAAC.aac
KPLW https://ice10.securenetsystems.net/PLR
KROH https://ice10.securenetsystems.net/KROH
KGHI http://173.193.205.96:7341/stream
KUPS https://streamingv2.shoutcast.com/kupsfm
KIEV https://usa17.fastcast4u.com/proxy/kievradio?mp=/1
KSQM https://video1.getstreamhosting.com:8182/stream
KAOS http://205.134.192.90:8930/;
KMIH https://www.streamvortex.com:8444/s/11390
KPBZ https://24883.live.streamtheworld.com/KPBZ_

## 2.) Iterating over every station URL, sampling audio, routing audio to both ACRCloud and Discogs APIs, logging associated information in each station's respective text file
### Note: because the functionality could not be set up to 'multi-thread' this operation, code-blocks like the one below were run multiple times in order to build a working dataset: 

In [24]:
for tup in wash_url_cs_tuples:
    it_staysh = Station(WA_FM_CSV, tup[1])
    it_staysh.RollRadio(it_staysh.url, it_staysh.callsign)

~~Now recording... https://18733.live.streamtheworld.com/KPBX_FM.mp3
No Discogs return from KPBX2021_12_16_T13_22_07.mp3
~~Now recording... http://centova.rockhost.com:8001/stream
Returned:["KHUH2021_12_16_T13_22_44.mp3 ~ song : Just Got Paid by Whitey Morgan And The 78's from album Hard Times and White Lines", 'Discogs returned the genre(s): Country,Honky Tonk']
~~Now recording... https://ic2.sslstream.com/kacs-fm
Returned:['KACS2021_12_16_T13_23_27.mp3 ~ song : You Meet The Nicest People by Andy Williams from album Christmas Treasures', 'Discogs returned the genre(s): Vocal']
~~Now recording... https://www.ophanim.net:8444/s/7170
Error - Sampler could not work on URL stream. Skipping station.
~~Now recording... https://ice7.securenetsystems.net/KXDD
No ACR return from KXDD2021_12_16_T13_23_58.mp3
~~Now recording... http://199.180.75.2:9395/stream/;
No ACR return from KZAX2021_12_16_T13_24_29.mp3
~~Now recording... http://199.180.75.2:9391/;
No ACR return from KMRE2021_12_16_T13_24_57

### 3.) Demonstrating what the 'raw text' inside a given station's "song play log" looks like (after the above code block has been run enough times):

In [24]:
KEXP = Station(WA_FM_CSV, 'KEXP')

with open(KEXP.txtpath, "r") as file_object:
    Lines = file_object.readlines()
    cleanLines = []
    txtheader = [0,1,2]
    for i in range(len(Lines)):
        if i not in txtheader:
            cleanLines.append(Lines[i])
    
for line in cleanLines:
    print(line)

KEXP2021_10_25_T19_59_19.mp3,song=Friends,artist=Big Gigantic,album=Free Your Mind,styles=Hip Hop

KEXP2021_10_26_T11_12_56.mp3,song=holy calamafuck,artist=Run The Jewels,album=holy calamafuck,styles=

KEXP2021_10_26_T12_19_42.mp3,song=The Joke,artist=Brandi Carlile,album=The Joke,styles=Pop Rock,Rock & Roll,Country Rock,Country

KEXP2021_10_26_T12_50_29.mp3,song=The Beachland Ballroom,artist=IDLES,album=The Beachland Ballroom,styles=

KEXP2021_10_27_T10_19_23.mp3,song=Velouria,artist=Pixies,album=Bossanova,styles=Indie Rock

KEXP2021_10_27_T10_37_45.mp3,song=Clean Air,artist=Hand Habits,album=Fun House,styles=

KEXP2021_10_27_T11_31_40.mp3,song=Sunrise,artist=Explosions In The Sky,album=Big Bend (An Original Soundtrack for Public Television),styles=Post Rock,Soundtrack

KEXP2021_10_27_T12_05_38.mp3,song=Marching Bands of Manhattan,artist=Death Cab for Cutie,album=Marching Bands of Manhattan,styles=Indie Rock

KEXP2021_10_27_T12_28_13.mp3,song=Nighttime Drive,artist=Jay Som,album=Anak 

### 4.) Demonstrating the operations involved in parsing this text, in order to put all of the info neatly into a spreadsheet:

In [62]:
genresplit1 = cleanLines[0].split("styles=")[1]
genrecomma = genresplit1.split('\n')[0]
genrelist = genrecomma.split(',')
genrelist

['Pop Rock']

In [63]:
stuffbeforegenre = cleanLines[0].split(",styles")[0]
albumparse = stuffbeforegenre.split('album=')[1]
albumparse

'Let It Be (Remastered)'

In [64]:
stuffbeforealbum = stuffbeforegenre.split(",album=")[0]
artistparse = stuffbeforealbum.split(",artist=")[1]
artistparse

'The Beatles'

In [65]:
stuffbeforeartist = stuffbeforealbum.split(",artist=")[0]
songparse = stuffbeforeartist.split(",song=")[1]
songparse

'Let It Be (Remaster)'

In [66]:
mp3filename = stuffbeforeartist.split(",song=")[0]
mp3filename

'KPBX2021_11_03_T15_32_34.mp3'

In [67]:
mp3_components = mp3filename.split('_')
mp3_components[-1] = mp3_components[-1].split('.mp3')[0]
mp3_components

['KPBX2021', '11', '03', 'T15', '32', '34']

In [69]:
stateabrev

'KPBX2021'

In [71]:
radiocallsgn

'KPBX2021'

In [78]:
#radiocallsgn = mp3_components[0]

stateabrev = 'WA'
captureyear = mp3_components[0].split('KPBX')[1] # ENTER IN ITERSTAYSH.CALLSIGN
capturemonth = mp3_components[1]
captureday = mp3_components[2]
capturehour = mp3_components[3].split('T')[1]
captureminute = mp3_components[4]
capturesecond = mp3_components[5]
                                     
print('STATE :',stateabrev)   # THIS IS GONNA NEED TO BE CODED PROPERLY EVENTUALLY
#print('CALLSIGN :',radiocallsgn)  # JUST ACCESS THRU STATION.CALLSIGN
print('SONG NAME :',songparse)
print('ARTIST NAME :',artistparse)
print('ALBUM NAME :',albumparse)
print('STYLES :',genrecomma)
print('YEAR OF CAPTURE :',captureyear)
print('MONTH OF CAPTURE:',capturemonth)
print('DAY OF CAPTURE :',captureday)
print('HOUR OF CAPTURE :',capturehour)
print('MINUTE OF CAPTURE :',captureminute)
print('SECOND OF CAPTURE :',capturesecond)

STATE : WA
SONG NAME : Let It Be (Remaster)
ARTIST NAME : The Beatles
ALBUM NAME : Let It Be (Remastered)
STYLES : Pop Rock
YEAR OF CAPTURE : 2021
MONTH OF CAPTURE: 11
DAY OF CAPTURE : 03
HOUR OF CAPTURE : 15
MINUTE OF CAPTURE : 32
SECOND OF CAPTURE : 34


## 5.) Iterating over every station's "song play log" .txt file,  parsing/extracting relevant info (saving to temporary variables), and creating clean song play .CSVs for each station:

In [79]:
TrackPlayLogDict = {}

for station in wash_callsign_folds:
    
    iterstaysh = Station(WA_FM_CSV, station)
    timerightnow = datetime.now()
    YMD_rightnow = timerightnow.strftime("%Y_%m_%d")
    
    TrackPlayLogDict[station] = {}
    tracklogCSVpath = os.path.join(iterstaysh.foldpath,
                                   f'{station}_TrackLog_{YMD_rightnow}.csv')
    
    with open(iterstaysh.txtpath, "r") as file_object:
        Lines = file_object.readlines()
        cleanLines = []
        txtheader = [0,1,2]
        for i in range(len(Lines)):
            if i not in txtheader:
                cleanLines.append(Lines[i])
        
    with open(tracklogCSVpath,'w',newline='') as csvfile:
        fieldnames = ['stateabrev',
                      'radiocallsgn',
                      'trackname',
                      'trackartist',
                      'trackalbum',
                      'trackstyles',
                      'trackmp3filename',
                      'captureyear',
                      'capturemonth',
                      'captureday',
                      'capturehour',
                      'captureminute',
                      'capturesecond']
        
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        
        for trackplaystring in cleanLines:
            
            genresplit1 = trackplaystring.split("styles=")[1]
            genrecomma = genresplit1.split('\n')[0]
            genrelist = genrecomma.split(',')
            genrecommastr = ",".join(genrelist)
            
            stuffbeforegenre = trackplaystring.split(",styles")[0]
            albumparse = stuffbeforegenre.split('album=')[1]
            
            stuffbeforealbum = stuffbeforegenre.split(",album=")[0]
            artistparse = stuffbeforealbum.split(",artist=")[1]

            stuffbeforeartist = stuffbeforealbum.split(",artist=")[0]
            songparse = stuffbeforeartist.split(",song=")[1]
            
            mp3filename = stuffbeforeartist.split(",song=")[0]
            
            mp3_components = mp3filename.split('_')
            mp3_components[-1] = mp3_components[-1].split('.mp3')[0]
            
            stateabrev = 'WA' # THIS IS GONNA NEED TO BE CODED PROPERLY EVENTUALLY
            radiocallsgn = station
            captureyear = mp3_components[0].split(station)[1]
            capturemonth = mp3_components[1]
            captureday = mp3_components[2]
            capturehour = mp3_components[3].split('T')[1]
            captureminute = mp3_components[4]
            capturesecond = mp3_components[5]
            
            writer.writerow({
                'stateabrev' : stateabrev,
                'radiocallsgn' : radiocallsgn,
                'trackname' : songparse,
                'trackartist' : artistparse,
                'trackalbum' : albumparse,
                'trackstyles' : genrecommastr,
                'trackmp3filename' : mp3filename,
                'captureyear' : captureyear,
                'capturemonth' : capturemonth,
                'captureday' : captureday,
                'capturehour' : capturehour,
                'captureminute' : captureminute,
                'capturesecond' : capturesecond
            })
print("*~*~~~~~**~~~~~~~~~~~~~~**~*~*\n"
      "Sucessfully printed track play log CSVs for all stations."+
      "\nAll CSVs outputted to respective station folders."+
      f"\nThis batch of CSVs have the extension: {YMD_rightnow}.csv"+
      "\n*~*~~~~~**~~~~~~~~~~~~~~**~*~*")

*~*~~~~~**~~~~~~~~~~~~~~**~*~*
Sucessfully printed track play log CSVs for all stations.
All CSVs outputted to respective station folders.
This batch of CSVs have the extension: 2021_11_03.csv
*~*~~~~~**~~~~~~~~~~~~~~**~*~*


### 6.) Demonstrating what one of these song play .CSV's looks like:

In [46]:
KEXP_tracklog = r'/Users/michaelfelzan/Desktop/GEO FM/WASH STATIONS/KEXP/KEXP_TrackLog_2021_11_03.csv'
KEXP_track_df = pd.read_csv(KEXP_tracklog)
KEXP_track_df

Unnamed: 0,stateabrev,radiocallsgn,trackname,trackartist,trackalbum,trackstyles,trackmp3filename,captureyear,capturemonth,captureday,capturehour,captureminute,capturesecond
0,WA,KEXP,Friends,Big Gigantic,Free Your Mind,Hip Hop,KEXP2021_10_25_T19_59_19.mp3,2021,10,25,19,59,19
1,WA,KEXP,holy calamafuck,Run The Jewels,holy calamafuck,,KEXP2021_10_26_T11_12_56.mp3,2021,10,26,11,12,56
2,WA,KEXP,The Joke,Brandi Carlile,The Joke,"Pop Rock,Rock & Roll,Country Rock,Country",KEXP2021_10_26_T12_19_42.mp3,2021,10,26,12,19,42
3,WA,KEXP,The Beachland Ballroom,IDLES,The Beachland Ballroom,,KEXP2021_10_26_T12_50_29.mp3,2021,10,26,12,50,29
4,WA,KEXP,Velouria,Pixies,Bossanova,Indie Rock,KEXP2021_10_27_T10_19_23.mp3,2021,10,27,10,19,23
5,WA,KEXP,Clean Air,Hand Habits,Fun House,,KEXP2021_10_27_T10_37_45.mp3,2021,10,27,10,37,45
6,WA,KEXP,Sunrise,Explosions In The Sky,Big Bend (An Original Soundtrack for Public Te...,"Post Rock,Soundtrack",KEXP2021_10_27_T11_31_40.mp3,2021,10,27,11,31,40
7,WA,KEXP,Marching Bands of Manhattan,Death Cab for Cutie,Marching Bands of Manhattan,Indie Rock,KEXP2021_10_27_T12_05_38.mp3,2021,10,27,12,5,38
8,WA,KEXP,Nighttime Drive,Jay Som,Anak Ko,"Indie Rock,Lo-Fi,Indie Pop",KEXP2021_10_27_T12_28_13.mp3,2021,10,27,12,28,13
9,WA,KEXP,Come Around,Breeze,Only Up,"Post-Punk,Brit Pop,Alternative Rock,Dream Pop",KEXP2021_10_27_T12_31_57.mp3,2021,10,27,12,31,57


### 7.) Creating a folder to store every station's song play log .CSV file in one place:
##### !!! Only needs to be run one time !!!

In [82]:
song_csv_fold = os.path.join(workindir, 'WASH SONG LOGS')
os.mkdir(song_csv_fold)

### 8.) Copying all individual song play .CSVs into new folder:
##### (one-time run)

In [86]:
for station in wash_callsign_folds:
    pathtostationfold = os.path.join(workindir,
                                    "WASH STATIONS",
                                     station)
    newsongcsv = os.path.join(pathtostationfold,
                             f'{station}_TrackLog_2021_11_03.csv')
    stationfolditems = os.listdir(pathtostationfold)
    fullpathitems = []
    for item in stationfolditems:
        fullpathitems.append(os.path.join(pathtostationfold,
                                         item))
    if newsongcsv in fullpathitems:
        copyfile(newsongcsv,
                 os.path.join(workindir,
                              "WASH SONG LOGS",
                              f'{station}_TrackLog_2021_11_03.csv'))

In [85]:
newsongcsv

'/Users/michaelfelzan/Desktop/GEO FM/WASH STATIONS/KCMS/KCMS_SONGLOG.csv'

### 9.) Merging all individual station song play .CSVs into one concatenated 'master' .CSV:

In [88]:
for station in wash_callsign_folds:
    pathtofold = os.path.join(workindir,
                              'WASH SONG LOGS')
SONGconcatCSV = os.path.join(pathtofold,
                            "CONCAT_TRACKLOG_2021_11_03.csv") 
allFiles = glob.glob(pathtofold + "/*.csv")
allFiles.sort()
with open(SONGconcatCSV, 'wb') as outfile:
        for i, fname in enumerate(allFiles):
            with open(fname, 'rb') as infile:
                if i != 0:
                    infile.readline()  # Throw away header on all but first file
            # Block copy rest of file from input to output without parsing
                shutil.copyfileobj(infile, outfile)

print("     *~*~~~~~**~~~~~~~~~~~~~~**~*~*\n"
      "All SONG PLAY LOG CSVs (that exist) have"+
      "\n      been successfully merged together"
      "\n  *~*~~~~~**~~~~~~~~~~~~~~**~*~*")

     *~*~~~~~**~~~~~~~~~~~~~~**~*~*
All SONG PLAY LOG CSVs (that exist) have
      been successfully merged together
  *~*~~~~~**~~~~~~~~~~~~~~**~*~*


In [47]:
concat_tracklog = r'/Users/michaelfelzan/Desktop/GEO FM/WASH SONG LOGS/CONCAT_TRACKLOG_2021_11_03.csv'

# F.) Creating statistics based on genres played by each station

### 1.) Creating a list of unique styles/genres that appear on the concatenated song play log .CSV:

In [50]:
unique_styles = []

with open(concat_tracklog, 'r') as read_obj:
    csv_reader = reader(read_obj)
    header = next(csv_reader)
    if header != None:
        for row in csv_reader:
            initvalue = row[5]
            splitinit = initvalue.split(',')
            for item in splitinit:
                if item not in unique_styles:
                    unique_styles.append(item)

In [51]:
unique_styles

['Gospel',
 'Country',
 'Country Rock',
 'Religious',
 'Easy Listening',
 '',
 'Blues Rock',
 'Stoner Rock',
 'Psychedelic Rock',
 'Indie Rock',
 'No Wave',
 'Surf',
 'Norteño',
 'Son',
 'Ranchera',
 'Mariachi',
 'Vocal',
 'Corrido',
 'Cumbia',
 'Tejano',
 'Rock & Roll',
 'Southern Rock',
 'Hard Rock',
 'Classic Rock',
 'Prog Rock',
 'Soft Rock',
 'Pop Rock',
 'Heavy Metal',
 'Post Bop',
 'Latin Jazz',
 'Bop',
 'Hard Bop',
 'Contemporary Jazz',
 'Hip Hop',
 'Post Rock',
 'Soundtrack',
 'Lo-Fi',
 'Indie Pop',
 'Post-Punk',
 'Brit Pop',
 'Alternative Rock',
 'Dream Pop',
 'Abstract',
 'Ballad',
 'House',
 'Pop Rap',
 'RnB/Swing',
 'Synth-pop',
 'UK Garage',
 'Funk',
 'Art Rock',
 'Baroque',
 'Classical',
 'Romantic',
 'Neo-Classical',
 'Modern',
 'Folk Rock',
 'Emo',
 'Pop Punk',
 'Post-Hardcore',
 'Metalcore',
 'Score',
 'Downtempo',
 'Ambient',
 'Drone',
 'Experimental',
 'New Age',
 'Power Pop',
 'Acoustic',
 'Dance-pop',
 'Soul',
 'Electro',
 'Europop',
 'Progressive House',
 'Soul-J

### 2.) Defining, from the above list, which of those genres are "electronic music" genres (this was subjective/ based on my own knowledge/interpretation of music genres, though my decisions were corroborated by Wikipedia's definitions)

In [52]:
electronic_music_genres = ['House',
                          'RnB/Swing',
                          'Synth-pop',
                          'UK Garage',
                          'Downtempo',
                          'Ambient',
                          'Drone',
                          'New Age',
                          'Dance-pop',
                          'Electro',
                          'Europop',
                          'Progressive House',
                          'Hi NRG',
                          'Happy Hardcore',
                          'Techno',
                          'Euro House',
                          'Italodance',
                          'Broken Beat',
                          'Deep House',
                          'Contemporary R&B',
                          'Tropical House',
                          'Electro House',
                          'Grime']

### 3.) Iterating over 'master' song play log .CSV, and marking tracks which contain one of the above 'electronic' genre tags. In doing this, a proportion of 'electronic songs played' by 'total songs played' for each radio station may be created :

In [74]:
electronic_11_3_path = r'/Users/michaelfelzan/Desktop/GEO FM/WASH SONG LOGS/ELECTRONIC_11_03.csv'
        
with open(electronic_11_3_path,'w',newline='') as csvfile:
    fieldnames = ['stateabrev',
                  'radiocallsgn',
                  'trackname',
                  'trackartist',
                  'trackalbum',
                  'trackstyles',
                  'trackmp3filename',
                  'captureyear',
                  'capturemonth',
                  'captureday',
                  'capturehour',
                  'captureminute',
                  'capturesecond',
                  'elec_or_not'
                 ]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    
    with open(concat_tracklog, 'r') as read_obj:
        csv_reader = reader(read_obj)
        header = next(csv_reader)
        if header != None:
            for row in csv_reader:
                if row[5] == "":
                    pass
                else:    
                    if any(x in row[5] for x in electronic_music_genres):
                        #print(row[5], "Duplicates found.")
                        elecornot = 'yes'
                    else:
                        #print(row[5], "No duplicates found.")
                        elecornot = 'no'
            
                    writer.writerow({
                        'stateabrev' : row[0],
                        'radiocallsgn' : row[1],
                        'trackname' : row[2],
                        'trackartist' : row[3],
                        'trackalbum' : row[4],
                        'trackstyles' : row[5],
                        'trackmp3filename' : row[6],
                        'captureyear' : row[7],
                        'capturemonth' : row[8],
                        'captureday' : row[9],
                        'capturehour' : row[10],
                        'captureminute' : row[11],
                        'capturesecond' : row[12],
                        'elec_or_not' : elecornot
                    })

### 4.) Note the 'elec_or_not' column header added to the below spreadsheet:

In [75]:
elec_df_11_3 = pd.read_csv(electronic_11_3_path)
elec_df_11_3 

Unnamed: 0,stateabrev,radiocallsgn,trackname,trackartist,trackalbum,trackstyles,trackmp3filename,captureyear,capturemonth,captureday,capturehour,captureminute,capturesecond,elec_or_not
0,WA,KACS,Less Like Me,Zach Williams,Rescue Story,"Gospel,Country,Country Rock",KACS2021_10_26_T12_00_26.mp3,2021,10,26,12,0,26,no
1,WA,KACS,Heaven Is The Face,Steven Curtis Chapman,Beauty Will Rise,"Gospel,Religious",KACS2021_10_27_T11_28_52.mp3,2021,10,27,11,28,52,no
2,WA,KACS,Bloodwashed Pilgrim,Crystal Lewis,(Hymns) My Life,"Gospel,Easy Listening",KACS2021_10_27_T12_31_59.mp3,2021,10,27,12,31,59,no
3,WA,KACS,Christ In You [Instrumental],Integrity Music,Experience Hope [Instrumental],Religious,KACS2021_11_03_T14_33_11.mp3,2021,11,3,14,33,11,no
4,WA,KACS,My Jesus,Anne Wilson,My Jesus,Gospel,KACS2021_11_03_T15_33_54.mp3,2021,11,3,15,33,54,no
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
103,WA,KZTM,Cómo Te Olvido,La Arrolladora Banda El Limón De René Camacho,Cómo Te Olvido,"Norteño,Corrido,Cumbia,Tejano",KZTM2021_10_26_T12_03_49.mp3,2021,10,26,12,3,49,no
104,WA,KZZU,Love Again,Dua Lipa,Love Again,"Dance-pop,Synth-pop,Disco,Funk",KZZU2021_10_26_T12_11_38.mp3,2021,10,26,12,11,38,yes
105,WA,KZZU,Shivers,Ed Sheeran,Shivers,Dance-pop,KZZU2021_10_27_T12_05_38.mp3,2021,10,27,12,5,38,yes
106,WA,KZZU,One Call Away,Charlie Puth,Nine Track Mind Deluxe,"Ballad,Contemporary R&B,Soul,Tropical House",KZZU2021_10_27_T12_31_57.mp3,2021,10,27,12,31,57,yes


### 5.) Using Pandas 'GroupBy' to summarize statistics by each radio station (callsign):

In [81]:
groupby_elec11_3 = pd.DataFrame({'count' : elec_df_11_3.groupby( [ "radiocallsgn", "elec_or_not"] ).size()}).unstack(fill_value=0)

groupby_elec11_3

Unnamed: 0_level_0,count,count
elec_or_not,no,yes
radiocallsgn,Unnamed: 1_level_2,Unnamed: 2_level_2
KACS,5,0
KAOS,2,0
KBFG,2,0
KCMS,1,0
KDDS,2,0
KEGX,6,0
KEWU,4,0
KEXP,11,1
KEZE,2,0
KFAE,2,0


### 6.) Printing summary statistics to .CSV, so this data may be linked with a point-class shapefile in ArcGIS Pro (allowing for spatial analysis) :

In [83]:
groupby_elec11_3.to_csv(r'/Users/michaelfelzan/Desktop/GEO FM/WASH SONG LOGS/11_3_electronic_yesno.csv')