# DALI-Test4ALT

## PART 1: RETRIEVING AUDIO

  This is a tutorial that shows how to retrieve the audio files for the DALI-test4ALT from their corresponding Youtube links. 
  
  - For this, we exploit the identifiers within the DALI-v1.0 dataset.
  
  - We created a CSV file which contains the DALI identifiers, song titles, artist ID, gender info and release years that will be used to retrieve data from Youtube.

In [132]:
import os, re
import pandas as pd
import DALI as dali_code
from youtubesearchpython import VideosSearch, ChannelsSearch
from youtube_dl import YoutubeDL
import unidecode

Declare the locations of the metadata file and directory to save the audio files.

In [136]:
INPUT_CSV_DIR = './DALI_TestSet4ALT.csv'
SAVE_DIR = './audio'

In [137]:
if not os.path.exists(SAVE_DIR):
    os.mkdir(SAVE_DIR)

In [138]:
data = pd.read_csv(INPUT_CSV_DIR,index_col=0)
data.head(10)

Unnamed: 0_level_0,SONG_TITLE,ARTIST,GENDER,RELEASE_YEAR,LYRICS
DALI_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
44a2455abc0e4fb397a396d2cd1ebeb9,Caught In The Middle,A1,M,2002.0,YOU SAID THAT LOVE WAS JUST A STATE OF MIND A ...
15d6e9e88ced41dfbff38ba2f3e1d885,All For You,Ace Of Base,F,2010.0,TRIED TO WRITE YOU A LOVE SONG THOUGHT I COULD...
a59e44a4c910443a87f068b177200fdc,Life Is A Flower,Ace Of Base,F,1998.0,WE LIVE IN A FREE WORLD I WHISTLE DOWN THE WIN...
ae91bcda73944695b7756ddc066c3e02,Beautiful Life,Ace Of Base,F,1995.0,YOU CAN DO WHAT YOU WANT JUST SEIZE THE DAY WH...
d6e3cf403653490f8366bf77cbc0f186,Unstable,Adema,M,2003.0,I WANTED TO KNOW WHO YOU REALLY ARE I NEEDED T...
7a1642003f574713a6e25e5ee549fce6,When You Walk In The Room,Agnetha Faltskog,F,2004.0,I CAN FEEL A NEW EXPRESSION ON MY FACE I CAN F...
8eb15ad6d17f41b68009fe3848930dee,Love Stoned,Akcent,M,,HOW DEEP IS YOUR LOVE BRING THE SUN OUT BABY H...
b63d71b7ec6b4c5e9f53a87f83fcd73e,Hallelujah,Alexandra Burke,F,2009.0,I HEARD THERE WAS A SECRET CHORD THAT DAVID PL...
ac0279efad294261b4c7dc0b86eaaec5,Destination Calabria,Alex Gaudino,M,2008.0,DESTINATION UNKNOWN KNOWN KNOWN KNOWN KNOWN UN...
1478242633aa4628aa9a88ca6f54130a,No More Mr. Nice Guy,Alice Cooper,M,1973.0,I USED TO BE SUCH A SWEET SWEET THING 'TIL THE...


Define functions for the audio retrieval procedure.

In [139]:
def download_audio_from_youtube(dali_id,link,output_dir):
    
    try:
        audio_downloader = YoutubeDL({
                                'format': 'bestaudio/best',
                                'postprocessors': [{
                                'key': 'FFmpegExtractAudio',
                                'preferredcodec': 'mp3',
                                'preferredquality': '192',
                                                   }],
                                'outtmpl': os.path.join(output_dir,dali_id + '.mp3'),
                                'quiet': False
                                                 })
        audio_downloader.extract_info(link)
        
    except:
        
        print('Download with python failed!')
        print('Trying with API on bash.')
        os.system(f"""youtube-dl -o "{output_dir}/{dali_id}.mp3" --extract-audio -x --audio-format mp3 {link}""")

In [140]:
def normalize_text(text): 
    # Convert to lowercase to match with search results 
    # AND REMOVE SPECIAL ALL CHARACTERS LIKE " (,),!,..."
    # AND REMOVE ACCENTS (NORMALIZE TEXT)
    return unidecode.unidecode(re.sub(r"[^a-zA-Z0-9'(-)(/) ]",r'',text.lower())) 
    

In [147]:
def retrieve_audio_for_dali_test(entry,save_dir,success):
    
    success = False
    
    #missing = []
    #vid_titles = []
    
    # PARSE SONG ID INFO TO BE USED FOR SEARCHING ON YOUTUBE
    
    dali_id = entry[0]
    
 
    song_title = normalize_text(entry[1]['SONG_TITLE'])
    artist_id = normalize_text(entry[1]['ARTIST'])
    year = str(entry[1]['RELEASE_YEAR'])

    try:
        year = int(year)   # CHECK IF RELEASE_YEAR INFO IS AVAILABLE
        search_title = song_title + ' ' + artist_id + ' ' + year
    except:
        # IF RELEASE_YER = NaN, THEN USE ARTIST AND SONG TITLES ONLY FOR THE SEARCH
        search_title = song_title + ' ' + artist_id
        
    # FIRST LIST OF SEARCH QUERIES, TO CHECK IF ALL EXIST IN THE 
    # YOUTUBE VIDEO TITLE    
    search_queries_1 = [song_title,artist_id]         
    
    # SECOND LIST OF SEARCH QUERIES, TO CHECK IF ANY EXISTS IN THE 
    # YOUTUBE VIDEO TITLE. THESE ARE DETERMINED TO RETRIEVE THE VERSIONS
    # USED IN DALI_TestSet4ALT DATASET
    search_queries_2 = ['official','lyrics']
    
    # THIRD LIST OF SEARCH QUERIES, TO CHECK IF ANY DOES NOT EXIST IN THE 
    # YOUTUBE VIDEO TITLE. THESE ARE DETERMINED TO RETRIEVE THE VERSIONS
    # USED IN DALI_TestSet4ALT DATASET
    search_queries_3 = ['remix']
    
    #THE ABOVE DEFINED LIST OF QUERIES CAN BE EXTENDED DEPENDING ON THE GOAL OF THE USER
    videosSearch = VideosSearch(search_title, limit = 2)
    for vid in videosSearch.result()['result']:
        vid_lower = normalize_text(vid['title'])
        print(vid['title'])
        if success == False:
            if all(item in vid_lower for item in search_queries_1):
                if any(item in vid_lower for item in search_queries_2):
                    #vid_titles.append([vid['title'],vid['link']])
                    download_audio_from_youtube(dali_id,vid['link'],save_dir)
                    success=True
                    break
                elif all(item not in vid_lower for item in search_queries_3):   
                   # vid_titles.append([vid['title'],vid['link']])
                    download_audio_from_youtube(dali_id,vid['link'],save_dir)
                    success=True  
                    break
    
    return success

  - Main download loop below:

In [149]:
retrieved =[]
failed = []

for row in data.iterrows():
    
    success = False
    print(row[0])
    # PARSE SONG ID INFO TO BE USED FOR SEARCHING ON YOUTUBE
    
    success = retrieve_audio_for_dali_test(row,SAVE_DIR,success)
    
    if success == True:
        
        retrieved.append(row[0])
        print('Retrieval of ' + row[1]['SONG_TITLE'] + ' - ' + row[1]['ARTIST'] + ' is successful.')

    if success == False:
        print('Could not retrieve the audio for ' + row[0] +' due to search queries are not present in the video title!')
        failed.append(row[0])
        #print('\n')

44a2455abc0e4fb397a396d2cd1ebeb9
A1 - Caught in the Middle
[youtube] qz57Ucb02yo: Downloading webpage
[download] ./audio/44a2455abc0e4fb397a396d2cd1ebeb9.mp3 has already been downloaded
[download] 100% of 3.20MiB
[ffmpeg] Post-process file ./audio/44a2455abc0e4fb397a396d2cd1ebeb9.mp3 exists, skipping
Retrieval of Caught In The Middle - A1 is successful.
15d6e9e88ced41dfbff38ba2f3e1d885
Ace of Base - All for You (Official Music Video)
[youtube] Vg0zSGVNjfo: Downloading webpage
[download] ./audio/15d6e9e88ced41dfbff38ba2f3e1d885.mp3 has already been downloaded
[download] 100% of 3.37MiB
[ffmpeg] Post-process file ./audio/15d6e9e88ced41dfbff38ba2f3e1d885.mp3 exists, skipping
Retrieval of All For You - Ace Of Base is successful.
a59e44a4c910443a87f068b177200fdc
Ace of Base - Life Is a Flower (Official Music Video)
[youtube] sdc_6YvJfFo: Downloading webpage
[download] ./audio/a59e44a4c910443a87f068b177200fdc.mp3 has already been downloaded
[download] 100% of 3.32MiB
[ffmpeg] Post-process fi

Asaf Avidan, The Mojos - One Day / Reckoning Song (Videoclip Day Version)
[youtube] YRom6y1E8p8: Downloading webpage
[download] ./audio/5e774785ea79424d96974f5662332b31.mp3 has already been downloaded
[download] 100% of 3.78MiB
[ffmpeg] Post-process file ./audio/5e774785ea79424d96974f5662332b31.mp3 exists, skipping
Retrieval of Reckoning Song - Asaf Avidan The Mojos is successful.
7a14ac26f9624922859b33caede07b07
Band Of Skulls - I Know What I Am
[youtube] kfwvpyrAW60: Downloading webpage
[download] ./audio/7a14ac26f9624922859b33caede07b07.mp3 has already been downloaded
[download] 100% of 3.08MiB
[ffmpeg] Post-process file ./audio/7a14ac26f9624922859b33caede07b07.mp3 exists, skipping
Retrieval of I Know What I Am - Band Of Skulls is successful.
e61d7f0bf9ad4947885ca6e0b664b23d
Barry McGuire - Eve Of Destruction
[youtube] qfZVu0alU0I: Downloading webpage
[download] ./audio/e61d7f0bf9ad4947885ca6e0b664b23d.mp3 has already been downloaded
[download] 100% of 3.51MiB
[ffmpeg] Post-process 

[download] ./audio/c69d9213f91a40349ce570c6c8d9fdc6.mp3 has already been downloaded
[download] 100% of 3.53MiB
[ffmpeg] Correcting container in "./audio/c69d9213f91a40349ce570c6c8d9fdc6.mp3"
[ffmpeg] Post-process file ./audio/c69d9213f91a40349ce570c6c8d9fdc6.mp3 exists, skipping
Retrieval of With Ur Love - Cher Lloyd is successful.
f45833d95ffc48a092874ed9c01bcad8
Cher Lloyd - Swagger Jagger
[youtube] sdbyG2MrBHk: Downloading webpage
[download] ./audio/f45833d95ffc48a092874ed9c01bcad8.mp3 has already been downloaded
[download] 100% of 3.23MiB
[ffmpeg] Post-process file ./audio/f45833d95ffc48a092874ed9c01bcad8.mp3 exists, skipping
Retrieval of Swagger Jagger - Cher Lloyd is successful.
9eb55eeb476e48a79b41876fdcab6712
Chicago - Hard To Say I'm Sorry/Get Away (Official Audio)
[youtube] EORSLz0_BRU: Downloading webpage
[download] ./audio/9eb55eeb476e48a79b41876fdcab6712.mp3 has already been downloaded
[download] 100% of 4.87MiB
[ffmpeg] Post-process file ./audio/9eb55eeb476e48a79b41876fdc

[download] Destination: ./audio/155c1427eb074347b7fc4dc89ccd2a27.mp3
[download] 100% of 8.03MiB in 00:0034MiB/s ETA 00:000
[ffmpeg] Post-process file ./audio/155c1427eb074347b7fc4dc89ccd2a27.mp3 exists, skipping
Retrieval of American Pie - Don McLean is successful.
2710e4d6a4e3478fbd0ed1706840ff21
Don Williams - Some Broken Hearts Never Mend
[youtube] RqjDvEC1uVw: Downloading webpage


ERROR: unable to download video data: HTTP Error 403: Forbidden


Download with python failed!
Trying with API on bash.
Retrieval of Some Broken Hearts Never Mend - Don Williams is successful.
9994eba1573e467994bea7445554d2e9
East 17 - Around The World (Official Video)
[youtube] hMBJ50LDfjI: Downloading webpage
[download] Destination: ./audio/9994eba1573e467994bea7445554d2e9.mp3
[download] 100% of 4.27MiB in 00:0050MiB/s ETA 00:004
[ffmpeg] Correcting container in "./audio/9994eba1573e467994bea7445554d2e9.mp3"
[ffmpeg] Post-process file ./audio/9994eba1573e467994bea7445554d2e9.mp3 exists, skipping
Retrieval of Around The World - East 17 is successful.
72df894f38944b96bee3a5041fbb34f7
Ed Sheeran - I See Fire (Music Video)
[youtube] 2fngvQS_PmQ: Downloading webpage
[download] Destination: ./audio/72df894f38944b96bee3a5041fbb34f7.mp3
[download] 100% of 4.55MiB in 00:0025MiB/s ETA 00:005
[ffmpeg] Correcting container in "./audio/72df894f38944b96bee3a5041fbb34f7.mp3"
[ffmpeg] Post-process file ./audio/72df894f38944b96bee3a5041fbb34f7.mp3 exists, skipping


Gary Go - Wonderful
[youtube] a28s_wyqkyc: Downloading webpage
[download] Destination: ./audio/ff348b1ede1043d8a1ab033b326f8056.mp3
[download] 100% of 3.56MiB in 00:0009MiB/s ETA 00:005
[ffmpeg] Post-process file ./audio/ff348b1ede1043d8a1ab033b326f8056.mp3 exists, skipping
Retrieval of Wonderful - Gary Go is successful.
607bb216fd9c49f88604be23bb3bc87c
Gary Moore - Still Got The Blues (Live)
[youtube] 4O_YMLDvvnw: Downloading webpage
[download] Destination: ./audio/607bb216fd9c49f88604be23bb3bc87c.mp3
[download] 100% of 6.59MiB in 00:0024MiB/s ETA 00:007
[ffmpeg] Post-process file ./audio/607bb216fd9c49f88604be23bb3bc87c.mp3 exists, skipping
Retrieval of Still Got The Blues - Gary Moore is successful.
ff33ea7be69e46e4bf154c1d6f3ae45b
Gary Moore - Always Gonna Love You [HD]
[youtube] c0gg4zY0oZ0: Downloading webpage
[download] Destination: ./audio/ff33ea7be69e46e4bf154c1d6f3ae45b.mp3
[download] 100% of 3.69MiB in 00:4483KiB/s ETA 00:002
[ffmpeg] Post-process file ./audio/ff33ea7be69e46

[download] 100% of 3.72MiB in 00:0002MiB/s ETA 00:005
[ffmpeg] Post-process file ./audio/cfbd301d613b4041aff2f3ac0d246e56.mp3 exists, skipping
Retrieval of Who You Are - Jessie J is successful.
289a42d903ca4b8fb5e6087a47a031a3
Jewel - Intuition (Official Music Video)
[youtube] 8Ilh1ewceco: Downloading webpage
[download] Destination: ./audio/289a42d903ca4b8fb5e6087a47a031a3.mp3
[download] 100% of 3.70MiB in 00:0088MiB/s ETA 00:005
[ffmpeg] Post-process file ./audio/289a42d903ca4b8fb5e6087a47a031a3.mp3 exists, skipping
Retrieval of Intuition - Jewel is successful.
40e95f7040dc406f9387e3ba6352d630
Jewel - Foolish Games (Official Music Video)
[youtube] UNoouLa7uxA: Downloading webpage
[download] Destination: ./audio/40e95f7040dc406f9387e3ba6352d630.mp3
[download] 100% of 4.01MiB in 00:0077MiB/s ETA 00:004
[ffmpeg] Post-process file ./audio/40e95f7040dc406f9387e3ba6352d630.mp3 exists, skipping
Retrieval of Foolish Games - Jewel is successful.
6167718ba7be41fc8cfc4f2e37c53ea4
Jewel - Stronge

ERROR: unable to download video data: HTTP Error 403: Forbidden


Download with python failed!
Trying with API on bash.
Retrieval of Unconditionally - Katy Perry is successful.
b32c14c66b2545d2b087db7b36aa1d0e
Kenny Loggins - Footloose (Official Video)
[youtube] ltrMfT4Qz5Y: Downloading webpage
[download] Destination: ./audio/b32c14c66b2545d2b087db7b36aa1d0e.mp3
[download] 100% of 2.71MiB in 00:0097MiB/s ETA 00:003
[ffmpeg] Correcting container in "./audio/b32c14c66b2545d2b087db7b36aa1d0e.mp3"
[ffmpeg] Post-process file ./audio/b32c14c66b2545d2b087db7b36aa1d0e.mp3 exists, skipping
Retrieval of Footloose - Kenny Loggins is successful.
8d7f9342cf644e8797f4d7d0402a75bd
Klaxons - It's Not Over Yet
[youtube] dGm1B9784nk: Downloading webpage
[download] Destination: ./audio/8d7f9342cf644e8797f4d7d0402a75bd.mp3
[download] 100% of 3.47MiB in 00:0034MiB/s ETA 00:003
[ffmpeg] Correcting container in "./audio/8d7f9342cf644e8797f4d7d0402a75bd.mp3"
[ffmpeg] Post-process file ./audio/8d7f9342cf644e8797f4d7d0402a75bd.mp3 exists, skipping
Retrieval of It's Not Over Y

[download] 100% of 4.81MiB in 00:0035MiB/s ETA 00:006
[ffmpeg] Post-process file ./audio/fb8234c517c246ba90726510a55071f4.mp3 exists, skipping
Retrieval of The Heart Never Lies - McFly is successful.
8e8159784ab14916a65e60cb1b0b485a
Melanie C - Northern Star
[youtube] i6OLkIF3Up4: Downloading webpage
[download] Destination: ./audio/8e8159784ab14916a65e60cb1b0b485a.mp3
[download] 100% of 3.77MiB in 00:0096MiB/s ETA 00:005
[ffmpeg] Post-process file ./audio/8e8159784ab14916a65e60cb1b0b485a.mp3 exists, skipping
Retrieval of Northern Star - Melanie C is successful.
b3639d60a49e45578a201d910f36c44c
Melanie C - First Day Of My Life (Music Video) (HQ)
[youtube] W9n5AkJU-so: Downloading webpage
[download] Destination: ./audio/b3639d60a49e45578a201d910f36c44c.mp3
[download] 100% of 3.79MiB in 00:5767KiB/s ETA 00:005
[ffmpeg] Post-process file ./audio/b3639d60a49e45578a201d910f36c44c.mp3 exists, skipping
Retrieval of First Day Of My Life - Melanie C is successful.
c34b9f8f9de7432d8f0f0b42e45c826

[download] 100% of 2.96MiB in 00:0065MiB/s ETA 00:0003
[ffmpeg] Correcting container in "./audio/86fd9ce08a084c3f8016043904788f6f.mp3"
[ffmpeg] Post-process file ./audio/86fd9ce08a084c3f8016043904788f6f.mp3 exists, skipping
Retrieval of Absolutely (Story Of A Girl) - Nine Days is successful.
aa6d4f4138d340b3a286bd9b837d71de
no mercy - when i die
[youtube] GJq36IRz3Ds: Downloading webpage
[download] Destination: ./audio/aa6d4f4138d340b3a286bd9b837d71de.mp3
[download] 100% of 4.56MiB in 00:0004MiB/s ETA 00:007
[ffmpeg] Post-process file ./audio/aa6d4f4138d340b3a286bd9b837d71de.mp3 exists, skipping
Retrieval of When I Die - No Mercy is successful.
f3bec6c19cf84f9295d5b1b112446727
Spirit In The Sky Norman Greenbaum
[youtube] AZQxH_8raCI: Downloading webpage
[download] Destination: ./audio/f3bec6c19cf84f9295d5b1b112446727.mp3
[download] 100% of 3.78MiB in 00:0036MiB/s ETA 00:004
[ffmpeg] Post-process file ./audio/f3bec6c19cf84f9295d5b1b112446727.mp3 exists, skipping
Retrieval of Spirit In T

The Pretenders - Back On The Chain Gang HQ Music
[youtube] CK3uf5V0pDA: Downloading webpage
[download] Destination: ./audio/cb7e2ddbc47d4cdd9cc57fd858578ed1.mp3
[download] 100% of 3.55MiB in 00:0034MiB/s ETA 00:005
[ffmpeg] Post-process file ./audio/cb7e2ddbc47d4cdd9cc57fd858578ed1.mp3 exists, skipping
Retrieval of Back On The Chain Gang - Pretenders is successful.
1916eff187b84c6c86a3bcdb27ee8da4
Rainbow - I Surrender
[youtube] iMmMqfQZkxA: Downloading webpage
[download] Destination: ./audio/1916eff187b84c6c86a3bcdb27ee8da4.mp3
[download] 100% of 3.07MiB in 00:0009MiB/s ETA 00:003
[ffmpeg] Correcting container in "./audio/1916eff187b84c6c86a3bcdb27ee8da4.mp3"
[ffmpeg] Post-process file ./audio/1916eff187b84c6c86a3bcdb27ee8da4.mp3 exists, skipping
Retrieval of I Surrender - Rainbow is successful.
45e0ccbdf76f4060af50f95d93492755
Rancid - "Time Bomb"
[youtube] DhKHAopx7D0: Downloading webpage
[download] Destination: ./audio/45e0ccbdf76f4060af50f95d93492755.mp3
[download] 100% of 2.35MiB

[download] 100% of 2.83MiB in 00:0031MiB/s ETA 00:003
[ffmpeg] Correcting container in "./audio/ae62daf4c9b34bdba090bc5a1906fd77.mp3"
[ffmpeg] Post-process file ./audio/ae62daf4c9b34bdba090bc5a1906fd77.mp3 exists, skipping
Retrieval of All Around My Hat - Steeleye Span is successful.
6e937fd74e4542cfb62064eaaa247869
Sugarland - Tonight
[youtube] pNDtN4XM8KA: Downloading webpage
[download] Destination: ./audio/6e937fd74e4542cfb62064eaaa247869.mp3
[download] 100% of 4.44MiB in 01:0611KiB/s ETA 00:00
[ffmpeg] Post-process file ./audio/6e937fd74e4542cfb62064eaaa247869.mp3 exists, skipping
Retrieval of Tonight - Sugarland is successful.
9942d02f1525441a8114f1d28ffb3d30
Sugarland - Stay (Official Video)
[youtube] zPG1n1B0Ydw: Downloading webpage
[download] Destination: ./audio/9942d02f1525441a8114f1d28ffb3d30.mp3
[download] 100% of 4.85MiB in 00:0038MiB/s ETA 00:005
[ffmpeg] Post-process file ./audio/9942d02f1525441a8114f1d28ffb3d30.mp3 exists, skipping
Retrieval of Stay - Sugarland is succe

[download] 100% of 6.00MiB in 00:0055MiB/s ETA 00:007
[ffmpeg] Post-process file ./audio/6241722644184050bb1a04a840701e54.mp3 exists, skipping
Retrieval of Rescue Me - The Gathering is successful.
a1f95acf37684dbda26a0fd9d7e99f29
The Gathering - You Learn About It (Video)
[youtube] yq9GpAWF6RA: Downloading webpage
[download] Destination: ./audio/a1f95acf37684dbda26a0fd9d7e99f29.mp3
[download] 100% of 3.58MiB in 00:0023MiB/s ETA 00:004
[ffmpeg] Post-process file ./audio/a1f95acf37684dbda26a0fd9d7e99f29.mp3 exists, skipping
Retrieval of You Learn About It - The Gathering is successful.
dc0cd0d4ec64447e894540fe4a0a01f5
Pale Traces
The Gathering - Pale Traces (Subtitulada)
[youtube] aZuhVO7gwlE: Downloading webpage
[download] Destination: ./audio/dc0cd0d4ec64447e894540fe4a0a01f5.mp3
[download] 100% of 7.58MiB in 00:0164MiB/s ETA 00:007
[ffmpeg] Post-process file ./audio/dc0cd0d4ec64447e894540fe4a0a01f5.mp3 exists, skipping
Retrieval of Pale Traces - The Gathering is successful.
d1a18dd66ef

Retrieval of Aquarius - Within Temptation is successful.
73972858ae3d464f9f8bf8dcd562aafd
Within Temptation - Faster (Videoclip)
[youtube] iQVei5C2N4E: Downloading webpage
[download] Destination: ./audio/73972858ae3d464f9f8bf8dcd562aafd.mp3
[download] 100% of 4.07MiB in 00:0095MiB/s ETA 00:004
[ffmpeg] Post-process file ./audio/73972858ae3d464f9f8bf8dcd562aafd.mp3 exists, skipping
Retrieval of Faster - Within Temptation is successful.
b0c1c41a5a024f47ae1eca7a8b5ca59b
Towards the End
Within Temptation - Towards The End
[youtube] RGmCCqAKpIE: Downloading webpage
[download] Destination: ./audio/b0c1c41a5a024f47ae1eca7a8b5ca59b.mp3
[download] 100% of 3.51MiB in 00:0025MiB/s ETA 00:0003
[ffmpeg] Post-process file ./audio/b0c1c41a5a024f47ae1eca7a8b5ca59b.mp3 exists, skipping
Retrieval of Towards The End - Within Temptation is successful.
b1b6bc336f78441b8b31da555ccf59d8
Within Temptation - Forgiven (Lyrics)
[youtube] a9QSoxoMpfo: Downloading webpage
[download] Destination: ./audio/b1b6bc336f

The number of recordings retrieved from Youtube should be 238 which means there are 2 songs missing.

In [151]:
len(retrieved)

238

Our search couldn't retrieve the missing recordings as they do not have their artist names in their Youtube video titles. These songs can be seen below:

In [152]:
failed

['144393f2a4a94a3f81907378f4edc095', '42c7fa8a79df4a19932c2fef2ede7b58']

The search algorithm lists the results according to their relevance. For these recordings, we simply retrieve the very first result that popped up in the search. Note that, these recordings are manually verified to be the same version that is used in DALI_TestSet4ALT. 

  - Please check the durations to be extra sure that you downloaded the correct version.

In [153]:
for item in failed:
    search_title = data.loc[item]['SONG_TITLE'] + data.loc[item]['ARTIST']
    videosSearch = VideosSearch(search_title, limit = 1)
    download_audio_from_youtube(item,videosSearch.result()['result'][0]['link'],SAVE_DIR)

[youtube] 65inlvN8Pq4: Downloading webpage
[download] ./audio/144393f2a4a94a3f81907378f4edc095.mp3 has already been downloaded
[download] 100% of 974.13KiB
[ffmpeg] Post-process file ./audio/144393f2a4a94a3f81907378f4edc095.mp3 exists, skipping
[youtube] hAm0H5A85gI: Downloading webpage
[download] ./audio/42c7fa8a79df4a19932c2fef2ede7b58.mp3 has already been downloaded
[download] 100% of 3.66MiB
[ffmpeg] Post-process file ./audio/42c7fa8a79df4a19932c2fef2ede7b58.mp3 exists, skipping


If the number of recordings is equal to 240, you have successfully retrieved the audio files for DALI_TestSet4ALT. 

In [164]:
len(os.listdir(SAVE_DIR))

240

Next, we format the data to be utilized within the Kaldi framework. This step is optional.

## PART 2: Data formatting for Kaldi

The script below generates the data files required by Kaldi. Once generated, you can directly input the output directory ```DATA_DIR_KALDI``` into your Kaldi pipeline for testing.

In [165]:
DATA_DIR_KALDI = 'DALI_Test'
if not os.path.exists(DATA_DIR_KALDI):
    os.mkdir(DATA_DIR_KALDI)

In [170]:
with open(DATA_DIR_KALDI + '/text','w') as wt, open(DATA_DIR_KALDI + '/utt2spk','w') as wu, open(DATA_DIR_KALDI + '/wav.scp','w') as ww:
    for row in data.iterrows():
        data_id = row[1]['GENDER'] + '-' + row[1]['ARTIST'] + '-' + row[0]
        wt.write(data_id + ' ' + row[1]['LYRICS'] + '\n')
        wu.write(data_id + ' ' + row[1]['ARTIST'] + '\n')
        ww.write(data_id + ' ffmpeg -y -i '
              + os.path.abspath(os.path.join('.',SAVE_DIR)) + '/' + row[0] 
              + '.mp3  -ar 16000 -ac 1 -f wav - | sox -t wav - -t wav - |\n' )