# Audio Data Pre-Processing [Sound Cloud]

## Part 1: Data Mining
You are given a soundscloud_urls.csv file containing URLs to SoundCloud.com. You are tasked with extracting the audio files, the name of the author and the track name from the provided links, save this data in a `.csv` and display the dataframe in your notebook. All audio file must be saved as `.wav` in the `./data` directory. 



### Importing Libraries

In [1]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import requests
import re
import os
import logging

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)

### Loading SoundCloud URLs

In [2]:
# When loading the csv file into pandas, it uses the top row (which was a URL) as the header by default 
# To solve this problem, I set the 'header' parameter to None and created a new column name called 'urls'
urls = pd.read_csv("soundscloud_urls.csv",header=None,names=['urls'])
urls

Unnamed: 0,urls
0,https://soundcloud.com/hyenrg/take-it-back
1,https://soundcloud.com/hyenrg/friction
2,https://soundcloud.com/djdssoulfulgeneration/s...
3,https://soundcloud.com/hyenrg/back-2-me
4,https://soundcloud.com/josephine-schmidt/rise-...
5,https://soundcloud.com/andre-roider/audiospektrum
6,https://soundcloud.com/hyenrg/feel-good-renegade
7,https://soundcloud.com/andre-roider/play
8,https://soundcloud.com/storyofthelie/free-hall...
9,https://soundcloud.com/djdssoulfulgeneration/s...


### Downloading Songs

In [3]:
df = pd.DataFrame(columns = ['URL','API Endpoint','Redirect URI','Song Id',
                             'Title','Author Name','Track Name'])

for index, url in urls.iterrows():    
    
    # checking if data directory exists
    os.system("mkdir data")
    
    # using index as song number for logging purposes
    index+=1 
    
    # fetching html script
    html = requests.get(url['urls']).text
    logging.info(f"Song #{index} SoundCloud URL: {url['urls']}")
    
    # searching for title tag
    title_raw = re.search('<title>([^|]+) | Free Listening on SoundCloud</title>', html)
    title_raw = title_raw.group(1)
    logging.info(f"Song #{index} Title: {title_raw}")
    title = title_raw.split("Stream ")[1].split(" by ")
    
    # parsing title to get author name
    author_name = title[1].strip()
    logging.info(f"Song #{index} Author Name: {author_name}")
    
    # parsing title to get track name
    track_name = title[0].strip()
    logging.info(f"Song #{index} Track Name: {track_name}")

    # extracting song id from html script
    song_id = re.search(r'soundcloud://sounds:(\d+)', html)
    song_id = song_id.group(1)
    logging.info(f"Song #{index} Song Id: {song_id}")
    
    # by going to any track and looking at the payload of any endpoint call you can find your client id
    client_id = 'nzlp05ChzxSyVpcOCKvTIZdwDLZfWM0z'

    # fetch downloadable link from redirectUri of API (v2) endpoint response
    endpoint_url = f"https://api-v2.soundcloud.com/tracks/{song_id}/download?client_id={client_id}"
    logging.info(f"Song #{index} API Endpoint URL: {endpoint_url}")
                 
    # downloading the song from the downloadable reply URL 
    redirect_uri = None          
    while redirect_uri is None:
        try:
            response = requests.request("GET", endpoint_url, headers={'content-type': "application/json"})
            redirect_uri = response.json()['redirectUri']
        except:
             pass
    logging.info(f"Song #{index} Redirect URL: {redirect_uri}")
    
    # downloading song and saving it into data directory
    song = requests.request("GET",redirect_uri)
    open(f'data/{song_id}.wav', 'wb').write(song.content)
    
    df = df.append({
        'URL': url['urls'],
        'API Endpoint': endpoint_url ,
        'Redirect URI': redirect_uri,
        'Song Id': song_id,                     
        'Title': title_raw,
        'Author Name': author_name,
        'Track Name': track_name,
    },ignore_index=True)
        
    if index==2:
        break

2021-12-06 18:06:27,943 - Song #1 SoundCloud URL: https://soundcloud.com/hyenrg/take-it-back
2021-12-06 18:06:27,945 - Song #1 Title: Stream Take It Back by HyeNRG
2021-12-06 18:06:27,945 - Song #1 Author Name: HyeNRG
2021-12-06 18:06:27,946 - Song #1 Track Name: Take It Back
2021-12-06 18:06:27,947 - Song #1 Song Id: 1171193422
2021-12-06 18:06:27,948 - Song #1 API Endpoint URL: https://api-v2.soundcloud.com/tracks/1171193422/download?client_id=nzlp05ChzxSyVpcOCKvTIZdwDLZfWM0z
2021-12-06 18:06:28,317 - Song #1 Redirect URL: https://cf-media.sndcdn.com/dhvqqJGqdA9b?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLW1lZGlhLnNuZGNkbi5jb20vZGh2cXFKR3FkQTliKiIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTYzODgwNjkwOH19fV19&Signature=XKrdKbu4Tdsbn78nKSCCfE30CQyEFozv9g2wiIU1bXf173FBx~47UvqTHAqQn6uXU3KnzLAEdrxJY71IDXS7wDM0S2IK4DczYiu6JJbKisiPl6Yw4prqruQklO~MGpt5hU0V5S9HSK7MrM5suDGL4aydTXfiMUl8TQtQeQypa8ScnXgVTKKjEUar94l0Xmrn5C3d8g6rudpBw~oXri3gVi-E5pT8KNOc9QoaHxBAcaKgR3FZg2

In [4]:
df

Unnamed: 0,URL,API Endpoint,Redirect URI,Song Id,Title,Author Name,Track Name
0,https://soundcloud.com/hyenrg/take-it-back,https://api-v2.soundcloud.com/tracks/117119342...,https://cf-media.sndcdn.com/dhvqqJGqdA9b?Polic...,1171193422,Stream Take It Back by HyeNRG,HyeNRG,Take It Back
1,https://soundcloud.com/hyenrg/friction,https://api-v2.soundcloud.com/tracks/117119277...,https://cf-media.sndcdn.com/nDniq7ikftY9?Polic...,1171192777,Stream Friction by HyeNRG,HyeNRG,Friction


## Part 2: Data Processing and Visualization
Once you have downloaded your audio files, you are tasked with pre-processing your data by conversing the raw waveform to a spectrogram using the Fast Fourier Transform (FFT). You can find more information about the FFT [HERE](http://www.dspguide.com/ch12.htm)
Finally, your must convert your spectrogram from Hz range to the mel-scale. More information about me-scaling can be found [HERE](http://pdf-s3.xuebalib.com:1262/249gn34RBxh1.pdf)

**Note:** You are not permitted to use built-in functions to perform FFT and mel-scale operations mentioned in part 2. 