# Visual Acoustic Fingerprinting
## Context
### Analysis
After cleaning the data, in this notebook we test the framework for developing a visual acoustic fingerprint for each track given a set of its audio features. We categorize features into two broad buckets, `Rhythm & Dynamics` and `Articulation & Texture` - the features which comprise these categories are transformed into polar curves and visualized in a 3-dimensional cartesian space, creating what I like to think of as a visual version of an Acoustic Fingerprint, or `Acoustic Print`. Each planar curve that comprises the print will only be identical if the audio feature measures at the same value (i.e., danceability=0.5=valence). Considering one dimension of the acoustic print, it should be rare that two songs match; when considering both `Rhythm & Dynamics` and `Articulation & Texture`, it is rather unlikely that two songs have identical tempo, danceability, valence, energy, speechiness, instrumentalness, and acousticness. As such the acoustic print of any two songs may be similar, but unique (especially within the scope of this dataset given the number of songs considered, ~13k). 

### Background
Descriptions of characteristics used here were taken from the [Spotify Web-API](https://developer.spotify.com/documentation/web-api/reference/get-several-audio-features). The data used in this project is from the Free Music Achive (FMA), see *Data_Cleaning.ipynb* for further detail on downloading the data. Note, the audio features for each track were provided by Echonest, which has since been acquired by Spotify.

Other datasets of interest can be found on [Spotify R&D](https://research.atspotify.com/datasets/). The "WSDM Cup:
The Music Streaming Sessions Dataset" holds similar information to that shown below and used in this notebook, but prevents users from linking a given song and it's features back to its metadata (i.e., song name, artist, album, release year, genre). As such it was not used for this work.

### Audio Features
Below is a list and description of the audio features available for a subset of tracks in the FMA dataset.
* **Danceability**: Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
* **Valence**: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
* **Energy**: Measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
* **Tempo**: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
* **Speechiness**: Detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
* **Instrumentalness**: Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
* **Acousticness**: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
* **Liveness**: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.


## Data Preparation

In [68]:
# import relevant librarties
import numpy as np
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots

# define helper functions
def cart2pol(x, y):
    """
    Transform cartesian coordinates to polar

    Parameters
    ----------
    x : int, float, np.array
        a numerical type that represents x coordinates in a cartesian plane
    y : int, float, np.array
        a numerical type that represents y coordinates in a cartesian plane
    """
    rho = np.sqrt(x**2 + y**2)
    theta = np.arctan2(y, x)
    return (rho, theta)


def pol2cart(rho, theta):
    """
    Transform polar coordinates to cartesian

    Parameters
    ----------
    rho : int, float, np.array
        a numerical type that represents rho in polar coordinates
    theta : int, float, np.array
        a numerical type that represents theta in polar coordinates
    """
    x = rho * np.cos(theta)
    y = rho * np.sin(theta)
    return (x, y)

In [69]:
# nomralize tempo from [12.753, 251.072] --> (0, 10]
tempo = 12.753
normalize = lambda x: ((x - 12.75) / (251.072 - 12.75)) * 10

# set the song parameters
valence, energy, danceability = 0.5, 0.5, 0.3
acousticness, instrumentalness, speechiness = 0.9, 0.5, 0.3
liveness = 0.1

# define number of points to evaluate and where to evaluate them in polar coords
points = 1500
theta = np.linspace(0, 48*np.pi, points)
plane = np.zeros(points)

# compute rho in polar coords for each song parameter
valence_rho = tempo * np.cos(5 * theta * valence) + 1
energy_rho = tempo * np.cos(5 * theta * energy) + 1
danceability_rho = tempo * np.cos(5 * theta * danceability) + 1

acoustic_rho = normalize(tempo) * (np.sin(2 * theta * speechiness) + np.cos(3 * theta * acousticness)) 
instrumental_rho = normalize(tempo) * (np.sin(2 * theta * speechiness) + np.cos(3 * theta * instrumentalness))
speech_rho = normalize(tempo) * (np.sin(2 * theta * speechiness) + np.cos(3 * theta * speechiness))

# convert (rho, theta) in polar coords to (x, y) in cartesian
x_valence, y_valence = pol2cart(valence_rho, theta)
x_energy, y_energy = pol2cart(energy_rho, theta)
x_danceability, y_danceability = pol2cart(danceability_rho, theta)

x_acoustic, y_acoustic = pol2cart(acoustic_rho, theta)
x_instrumental, y_instrumental = pol2cart(instrumental_rho, theta)
x_speech, y_speech = pol2cart(speech_rho, theta)

In [70]:
# define dataframes for Rhythm & Dynamics and Articulation & Texture
RD_df = pd.DataFrame(
    {
        "Attribute": (["Valence"] * points) + (["Energy"] * points) + (["Danceability"] * points),
        "X": np.concatenate((x_valence, plane, y_danceability)),
        "Y": np.concatenate((y_valence, x_energy, plane)),
        "Z": np.concatenate((plane, y_energy, x_danceability))
    }
)
AT_df = pd.DataFrame(
    {
        "Attribute": (["Acousticness"] * points) + (["Instrumentalness"] * points) + (["Speechinees"] * points),
        "X": np.concatenate((x_acoustic, plane, y_speech)),
        "Y": np.concatenate((y_acoustic, x_instrumental, plane)),
        "Z": np.concatenate((plane, y_instrumental, x_speech))
    }
)

## Acoustic Print

In [71]:
### two independent plots rather than 1 figure with 2 subplots
# fig = px.line_3d(data_frame=vibe_df, x="X",y="Y", z="Z", color="Attribute", title="Vibes")
# fig.update_layout(
#     scene = dict(
#         bgcolor = "rgba(0,0,0,0)",
#     ),
#     scene_aspectmode='cube'
# )
# fig.update_scenes(xaxis_visible=False, yaxis_visible=False,zaxis_visible=False )
# fig.show()

# fig = px.line_3d(data_frame=musicality_df, x="X",y="Y", z="Z", color="Attribute")
# fig.update_layout(
#    scene = dict(
#        bgcolor = "rgba(0,0,0,0)",
#    ),
#    scene_aspectmode='cube'
# )
# fig.update_scenes(xaxis_visible=False, yaxis_visible=False,zaxis_visible=False )
# fig.show()

In [72]:
# create figure object with subplots
fig = make_subplots(
    rows=1, 
    cols=2, 
    specs=[[{"type":"scatter3d"}, {"type":"scatter3d"}]],
    horizontal_spacing=.01,
    subplot_titles=["Rhythm & Dynamics", "Articulation & Texture"]
)

# define the first trace (subplot)
t1 = px.line_3d(data_frame=RD_df, x="X",y="Y", z="Z", color="Attribute", width=1600, height=800)
t1.update_traces(legendgroup=1, legendgrouptitle=dict(text="Rhythm & Dynamics"), legend="legend")
fig.add_traces(
    t1.data,
    rows=1,
    cols=1
)

# define the second trace (subplot)
t2 = px.line_3d(data_frame=AT_df, x="X",y="Y", z="Z", color="Attribute", width=1600, height=800)
t2.update_traces(legendgroup=2, legendgrouptitle=dict(text="Articulation & Texture"), legend="legend2")
fig.add_traces(
    t2.data,
    rows=1,
    cols=2
)


fig.update_layout(
    scene = dict(
        bgcolor = "rgba(0,0,0,0)",
    ),
    scene_aspectmode='cube',
    margin=dict(l=20, r=20, t=10, b=0),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=0.1,
        xanchor="center",
        x=0.5,
        tracegroupgap=10
    ),
    
)
fig.layout.annotations[0].update(y=0.85)
fig.layout.annotations[1].update(y=0.85)

fig.update_scenes(xaxis_visible=False, yaxis_visible=False, zaxis_visible=False)

fig.show()

## Acoustic Radar

In [119]:
# create radar plot for song
radar = pd.DataFrame({
    "Rho": [valence, energy, danceability, acousticness, instrumentalness, speechiness,liveness]
    +[valence+np.random.uniform(low=-.5, high=.2, size=1)[0], energy+np.random.uniform(low=-.5, high=.5, size=1)[0], danceability+np.random.uniform(low=-.2, high=.2, size=1)[0], acousticness, instrumentalness+np.random.uniform(low=-.2, high=.2, size=1)[0], speechiness,liveness],
    "Theta": ["Danceability", "Valence", "Energy", "Speechiness", "Instrumentalness", "Acousticness", "Liveness"]*2,
    "Group": ["Song"]*7+["Catalogue"]*7
})

fig = px.line_polar(
    data_frame=radar,
    r="Rho",
    theta="Theta",
    #text="rho",
    line_close=True,
    markers=True,
    color="Group",
    line_shape="spline",

)
fig.update_traces(fill='toself', textposition="top center")
fig.update_layout(
    scene = dict(
        bgcolor="rgba(0,0,0,0)",
    ),
    margin=dict(l=20, r=20, t=20, b=20),
    legend=dict(
        #orientation="h",
        yanchor="middle",
        y=0.5,
        #xanchor="center",
        x=0.8,
        #tracegroupgap=10
    ),
    
)

fig.show()