## Convert JSON Objects into Data Frames

The data was collected by CURLing the requests in my own terminal using the [Spotify Web API](https://developer.spotify.com/documentation/web-api/reference/):

### Read JSON Files

In [27]:
import json

In [40]:
time_range = ['short_term', 'medium_term', 'long_term']

##### Tracks

In [42]:
track_data_json = {}
for term in time_range:
    with open(f"data/top_tracks_{term}.json") as json_file:
        data = json.load(json_file)
    key = f'{term}'
    track_data_json[key] = data

##### Artists

In [48]:
artists_data_json = {}
for term in time_range:
    with open(f"data/top_artists_{term}.json") as json_file:
        data = json.load(json_file)
    key = f'{term}'
    artists_data_json[key] = data

### Organize Artists and Tracks Data

Turn the dictionaries into csv and pandas dataframe

In [53]:
for term in time_range:
    data = artists_data_json[term]
    df = pd.DataFrame(data['items'])
    df.to_csv(f'data/top_artists_{term}.csv')

In [57]:
for term in time_range:
    data = track_data_json[term]
    df = pd.DataFrame(data['items'])
    df.to_csv(f'data/top_tracks_{term}.csv')

In [58]:
top_artists_short_term_df = pd.read_csv('data/top_artists_short_term.csv')
top_artists_medium_term_df = pd.read_csv('data/top_artists_medium_term.csv')
top_artists_long_term_df = pd.read_csv('data/top_artists_long_term.csv')

In [59]:
top_tracks_short_term_df = pd.read_csv('data/top_tracks_short_term.csv')
top_tracks_medium_term_df = pd.read_csv('data/top_tracks_medium_term.csv')
top_tracks_long_term_df = pd.read_csv('data/top_tracks_long_term.csv')

### Keep Popularity and Genres for Artists

In [134]:
artist_features = ['genres', 'name', 'popularity', 'id']

In [135]:
top_artists_short_term_reduced = top_artists_short_term_df[artist_features]
top_artists_medium_term_reduced = top_artists_medium_term_df[artist_features]
top_artists_long_term_reduced = top_artists_long_term_df[artist_features]

In [138]:
top_artists_short_term_reduced.to_csv('data/artists_short_term_reduced.csv')
top_artists_medium_term_reduced.to_csv('data/artists_medium_term_reduced.csv')
top_artists_long_term_reduced.to_csv('data/artists_long_term_reduced.csv')

### Get Musical Features from Track IDs

We'll be using spotipy's library to do this

In [64]:
import spotipy
from config import get_spotipy_client

In [75]:
sp = get_spotipy_client()

In [82]:
short_term_audio_features = sp.audio_features(tracks=list(top_tracks_short_term_df['id']))
medium_term_audio_features = sp.audio_features(tracks=list(top_tracks_medium_term_df['id']))
long_term_audio_features = sp.audio_features(tracks=list(top_tracks_long_term_df['id']))

In [84]:
short_term_audio_features_df = pd.DataFrame(short_term_audio_features)
medium_term_audio_features_df = pd.DataFrame(medium_term_audio_features)
long_term_audio_features_df = pd.DataFrame(long_term_audio_features)

#### Add name column for audio features

In [88]:
short_term_audio_features_df['name'] = top_tracks_short_term_df['name']
medium_term_audio_features_df['name'] = top_tracks_medium_term_df['name']
long_term_audio_features_df['name'] = top_tracks_long_term_df['name']

In [132]:
short_term_audio_features_df.to_csv('data/audio_features_short.csv')
medium_term_audio_features_df.to_csv('data/audio_features_medium.csv')
long_term_audio_features_df.to_csv('data/audio_features_long.csv')

#### All Musical Features
Remove duplicates

In [90]:
all_audio_features_df = pd.concat([short_term_audio_features_df, medium_term_audio_features_df, long_term_audio_features_df])

In [93]:
all_audio_features_df = all_audio_features_df.drop_duplicates(subset='name')

#### Get Averages of Features

Do this for each time ranges

In [102]:
audio_features = ['danceability', 'energy', 'loudness', 'speechiness', 'acousticness', 'instrumentalness', 'tempo', 'liveness', 'valence']

In [103]:
audio_features_averages_short = short_term_audio_features_df[audio_features].mean()
audio_features_averages_medium = medium_term_audio_features_df[audio_features].mean()
audio_features_averages_long = long_term_audio_features_df[audio_features].mean()
audio_features_averages_all = all_audio_features_df[audio_features].mean()

In [129]:
audio_features_averages_df = pd.DataFrame([dict(audio_features_averages_short), dict(audio_features_averages_medium), dict(audio_features_averages_long), dict(audio_features_averages_all)])
audio_features_averages_df['time_frame'] = ['short_term', 'medium_term', 'long_term', 'all']

In [131]:
audio_features_averages_df.to_csv('data/audio_features_averages.csv')

### Data So Far...

Now we have audio features for each track, the averages of those features over time and in total, my top artists that I listen to, the artist's popularity index, and the genres of the artist's