## questions I'm curious about

- am I listing to full albums?
- what decade do i listen to most?
- what are my most listened to genres?
- what are my most listened to albums?
- are songs I'm listening to on my playlists or in my library?
- can I tell someones mood that day based on what they are listening to?

## Info I want about every song

- Track ID
- Track Name
- Artists ID
- Artist Name
- Length Of Track
- Genres List of Track
- End Time of Stream
- Time of Stream
- % of Song Listened To
- Is Explicit
- Disc Number

## Things I've done so far

- Spotipy authorization and scope.
    - Scope defines what data you're allow to pull from that API request
    - I set environment variables of my client and secret needed for at least some authorization
    - Does all authorization require scope?
- Spotify Data Request
    - They only retain 1 years worth of streaming data
- 

# Initialize

In [1]:
import json
import os
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials,SpotifyOAuth
import pandas as pd
from datetime import date, datetime
import time
from dateutil import tz

## Pick your auth version

In [3]:
user_id = 'brian.cross741'
auth_manager = SpotifyClientCredentials()
sp = spotipy.Spotify(auth_manager=auth_manager)
#sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))

# Basic Functions

In [4]:
# opens json files provided by spotify data retreival
def get_json_from_file(spotify_file):
    f = open('MyData/{}.json'.format(spotify_file), encoding="utf8")
    data = json.load(f)
    f.close()
    return data

# searches for any string and returns the artist_id for the top response item
def search_for_artist_id(name):
    result = sp.search(name)
    id = result['tracks']['items'][0]['artists'][0]['id']
    return id

# searches for any string and returns the track_id for the top response item
def search_for_track_id(name):
    result = sp.search(name)
    id = result['tracks']['items'][0]['id']
    return id

In [5]:
print(search_for_artist_id('Dopapod'))
print(search_for_track_id('Statesboro Blues - Live At Fillmore East, March 13, 1971'))

6ur6SxSBdRLBgehOIT2iwX
1BnfAYVesByA18sA9EAGiI


# Examine streaming data

In [6]:
def get_artist_data(id):
    result = sp.artist(id)
    subset = dict((k, result[k]) for k in ('id', 'name', 'genres', 'popularity'))
    subset['followers'] = result['followers']['total']
    return subset
print('BASIC ARTIST DATA')    
print(get_artist_data('5Pb27ujIyYb33zBqVysBkj'))

print('-----------------------------')

def get_track_data(id):
    result = sp.track(id)
    subset = dict((k, result[k]) for k in ('id', 'name', 'duration_ms', 'popularity', 'disc_number', 'track_number', 'explicit', 'type'))
    subset['album_id'] = result['album']['id']
    subset['artist_id'] = result['artists'][0]['id']
    subset['artist_count'] = len(result['artists'])
    return subset
print('BASIC TRACK DATA')
print(get_track_data('7D5gkUVhkLbe5e8qG1NqcZ'))

print('-----------------------------')

def get_album_data(id):
    result = sp.album(id)
    subset = dict((k, result[k]) for k in ('id', 'name', 'album_type', 'release_date', 'release_date_precision', 'popularity', 'total_tracks', 'genres', 'label', 'type'))
    return subset
print('BASIC ALBUM DATA')
print(get_album_data('4EAehCii5lZgeewct1LA5p'))

BASIC ARTIST DATA
{'id': '5Pb27ujIyYb33zBqVysBkj', 'name': 'RÜFÜS DU SOL', 'genres': ['australian electropop', 'indietronica'], 'popularity': 70, 'followers': 1448315}
-----------------------------
BASIC TRACK DATA
{'id': '7D5gkUVhkLbe5e8qG1NqcZ', 'name': 'Brighter', 'duration_ms': 281084, 'popularity': 60, 'disc_number': 1, 'track_number': 1, 'explicit': False, 'type': 'track', 'album_id': '4EAehCii5lZgeewct1LA5p', 'artist_id': '5Pb27ujIyYb33zBqVysBkj', 'artist_count': 1}
-----------------------------
BASIC ALBUM DATA
{'id': '4EAehCii5lZgeewct1LA5p', 'name': 'Bloom', 'album_type': 'album', 'release_date': '2016-01-22', 'release_date_precision': 'day', 'popularity': 74, 'total_tracks': 11, 'genres': [], 'label': 'Sweat It Out', 'type': 'album'}


# Parse recently listened for saving

In [64]:

###
# opens all the recently palyed files and finds the most recent timestamp, converting it to local time
# returns unix timestamp and a datetime for the most recently played track in the logs
###
def get_most_recent_timestamp():
    # folder containing saved files of recently played
    path = 'recently_played/'
    recently_played_files = os.listdir(path)

    total_max_played = ''
    total_max_played_unix = 0
    for i in recently_played_files:
        current_max_played = pd.read_csv(path+i)['played_at'].max()
        utc = datetime.strptime(current_max_played, "%Y-%m-%dT%H:%M:%S.%fZ")
        current_max_played_unix = int(time.mktime(utc.timetuple())*1000)
    
        if current_max_played_unix > total_max_played_unix:
            total_max_played_unix = current_max_played_unix
            total_max_played = utc
    
    from_zone = tz.tzutc()
    to_zone = tz.tzlocal()
    total_max_played = total_max_played.replace(tzinfo=from_zone)
    local_total_max_played = total_max_played.astimezone(to_zone)
    local_total_max_played_unix = int(time.mktime(local_total_max_played.timetuple())*1000)
    return local_total_max_played_unix, local_total_max_played

###
# runs through the results of recently played and flattens the dictionary
# returns a cleaned dictionary of recently played
###
def parse_recently_played(recently_played_item):
    result = recently_played_item['track']
    subset = dict((k, result[k]) for k in ('id', 'name', 'duration_ms', 'popularity', 'disc_number', 'track_number', 'explicit', 'type'))
    subset['album_id'] = result['album']['id']
    subset['artist_id'] = result['artists'][0]['id']
    subset['artist_count'] = len(result['artists'])
    subset['played_at'] = recently_played_item['played_at']
    return subset

###
# intakes either None or a unix timestamp which will call all records occurring after it
# returns a cleaned dictionary of recently played
###
def call_recently_played(after_timestamp_unix):
    # set permissions, the scope is needed for the API to be allowed to grab this data.
    scope = "user-read-recently-played"
    sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))
    
    # call spotify
    recently_played = sp.current_user_recently_played(limit=50, after=after_timestamp_unix)

    # parse data and return a shorter, flattened dictionary with track data
    recently_played_short = []
    for i in recently_played['items']:
        subset = parse_recently_played(i)
        recently_played_short.append(subset)
    return recently_played_short

###
# runs through the recently played directory, finds the most recent timestamp, and calls recently played for the new, unsaved records
# returns a cleaned dataframe of non-recorded recently played
###
def get_new_recently_played():
    after_timestamp = get_most_recent_timestamp()[0]
    recently_played = call_recently_played(after_timestamp+1000)
    df = pd.DataFrame(data=recently_played)
    return df

###
# intakes a dataframe and savesit as a log in the recently played folder
# tells you the timestamp used to save
###
def save_new_recently_played(df):
    dt_string = datetime.now().strftime("%Y%m%d_%H_%M_%S")
    if len(df) > 0:
        df.to_csv(f'recently_played/recently_played_{dt_string}.csv', index=False)
        print('saved file: {}'.format(dt_string))

In [65]:
get_most_recent_timestamp()

(1697382750000,
 datetime.datetime(2023, 10, 15, 11, 12, 30, 572000, tzinfo=tzlocal()))

In [66]:
new_recently_played = get_new_recently_played()

In [67]:
len(new_recently_played)

0

In [68]:
new_recently_played.head(5)

In [20]:
#save_new_recently_played(new_recently_played)

saved file: 20231015_11_14_03


# Issues

Recently played isn't returning records after a milisecond time stamp, I think it works more on a 12hr basis. Probably best to call this a couple or times per day

### Save Recently Played

In [142]:
recently_played = get_recently_played(None)
df = pd.DataFrame(data=recently_played)
dt_string = now.strftime("%Y%m%d_%H_%M_%S")
df.to_csv(f'recently_played/recently_played_{dt_string}.csv', index=False)

### Get last played timestamp

In [265]:
get_most_recent_timestamp()

1697033184000

In [231]:
max_played_time = pd.read_csv('recently_played/recently_played_20231012_09_30_45.csv')['played_at'].max()
unix_timestamp = int(time.mktime(datetime.strptime(max_played_time, "%Y-%m-%dT%H:%M:%S.%fZ").timetuple())*1000)

In [236]:
recently_played_short

[{'id': '4HIKcEKSijQLW5YNLsdLzt',
  'name': 'X',
  'duration_ms': 118160,
  'popularity': 60,
  'disc_number': 1,
  'track_number': 5,
  'explicit': False,
  'type': 'track',
  'album_id': '6jWde94ln40epKIQCd8XUh',
  'artist_id': '5eAWCfyUhZtHHtBdNk56l1',
  'artist_count': 1,
  'played_at': '2023-10-12T15:52:03.105Z'}]

In [None]:
recently_played_short

In [182]:
max_played_time

'2023-10-11T10:06:24.654Z'

In [175]:
type(unix_timestamp)

float

In [177]:
int(unix_timestamp)

1697033184

In [229]:
recently_played = get_recently_played(unix_timestamp)

In [230]:
recently_played

[]

In [148]:
df

'2023-10-11T10:06:24.654Z'

In [86]:
for i in result['items']:
    track = i['track']
    print(list(track))
    break

['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri']


In [79]:
subset = dict((k, result[k]) for k in ('id', 'name', 'duration_ms', 'popularity', 'disc_number', 'track_number', 'explicit', 'type'))
# subset['album_id'] = result['album']['id']
# subset['artist_id'] = result['artists'][0]['id']
# subset['artist_count'] = len(result['artists'])

KeyError: 'id'

In [12]:
for i in results['items']:
    track = i['track']
    track_info = {} 
    track_name = track['name']
    track_id = track['id']
    artist_name = track

[{'track': {'album': {'album_type': 'album',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/5Pb27ujIyYb33zBqVysBkj'},
      'href': 'https://api.spotify.com/v1/artists/5Pb27ujIyYb33zBqVysBkj',
      'id': '5Pb27ujIyYb33zBqVysBkj',
      'name': 'RÜFÜS DU SOL',
      'type': 'artist',
      'uri': 'spotify:artist:5Pb27ujIyYb33zBqVysBkj'}],
    'available_markets': ['AR',
     'AU',
     'AT',
     'BE',
     'BO',
     'BR',
     'BG',
     'CA',
     'CL',
     'CO',
     'CR',
     'CY',
     'CZ',
     'DK',
     'DO',
     'DE',
     'EC',
     'EE',
     'SV',
     'FI',
     'FR',
     'GR',
     'GT',
     'HN',
     'HK',
     'HU',
     'IS',
     'IE',
     'IT',
     'LV',
     'LT',
     'LU',
     'MY',
     'MT',
     'MX',
     'NL',
     'NZ',
     'NI',
     'NO',
     'PA',
     'PY',
     'PE',
     'PH',
     'PL',
     'PT',
     'SG',
     'SK',
     'ES',
     'SE',
     'CH',
     'TW',
     'TR',
     'UY',
     'US',
     'AD',
 

In [None]:
for idx, item in enumerate(results['items']):
    track = item['track']
    print(idx, track['artists'][0]['name'], " – ", track['name'])

### Streaming Data

In [16]:
streaming_data = get_json_from_file('StreamingHistory0')

In [184]:
len(streaming_data)

7886

In [19]:
df_test = pd.DataFrame(data=streaming_data)

In [20]:
df_test

Unnamed: 0,endTime,artistName,trackName,msPlayed
0,2022-08-27 22:36,Dopapod,Present Ghosts,35060
1,2022-08-28 22:09,"My Brother, My Brother And Me",MBMBaM 624: Reince Your Own Priebus,812280
2,2022-08-28 22:09,Jaimie Branch,Jump Off,2089
3,2022-08-28 22:09,Dopapod,Present Ghosts,13631
4,2022-08-28 22:15,Jaimie Branch,Leaves of Glass,39979
...,...,...,...,...
7881,2023-08-28 13:42,The Replacements,Beer for Breakfast,2060
7882,2023-08-28 13:42,Living Colour,Cult of Personality,1350
7883,2023-08-28 13:42,Intronaut,Killing Birds With Stones,7500
7884,2023-08-28 13:42,Electric Wizard,Funeralopolis,267255


In [None]:
list_index = 0
for i in playlist_data:
    print(i['name'],'index:',list_index)
    list_index += 1

# Archive

### Datetime handling

In [26]:
utc = datetime.strptime(current_max_played, "%Y-%m-%dT%H:%M:%S.%fZ")
from_zone = tz.tzutc()
to_zone = tz.tzlocal()
utc = utc.replace(tzinfo=from_zone)
local = utc.astimezone(to_zone)
local_unix = int(time.mktime(local.timetuple())*1000)

### List Json File Names

In [16]:
path = "C://Users//Brian//Documents//python//spotify//MyData//"
dir_list = os.listdir(path)
print(dir_list)

['DuoNewFamily.json', 'Follow.json', 'Identifiers.json', 'Inferences.json', 'Marquee.json', 'Payments.json', 'Playlist1.json', 'Read_Me_First.pdf', 'SearchQueries.json', 'StreamingHistory0.json', 'UserAddress.json', 'Userdata.json', 'YourLibrary.json']


In [129]:
# Get the list of all files and directories
path = "C://Users//Brian//Documents//python//spotify//MyData//"
spotify_file_list = os.listdir(path)

# prints all files
for i in spotify_file_list:
    print(i.replace('.json', ''))

DuoNewFamily
Follow
Identifiers
Inferences
Marquee
Payments
Playlist1
Read_Me_First.pdf
SearchQueries
StreamingHistory0
UserAddress
Userdata
YourLibrary


### Simple Query From Spotipy Docs

In [6]:
scope = "user-library-read"

results = sp.current_user_saved_tracks()
for idx, item in enumerate(results['items']):
    track = item['track']
    print(idx, track['artists'][0]['name'], " – ", track['name'])

0 Slothrust  –  Pony
1 The Steeldrivers  –  East Kentucky Home
2 Norman Greenbaum  –  Spirit in the Sky
3 Redbone  –  Come and Get Your Love (Rerecorded Version)
4 Blue Swede  –  Hooked On A Feeling
5 Rupert Holmes  –  Escape (The Pina Colada Song)
6 Plague Vendor  –  Ox Blood
7 Art Tatum  –  Blue Skies
8 The Dillards  –  Dooley
9 Matroda  –  Gimme Some Keys
10 FJAAK  –  Plan of Escape
11 Chris Lorenzo  –  MAMI
12 Dreamville  –  Ma Boy
13 Ethereal  –  Beef
14 Freddie Gibbs  –  Couldn’t Be Done (feat. Kelly Price)
15 Quindon Tarver  –  Everybody's Free (To Wear Sunscreen)
16 Freddie Gibbs  –  Something to Rap About (feat. Tyler, The Creator)
17 Freddie Gibbs  –  Dark Hearted
18 Freddie Gibbs  –  PYS (feat. DJ Paul)
19 Freddie Gibbs  –  Feel No Pain (feat. Anderson .Paak & Raekwon)


### Playlist Info

In [None]:
# print information about playlist names and IDs
# print(print_playlist_names(user_id))

def print_playlist_names(user_id):
    playlists = sp.user_playlists(user_id)
    while playlists:
        for i, playlist in enumerate(playlists['items']):
            print("%4d %s %s" % (i + 1 + playlists['offset'], playlist['uri'],  playlist['name']))
        if playlists['next']:
            playlists = sp.next(playlists)
        else:
            playlists = None
        


# Read JSON File Version

def get_playlist_data_old():
    playlist_data = get_json_from_file('Playlist1')['playlists']
    print('Number of playlists: {}'.format(len(playlist_data)))
    print('')
    print('List Playlist Names:')
    list_index = 0
    for i in playlist_data:
        print(i['name'],'index:',list_index)
        list_index += 1
    return playlist_data

In [111]:
playlists_data = get_playlist_data()

playlists_data[2]

Number of playlists: 42

List Playlist Names:
My recommendation playlist index: 0
Happy Birthday Mama! index: 1
Liked em index: 2
Listen List index: 3
Love songs index: 4
Karaoke Mix index: 5
Wedding index: 6
Dancing index: 7
Bacci Boys index: 8
Chilll index: 9
Summah index: 10
Funky index: 11
Random New index: 12
eclectric index: 13
Persian index: 14
Easy Listening index: 15
For My Love index: 16
nothiing index: 17
My Shazam Tracks index: 18
Pity Party index: 19
nate index: 20
you no index: 21
scRAP it index: 22
"Psychadelic" "Rock" index: 23
Pure Reggae Starbucks CD index: 24
Afro Cuban index: 25
FUN index: 26
Nold Scholl index: 27
House jammz index: 28
Glitch Hop index: 29
Yum Yum index: 30
Instrumentals index: 31
Ween index: 32
Folk index: 33
Jazz New index: 34
Rock Hard index: 35
Random discovered liked index: 36
Rock "Classics" index: 37
Lean on me index: 38
Rippitty Rap index: 39
Stoopid index: 40
Jazz Classic index: 41


{'name': 'Liked em',
 'lastModifiedDate': '2023-03-08',
 'items': [{'track': {'trackName': '151 Rum',
    'artistName': 'JID',
    'albumName': 'DiCaprio 2',
    'trackUri': 'spotify:track:22WV03i2lBbwNVCE1g671p'},
   'episode': None,
   'localTrack': None,
   'addedDate': '2023-02-07'},
  {'track': {'trackName': 'Two Tens (feat. Anderson .Paak)',
    'artistName': 'Cordae',
    'albumName': 'Two Tens (feat. Anderson .Paak)',
    'trackUri': 'spotify:track:6clDsO8HwhHEgJDDp88VdL'},
   'episode': None,
   'localTrack': None,
   'addedDate': '2023-02-07'},
  {'track': {'trackName': 'Ma Boy',
    'artistName': 'Dreamville',
    'albumName': 'Ma Boy',
    'trackUri': 'spotify:track:1kn9JsBWIw6qREhDWw0lKb'},
   'episode': None,
   'localTrack': None,
   'addedDate': '2023-02-07'},
  {'track': {'trackName': 'Eyes in the Back of My Head',
    'artistName': 'Katori Walker',
    'albumName': 'Eyes in the Back of My Head',
    'trackUri': 'spotify:track:5MrtQggewKtfSzs4akoaD2'},
   'episode': No