In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import pickle as pk
import json

## Streaming Data

In [2]:
df1 = pd.read_json(r"C:\Users\91897\Downloads\my spotify data\last year data\Spotify Account Data\StreamingHistory_music_0.json")
df2 = pd.read_json(r"C:\Users\91897\Downloads\my spotify data\last year data\Spotify Account Data\StreamingHistory_music_1.json")
df3 = pd.read_json(r"C:\Users\91897\Downloads\my spotify data\last year data\Spotify Account Data\StreamingHistory_music_2.json")

#### Stacking all the dataframes on top of each

In [3]:
df = pd.concat([df1, df2, df3], ignore_index= True)

#### Understanding The Data

In [4]:
df.shape

(21349, 4)

In [5]:
df.head()

Unnamed: 0,endTime,artistName,trackName,msPlayed
0,2023-11-18 03:53,Gajendra Verma,Mann Mera,199970
1,2023-11-18 03:57,Gravero,Waalian (Lofi Mix),187849
2,2023-11-18 04:00,Arijit Singh,Main Dhoondne Ko Zamaane Mein,193654
3,2023-11-18 04:04,French Montana,Unforgettable,233824
4,2023-11-18 04:05,French Montana,Too Much,99195


- As of now, we've a total of four fields:<br>
1. endTime = This displays the date and time on which a track ended.
2. artistName
3. trackName
4. msPlayed = This displays the duration for which a track was played in milliseconds

In [6]:
df.dtypes

endTime       object
artistName    object
trackName     object
msPlayed       int64
dtype: object

The data type for endTime is the only data type that we need to fix, our endTime should be a DateTime data object

There's an opportunity for us to split our endDate column to seperate column of date and time.<br>
We can do this by converting endTime into DateTime data object and then from that column we can extract seperately our dates and as well as columns.<br>
Additionaly we can also use a string split in endTime and split Date and Time and then seperately change it to Date and Time data types.

In [7]:
df[['date', 'time']] = df['endTime'].str.split(expand= True)

In [8]:
df['endTime'] = pd.to_datetime(df['endTime'])
df['date'] = pd.to_datetime(df['date'])

In [9]:
df['time'] = pd.to_datetime(df['time'], format= '%H:%M').dt.time
# converting time into date time's time object

In [10]:
df.dtypes

endTime       datetime64[ns]
artistName            object
trackName             object
msPlayed               int64
date          datetime64[ns]
time                  object
dtype: object

Now it is fixed as per our requirements.

#### Understanding the Range

In [11]:
len(df)

21349

In [12]:
df.sort_values(by= 'endTime', ascending= True).head()

Unnamed: 0,endTime,artistName,trackName,msPlayed,date,time
0,2023-11-18 03:53:00,Gajendra Verma,Mann Mera,199970,2023-11-18,03:53:00
1,2023-11-18 03:57:00,Gravero,Waalian (Lofi Mix),187849,2023-11-18,03:57:00
2,2023-11-18 04:00:00,Arijit Singh,Main Dhoondne Ko Zamaane Mein,193654,2023-11-18,04:00:00
3,2023-11-18 04:04:00,French Montana,Unforgettable,233824,2023-11-18,04:04:00
4,2023-11-18 04:05:00,French Montana,Too Much,99195,2023-11-18,04:05:00


In [13]:
df.sort_values(by= 'endTime', ascending= True).tail()

Unnamed: 0,endTime,artistName,trackName,msPlayed,date,time
21344,2024-11-18 14:46:00,Young Stunners,Why,3096,2024-11-18,14:46:00
21345,2024-11-18 14:46:00,Brave Wrld,SARPHIRE AASHIQUE,157593,2024-11-18,14:46:00
21347,2024-11-18 14:49:00,Panther,Jaani,166000,2024-11-18,14:49:00
21346,2024-11-18 14:49:00,Badshah,Naraaz,10027,2024-11-18,14:49:00
21348,2024-11-18 14:50:00,Talha Anjum,Lost in Time,16954,2024-11-18,14:50:00


Now we can see that our data starts from 18th Nov of 2023 and ends on the same date of 2024.<br>
We for this project need our data for the year 2024 only.<br>
Let's filter out streaming data for year 2024.

In [14]:
df = df.loc[df['endTime'].dt.year == 2024]

In [15]:
df.shape

(17646, 6)

In [16]:
# verifying filter
df.loc[df['endTime'].dt.year != 2024]

Unnamed: 0,endTime,artistName,trackName,msPlayed,date,time


#### Checking NaN's and Duplicates

In [17]:
df[df.duplicated]

Unnamed: 0,endTime,artistName,trackName,msPlayed,date,time


In [18]:
df.isna().sum()

endTime       0
artistName    0
trackName     0
msPlayed      0
date          0
time          0
dtype: int64

## Understanding Marquee

In [19]:
artSeg = pd.read_json(r"C:\Users\91897\Downloads\my spotify data\last year data\Spotify Account Data\Marquee.json")

In [20]:
artSeg.head()

Unnamed: 0,artistName,segment
0,Prashant Pandey,Previously Active Listeners
1,Devil J,Previously Active Listeners
2,Gym Class Heroes,Previously Active Listeners
3,Aniket Raturi,Super Listeners
4,Anu Malik,Previously Active Listeners


Segment is the category on which a user falls for a specific artist, this is calculated on the basis of interaction with that particular artist.

#### Checking for NaNs' and Duplicates

In [21]:
artSeg[artSeg['artistName'].duplicated()]

Unnamed: 0,artistName,segment
117,IShowSpeed,Previously Active Listeners
524,Chhotu Shikari,Previously Active Listeners
596,AUR,Previously Active Listeners
608,Savage,Previously Active Listeners
701,Hasan Raheem,Moderate listeners
704,UZIII,Previously Active Listeners
809,Talha Anjum,Previously Active Listeners
1009,Talhah Yunus,Super Listeners
1052,Boby Raja,Previously Active Listeners
1054,Sreeram,Previously Active Listeners


This is because this file is created over intervals of time and gets refreshed each time, it is possible that in two different intervals a users interaction with one artist changed.<br>
Therefore it is important here to take care of this matter.

In [22]:
artSeg.loc[artSeg['artistName'] == 'Hasan Raheem']

Unnamed: 0,artistName,segment
421,Hasan Raheem,Previously Active Listeners
701,Hasan Raheem,Moderate listeners


Since this data is on the basis of my Spotify Account, I can verify that the latest segment is correct one, so we'll drop our duplicates and keep only the one that we've in early indexes or the first ones.

In [23]:
artSeg.drop_duplicates('artistName', keep= 'first', inplace= True)

In [24]:
artSeg.isna().sum()

artistName    0
segment       0
dtype: int64

Now we can add this to our artist data dataframe.

## Understanding Playlist1.json

We can't directly convert this json object into a dataframe first of all we need to understand the logic of this json file then only we can work with this.

In [25]:
with open(r"C:\Users\91897\Downloads\my spotify data\last year data\Spotify Account Data\Playlist1.json", 'r', encoding= 'utf8') as f:
    raw = json.load(f)

Till now the json object looks like it contains values like this:<br>
playlists<br>
____playlist_name<br>
________allTracks in that particular playlist<br>
____playlist_name<br>
________allTracks in that particular playlist<br>

I need data only for my 4 playlists that I use mostly, since I know name of these playlists I can have the data for only them

Name of playlists I'm interested in:<br>
'gauraV 🛌'<br>
'gauraV :)'<br>
'gauraV⛔'<br>
'gauraV⚡'

In [26]:
l = raw['playlists']
c = 1
for i in l:
    if i['name'] == 'gauraV⛔':
        print("Printing tracks for English playlist")
        tracks = i['items']
        for k in tracks:
            print(k['track']['trackName'])
            c += 1
            if c == 10:
                break

Printing tracks for English playlist
Better
I Don’t Wanna Live Forever (Fifty Shades Darker)
Memories
Girls Like You (feat. Cardi B) - Cardi B Version
Nobody's Love (feat. Popcaan) - Remix
Señorita
It's You
Beautiful People (feat. Khalid)
Trampoline (with ZAYN)


As we can see our logic is working and giving us the info on the basis of playlists that we've, let's try to understand where our all playlists are stored.

In [27]:
myPlaylistIndex = []
for i in range(len(l)):
    if l[i]['name'] == 'gauraV :)':
        myPlaylistIndex.append(i)
    if l[i]['name'] == 'gauraV⛔':
        myPlaylistIndex.append(i)
    if l[i]['name'] == 'gauraV⚡':
        myPlaylistIndex.append(i)
    if l[i]['name'] == 'gauraV 🛌':
        myPlaylistIndex.append(i)

In [28]:
myPlaylistIndex

[7, 10, 13, 15]

Extracting the playlist name and trackAdded for each track

In [29]:
k = l[7]['items']
len(k)

182

In [30]:
c = 1
for i in k:
    print(i['track']['trackName'])
    print(i['addedDate'])
    c += 1
    if c == 10:
        break

12:00 AM
2023-07-13
Downers At Dusk
2023-07-13
Gumaan
2023-07-13
No Other Place
2023-07-13
Phir Milenge
2023-07-13
Samjho Na
2023-07-13
Iraaday
2023-07-13
Haaray
2023-07-13
Bikhra
2023-07-13


With this logic let's create a dataframe quickly using a nested loop

In [31]:
for i in myPlaylistIndex:
    playlist = l[i]
    print(f"Total Tracks in {playlist['name']} are {len(playlist['items'])}")

Total Tracks in gauraV 🛌 are 182
Total Tracks in gauraV⚡ are 379
Total Tracks in gauraV :) are 266
Total Tracks in gauraV⛔ are 288


Now we're ready to work with this, we'll create a empty datframe and with each iteration we'll try to fill it

In [32]:
plTracks = pd.DataFrame(columns= ['trackName', 'playlistName', 'dateAdded'])
for i in myPlaylistIndex:
    allTracks = l[i]['items']
    for k in range(len(allTracks)):
        playlistName = l[i]['name']
        trackName = l[i]['items'][k]['track']['trackName']
        dateAdded = l[i]['items'][k]['addedDate']
        values = [trackName, playlistName, dateAdded]
        columns = ['trackName', 'playlistName', 'dateAdded']
        plTracks = pd.concat([plTracks, pd.DataFrame([values], columns= columns)], ignore_index= True)

In [33]:
plTracks.sample(5)

Unnamed: 0,trackName,playlistName,dateAdded
338,RA TA TA,gauraV⚡,2024-01-26
223,WuShang Clan,gauraV⚡,2023-03-06
943,Copines,gauraV⛔,2022-02-11
350,Naseeb,gauraV⚡,2024-02-01
822,Tumhare Aane Se,gauraV :),2024-05-16


#### Checking for Duplicates

In [34]:
plTracks.loc[plTracks['trackName'].duplicated()]

Unnamed: 0,trackName,playlistName,dateAdded
108,Teri Yaad,gauraV 🛌,2024-02-03
115,Hausla,gauraV 🛌,2024-02-03
116,Raabta,gauraV 🛌,2024-02-03
153,Baat Bangayi,gauraV 🛌,2024-04-13
171,Awaara,gauraV 🛌,2024-05-28
...,...,...,...
1095,Be Friends,gauraV⛔,2024-02-23
1101,Soulmate,gauraV⛔,2024-03-18
1103,Devotion,gauraV⛔,2024-04-05
1106,Skate,gauraV⛔,2024-04-29


Here there is a high probability that some of my tracks are saved in multiple playlists, therefore, we'll choose only the onles who appeafn first

In [35]:
plTracks.drop_duplicates(subset= 'trackName', keep= 'first', inplace= True)

## Loading data related to artists

Image

In [36]:
with open('imageArtist.pkl', 'rb') as f:
    imgArt = pk.load(f)

Genre

In [37]:
with open('genreArtist.pkl', 'rb') as f:
    genArt = pk.load(f)

Popularity

In [38]:
with open('popularityArtist.pkl', 'rb') as f:
    popArt = pk.load(f)

In [39]:
print(len(imgArt), len(genArt), len(popArt))

788 677 676


#### Converting all of this data into dataframes

Artist Image

In [40]:
data = list(imgArt.items())
artImg = pd.DataFrame(data, columns= ['artistName', 'imgLink'])

In [43]:
artImg.to_csv('artImg.csv')

Artist Genre

In [41]:
data = list(genArt.items())
artGen = pd.DataFrame(data, columns= ['artistName', 'genre'])

In [44]:
artGen.to_csv('artGen.csv')

Artist Popularity

In [42]:
data = list(popArt.items())
artPop = pd.DataFrame(data, columns= ['artistName', 'popularity'])

In [46]:
artPop.to_csv('artPop.csv')

## Loading Data Related to Tracks

Image

In [43]:
with open('trackImages.pkl', 'rb') as f:
    trImg = pk.load(f)

ExplicitFlag

In [44]:
with open('trackExplicit.pkl', 'rb') as f:
    trExp = pk.load(f)

Popularity

In [45]:
with open('trackPopularity.pkl', 'rb') as f:
    trPop = pk.load(f)

Track Duration

In [46]:
with open('trackDuration.pkl', 'rb') as f:
    trDur = pk.load(f)

Track Album

In [47]:
with open('trackAlbum.pkl', 'rb') as f:
    trAlb = pk.load(f)

Feature Artists

In [48]:
with open('featureArtists.pkl', 'rb') as f:
    trFeat = pk.load(f)

#### Converting all of these into DataFrames

Track Image

In [49]:
data = list(trImg.items())
imgTr = pd.DataFrame(data, columns= ['trackName', 'imgLink'])

In [36]:
imgTr.to_csv('imgTr.csv')

Track Explicit

In [50]:
data = list(trExp.items())
expTr = pd.DataFrame(data, columns= ['trackName', 'explicitFlag'])

In [37]:
expTr.to_csv('expTr.csv')

Track Popularity

In [51]:
data = list(trPop.items())
popTr = pd.DataFrame(data, columns= ['trackName', 'popularity'])

In [38]:
popTr.to_csv('popTr.csv')

Track Album

In [52]:
data = list(trAlb.items())
albTr = pd.DataFrame(data, columns= ['trackName', 'albumName'])

In [39]:
albTr.to_csv('albTr.csv')

Track Duration

In [53]:
data = list(trDur.items())
durTr = pd.DataFrame(data, columns= ['trackName', 'duration_ms'])

In [40]:
durTr.to_csv('durTr.csv')

Track Feature Artist

In [54]:
data = list(trFeat.items())
featTr = pd.DataFrame(data, columns = ['trackName', 'ftArtists'])

In [41]:
featTr.to_csv('featTr.csv')

## A Better Approach

To make our data model light and fast, instead of saving all these dataframes as seperate CSVs we can merge them together.<br>
Till now we've total of these dataframes:<br>
StreamingData (df)<br>
Marquee (artistInfo)<br>
Playlist (plTracks)<br>
and more data related to artist as well as image.

With all of this information the best thing we can do is merge them into 3 different dataframes:<br>
1. MainStreamingData: This will contain only main data from df.
2. MainArtistData: This will contain all the data related to artist.
3. MainTrackData: This will contain all the data related to tracks.

#### MainArtistData

In [55]:
print(len(artPop), len(artImg), len(artGen), len(artSeg))

676 788 677 1299


Now from here we'll join onto artImg, artSeg contains more data because it comprises data for last year also

In [56]:
mainArtist = artImg.merge(
    artPop,
    how= 'left',
    on= 'artistName'
).merge(
    artGen,
    how= 'left',
    on= 'artistName'
).merge(
    artSeg,
    how= 'left',
    on= 'artistName'
)

In [57]:
len(mainArtist) == len(artImg)

True

In [58]:
mainArtist.sample(5)

Unnamed: 0,artistName,imgLink,popularity,genre,segment
698,Bir,https://i.scdn.co/image/ab6761610000e5eb673b11...,,,Previously Active Listeners
211,Gunda,https://i.scdn.co/image/ab6761610000e5ebc8351c...,,,Moderate listeners
113,Harjas Harjaayi,https://i.scdn.co/image/ab6761610000e5ebd79f27...,50.0,desi hip hop,Moderate listeners
427,Faheem Abdullah,https://i.scdn.co/image/ab6761610000e5ebc0b07a...,66.0,kashmiri pop,Previously Active Listeners
431,Kailash Kher,https://i.scdn.co/image/ab6761610000e5ebc6d179...,71.0,filmi,Previously Active Listeners


#### MainTracksData

In [59]:
print(len(imgTr), len(expTr), len(popTr), len(albTr), len(durTr), len(featTr))

2358 872 2337 1465 2358 1434


This means we'll join all tables on imgTr

In [60]:
mainTracks = imgTr.merge(
    expTr,
    how= 'left',
    on= 'trackName'
).merge(
    popTr,
    how= 'left', 
    on= 'trackName'
).merge(
    albTr,
    how= 'left',
    on= 'trackName'
).merge(
    durTr,
    how= 'left',
    on= 'trackName'
).merge(
    featTr,
    how= 'left',
    on= 'trackName'
).merge(
    plTracks,
    how= 'left',
    on= 'trackName'
)

In [61]:
len(mainTracks) == len(imgTr)

True

In [62]:
mainTracks.sample(5)

Unnamed: 0,trackName,imgLink,explicitFlag,popularity,albumName,duration_ms,ftArtists,playlistName,dateAdded
1621,Qanoon,https://i.scdn.co/image/ab67616d0000b2737c936a...,True,10.0,Confessions,211200,Rithmetic,gauraV⚡,2023-09-29
555,"Dil Chori (From ""Sonu Ke Titu Ki Sweety"")",https://i.scdn.co/image/ab67616d0000b2734ba246...,,65.0,Yo Yo Honey Singh Is Back,226696,"Simar Kaur, Ishers",,
20,28 (with Dean Lewis),https://i.scdn.co/image/ab67616d0000b2733ec018...,,64.0,,208293,Dean Lewis,,
145,Anjaan,https://i.scdn.co/image/ab67616d0000b27382e678...,True,46.0,,303203,"superdupersultan, Nabeel Akbar, Talhah Yunus",gauraV 🛌,2023-10-26
1146,Koi Nahi Puchta,https://i.scdn.co/image/ab67616d0000b273637e2e...,,36.0,Robin Hood,225124,"Sammohit, Umair",,


Now we've got our 3 main tables:<br>
df: Containing streaming data for year 2024<br>
mainArtist: Containing all the data related to our artists.<br>
mainTracks: Containing all the data related to our tracks.<br>

Now let's save them into a CSV to load them in Power BI and create relationship like we wish:<br>
##### UPDATED: The type of relationship that we saw in Power BI was Many to Many, but we wanted a One to Many relationship.<br>
##### UPDATED: In Power BI, error message it was revealed to us that the error was because there are duplicate records in MainTracks and MainArtist data.<br>

## Demonstrating the reason behind error in Relationships in Power BI

In [63]:
mainTracks.loc[mainTracks['trackName'].duplicated()]

Unnamed: 0,trackName,imgLink,explicitFlag,popularity,albumName,duration_ms,ftArtists,playlistName,dateAdded


Oops, what to do now, we've checked it, and there are no duplicates, then why did we recieved the error message stating there are duplicates present in this column.<br>
Let's check for duplicates but in a different manner.

In [64]:
mainTracks.loc[mainTracks['trackName'].str.lower().duplicated()]

Unnamed: 0,trackName,imgLink,explicitFlag,popularity,albumName,duration_ms,ftArtists,playlistName,dateAdded
54,Aadat,https://i.scdn.co/image/ab67616d0000b2732938de...,,46.0,,212912,Jokhay,gauraV 🛌,2023-12-21
606,Downers at Dusk,https://i.scdn.co/image/ab67616d0000b273860f20...,True,66.0,Open Letter,256000,Umair,,
1026,Joint in the Booth,https://i.scdn.co/image/ab67616d0000b273b7b544...,True,43.0,Lunch Break,166285,,,
1174,Lagta Ni Mann,https://i.scdn.co/image/ab67616d0000b27344cfb9...,,1.0,Chokde Kahin Ke,193846,"Kumauni 17, Dumpling, Bug's",,
1223,Lost in Time,https://i.scdn.co/image/ab67616d0000b273860f20...,True,50.0,Open Letter,214589,Umair,,
1564,Peace of Mind,https://i.scdn.co/image/ab67616d0000b273b7b544...,True,38.0,Lunch Break,236769,"Lil Bhavi, Bhaskar, Ab 17",gauraV⚡,2024-04-17
1867,Still The Same,https://i.scdn.co/image/ab67616d0000b273e0b460...,,52.0,,194928,,,
2053,Trouble,https://i.scdn.co/image/ab67616d0000b2731633c4...,True,60.0,The Death of Slim Shady (Coup De Grâce),41487,,,
2295,dangerous,https://i.scdn.co/image/ab67616d0000b273bbdceb...,True,64.0,american dream,265305,"Lil Durk, Metro Boomin",,
2299,dooriyan,https://i.scdn.co/image/ab67616d0000b273b773ac...,,47.0,Genesis 1:1,161624,,gauraV :),2022-12-10


WoW, now there are duplicates, it means we've tracks that are in different CASES, let's check more about them.

This error is uniform to all the datasets that we created

In [65]:
mainTracks.loc[mainTracks['trackName'].str.lower() == 'dangerous']

Unnamed: 0,trackName,imgLink,explicitFlag,popularity,albumName,duration_ms,ftArtists,playlistName,dateAdded
495,Dangerous,https://i.scdn.co/image/ab67616d0000b27391bd45...,True,3.0,O,194894,,,
2295,dangerous,https://i.scdn.co/image/ab67616d0000b273bbdceb...,True,64.0,american dream,265305,"Lil Durk, Metro Boomin",,


With this we can verify that even though the name of both tracks is same, they are different.<br>
And because they are different we can't actually drop these duplicates.<br>
So what should we do next, so that we are not dropping this information and our relationships are also fixed.

We'll create Unique Identifiers for our Tracks and Artists, and use them for connections instead.

#### Generating Primary Keys 

Now since we're creating primary keys for our artists and tracks, we should now also create a streamId.<br>

In [66]:
df.head()

Unnamed: 0,endTime,artistName,trackName,msPlayed,date,time
3703,2024-01-01 06:27:00,The Local Train,Choo Lo,233630,2024-01-01,06:27:00
3704,2024-01-01 06:33:00,The Local Train,Aaoge Tum Kabhi,313163,2024-01-01,06:33:00
3705,2024-01-01 06:37:00,Jokhay,Me & You,297019,2024-01-01,06:37:00
3706,2024-01-01 06:41:00,Talwiinder,Tera Saath,214595,2024-01-01,06:41:00
3707,2024-01-01 06:44:00,Bella,Murdaghar,161250,2024-01-01,06:44:00


We'll create these primary keys:<br>
1. StreamID
2. ArtistID
3. TrackID

1. Stream ID

We'll make our streamId with a incrementing logic

In [67]:
df['streamId'] = range(1, len(df)+1)

2. ArtistID

For Artists and Tracks we'll firstly get all of our artists and tracks in one place.<br>
Then for each of them we'll give them an unique value, with the same incrementing logic.<br>
After that we'll store them in a dictionary.<br>
The final step would be to map our keys with the artists and tracks respectively.

In [68]:
artists = list(set(df['artistName']))
artistKeys = {}
for i in range(len(artists)):
    artistKeys[artists[i]] = i
len(artistKeys)

790

In [69]:
df['artistId'] = df['artistName'].map(artistKeys)

3. TrackID

In [70]:
tracks = list(set(df['trackName']))
trackKeys = {}
for i in range(len(tracks)):
    trackKeys[tracks[i]] = i
len(trackKeys)

2371

In [71]:
df['trackId'] = df['trackName'].map(trackKeys)

In [72]:
df.head()

Unnamed: 0,endTime,artistName,trackName,msPlayed,date,time,streamId,artistId,trackId
3703,2024-01-01 06:27:00,The Local Train,Choo Lo,233630,2024-01-01,06:27:00,1,484,2024
3704,2024-01-01 06:33:00,The Local Train,Aaoge Tum Kabhi,313163,2024-01-01,06:33:00,2,484,2269
3705,2024-01-01 06:37:00,Jokhay,Me & You,297019,2024-01-01,06:37:00,3,729,427
3706,2024-01-01 06:41:00,Talwiinder,Tera Saath,214595,2024-01-01,06:41:00,4,201,1464
3707,2024-01-01 06:44:00,Bella,Murdaghar,161250,2024-01-01,06:44:00,5,23,2035


Now that our streaming data is updated let's also update our other two DataFrames.

In [73]:
mainArtist['artistId'] = mainArtist['artistName'].map(artistKeys)

In [74]:
mainTracks['trackId'] = mainTracks['trackName'].map(trackKeys)

Now our dataframes are ready to move to Power BI, and this time there would be no problem in our relationships.<br>
Now let's again save them into CSVs.

Now let's save this new dataframe as csv

In [95]:
df.to_csv('main_df.csv')

In [96]:
mainArtist.to_csv('main_artist.csv')

In [97]:
mainTracks.to_csv('main_tracks.csv')