# DS3000 Foundations of Data Science
## Dialogue of Civilizations
May 13, 2024

Admin
- More new modules:
    - `pip install requests plotly matplotlib`

Content
- introduce APIs

Planned Time: ~1.5 hour

Next Thing: Web Scraping

# Basic tools in preparation for APIs

The `requests` module comes into play soon. While you may have installed it in the terminal earlier, it is actually a [magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-pip) command, which means it can be installed directly from jupyter (should you ever need to install it again, or if you had difficulty installing it earlier).

In [1]:
pip install requests

Note: you may need to restart the kernel to use updated packages.


## Building a DataFrame row by row

We often get data in chunks (web scraping / API calls).  We'll need to store our data incrementally:

In [2]:
import pandas as pd

dict_list = [{'a': 1, 'b': 2, 'c': 3},
             {'a': 4, 'b': 3874, 'c': 398}]

df = pd.DataFrame()

for d in dict_list:
    df = pd.concat([df, pd.Series(d).to_frame().T], ignore_index=True)
    
df

Unnamed: 0,a,b,c
0,1,2,3
1,4,3874,398


In [3]:
# to include index names
list_dict = [{'a': 1, 'b': 2, 'c': 3},
            {'a': 4, 'b': 3874, 'c': 398}]

name_list = ['first', 'second']

df = pd.DataFrame()
for idx in range(2):
    # extract dictionary & name
    d = list_dict[idx]
    name = name_list[idx]
    
    # build series and name it
    series = pd.Series(d, name=name)
    
    df = pd.concat([df, series.to_frame().T])
    
df

Unnamed: 0,a,b,c
first,1,2,3
second,4,3874,398


# API
###  Definitions
**API** Application Program Interface
 - a server which gives out data (often over the internet)
 - note: 'API', in general, refers to the barrier between two pieces of software:
     - in this case, the server which hosts data & our own software which requests it
 
 
 **JSON** JavaScript Object Notation
  - a method of storing objects as text
  - much like the nested dictionaries ... JSON and similar formats are often trees

## OpenWeather API
What information does this offer?

[https://openweathermap.org/api](https://openweathermap.org/api)

How do I get ready to use it?
- sign up for an account
    - [https://home.openweathermap.org/users/sign_up](https://home.openweathermap.org/users/sign_up)
- get an api key (my key was emailed to me with my confirmation of account)
    - [https://home.openweathermap.org/api_keys](https://home.openweathermap.org/api_keys)
        
Think of APIs as a hybrid of a website and a function.  Its a website where your query is stored in the address:
    
    https://api.openweathermap.org/data/3.0/onecall?lat=50.8823&lon=4.7138&appid=YOUR-API-KEY-HERE-THIS-WONT-WORK&units=imperial
    
The result is a JSON object, which we can quickly convert to our dictionary of dictionary tree format.

In [4]:
# todo: swap this out
api_key = 'd36fa352ac73226b30772f64675f41bb'

# north = positive, south = negative
lat = 50.8823
# east = positive, west = negative
lon = 4.7138

# could use metric if you're feeling very European
units = 'metric'
url = f'https://api.openweathermap.org/data/3.0/onecall?lat={lat}&lon={lon}&appid={api_key}&units={units}'
print(url)

https://api.openweathermap.org/data/3.0/onecall?lat=50.8823&lon=4.7138&appid=d36fa352ac73226b30772f64675f41bb&units=metric


In [5]:
import requests

# get url as a string
url_text = requests.get(url).text    
url_text

'{"lat":50.8823,"lon":4.7138,"timezone":"Europe/Brussels","timezone_offset":7200,"current":{"dt":1747216887,"sunrise":1747194793,"sunset":1747250515,"temp":21.85,"feels_like":21.24,"pressure":1018,"humidity":44,"dew_point":9.06,"uvi":4.93,"clouds":31,"visibility":10000,"wind_speed":3.18,"wind_deg":40,"wind_gust":3.47,"weather":[{"id":802,"main":"Clouds","description":"scattered clouds","icon":"03d"}]},"minutely":[{"dt":1747216920,"precipitation":0},{"dt":1747216980,"precipitation":0},{"dt":1747217040,"precipitation":0},{"dt":1747217100,"precipitation":0},{"dt":1747217160,"precipitation":0},{"dt":1747217220,"precipitation":0},{"dt":1747217280,"precipitation":0},{"dt":1747217340,"precipitation":0},{"dt":1747217400,"precipitation":0},{"dt":1747217460,"precipitation":0},{"dt":1747217520,"precipitation":0},{"dt":1747217580,"precipitation":0},{"dt":1747217640,"precipitation":0},{"dt":1747217700,"precipitation":0},{"dt":1747217760,"precipitation":0},{"dt":1747217820,"precipitation":0},{"dt":1

In [6]:
# should not have to install the below
import json

# convert json to a nested dict
weather_dict = json.loads(url_text)

weather_dict.keys()

dict_keys(['lat', 'lon', 'timezone', 'timezone_offset', 'current', 'minutely', 'hourly', 'daily'])

In [7]:
weather_dict['hourly'][-1]

{'dt': 1747386000,
 'temp': 15.03,
 'feels_like': 14.08,
 'pressure': 1023,
 'humidity': 57,
 'dew_point': 6.43,
 'uvi': 3.88,
 'clouds': 4,
 'visibility': 10000,
 'wind_speed': 3.01,
 'wind_deg': 343,
 'wind_gust': 3.24,
 'weather': [{'id': 800,
   'main': 'Clear',
   'description': 'clear sky',
   'icon': '01d'}],
 'pop': 0}

## Cleaning up data from one hour

In [8]:
from datetime import datetime
import pandas as pd

hour_dict = weather_dict['hourly'][0]
hour_dict

# lets convert from unix time to a datetime (easier to use)
hour_dict['datetime'] = datetime.fromtimestamp(hour_dict['dt'])

pd.Series(hour_dict)

dt                                                   1747216800
temp                                                      21.85
feels_like                                                21.24
pressure                                                   1018
humidity                                                     44
dew_point                                                  9.06
uvi                                                        4.93
clouds                                                       31
visibility                                                10000
wind_speed                                                 3.18
wind_deg                                                     40
wind_gust                                                  3.47
weather       [{'id': 802, 'main': 'Clouds', 'description': ...
pop                                                           0
datetime                                    2025-05-14 12:00:00
dtype: object

In [9]:
def my_func(args):

    df_hourly = pd.DataFrame()
    for hour_dict in weather_dict['hourly']:

        # lets convert from unix time to a datetime (easier to use)
        hour_dict['datetime'] = datetime.fromtimestamp(hour_dict['dt'])

        s_hour = pd.Series(hour_dict)
    
        df_hourly = pd.concat([df_hourly, s_hour.to_frame().T], ignore_index=True)
    
    return df_hourly

## Lecture Break/Practice 3

Pick a city in Europe you really want to visit and find its latitude and longitude. For example, La Chaux-de-Fonds, Switzerland (where Dr. Gerber's grandmother was born and raised) is located at:

    47.101333° N, 6.825° E
    
1. Create a dataframe of the next 48 hours of their weather as was done above
2. (++) Make a function `get_forecast` which accepts:
    - `lat`
    - `lon`
    - `api_key`
    - `units` (default = 'imperial')
    
    and returns a dataframe of the next 48 hours of the location's weather.

In [10]:
# Put your function here
def get_forecast(lat, lon, api_key, units='imperial'):

    url = f'https://api.openweathermap.org/data/3.0/onecall?lat={lat}&lon={lon}&appid={api_key}&units={units}'
    url_text = requests.get(url).text
    weather_dict = json.loads(url_text)

    df_hourly = pd.DataFrame()

    for hour_dict in weather_dict['hourly']:
        hour_dict['datetime'] = datetime.fromtimestamp(hour_dict['dt'])

        s_hour = pd.Series(hour_dict)
        df_hourly = pd.concat([df_hourly, s_hour.to_frame().T], ignore_index=True)

    return df_hourly


In [None]:
get_forecast(51.52, 0.30, 'd36fa352ac73226b30772f64675f41bb', units='metric')


Unnamed: 0,dt,temp,feels_like,pressure,humidity,dew_point,uvi,clouds,visibility,wind_speed,wind_deg,wind_gust,weather,pop,datetime
0,1747216800,18.3,17.7,1021,58,9.9,4.58,10,10000,4.69,59,5.88,"[{'id': 800, 'main': 'Clear', 'description': '...",0,2025-05-14 12:00:00
1,1747220400,18.61,17.99,1021,56,9.67,5.51,10,10000,4.69,66,5.42,"[{'id': 800, 'main': 'Clear', 'description': '...",0,2025-05-14 13:00:00
2,1747224000,19.2,18.59,1021,54,9.68,5.85,12,10000,4.57,74,5.05,"[{'id': 801, 'main': 'Clouds', 'description': ...",0,2025-05-14 14:00:00
3,1747227600,19.73,19.09,1021,51,9.32,5.53,17,10000,4.89,80,5.19,"[{'id': 801, 'main': 'Clouds', 'description': ...",0,2025-05-14 15:00:00
4,1747231200,20.32,19.69,1021,49,9.27,4.55,32,10000,4.79,85,5.27,"[{'id': 802, 'main': 'Clouds', 'description': ...",0,2025-05-14 16:00:00
5,1747234800,21.02,20.4,1021,47,6.85,3.26,36,10000,4.77,89,5.51,"[{'id': 802, 'main': 'Clouds', 'description': ...",0,2025-05-14 17:00:00
6,1747238400,20.51,20.0,1021,53,7.46,1.94,53,10000,4.49,94,5.61,"[{'id': 803, 'main': 'Clouds', 'description': ...",0,2025-05-14 18:00:00
7,1747242000,19.3,18.88,1021,61,7.95,0.96,63,10000,4.11,89,5.39,"[{'id': 803, 'main': 'Clouds', 'description': ...",0,2025-05-14 19:00:00
8,1747245600,17.7,17.33,1022,69,8.75,0.35,61,10000,3.52,85,5.4,"[{'id': 803, 'main': 'Clouds', 'description': ...",0,2025-05-14 20:00:00
9,1747249200,15.41,15.07,1022,79,9.17,0.1,91,10000,2.89,78,5.72,"[{'id': 804, 'main': 'Clouds', 'description': ...",0,2025-05-14 21:00:00


# Storing your API key in a local file

There exists a file `open_weather_access.py` in same directory as this jupyter notebook which contains:
    
    my_api_key = 'hello!'

In [12]:
from open_weather_access import my_api_key

print(my_api_key)

from open_weather_access import my_real_api_key
my_real_api_key

hello!


'd36fa352ac73226b30772f64675f41bb'

# `datetime`, `date`, `time` and UTC
## Unix Time (UTC)
- [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) Coordinated Universal Time
    - time zone at 0 deg longitude (i.e. British Time; some interesting [history about this choice](https://en.wikipedia.org/wiki/International_Meridian_Conference))
- [Unix Time](https://en.wikipedia.org/wiki/Unix_time) is the number of seconds which have passed since 00:00:00 UTC on 1 Jan 1970 (ignoring leap seconds)

In [30]:
from datetime import date, time, datetime

# building just a date (no time)
some_time = date(year=2022, month=11, day=11)

In [14]:
# building just a time (no date)
time(hour=15, minute=23)

datetime.time(15, 23)

In [47]:
# getting just a date from a datetime
datetime.now().date()

datetime.date(2025, 5, 14)

In [46]:
datetime.now()

datetime.datetime(2025, 5, 14, 12, 4, 48, 331429)

In [36]:
# getting just a time from a datetime
datetime.now().time()

datetime.time(12, 4, 36, 232869)

## datetimes to and from strings
Using [the strptime/strftime code](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior), we can convert between string and `datetime` representations:
- building datetimes with tzinfo explicitly passed
- strptime (from str to `datetime`)
- strftime (from `datetime` to str)
- use a date when swapping timezones switch (add space)

In [50]:
# get current time (as datetime)
now_datetime = datetime.now()

# convert datetime to str
format_str = 'It\'s now %A %B %d at %I:%M %p; is that not great!?'
now_str = now_datetime.strftime(format_str)
now_str

"It's now Wednesday May 14 at 12:07 PM; is that not great!?"

In [51]:
# convert str to datetime
# notice anything **strange**?
then_datetime = datetime.strptime(now_str, format_str)
then_datetime

datetime.datetime(1900, 5, 14, 12, 7)

# Timezones

[pytz](http://pytz.sourceforge.net/) will do all the heavy lifting for managing timezones for us

In [52]:
import pytz

# this is a lot of standards whose quirks are handled by pytz ...
pytz.all_timezones

['Africa/Abidjan',
 'Africa/Accra',
 'Africa/Addis_Ababa',
 'Africa/Algiers',
 'Africa/Asmara',
 'Africa/Asmera',
 'Africa/Bamako',
 'Africa/Bangui',
 'Africa/Banjul',
 'Africa/Bissau',
 'Africa/Blantyre',
 'Africa/Brazzaville',
 'Africa/Bujumbura',
 'Africa/Cairo',
 'Africa/Casablanca',
 'Africa/Ceuta',
 'Africa/Conakry',
 'Africa/Dakar',
 'Africa/Dar_es_Salaam',
 'Africa/Djibouti',
 'Africa/Douala',
 'Africa/El_Aaiun',
 'Africa/Freetown',
 'Africa/Gaborone',
 'Africa/Harare',
 'Africa/Johannesburg',
 'Africa/Juba',
 'Africa/Kampala',
 'Africa/Khartoum',
 'Africa/Kigali',
 'Africa/Kinshasa',
 'Africa/Lagos',
 'Africa/Libreville',
 'Africa/Lome',
 'Africa/Luanda',
 'Africa/Lubumbashi',
 'Africa/Lusaka',
 'Africa/Malabo',
 'Africa/Maputo',
 'Africa/Maseru',
 'Africa/Mbabane',
 'Africa/Mogadishu',
 'Africa/Monrovia',
 'Africa/Nairobi',
 'Africa/Ndjamena',
 'Africa/Niamey',
 'Africa/Nouakchott',
 'Africa/Ouagadougou',
 'Africa/Porto-Novo',
 'Africa/Sao_Tome',
 'Africa/Timbuktu',
 'Africa/

## Specifying a timezone info with datetime
- use `.localize()` method of a pytz timezone object
    - takes a `datetime` without any current timezone as input
- don't pass the pytz timezone object to the `tzinfo` keyword of `datetime` objects ... 
    - errors with daylight's savings time
    - these are "silent" errors, the code will work but things will be off by some amount of time

In [53]:
# build a datetime
ball_drop2025 = datetime(year=2025, month=1, day=1)

# load the timezone
time_zone_gmt = pytz.timezone('GMT')

# add the timezone to the datetime
ball_drop2025_gmt = time_zone_gmt.localize(ball_drop2025)
ball_drop2025_gmt

datetime.datetime(2025, 1, 1, 0, 0, tzinfo=<StaticTzInfo 'GMT'>)

In [54]:
# load the timezone
time_zone_BRU = pytz.timezone('Europe/Brussels')

# add the timezone to the datetime
ball_drop2025_BRU = time_zone_BRU.localize(ball_drop2025)
ball_drop2025_BRU

datetime.datetime(2025, 1, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Brussels' CET+1:00:00 STD>)

In [23]:
# Leuven is living one hour in the future ...
ball_drop2025_gmt - ball_drop2025_BRU

datetime.timedelta(seconds=3600)

In [24]:
# See if you can use datetime() to determine exactly how many days, seconds, and microseconds old you are:


In [25]:
# WARNING: don't specify a timezone at the construction of a datetime
time_zone_BRU = pytz.timezone('Europe/Brussels')
ball_drop2025_BRU_bug = datetime(year=2025, month=1, day=1, tzinfo=time_zone_BRU)

# not quite right ...
ball_drop2025_gmt - ball_drop2025_BRU_bug

datetime.timedelta(seconds=1080)

In [26]:
# notice, once a datetime has a timezone, you can no longer `.localize()` it
#time_zone_gmt.localize(ball_drop2025_BRU)

# More APIs

There are several APIs relevant to the program which you should explore on your own and determine if they are appropriate for use with your data project. You should have already looked at them a little bit, as they were shared to you via the [Notion Page](https://markefontenot.notion.site/List-of-Data-Sources-14637719edac81d0b591c5c61cdc8ed4?pvs=4) a while ago. Since you might be using any of them for your project, and one of the assessments there is your ability to work with an API without as much scaffolding/guidance as in class, we won't cover any of those specifically (but Dr. Gerber, Dr. Fontenot and Sydney will be happy to help as needed throughout the project). Instead, if there is class time for it:


## (if time) Another API Example; Spotipy

The Spotify API gives us access to any song/artist in its libraries including some simple information about it; it also serves as a convenient example for learning how to work with APIs for data collection. There is a module that has been created to access the API within python. Open up a terminal (or do it in jupyter notebook; this is a magic module) and run:

`pip install spotipy`

In [1]:
pip install spotipy --upgrade

Note: you may need to restart the kernel to use updated packages.


In [56]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

Just like with OpenWeather, we need to make an account [here](https://developer.spotify.com/) (this is essentially the same as making a regular Spotify account) and then get an API key (Spotify requires two things, actually, a Client ID and a secret key). At the above website, go to:

- Dashboard
- Log into your Spotify account (make one if you don't have one)
- Accept the terms of using the API
- Create an app (you can call it anything, I called mine `DS3000_Spotify`)
- Get a client ID (mine is `592acf2d2dc84d94bbc652f2f1d72375`, though it is usually good practice to **not** share this) and a client secret (**never share this with anyone**: save it in a separate file like we did with our OpenWeather API key earlier)

There exists a file `spotify_secret.py` in same directory as this jupyter notebook which contains:
    
    secret = 'professorgerberssecretspotify'

In [29]:
from spotify_secret import secret

ModuleNotFoundError: No module named 'spotify_secret'

In [None]:
# Authentication
# Make sure you use your OWN client ID (DO NOT leave mine in there!!)
cid = '592acf2d2dc84d94bbc652f2f1d72375'

client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

### Uniform Resource Identifiers (URI)

An important component of using the Spotify API is the use of the uniform resource identifiers, pointing at each object in the API. We need a URI to perform any function with the API referring to an object in Spotify. The URI of any Spotify object is contained in its shareable link. For example, the link to the "Best International Songs of All Time" playlist, when found from the Spotify desktop application, is:

In [None]:
playlist_link = "https://open.spotify.com/playlist/13nMupvZ2oJPhI4wetleMJ"
playlist_URI = playlist_link.split("/")[-1].split("?")[0]
track_uris0 = [x["track"]["uri"] for x in sp.playlist_tracks(playlist_URI, offset=0)["items"] if x.get("track") and x["track"].get("uri")]
# If there are more than 100 songs in the playlist, you would add additional lists like below:
track_uris1 = [x["track"]["uri"] for x in sp.playlist_tracks(playlist_URI, offset=100)["items"] if x.get("track") and x["track"].get("uri")]
#track_uris2 = [x["track"]["uri"] for x in sp.playlist_tracks(playlist_URI, offset=200)["items"] if x.get("track") and x["track"].get("uri")]

In [None]:
# put them into a list
track_uris = track_uris0 + track_uris1# + track_uris2
len(track_uris)

200

### Once you have the track URIs

There is data contained about each object in the playlist that we want to collect for our data set. Take a look at the [Spotipy](https://spotipy.readthedocs.io/en/2.25.1/#module-spotipy.client) documentation and all the functions they offer. We just used the `.playlist_tracks()` function; check out what the docs say about that, and perhaps investigate it's output before continuing to call the API and construct the data frame we ultimately want:

In [None]:
# the first track in the playlist
sp.playlist_tracks(playlist_URI, offset=0)["items"][0]

{'added_at': '2023-04-18T10:12:03Z',
 'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/redmusiccompany'},
  'href': 'https://api.spotify.com/v1/users/redmusiccompany',
  'id': 'redmusiccompany',
  'type': 'user',
  'uri': 'spotify:user:redmusiccompany'},
 'is_local': False,
 'primary_color': None,
 'track': {'preview_url': None,
  'available_markets': ['AR',
   'AT',
   'BE',
   'BO',
   'BR',
   'BG',
   'CA',
   'CL',
   'CO',
   'CR',
   'CY',
   'CZ',
   'DK',
   'DO',
   'DE',
   'EC',
   'EE',
   'SV',
   'FI',
   'FR',
   'GR',
   'GT',
   'HN',
   'HK',
   'HU',
   'IS',
   'IE',
   'IT',
   'LV',
   'LT',
   'LU',
   'MY',
   'MT',
   'MX',
   'NL',
   'NI',
   'NO',
   'PA',
   'PY',
   'PE',
   'PH',
   'PL',
   'PT',
   'SG',
   'SK',
   'ES',
   'SE',
   'CH',
   'TW',
   'TR',
   'UY',
   'US',
   'GB',
   'AD',
   'LI',
   'MC',
   'ID',
   'JP',
   'TH',
   'VN',
   'RO',
   'IL',
   'ZA',
   'SA',
   'AE',
   'BH',
   'QA',
   'OM',
   'KW',
   

In [None]:
# Initialize an empty dictionary to store all the data
playlist_dict = {'track_uri': list(),
                'track_name': list(),
                'artist_uri': list(),
                'artist_name': list(),
                'artist_pop': list(),
                'artist_genres': list(),
                'album': list(),
                'track_pop': list()}

In [None]:
track_idx = 0
ofst = [0, 100, 200] # in case you have more than 100 songs

# Spotipy (should) have a built-in timer that will wait an appropriate amount of time if you hit the request limit

for of in ofst:

    for track in sp.playlist_tracks(playlist_URI, offset=of)["items"]:
    
        if track["track"] is not None:
        
            #URI
            playlist_dict['track_uri'].append(track["track"]["uri"])
        
            #Track name
            playlist_dict['track_name'].append(track["track"]["name"])
        
            #Main Artist (may not need to save this)
            artist_info = sp.artist(track["track"]["artists"][0]["uri"])
            # playlist_dict['artist_uri'].append(artist_info)
        
            #Name, popularity, genre
            playlist_dict['artist_name'].append(track["track"]["artists"][0]["name"])
            playlist_dict['artist_pop'].append(artist_info["popularity"])
            playlist_dict['artist_genres'].append(artist_info["genres"])
        
            #Album
            playlist_dict['album'].append(track["track"]["album"]["name"])
        
            #Popularity of the track
            playlist_dict['track_pop'].append(track["track"]["popularity"])
            
            #Update track index
            track_idx += 1



In [None]:
# Did we get all the songs?
len(playlist_dict['track_uri'])

200

In [None]:
playlist_dict.keys()

dict_keys(['track_uri', 'track_name', 'artist_uri', 'artist_name', 'artist_pop', 'artist_genres', 'album', 'track_pop'])

In [None]:
# grabs track specific data
track_data = sp.track(playlist_dict['track_uri'][0])

In [None]:
from collections import defaultdict
import pandas as pd

song_dict = defaultdict(list)

# adds some more track specific data (release date, duration, if it's explicit or not)
for track in playlist_dict['track_uri']:
    song_temp = sp.track(track)
    song_dict['release_date'].append(song_temp['album']['release_date'])
    song_dict['duration_ms'].append(song_temp['duration_ms'])
    song_dict['explicit'].append(song_temp['explicit'])

song_df = pd.DataFrame(song_dict)



In [None]:
# we got the data, but we need to append the information to the first data frame
# currently, there's no song information here:
song_df.head()

Unnamed: 0,release_date,duration_ms,explicit
0,2021-04-23,229525,False
1,2017-03-03,233712,False
2,2024-12-27,199032,False
3,2021-11-30,219493,False
4,2015-01-12,269666,False


In [None]:
song_df['song_title'] = playlist_dict['track_name']
song_df['artist_name'] = playlist_dict['artist_name']
song_df['artist_pop'] = playlist_dict['artist_pop']
song_df['artist_genres'] = playlist_dict['artist_genres']
song_df['track_pop'] = playlist_dict['track_pop']
song_df.head()

Unnamed: 0,release_date,duration_ms,explicit,song_title,artist_name,artist_pop,artist_genres,track_pop
0,2021-04-23,229525,False,See You Again,Wiz Khalifa,84,[rap],44
1,2017-03-03,233712,False,Shape of You,Ed Sheeran,90,[soft pop],90
2,2024-12-27,199032,False,back to friends,sombr,83,[],94
3,2021-11-30,219493,False,Gangnam Style,PSY,66,[k-pop],0
4,2015-01-12,269666,False,Uptown Funk (feat. Bruno Mars),Mark Ronson,78,[],88


In [None]:
# ALWAYS a good strategy to save it somewhere (with a different name each time, for version control)
# This way, you don't have to continuously call the API just to get the data; just call it once!
# (This is also very important when you have a limited number of API calls, or are paying per call)
song_df.to_csv('international_hits.csv', index=False)