# DS3000 Day 2

Sep 13, 2024 (Friday the 13th)

![13th](https://s.hdnux.com/photos/66/55/44/14342434/3/rawImage.jpg)

Admin
- New modules for today and next week:
    - `pip install requests plotly matplotlib`
- Homework 1 due next Tues, Sep 17 by end of the day
    - submit by Sunday night to get 5\% extra credit

Push-Up Tracker
- Section 04: 1
- Section 08: 1

Content
- introduce APIs

# Basic tools in preparation for APIs

The `requests` module comes into play soon. While you may have installed it in the terminal earlier, it is actually a [magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-pip) command, which means it can be installed directly from jupyter (should you ever need to install it again, or if you had difficulty installing it earlier).

In [1]:
#pip install requests

## Building a DataFrame row by row

We often get data in chunks (web scraping / API calls).  We'll need to store our data incrementally:

In [2]:
import pandas as pd

dict_list = [{'a': 1, 'b': 2, 'c': 3},
             {'a': 4, 'b': 3874, 'c': 398}]

df = pd.DataFrame()

for d in dict_list:
    df = pd.concat([df, pd.Series(d).to_frame().T])
    
df

Unnamed: 0,a,b,c
0,1,2,3
0,4,3874,398


In [3]:
# to include index names
list_dict = [{'a': 1, 'b': 2, 'c': 3},
            {'a': 4, 'b': 3874, 'c': 398}]

name_list = ['first', 'second']

df = pd.DataFrame()
for idx in range(2):
    # extract dictionary & name
    d = list_dict[idx]
    name = name_list[idx]
    
    # build series and name it
    series = pd.Series(d, name=name)
    
    df = pd.concat([df, series.to_frame().T])
    
df

Unnamed: 0,a,b,c
first,1,2,3
second,4,3874,398


# Timestamps

Many datasets include a timestamp, or include a date/time as a feature in the dataset. Understanding how to deal with these can be important! We actually already used pandas `.to_datetime()` function with the Korean Demographics data to cast strings to `datetime` objects. We'll look a few time highlights which will come in handy on Homework 2.

## Unix Time

- [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) Coordinated Universal Time
    - time zone at 0 deg longitude
        - how is 0 deg longitude defined?  
            - A succesfully warring empire (United Kingdom) chose it 
                - (It would be convenient if a metric system loving empire had been more successful at war ...)
- [Unix Time](https://en.wikipedia.org/wiki/Unix_time) is The number of seconds which have passed since 00:00:00 UTC on 1 Jan 1970 (ignoring leap seconds)
![unix](https://i.redd.it/o1li4ktbyf871.png)
- UTC is time zone agnostic 
    - (more on this next lesson...)

## Python's `datetime`, `timedelta`, and pytz
- helpful for all those pesky unit conversions

In [1]:
from datetime import datetime, timedelta

# would you believe that the below is exactly 2 am on Valentine's Day 2021?
utc_example = 1613286000

# assumes the time zone of the machine its running on!
dt0 = datetime.fromtimestamp(utc_example)
dt0

datetime.datetime(2021, 2, 14, 2, 0)

In [3]:
date1 = datetime.strptime('Today is the 25th', 'Today is the %dth')

In [4]:
date1 + timedelta(days = 90)

datetime.datetime(1900, 4, 25, 0, 0)

In [8]:
str1 = "September 25"
str2 = "10 am"

In [9]:
newdate = datetime.strptime(str1 + ' ' + str2, '%B %d %H %p')
newdate

datetime.datetime(1900, 9, 25, 10, 0)

In [10]:
newnewdate = datetime(year = 2023, month = newdate.month, day = newdate.day, hour = newdate.hour)
newnewdate

datetime.datetime(2023, 9, 25, 10, 0)

In [11]:
import pytz
tz_mali = pytz.timezone("Africa/Timbuktu")
inmali = tz_mali.localize(newnewdate)

In [12]:
tz_est = pytz.timezone("EST")
inmali.astimezone(tz_est)

datetime.datetime(2023, 9, 25, 5, 0, tzinfo=<StaticTzInfo 'EST'>)

In [13]:
# what about right.... now?
dt1 = datetime.now()
dt1

datetime.datetime(2024, 9, 9, 18, 44, 17, 349442)

In [14]:
# we can set future dates as well
dt2 = datetime(year=2031, month=4, day=15, hour=9, minute=26, second=53)
dt2

datetime.datetime(2031, 4, 15, 9, 26, 53)

In [15]:
# we can access meaningful date attributes of a datetime object
# year, month, day, hour, minute, second
dt2.month, dt2.day

(4, 15)

In [18]:
# we can add / subtract timedelta objects
offset = timedelta(days=5, seconds=8979)

print(dt2)
print(dt2 + offset)

2031-04-15 09:26:53
2031-04-20 11:56:32


In [19]:
# use strptime to take the time from strings contain other words
datetime.strptime('the time is now: September-30-2022 3:20 PM', 'the time is now: %B-%d-%Y %H:%M %p')

datetime.datetime(2022, 9, 30, 3, 20)

In [20]:
# use strftime to cast a time to a string that contains other words
# https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
s = datetime.now().strftime('the time is now: %B-%d-%Y %H:%M %p')
s

'the time is now: September-09-2024 18:46 PM'

In [21]:
# you can save useful time info in a dictionary (which could then become a series -> data frame)
dt = datetime.now()
{'hour': dt.hour,
'minute': dt.minute}

{'hour': 18, 'minute': 46}

In [23]:
# you can figure out how old you are in seconds
eric_age = (datetime.now() - datetime(year=1990, month=12, day=20, hour=22, minute=42)).total_seconds()
print(eric_age)
# put it in billions (it wasn't too long ago that I turned 1 billion!)
eric_age/ 1e09

1064174716.674757


1.064174716674757

# API
###  Definitions
**API** Application Program Interface
 - a server which gives out data (often over the internet)
 - note: 'API', in general, refers to the barrier between two pieces of software:
     - in this case, the server which hosts data & our own software which requests it
 
![apii](https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65472b4e-8deb-474d-82e7-6ca5632b3556_505x529.png)


 **JSON** JavaScript Object Notation
  - a method of storing objects as text
  - much like the nested dictionaries ... JSON and similar formats are often trees

![json](https://project-static-assets.s3.amazonaws.com/APISpreadsheets/APIMemes/WhoIsJason.jpeg)


## OpenWeather API
What information does this offer?

[https://openweathermap.org/api](https://openweathermap.org/api)

How do I get ready to use it?
- sign up for an account
    - [https://home.openweathermap.org/users/sign_up](https://home.openweathermap.org/users/sign_up)
- get an api key (my key was emailed to me with my confirmation of account)
    - [https://home.openweathermap.org/api_keys](https://home.openweathermap.org/api_keys)
 
![ap](https://www.memecreator.org/static/images/memes/4593014.jpg)
        
Think of APIs as a hybrid of a website and a function.  Its a website where your query is stored in the address:
    
    https://api.openweathermap.org/data/3.0/onecall?lat=42.3601&lon=-71.0589&appid=YOUR-API-KEY-HERE-THIS-WONT-WORK&units=imperial
    
The result is a JSON object, which we can quickly convert to our dictionary of dictionary tree format.

In [7]:
api_key = 'cf758020c3c57082bbfd8b62d88ca683'

# north = positive, south = negative
lat = 42.3601
# west = positive, east = negative
lon = -71.0589

units = 'imperial'
url = f'https://api.openweathermap.org/data/3.0/onecall?lat={lat}&lon={lon}&appid={api_key}&units={units}'
print(url)

https://api.openweathermap.org/data/3.0/onecall?lat=42.3601&lon=-71.0589&appid=cf758020c3c57082bbfd8b62d88ca683&units=imperial


In [8]:
import requests

# get url as a string
url_text = requests.get(url).text    
url_text

'{"lat":42.3601,"lon":-71.0589,"timezone":"America/New_York","timezone_offset":-14400,"current":{"dt":1726184212,"sunrise":1726136495,"sunset":1726181964,"temp":69.67,"feels_like":69.53,"pressure":1023,"humidity":68,"dew_point":58.66,"uvi":0,"clouds":19,"visibility":10000,"wind_speed":7.96,"wind_deg":197,"wind_gust":21.81,"weather":[{"id":801,"main":"Clouds","description":"few clouds","icon":"02n"}]},"minutely":[{"dt":1726184220,"precipitation":0},{"dt":1726184280,"precipitation":0},{"dt":1726184340,"precipitation":0},{"dt":1726184400,"precipitation":0},{"dt":1726184460,"precipitation":0},{"dt":1726184520,"precipitation":0},{"dt":1726184580,"precipitation":0},{"dt":1726184640,"precipitation":0},{"dt":1726184700,"precipitation":0},{"dt":1726184760,"precipitation":0},{"dt":1726184820,"precipitation":0},{"dt":1726184880,"precipitation":0},{"dt":1726184940,"precipitation":0},{"dt":1726185000,"precipitation":0},{"dt":1726185060,"precipitation":0},{"dt":1726185120,"precipitation":0},{"dt":17

In [9]:
# should not have to install the below
import json

# convert json to a nested dict
weather_dict = json.loads(url_text)

weather_dict.keys()

dict_keys(['lat', 'lon', 'timezone', 'timezone_offset', 'current', 'minutely', 'hourly', 'daily'])

In [10]:
#what does one hour of weather look like
weather_dict['hourly'][2]

{'dt': 1726189200,
 'temp': 69.01,
 'feels_like': 68.67,
 'pressure': 1023,
 'humidity': 65,
 'dew_point': 56.77,
 'uvi': 0,
 'clouds': 15,
 'visibility': 10000,
 'wind_speed': 6.98,
 'wind_deg': 206,
 'wind_gust': 19.3,
 'weather': [{'id': 801,
   'main': 'Clouds',
   'description': 'few clouds',
   'icon': '02n'}],
 'pop': 0}

## Cleaning up data from one hour

In [11]:
from datetime import datetime
import pandas as pd

hour_dict = weather_dict['hourly'][0]
hour_dict

# lets convert from unix time to a datetime (easier to use)
hour_dict['datetime'] = datetime.fromtimestamp(hour_dict['dt'])

pd.Series(hour_dict)

dt                                                   1726182000
temp                                                      70.02
feels_like                                                69.67
pressure                                                   1023
humidity                                                     63
dew_point                                                 56.86
uvi                                                           0
clouds                                                       20
visibility                                                10000
wind_speed                                                 8.16
wind_deg                                                    192
wind_gust                                                 19.08
weather       [{'id': 801, 'main': 'Clouds', 'description': ...
pop                                                           0
datetime                                    2024-09-12 19:00:00
dtype: object

In [12]:
df_hourly = pd.DataFrame()
for hour_dict in weather_dict['hourly']:

    # lets convert from unix time to a datetime (easier to use)
    hour_dict['datetime'] = datetime.fromtimestamp(hour_dict['dt'])

    s_hour = pd.Series(hour_dict)
    
    df_hourly = pd.concat([df_hourly, s_hour.to_frame().T], ignore_index=True)
    
df_hourly.head()

Unnamed: 0,dt,temp,feels_like,pressure,humidity,dew_point,uvi,clouds,visibility,wind_speed,wind_deg,wind_gust,weather,pop,datetime,rain
0,1726182000,70.02,69.67,1023,63,56.86,0,20,10000,8.16,192,19.08,"[{'id': 801, 'main': 'Clouds', 'description': ...",0,2024-09-12 19:00:00,
1,1726185600,69.67,69.53,1023,68,58.66,0,19,10000,7.96,197,21.81,"[{'id': 801, 'main': 'Clouds', 'description': ...",0,2024-09-12 20:00:00,
2,1726189200,69.01,68.67,1023,65,56.77,0,15,10000,6.98,206,19.3,"[{'id': 801, 'main': 'Clouds', 'description': ...",0,2024-09-12 21:00:00,
3,1726192800,67.8,67.24,1023,63,54.77,0,11,10000,6.71,218,18.57,"[{'id': 801, 'main': 'Clouds', 'description': ...",0,2024-09-12 22:00:00,
4,1726196400,66.07,65.34,1024,63,53.13,0,8,10000,6.24,223,17.76,"[{'id': 800, 'main': 'Clear', 'description': '...",0,2024-09-12 23:00:00,


## Lecture Break/Practice 3

La Chaux-de-Fonds, Switzerland is located at:

    47.101333° N, 6.825° E
    
1. Create a dataframe of the next 48 hours of their weather as was done above
2. (++) Make a function `get_forecast` which accepts:
    - `lat`
    - `lon`
    - `api_key`
    - `units` (default = 'imperial')
    
    and returns a dataframe of the next 48 hours of the location's weather. Test it on a location of your choice.

In [24]:
# get_forecast(47.101333, 6.825)

# Storing your API key in a local file

There exists a file `open_weather_access.py` in same directory as this jupyter notebook which contains:
    
    my_api_key = 'hello!'

In [32]:
from open_weather_access import my_api_key

print(my_api_key)

# from open_weather_access import my_real_api_key
# print(my_real_api_key)

hello!


## Looking Ahead; Spotify (for use on Homework 2)

![spot](https://static1.srcdn.com/wordpress/wp-content/uploads/2021/12/Spotify-Wrapped-Memes-Featured.jpeg)

The Spotify API is quite powerful and gives us access to any song/artist in its libraries, plus even more information that you might not have thought of. There is also a module that has been created to access the API within python. Open up a terminal (or do it in jupyter notebook; this is a magic module) and run:

`pip install spotipy`

In [33]:
#pip install spotipy

In [34]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

Just like with OpenWeather, we need to make an account [here](https://developer.spotify.com/) (this is essentially the same as making a regular Spotify account) and then get an API key (Spotify requires two things, actually, a Client ID and a secret key). At the above website, go to:

- Dashboard
- Log into your Spotify account (make one if you don't have one)
- Accept the terms of using the API
- Create an app (you can call it anything, I called mine `DS3000_Spotify`)
- Get a client ID (mine is `952bb78187fd483b9a9e1edc7ab78100`, though it is usually good practice to **not** share this) and a client secret (**never share this with anyone**: save it in a separate file like we did with our OpenWeather API key earlier)

There exists a file `spotify_secret.py` in same directory as this jupyter notebook which contains:
    
    secret = 'professormohitssecretspotify'

In [35]:
from spotify_secret import secret

In [36]:
# Authentication
cid = '952bb78187fd483b9a9e1edc7ab78100'

client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

You will learn more about how to use Spotipy, including the tricky bits that are unique to its usage, on Homework 2. **START THIS EARLY ONCE IT IS RELEASED NEXT WEEK!!**