# DS3000 Lecture 6

### Admin:
- HW 2 due on Friday
- HW 3 posted today and due on Sunday
- Quiz 1 will be posted on Friday; will have 2 hours to complete by Sunday night (covers all material up to and including today)

### Content:
- New Skill
    - representing trees as nested dictionaries
- Obtaining data via an API
    - OpenWeatherAPI
        - More time discussion (timezones)
    - Spotify API (If time)

In [2]:
import numpy

numpy.zeros()

TypeError: zeros() missing required argument 'shape' (pos 0)

## Installation

- modules needed today 
    - `requests` and `spotipy`
- the first can be installed directly from jupyter
    - its a [magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-pip) command


    pip install requests

In [2]:
#pip install requests

## Representing Trees as Lists & Dictionaries
- useful for representing a tree of data
- (our API calls will return nested dictionaries)

<img src="https://i.ibb.co/Pmxqpb3/tree-ex.png" alt="Drawing" style="width: 400px;"/>

<img src="https://i.ibb.co/4SSH4mm/tree-ex2.png" alt="Drawing" style="width: 600px;"/>

### Exercise 0: 

Express all of the following penguin group's height and weight as a list of dictionaries:
<img src="https://i.ibb.co/XXzX4Wk/penguin-tree.png" alt="Drawing" style="width: 700px;"/>

# API
###  Definitions
**API** Application Program Interface
 - a server which gives out data (often over the internet)
 - note: 'API', in general, refers to the barrier between two pieces of software:
     - in this case, the server which hosts data & our own software which requests it
 
 
 **JSON** JavaScript Object Notation
  - a method of storing objects as text
  - much like the nested dictionaries ... JSON and similar formats are often trees

## OpenWeather API
What information does this offer?

[https://openweathermap.org/api](https://openweathermap.org/api)

How do I get ready to use it?
- sign up for an account
    - [https://home.openweathermap.org/users/sign_up](https://home.openweathermap.org/users/sign_up)
- get an api key (my key was emailed to me with my confirmation of account)
    - [https://home.openweathermap.org/api_keys](https://home.openweathermap.org/api_keys)
        
Think of APIs as a hybrid of a website and a function.  Its a website where your query is stored in the address:
    
    https://api.openweathermap.org/data/2.5/onecall?lat=42.3601&lon=-71.0589&appid=YOUR-API-KEY-HERE-THIS-WONT-WORK&units=imperial
    
The result is a JSON object, which we can quickly convert to our dictionary of dictionary tree format.

In [2]:
# todo: swap this out
api_key = '2afdede234eabfa52612efba55bcc8ac'

# north = positive, south = negative
lat = 42.3601
# east = positive, west = negative
lon = -71.0589

#standard, metric and imperial
units = 'imperial' 

#url = f'https://api.openweathermap.org/data/2.5/weather?lat={lat}&lon={lon}&APPID={api_key}'
url = f'https://api.openweathermap.org/data/2.5/forecast?lat={lat}&lon={lon}&APPID={api_key}&units={units}'

print(url)

https://api.openweathermap.org/data/2.5/forecast?lat=42.3601&lon=-71.0589&APPID=2afdede234eabfa52612efba55bcc8ac&units=imperial


In [3]:
import requests

In [4]:
import json

## Cleaning up data


In [6]:
from datetime import datetime
import pandas as pd


## Exercise 1

La Chaux-de-Fonds, Switzerland is located at:

    47.101333° N, 6.825° E
    
1. Create a dataframe of the 5-day-3-hour of their weather as was done above
2. (++) Make a function `get_forecast` which accepts:
    - `lat`
    - `lon`
    - `api_key`
    - `units` (default = 'imperial')
    
    and returns a dataframe of the next 48 hours of the location's weather.

In [64]:
import requests
import json
import pandas as pd
from datetime import datetime

def get_forecast(lat, lon, units='imperial', api_key=''):
    """ creates a function that gets the hourly forecast (for the next two days) for a given lat and lon
    
    Args:
        lat (float): latitude
        lon (float): longitude
        units (str): units
        api_key (str): key for accessing API
        
    Returns:
        df_hourly (data frame): data frame with rows corresponding to 3-hour window and columns various features
    
    """
  

In [7]:
#get_forecast(47.101333, 6.825)

# Storing your API key in a local file

There exists a file `open_weather_access.py` in same directory as this jupyter notebook which contains:
    
    my_api_key = 'hello!'

# `datetime`, `date`, `time` and UTC refresher
## Unix Time (UTC)
- [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) Coordinated Universal Time
    - time zone at 0 deg longitude
- [Unix Time](https://en.wikipedia.org/wiki/Unix_time) is the number of seconds which have passed since 00:00:00 UTC on 1 Jan 1970 (ignoring leap seconds)

In [8]:
from datetime import date, time, datetime


## datetimes to and from strings
Using [the strptime/strftime code](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior), we can convert between string and `datetime` representations:
- building datetimes with tzinfo explicitly passed
- strptime (from str to `datetime`)
- strftime (from `datetime` to str)
- use a date when swapping timezones switch (add space)

- %d: the day of the month as a zero-padded decimal number such as 28.
- %a: a day's abbreviated name, such as Sun.
- %A: a day's full name, such as Sunday.
- %m: the month as a zero-padded decimal number, such as 01.
- %b: month's abbreviated name, such as Jan.
- %B: the month's full name, such as January.
- %y: the year without century, such as 23.
- %Y: the year with century, such as 2023.
- %H: the hours of the day in a 24-hour format, such as 08.
- %I: the hours of the day in a 12-hour format.
- %p: AM or PM
- %M: the minutes in an hour, such as 20.
- %S: the seconds in a minute, such as 00.

# Timezones

[pytz](http://pytz.sourceforge.net/) will do all the heavy lifting for managing timezones for us

(This is super helpful for hw3!)

In [9]:
import pytz



## Specifying a timezone info with datetime
- use `.localize()` method of a pytz timezone object
    - takes a `datetime` without any current timezone as input
- don't pass the pytz timezone object to the `tzinfo` keyword of `datetime` objects ... 
    - errors with daylight's savings time
    - these are "silent" errors, the code will work but things will be off by some amount of time

In [10]:
#time_zone_gmt.localize(ball_drop2023_est)

# Changing Time Zones

To change the timezone of a datetime, use `datetime.astimezone()`:

Best practice: only use `datetime.astimezone()` after you've explicitly set timezone of the datetime object first.

# A subtlety:

`datetime.utcfromtimestamp()`
   - converts unix time to datetime with UTC (though no timezone given in datetime)
   - useful for timezone conversion
   
`datetime.fromtimestamp()`
   - converts unix time to datetime in local timezone (though no timezone given in datetime)
   - useful if we want everything in local timezone
        - (we run into trouble once timezones come into play)

## Exercise 2: 
1. Write a function `from_unix_to_datetime()` which:
- accepts:
    - unix_time (float): seconds since jan 1 1970 UTC
    - timezone_to (str): timezone of output datetime object
- returns:
    - datetime_tz (datetime): datetime object in given timezone
    
Be sure to properly document your function.

2.  Write a few `assert` test cases to validate your function.
    - Rather than checking if values are proper yourself, it may be best to rely on [another source](https://www.epochconverter.com/)

In [8]:
def from_unix_to_datetime(unix_time, timezone_to='US/Eastern'):
    """ converts unix time to a datetime object
    
    Args:
        unix_time (float): unix time (sec since jan 1 1970 UTC)
        timezone_to (str): timezone of output datetime
        
    Returns:
        datetime_tz (datetime): datetime object corresponding to 
            unix_time
    """

In [9]:
unix_time=1130385662
timezone_to='US/Central'

# compute (per my function)
datetime_computed = from_unix_to_datetime(unix_time=unix_time, 
                                          timezone_to=timezone_to)

# construct (per known answer from website)
time_zone = pytz.timezone(timezone_to)
datetime_expected = time_zone.localize(datetime(2005, 10, 26, 23, 1, 2))

# check that they're equal
assert datetime_expected == datetime_computed

In [10]:
unix_time=1664820000
timezone_to='GMT'

# compute (per my function)
datetime_computed = from_unix_to_datetime(unix_time=unix_time, 
                                          timezone_to=timezone_to)

# construct (per known answer from website)
time_zone = pytz.timezone(timezone_to)
datetime_expected = time_zone.localize(datetime(2022, 10, 3, 18, 0, 0))

# check that they're equal
assert datetime_expected == datetime_computed

## Spotify API (If time)

The Spotify API is quite powerful and gives us access to any song/artist in its libraries, plus even more information that you might not have thought of. There is also a module that has been created to access the API within python. Open up a terminal (or do it in jupyter notebook; this is a magic module) and run:

`pip install spotipy`

In [11]:
#pip install spotipy

Collecting spotipy
  Downloading spotipy-2.23.0-py3-none-any.whl (29 kB)
Collecting redis>=3.5.3
  Downloading redis-5.0.1-py3-none-any.whl (250 kB)
     -------------------------------------- 250.3/250.3 kB 3.9 MB/s eta 0:00:00
Collecting async-timeout>=4.0.2
  Downloading async_timeout-4.0.3-py3-none-any.whl (5.7 kB)
Installing collected packages: async-timeout, redis, spotipy
Successfully installed async-timeout-4.0.3 redis-5.0.1 spotipy-2.23.0
Note: you may need to restart the kernel to use updated packages.




In [28]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

Just like with OpenWeather, we need to make an account [here](https://developer.spotify.com/) (this is essentially the same as making a regular Spotify account) and then get an API key (Spotify requires two things, actually, a Client ID and a secret key). At the above website, go to:

- Dashboard
- Log into your Spotify account (make one if you don't have one)
- Accept the terms of using the API
- Create an app (you can call it anything, I called mine `DS3000_Spotify`)
- Get a client ID (it is usually good practice to **not** share this) and a client secret (**never share this with anyone**: save it in a separate file like we did with our OpenWeather API key earlier)

There exists a file `spotify.py` in same directory as this jupyter notebook which contains:
    
    Client_ID = 'professorsidspotify'
    Client_secret = 'professorssecretspotify'

In [27]:
from spotify import Client_ID, Client_secret

In [29]:
# Authentication

## Uniform Resource Identifiers (URI)

An important component of using the Spotify API is the use of the uniform resource identifiers, pointing at each object in the API. We need a URI to perform any function with the API referring to an object in Spotify. The URI of any Spotify object is contained in its shareable link. For example, the link to the Global top songs playlist, when found from the Spotify desktop application, is:



In [30]:
playlist_link = "https://open.spotify.com/playlist/37i9dQZF1DXcBWIGoYBM5M"
playlist_URI = playlist_link.split("/")[-1].split("?")[0]
track_uris = [x["track"]["uri"] for x in
              sp.playlist_tracks(playlist_URI)["items"]]

In [11]:
playlist_dict = {'track_uri': list(),
                'track_name': list(),
                'artist_uri': list(),
                'artist_name': list(),
                'artist_pop': list(),
                'artist_genres': list(),
                'album': list(),
                'track_pop': list()}

track_idx = 0



### Extracting Features from Tracks

Now that we have a list of track URIs, we can extract features from these tracks. Spotify has a list of these features for each of its tracks, from analysis of the audio. We can access these with a single method of the spotify object `audio_features(uri)`. This gives us a list of mostly numerical features that we can use for analysis.

In [12]:
from collections import defaultdict
import pandas as pd



In [13]:
# make a plot of energy vs. danceability with song title hover data
