# DSCI 511: Data acquisition and pre-processing<br>Chapter 3: Acquiring Data from the Internet
## Exercises
Note: numberings refer to the main notes.

#### 3.1.2.3 Exercise: processing a JSON response
Make a request to the SEPTA Arrivals API to get data on the next 10 trains to arrive at Suburban Station. Store this JSON-format data into a dictionary. Inspect the dictionary structure. Then, write code to create a list containing 10 dictionaries, one for each train. These new dictionaries should look like this:

#### Discussion: figuring out what we got. 
Putting together/modifying the example URL from Section 3.1.2.2 was relatively straightforward, especially since Suburban Station is exhibited with its station ID directly in the docs. The harder part is probably figuring out how to extract the information out of the json response. As it turns out, we got 20 trains: 10 each going Northbound and Southbound! 

In [5]:
import requests
from pprint import pprint

response = requests.get("http://www3.septa.org/hackathon/Arrivals/Suburban Station/10")

# print(response.json())

In [7]:
# example of train dictionary format
train_dict = {
    'direction': 'S',
     'line': 'Media/Elwyn',
     'sched_time': '2018-08-22 17:31:01.000',
     'status': 'On Time',
     'track': '6'
}

pprint(train_dict)

{'direction': 'S',
 'line': 'Media/Elwyn',
 'sched_time': '2018-08-22 17:31:01.000',
 'status': 'On Time',
 'track': '6'}


In [17]:
data = response.json()
top_keys = list(data.keys())
# pprint(data[top_keys[0]][0]["Northbound"])

trains = []
for timestamp in data: ## timestamp is the sole key at the top level of response
    for outbound_direction in data[timestamp]: ## each track direction gets its own dictionary
        for direction in outbound_direction:
            for train in outbound_direction[direction]:
                trains.append({
                    'direction': train['direction'],
                    'line': train['line'],
                    'sched_time': train['sched_time'],
                    'status': train['status'],
                    'track': train['track']
                })

pprint(trains)

[{'direction': 'N',
  'line': 'Warminster',
  'sched_time': '2018-10-16 19:04:00.000',
  'status': '1 min',
  'track': '1'},
 {'direction': 'N',
  'line': 'Paoli/Thorndale',
  'sched_time': '2018-10-16 19:19:00.000',
  'status': '3 min',
  'track': '1'},
 {'direction': 'N',
  'line': 'Media/Elwyn',
  'sched_time': '2018-10-16 19:21:00.000',
  'status': '1 min',
  'track': '2'},
 {'direction': 'N',
  'line': 'Fox Chase',
  'sched_time': '2018-10-16 19:22:00.000',
  'status': 'On Time',
  'track': '1'},
 {'direction': 'N',
  'line': 'Trenton',
  'sched_time': '2018-10-16 19:22:00.000',
  'status': '8 min',
  'track': '1'},
 {'direction': 'N',
  'line': 'Cynwyd',
  'sched_time': '2018-10-16 19:28:00.000',
  'status': 'On Time',
  'track': '6'},
 {'direction': 'N',
  'line': 'Airport',
  'sched_time': '2018-10-16 19:34:00.000',
  'status': 'On Time',
  'track': '2'},
 {'direction': 'N',
  'line': 'Chestnut Hill West',
  'sched_time': '2018-10-16 19:36:00.000',
  'status': 'On Time',
  'tra

In [20]:
data = response.json()
top_keys = list(data.keys())
# pprint(data[top_keys[0]][0]["Northbound"])

train_keys = ['direction', 'line', 'sched_time', 'status', 'track']

trains = []
for timestamp in data: ## timestamp is the sole key at the top level of response
    for outbound_direction in data[timestamp]: ## each track direction gets its own dictionary
        for direction in outbound_direction:
            for train in outbound_direction[direction]:
                trains.append({
                    train_key: train[train_key]
                    for train_key in train_keys
                })
print(len(trains))
pprint(trains)

20
[{'direction': 'N',
  'line': 'Warminster',
  'sched_time': '2018-10-16 19:04:00.000',
  'status': '1 min',
  'track': '1'},
 {'direction': 'N',
  'line': 'Paoli/Thorndale',
  'sched_time': '2018-10-16 19:19:00.000',
  'status': '3 min',
  'track': '1'},
 {'direction': 'N',
  'line': 'Media/Elwyn',
  'sched_time': '2018-10-16 19:21:00.000',
  'status': '1 min',
  'track': '2'},
 {'direction': 'N',
  'line': 'Fox Chase',
  'sched_time': '2018-10-16 19:22:00.000',
  'status': 'On Time',
  'track': '1'},
 {'direction': 'N',
  'line': 'Trenton',
  'sched_time': '2018-10-16 19:22:00.000',
  'status': '8 min',
  'track': '1'},
 {'direction': 'N',
  'line': 'Cynwyd',
  'sched_time': '2018-10-16 19:28:00.000',
  'status': 'On Time',
  'track': '6'},
 {'direction': 'N',
  'line': 'Airport',
  'sched_time': '2018-10-16 19:34:00.000',
  'status': 'On Time',
  'track': '2'},
 {'direction': 'N',
  'line': 'Chestnut Hill West',
  'sched_time': '2018-10-16 19:36:00.000',
  'status': 'On Time',
  '

#### 3.2.1.1 Exercise: accessing a soccer schedule

Make a request to the Sportradar Soccer schedule API to obtain the match schedule for Liverpool FC (team_id = sr:competitor:44). Then, from the obtained schedule, make a simple list of fixtures. Your output should be a list with strings as elements. The strings should be of the format "HOME_TEAM vs AWAY_TEAM".

#### Discussion: sometimes it's easier to work with support data than logic
This solution is a great example of when data can simplify code. If we had wanted to, we could have used `if/else` logic gates to make sure the home and away teams were always listed in the right order as we construct our fixtures. But creating the `fixture` object as a dictionary with two keys: `'home'` and `'away'` was strategic: using the _value_ of each `competitor`'s `'qualifier'` field (i.e., role as `'home'` and `'away'`) allowed us to just focus on routing each team to its appropriate position in the fixture _associatively_. 

In [25]:
import requests

# code goes here
soccer_key = "5a7p6zwdqwu5pkvfhhm3bd4a"
address = "https://api.sportradar.us/soccer-xt3/eu/en/teams/sr:competitor:44/schedule.json?api_key=" + soccer_key

response = requests.get(address)

data = response.json()

fixtures = []
for match in data['schedule']:
    fixture = {
        "home": "",
        "away": ""
    }
    for competitor in match['competitors']:
#         pprint(competitor)
        fixture[competitor['qualifier']] = competitor['name']
    
    fixtures.append(
        fixture['home']+" vs "+fixture['away']
    )
pprint(fixtures)

['Huddersfield Town vs Liverpool FC',
 'Liverpool FC vs FK Red Star Belgrade',
 'Liverpool FC vs Cardiff City',
 'Arsenal FC vs Liverpool FC',
 'FK Red Star Belgrade vs Liverpool FC',
 'Liverpool FC vs Fulham FC',
 'Watford FC vs Liverpool FC',
 'Paris Saint-Germain vs Liverpool FC',
 'Liverpool FC vs Everton FC',
 'Burnley FC vs Liverpool FC',
 'AFC Bournemouth vs Liverpool FC',
 'Liverpool FC vs SSC Napoli',
 'Liverpool FC vs Manchester United',
 'Wolverhampton Wanderers vs Liverpool FC',
 'Liverpool FC vs Newcastle United',
 'Liverpool FC vs Arsenal FC',
 'Manchester City vs Liverpool FC',
 'Brighton & Hove Albion FC vs Liverpool FC',
 'Liverpool FC vs Crystal Palace',
 'Liverpool FC vs Leicester City',
 'West Ham United vs Liverpool FC',
 'Liverpool FC vs AFC Bournemouth',
 'Manchester United vs Liverpool FC',
 'Liverpool FC vs Watford FC',
 'Everton FC vs Liverpool FC',
 'Liverpool FC vs Burnley FC',
 'Fulham FC vs Liverpool FC',
 'Liverpool FC vs Tottenham Hotspur',
 'Southampton

#### 3.3.3.1 Exercise: access some accidental haikus from Twitter's REST API
Create your Twitter API keys and download the last 15 tweets by @accidental575 (the hilarious Accidental Haiku Bot).

#### Discussion: just drop your keys in, and start accessing tweets
Working with a client is _very_ convenient, but the only reason these things exist is because the access is so valued and controled. If you haven't, sign up for a developer account and create an app today to get working with Twitter's API. 

In [27]:
from twython import Twython

access_token = ''
access_token_secret = ''
consumer_key = ''
consumer_secret = ''

twitter = Twython(consumer_key, consumer_secret)

haiku_twitter = twitter.get_user_timeline(screen_name = "accidental575")

In [32]:
for tweet in haiku_twitter[:15]:
    print(tweet['text']+"\n")

Just writing to let /
everyone know I have a /
new profile picture /
#accidentalhaiku by @BHump_12 
https://t.co/T2usPc5C0S

freelance is great cause /
sometimes you don’t wear pants for /
an entire day /
#accidentalhaiku by @mattgee 
https://t.co/y48pVrBc8D

Tell me you love me /
started playing at Starbucks /
and I gasped out loud /
#accidentalhaiku by @ashmj21 
https://t.co/Rfml47ypDL

My dad is singing /
Disney hits with me in the /
car! I’m so happy ❤️ /
#accidentalhaiku by @MakaylaBickhart 
https://t.co/FmLlroSBVv

Wow that’s a lot of /
instructions on how to use /
a public restroom! /
#accidentalhaiku by @kbakies 
https://t.co/cE3OUt0Exh

there are squirrels in /
the Grand Canyon that carry /
the bubonic plague /
#accidentalhaiku by @luckyenoughlin1 
https://t.co/bW8X9tBRDp

apparently he /
donated it to his own /
foundation - #taxdodge /
#accidentalhaiku by @woolkebb 
https://t.co/YNnXKnzul3

Really Ain't Tryna /
Go To Buffalo Wild Wings /
But Imma Have To /
#accidentalhaiku by