### Access data using API
* No need to construct the full dataset
* ready-prepared .csv files are not always available.
* Use request library to ask remote server to return .json files (JavaScript Object Notation)

#### Current position of the International Space Station through OPEN-API

In [3]:
import requests
# Make a get request to get the latest position of the ISS from the OpenNotify API.
response = requests.get("http://api.open-notify.org/iss-now.json")

status_code=response.status_code

#### Some status_code
* 200 - Everything went okay, and the server returned a result (if any).
* 301 - The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint's name has changed.
* 401 - The server thinks you're not authenticated. This happens when you don't send the right credentials to access an API (we'll talk about this in a later mission).
* 400 - The server thinks you made a bad request. This can happen when you don't send the information the API requires to process your request, among other things.
* 403 - The resource you're trying to access is forbidden; you don't have the right permissions to see it.
* 404 - The server didn't find the resource you tried to access.

In [5]:
response = requests.get("http://api.open-notify.org/iss-pass")
status_code=response.status_code
status_code

404

In [6]:
# Set up the parameters we want to pass to the API.
# This is the latitude and longitude of New York City.
parameters = {"lat": 37.78, "lon": -122.41}

# Make a get request with the parameters.
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)

# Print the content of the response (the data the server returned)
print(response.content)
content=response.content

# This gets the same data as the command above
response = requests.get("http://api.open-notify.org/iss-pass.json?lat=37.78&lon=-122.41")
print(response.content)

b'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1533099994, \n    "latitude": 37.78, \n    "longitude": -122.41, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 570, \n      "risetime": 1533101652\n    }, \n    {\n      "duration": 643, \n      "risetime": 1533107429\n    }, \n    {\n      "duration": 465, \n      "risetime": 1533113276\n    }, \n    {\n      "duration": 437, \n      "risetime": 1533161693\n    }, \n    {\n      "duration": 639, \n      "risetime": 1533167348\n    }\n  ]\n}\n'
b'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1533099994, \n    "latitude": 37.78, \n    "longitude": -122.41, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 570, \n      "risetime": 1533101652\n    }, \n    {\n      "duration": 643, \n      "risetime": 1533107429\n    }, \n    {\n      "duration": 465, \n      "risetime": 1533113276\n    }, \n    {\n      "duration": 437, \n    

### Everything .get is now a string (essentially JSON)
we need to use json library to dump or load the string back to usable forms in python (list, dictionary)

In [7]:
import json

In [8]:
# Make a list of fast food chains.
best_food_chains = ["Taco Bell", "Shake Shack", "Chipotle"]
print(type(best_food_chains))


# Use json.dumps to convert best_food_chains to a string.
best_food_chains_string = json.dumps(best_food_chains)
print(type(best_food_chains_string))

# Convert best_food_chains_string back to a list.
print(type(json.loads(best_food_chains_string)))

# Make a dictionary
fast_food_franchise = {
    "Subway": 24722,
    "McDonalds": 14098,
    "Starbucks": 10821,
    "Pizza Hut": 7600
}

# We can also dump a dictionary to a string and load it.
fast_food_franchise_string = json.dumps(fast_food_franchise)
print(type(fast_food_franchise_string))

fast_food_franchise_2=json.loads(fast_food_franchise_string)

<class 'list'>
<class 'str'>
<class 'list'>
<class 'str'>


In [9]:
response = requests.get("http://api.open-notify.org/astros.json")

in_space_count=response.content
print(in_space_count)

b'{"message": "success", "people": [{"craft": "ISS", "name": "Oleg Artemyev"}, {"craft": "ISS", "name": "Andrew Feustel"}, {"craft": "ISS", "name": "Richard Arnold"}, {"craft": "ISS", "name": "Sergey Prokopyev"}, {"craft": "ISS", "name": "Alexander Gerst"}, {"craft": "ISS", "name": "Serena Aunon-Chancellor"}], "number": 6}'


In [11]:
import json
response = requests.get("http://api.open-notify.org/astros.json")
cont=response.json()
in_space_count=cont['number']


### Use token and Github API

In [None]:
# Create a dictionary of headers containing our Authorization header.
headers = {"Authorization": "token 1f36137fbbe1602f779300dad26e4c1b7fbab631"}

# Make a GET request to the GitHub API with our headers.
# This API endpoint will give us details about Vik Paruchuri.
response = requests.get("https://api.github.com/users/VikParuchuri/orgs", headers=headers)

orgs=response.json()

# Print the content of the response.  As you can see, this token corresponds to the account of Vik Paruchuri.
print(response.json())

In [None]:
# We've loaded headers in.
headers = {"Authorization": "token 1f36137fbbe1602f779300dad26e4c1b7fbab631"}

response=requests.get("https://api.github.com/users/torvalds",headers=headers)
torvalds=response.json()

In [None]:
# Enter your answer here.
response=requests.get("https://api.github.com/repos/octocat/Hello-World",headers=headers)
hello_world=response.json()

#### Pagination
This isn't a great user experience, so it's typical for API providers to implement pagination. This means that the API provider will only return a certain number of records per page. You can specify the page number that you want to access. To access all of the pages, you'll need to write a loop.

In [None]:
params = {"per_page": 50, "page": 2}
response = requests.get("https://api.github.com/users/VikParuchuri/starred", headers=headers, params=params)
#page1_repos = response.json()


page2_repos= response.json()

In [None]:
# Enter your code here.
response=requests.get("https://api.github.com/user",headers=headers)
user=response.json()

In [None]:
# Create the data we'll pass into the API endpoint.  While this endpoint only requires the "name" key, there are other optional keys.
payload = {"name": "learning-about-apis"}

# We need to pass in our authentication headers!
response = requests.post("https://api.github.com/user/repos", json=payload, headers=headers)
status=response.status_code
print(status)

In [None]:
# Patch or Put
payload = {"description": "Learning about requests!", "name": "learning-about-apis"}
response = requests.patch("https://api.github.com/repos/VikParuchuri/learning-about-apis", json=payload, headers=headers)
status=response.status_code
print(status)

In [None]:
# Delete
response = requests.delete("https://api.github.com/repos/VikParuchuri/learning-about-apis", headers=headers)
status=response.status_code
print(status)

### Notes:
* not all APIs accept all options under requests
* Most common, request.GET > to retrieve data 
* Github API accepts requests.POST (to create new repo for https://api.github.com/user/repos endpoint) > return 201 if success
* requests.PATCH (attributes) and request.PUT (whole object) to update existing items > return 200 if success
* requests.DELETE > return 204 if success

### Reddit API challenge

In [None]:
headers={"Authorization": "bearer 13426216-4U1ckno9J5AiK72VRbpEeBaMSKk", "User-Agent": "Dataquest/1.0"}
params = {"t": "day"}
response=requests.get("https://oauth.reddit.com/r/python/top",headers=headers,params=params)
python_top=response.json()

In [None]:
# find the most upvoted post in the past day:
python_top_articles=python_top["data"]["children"]
most_upvoted = ""
most_upvotes = 0

for article in python_top_articles:
    ar = article["data"]
    if ar["ups"] >= most_upvotes:
        most_upvoted = ar["id"]
        most_upvotes = ar["ups"]

In [None]:
headers = {"Authorization": "bearer 13426216-4U1ckno9J5AiK72VRbpEeBaMSKk", "User-Agent": "Dataquest/1.0"}

response=requests.get("https://oauth.reddit.com/r/python/comments/4b7w9u",headers=headers)
comments=response.json()

In [None]:
comments_list=comments[1]['data']['children']

most_upvoted_comment = ""
most_upvotes = 0

for article in comments_list:
    ar = article["data"]
    if ar["ups"] >= most_upvotes:
        most_upvoted_comment = ar["id"]
        most_upvotes = ar["ups"]

In [None]:
headers = {"Authorization": "bearer 13426216-4U1ckno9J5AiK72VRbpEeBaMSKk", "User-Agent": "Dataquest/1.0"}

payload={'dir':1,'id':'d16y4ry'}
response=requests.post('https://oauth.reddit.com/api/vote',headers=headers,json=payload)

status=response.status_code

### Web scrapping

In [3]:
# Request to the html file
import requests
response=requests.get("http://dataquestio.github.io/web-scraping-pages/simple.html")
content=response.content

# Use BeautifulSoup to parse the html 
from bs4 import BeautifulSoup
# Initialize the parser, and pass in the content we grabbed earlier.
parser = BeautifulSoup(content, 'html.parser')

# Get the body tag from the document.
# Since we passed in the top level of the document to the parser, we need to pick a branch off of the root.
# With BeautifulSoup, we can access branches by using tag types as attributes.
body = parser.body
header=parser.head
# Get the p tag from the body.
p = body.p
t = header.title
# Print the text inside the p tag.
# Text is a property that gets the inside text of a tag.
title_text=t.text
print(t.text)

A simple example page


In [4]:
parser = BeautifulSoup(content, 'html.parser')

# Get a list of all occurrences of the body tag in the element.
body = parser.find_all("body")
head= parser.find_all("head")

# Get the paragraph tag.
p = body[0].find_all("p")
t = head[0].find_all("title")
# Get the text.
title_text=t[0].text
print(p[0].text)

Here is some simple content for this page.


In [5]:
# Get the page content and set up a new parser.
response = requests.get("http://dataquestio.github.io/web-scraping-pages/simple_ids.html")
content = response.content
parser = BeautifulSoup(content, 'html.parser')

# Pass in the ID attribute to only get the element with that specific ID.
first_paragraph = parser.find_all("p", id="first")[0]
print(first_paragraph.text)

second_paragraph = parser.find_all("p", id="second")[0]
second_paragraph_text = second_paragraph.text


                First paragraph.
            


In [6]:
# Get the website that contains classes.
response = requests.get("http://dataquestio.github.io/web-scraping-pages/simple_classes.html")
content = response.content
parser = BeautifulSoup(content, 'html.parser')

# Get the first inner paragraph.
# Find all the paragraph tags with the class inner-text.
# Then, take the first element in that list.
first_inner_paragraph = parser.find_all("p", class_="inner-text")[0]
print(first_inner_paragraph.text)


second_inner_paragraph_text=parser.find_all("p", class_="inner-text")[1].text

first_outer_paragraph_text=parser.find_all("p", class_="outer-text")[0].text


                First paragraph.
            


In [9]:
# We can use BeautifulSoup's .select method to work with CSS selectors. 
# Here's the HTML we'll be working with on this screen:
# Get the website that contains classes and IDs.
response = requests.get("http://dataquestio.github.io/web-scraping-pages/ids_and_classes.html")
content = response.content
parser = BeautifulSoup(content, 'html.parser')

# Select all of the elements that have the first-item class.
first_items = parser.select(".first-item")

first_outer_text = parser.select(".outer-text")[0].text
second_text = parser.select("#second")[0].text


# Print the text of the first paragraph (the first element with the first-item class).
print(first_items[0].text)


                First paragraph.
            


In [None]:
# Get the Superbowl box score data.
response = requests.get("http://dataquestio.github.io/web-scraping-pages/2014_super_bowl.html")
content = response.content
parser = BeautifulSoup(content, 'html.parser')

# Find the number of turnovers the Seahawks committed.
turnovers = parser.select("#turnovers")[0]
seahawks_turnovers = turnovers.select("td")[1]
seahawks_turnovers_count = seahawks_turnovers.text
print(seahawks_turnovers_count)

tplays = parser.select("#total-plays")[0]
patriots_tplays=tplays.select("td")[2]
tyards = parser.select("#total-yards")[0]
seahawks_tyards=tyards.select("td")[1]

patriots_total_plays_count=patriots_tplays.text

seahawks_total_yards_count=seahawks_tyards.text

#### Notes
* combine requests and beautifulSoup (parser.select use css like)
* Scrapy
