<h1>API's and Web Scraping</h1> 

## Get Requests
- 200 - Everything went okay, and the server returned a result (if any).
- 301 - The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint's name has changed.
- 401 - The server thinks you're not authenticated. This happens when you don't send the right credentials to access an API (we'll talk about this in a later mission).
- 400 - The server thinks you made a bad request. This can happen when you don't send the information the API requires to process your request, among other things.
- 403 - The resource you're trying to access is forbidden; you don't have the right permissions to see it.
- 404 - The server didn't find the resource you tried to access.


In [None]:
response = requests.get("http://api.open-notify.org/iss-now.json")
status_code = response.status_code

response = requests.get("http://api.open-notify.org/iss-pass")
status_code = response.status_code

## Adding Query Parameters ##
Best to use dictionaries to do this (handles formatting etc)

In [None]:
# Set up the parameters we want to pass to the API, This is the latitude and longitude of New York City.
parameters = {"lat":37.78, "lon":-122.41}

response = requests.get("http://api.open-notify.org/iss-pass.json",params=parameters)
content = response.content

print(response.content)


## JSON Format ##
JSON is the primary format for sending and receiving data through APIs. The JSON library has two main methods:
-  dumps -- Takes in a Python object, and converts it to a string
- loads -- Takes a JSON string, and converts it to a Python object

In [None]:
# Make a list of fast food chains.
best_food_chains = ["Taco Bell", "Shake Shack", "Chipotle"]
print(type(best_food_chains))

import json

# Use json.dumps to convert best_food_chains to a string.
best_food_chains_string = json.dumps(best_food_chains)
print(type(best_food_chains_string))

# Convert best_food_chains_string back to a list.
print(type(json.loads(best_food_chains_string)))

# Make a dictionary
fast_food_franchise = {
    "Subway": 24722,
    "McDonalds": 14098,
    "Starbucks": 10821,
    "Pizza Hut": 7600
}

# We can also dump a dictionary to a string and load it.
fast_food_franchise_string = json.dumps(fast_food_franchise)
print(type(fast_food_franchise_string))

fast_food_franchise_2 = json.loads(fast_food_franchise_string)

## Getting JSON From a Request ##

In [None]:
# Make the same request we did two screens ago.
parameters = {"lat": 37.78, "lon": -122.41}
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)

# Get the response data as a Python object.  Verify that it's a dictionary.
json_data = response.json()
print(type(json_data))
print(json_data)
first_pass_duration = json_data["response"][0]["duration"]


## Content Type ##

In [None]:
# Headers is a dictionary
print(response.headers)
content_type = response.headers["content-type"]

## API Authentication ##

The token is a string that the API can read and associate with your account.
Using a token is preferable to a username and password for a few reasons:
* Typically, you'll be accessing an API from a script. If you put your username and password in the script and someone manages to get their hands on it, they can take over your account. In contrast, you can revoke an access token to cancel an unauthorized person's access if there's a security breach.
* Access tokens can have scopes and specific permissions. For instance, you can make a token that has permission to write to your GitHub repositories and make new ones. Or, you can make a token that can only read from your repositories. Using read-access-only tokens in potentially insecure or shared scripts gives you more control over security.



In [None]:
# Create a dictionary of headers containing our Authorization header.
headers = {"Authorization": "token 1f36137fbbe1602f779300dad26e4c1b7fbab631"}

# Make a GET request to the GitHub API with our headers. This API endpoint will give us details about Vik Paruchuri.
response = requests.get("https://api.github.com/users/VikParuchuri", headers=headers)

# this token corresponds to the account of Vik Paruchuri.
print(response.json())

response = requests.get("https://api.github.com/users/VikParuchuri/orgs",headers=headers)
orgs = response.json()

## Rate Limiting:

In order to ensure that it remains available and responsive for all users, an API will prevent you from making too many requests in too short a time. We call this restriction rate limiting.

## Pagination ##

In [None]:
params = {"per_page": 50, "page": 1}
response = requests.get("https://api.github.com/users/VikParuchuri/starred", headers=headers, params=params)
page1_repos = response.json()

params["page"] = 2
response = requests.get("https://api.github.com/users/VikParuchuri/starred", headers=headers, params=params)
page2_repos = response.json()

## POST Requests ##
To create an object at the endpoint. A successful POST request will usually return a 201 status code indicating that it was able to create the object on the server. Sometimes, the API will return the JSON representation of the new object as the content of the response.


In [None]:
# Create the data we'll pass into the API endpoint.  While this endpoint only requires the "name" key, there are other optional keys.
payload = {"name": "test"}

# We need to pass in our authentication headers!
response = requests.post("https://api.github.com/user/repos", json=payload, headers=headers)
print(response.status_code)

payload["name"] = "learning-about-apis"

response = requests.post("https://api.github.com/user/repos", json=payload, headers=headers)
status = response.status_code

## PUT/PATCH Requests ##
To update an existing object. A successful PATCH request will usually return a 200 status code.

In [None]:
payload = {"description": "The best repository ever!", "name": "test"}
response = requests.patch("https://api.github.com/repos/VikParuchuri/test", json=payload, headers=headers)
print(response.status_code)

payload["name"] = "learning-about-apis"
payload["description"] = "Learning about requests!"
response = requests.patch("https://api.github.com/repos/VikParuchuri/learning-about-apis",json=payload, headers=headers)
status = response.status_code

## DELETE Requests ##
To remove an existing object from the server. A successful DELETE request will usually return a 204 status code indicating that it successfully deleted the object

In [None]:
response = requests.delete("https://api.github.com/repos/VikParuchuri/test", headers=headers)
print(response.status_code)

response = requests.delete("https://api.github.com/repos/VikParuchuri/learning-about-apis", headers=headers)
status = response.status_code

## Web Scraping 

In [None]:
response = requests.get("http://dataquestio.github.io/web-scraping-pages/simple.html")
content = response.content

## Retrieving Elements from a Page ##

In [None]:
from bs4 import BeautifulSoup

# Initialize the parser, and pass in the content we grabbed earlier.
parser = BeautifulSoup(content, 'html.parser')

# Get the body tag from the document.
# Since we passed in the top level of the document to the parser, we need to pick a branch off of the root.
# With BeautifulSoup, we can access branches by using tag types as attributes.
body = parser.body

# Get the p tag from the body.
p = body.p

# Print the text inside the p tag. Text is a property that gets the inside text of a tag.
print(p.text)

# Get the text inside the title tag, and assign the result to title_text.
header = parser.head
title_text = header.title.text

## Get BODY tags ##

In [None]:
parser = BeautifulSoup(content, 'html.parser')

# Get a list of all occurrences of the body tag in the element.
body = parser.find_all("body")

# Get the paragraph tag.
p = body[0].find_all("p")

# Get the text.
print(p[0].text)

title = parser.find_all("title")
title_text = title[0].text


## Using Element ID's'

In [None]:
# Get the page content and set up a new parser.
response = requests.get("http://dataquestio.github.io/web-scraping-pages/simple_ids.html")
content = response.content
parser = BeautifulSoup(content, 'html.parser')

# Pass in the ID attribute to only get the element with that specific ID.
first_paragraph = parser.find_all("p", id="first")[0]
print(first_paragraph.text)

second_paragraph_text = parser.find_all("p", id="second")[0].text

## Using Element Classes

In [None]:
# Get the website that contains classes.
response = requests.get("http://dataquestio.github.io/web-scraping-pages/simple_classes.html")
content = response.content
parser = BeautifulSoup(content, 'html.parser')

# Get the first inner paragraph.
# Find all the paragraph tags with the class inner-text.
# Then, take the first element in that list.
first_inner_paragraph = parser.find_all("p", class_="inner-text")[0]
print(first_inner_paragraph.text)
second_inner_paragraph_text = parser.find_all("p", class_="inner-text")[1].text
first_outer_paragraph_text = parser.find_all("p", class_="outer-text")[0].text



## Using CSS Selectors ##

In [None]:
# Get the website that contains classes and IDs.
response = requests.get("http://dataquestio.github.io/web-scraping-pages/ids_and_classes.html")
content = response.content
parser = BeautifulSoup(content, 'html.parser')

# Select all of the elements that have the first-item class.
first_items = parser.select(".first-item")

# Print the text of the first paragraph (the first element with the first-item class).
print(first_items[0].text)

first_outer_text = parser.select(".outer-text")[0].text
second_text = parser.select("#second")[0].text


## Using Nested CSS Selectors ##

In [None]:
# Get the Superbowl box score data.
response = requests.get("http://dataquestio.github.io/web-scraping-pages/2014_super_bowl.html")
content = response.content
parser = BeautifulSoup(content, 'html.parser')

# Find the number of turnovers the Seahawks committed.
turnovers = parser.select("#turnovers")[0]
seahawks_turnovers = turnovers.select("td")[1]
seahawks_turnovers_count = seahawks_turnovers.text
print(seahawks_turnovers_count)


# Find the Total Plays for the New England Patriots
total_plays = parser.select("#total-plays")[0]
patriots_total_plays_count = total_plays.select("td")[2].text

# Find the Total Yards for the Seahawks
total_yards = parser.select("#total-yards")[0]
seahawks_total_yards_count = total_yards.select("td")[1].text

## CSS

This CSS will make all of the text inside all paragraphs red:

    p{
        color: red
    }

This CSS will change the text color to red for any paragraphs that have the class inner-text. We select classes with the period or dot symbol (.):

    p.inner-text{
    color: red
     }

This CSS will change the text color to red for any paragraphs that have the ID first. We select IDs with the pound or hash symbol (#):

    p#first{
        color: red
     }

You can also style IDs and classes without using any specific tags. For example, this CSS will make the element with the ID first red (not just paragraphs):

    #first{
        color: red
     }

This CSS will make any element with the class inner-text red:

    .inner-text{
        color: red
     }

Working with CSS selectors:

    first_outer_text = parser.select(".outer-text")[0].text
    second_text = parser.select("#second")[0].text

Nested Selectors:

This selector will target any paragraph inside a div tag:

    div p

This selector will target any item inside a div tag that has the class first-item:

    div .first-item

This one is even more specific. It selects any item that's inside a div tag inside a body tag, but only if it also has the ID first:

    body div #first

This selector zeroes in on any items with the ID first that are inside any items with the class first-item:

    .first-item #first

<h3>Summary Challenge</h3>

In [None]:
## 2. Authenticating with the API ##

headers = {"Authorization": "bearer 13426216-4U1ckno9J5AiK72VRbpEeBaMSKk", "User-Agent": "Dataquest/1.0"}
params = {"t": "day"}

response = requests.get("https://oauth.reddit.com/r/python/top", headers=headers, params = params)
python_top = response.json()

## 3. Getting the Most Upvoted Post ##

# get a list of dictionaries (articles)
python_top_articles = python_top["data"]["children"]

most_upvoted = ""
most_upvotes = 0

for article in python_top_articles:
    
    # the data is in a dictionary within each (article) dictionary!
    art = article["data"]
    
    if art["ups"] >= most_upvotes:
        most_upvoted = art["id"]
        most_upvotes = art["ups"]

## 4. Getting Post Comments ##

# Get all of the comments on the /r/python subreddit's top post from the past day

URL = "https://oauth.reddit.com/r/python/comments/4b7w9u"

response = requests.get(URL,headers=headers)
comments = response.json()


## 5. Getting the Most Upvoted Comment ##

comments_list = comments[1]["data"]["children"]

most_upvoted_comment = ""
most_upvotes = 0

for comment in comments_list:
    comm = comment["data"]
    
    if comm["ups"] >= most_upvotes:
        most_upvoted_comment = comm["id"]
        most_upvotes = comm["ups"]
        


## 6. Upvoting a Comment ##
payload = {"dir": 1, "id": "d16y4ry"}
headers = {"Authorization": "bearer 13426216-4U1ckno9J5AiK72VRbpEeBaMSKk", "User-Agent": "Dataquest/1.0"}

response = requests.post("https://oauth.reddit.com/api/vote", json=payload, headers=headers)
status = response.status_code