Today we'll learn how to do some basic web interactions. The end
state will be you have an introductory knowledge towards requesting
data in your Python scripts from external websites. 

This will be based on Automate the Boring Stuff chapter 11 with some
supplementary material for saving that data from chapter 14.

Reference:
https://automatetheboringstuff.com/chapter11/
https://automatetheboringstuff.com/chapter14/

In [1]:
'''
Start with importing and doing a basic check of the webbrowser
module. Note that since you are already in jupyter notebook this
will just open a new tab.
'''
import webbrowser
webbrowser.open('https://www.google.com')

True

In [2]:
'''
You can take this to the next step by opening a more specific URL path
'''
address = "Washington DC"
webbrowser.open('https://www.google.com/maps/place/' + address)

True

In [3]:
'''
Often you can get what you are looking for by formatting a search with
the correct terms. Adjust this to do a plain search on google.com below.
This is essentially entry level URL dorking and you can use this in the
future to format more advanced search strings.

Before you change anything, take a note of the different style of comment
blocks.  The triple quote style can be used to enclose large chunks of
text while the hashtag you see below can only comment a single line.
'''
def lmgtfy(search):
    webbrowser.open('https://www.google.com/search?q=' + search)
    pass
    
lmgtfy("python learning")

Our next learning objective will be basic usage of the requests module.
Opening a web page in a browser usually isn't of value for your scripts.
Requests will let you programatically download a page and you can take
further actions on it in code.

Let's start by installing it.

In [None]:
!pip install requests

Import all modules you will need for the next section. You can read
a bit more for what you will need at the links below.

https://docs.python.org/3/library/csv.html

http://docs.python-requests.org/en/master/user/quickstart/#json-response-content

In [4]:
import csv
import os
import requests

Fetch the information you will need. Note you only need to do this
once as further work you will do will already be stored in the 'r'
variable.

In [5]:
# GET /eventsm
r = requests.get('https://api.github.com/events')

One of the key aspects of data science often comes down to understanding the data you are looking at. Let's take a moment to inspect three different syntaxes.

In [6]:
'''
Inspect full .json() method

Look very close at the results, note the first character is
a [ meaning the result is a list. Note the second character
is a { meaning each entry in the list is a dictionary.
'''
r.json()

[{'id': '9505016235',
  'type': 'PushEvent',
  'actor': {'id': 15008142,
   'login': 'wenxuans',
   'display_login': 'wenxuans',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/wenxuans',
   'avatar_url': 'https://avatars.githubusercontent.com/u/15008142?'},
  'repo': {'id': 178074109,
   'name': 'miaschoening1/ReconnectApp',
   'url': 'https://api.github.com/repos/miaschoening1/ReconnectApp'},
  'payload': {'push_id': 3542382603,
   'size': 12,
   'distinct_size': 1,
   'ref': 'refs/heads/master',
   'head': '973de68ecb3315edd0cbd04b876228df7ddd228f',
   'before': 'db41bd712d69aebe082ecd76fe287ed43115d9ba',
   'commits': [{'sha': 'd7392cea71239bb960cecd17e9607ad938653039',
     'author': {'email': 'wshi0810@gmail.com', 'name': 'Wendy'},
     'message': 'Set up Summary activity',
     'distinct': False,
     'url': 'https://api.github.com/repos/miaschoening1/ReconnectApp/commits/d7392cea71239bb960cecd17e9607ad938653039'},
    {'sha': 'f3528b6a0dc72343d7863e81ecc8361a961e1

In [7]:
'''
Knowing the data is a list, we can access index 0 for the first
element of the list.
'''
r.json()[0]

{'id': '9505016235',
 'type': 'PushEvent',
 'actor': {'id': 15008142,
  'login': 'wenxuans',
  'display_login': 'wenxuans',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/wenxuans',
  'avatar_url': 'https://avatars.githubusercontent.com/u/15008142?'},
 'repo': {'id': 178074109,
  'name': 'miaschoening1/ReconnectApp',
  'url': 'https://api.github.com/repos/miaschoening1/ReconnectApp'},
 'payload': {'push_id': 3542382603,
  'size': 12,
  'distinct_size': 1,
  'ref': 'refs/heads/master',
  'head': '973de68ecb3315edd0cbd04b876228df7ddd228f',
  'before': 'db41bd712d69aebe082ecd76fe287ed43115d9ba',
  'commits': [{'sha': 'd7392cea71239bb960cecd17e9607ad938653039',
    'author': {'email': 'wshi0810@gmail.com', 'name': 'Wendy'},
    'message': 'Set up Summary activity',
    'distinct': False,
    'url': 'https://api.github.com/repos/miaschoening1/ReconnectApp/commits/d7392cea71239bb960cecd17e9607ad938653039'},
   {'sha': 'f3528b6a0dc72343d7863e81ecc8361a961e18ce',
    'author': {'e

In [8]:
'''
Knowing that each element in the list is a dictionary we can access
it via the dictionary key.

Enumerage the login name of user (actor) mentioned above.
'''
r.json()[0]['actor']['login']

'wenxuans'

In [10]:
'''
You now have enough pieces to query multiple aspects of the data.
I'd like you to get three fields out of each element in the json()
list from our initial GET request.

The username: This is the 'login' key/value pair under the 'actor' key
The repo URL: This is the 'name' key/value pair under the 'repo' key
The entry time: This is the 'created_at' key

Hint: The repo URL can be shown by concatenating the github URL and name
like shown below:

'https://github.com/' + entry['repo']['name']

'''
cwd = os.path.abspath('.')
with open(os.path.join(cwd, "api_output.csv"), 'w+') as csvfile:
    csvwriter = csv.writer(csvfile, lineterminator = '\n')
    # Get a csv object to write to (hint use ATBS chapter 14)
    # NOT IMPLEMENTED YET, you need to set 'csvwriter = ... something ...")
    
    # Write out a header line
    # NOT IMPLEMENTED YET, easy one, just uncomment below and delete me")
    header = ['USERNAME', 'REPO_URL', 'ENTRY_TIME']
    csvwriter.writerow(header)
    
    # Write out each entry with the portions we care about into the CSV
    for entry in r.json():
        csvwriter.writerow([entry['actor']['login'], 'https://github.com/' + 
                           entry['repo']['name'], entry['created_at']])
        # NOT IMPLEMENTED YET, use writerow of the three fields
        # mentioned above into the CSV file

Alright, you got the gist of things now. Your next step is to identify
an API that you would like to interact with. Write some code to make a
query, inspect that result, and selectively pull out some information
you find interesting from it.

Many sites require an API key to retrive data. Even something as innocuous as a request to get the weather can be abused by someone
looking to get something for free. You can find a list of sites below
that offer a free unauthenticated API.

https://shkspr.mobi/blog/2016/05/easy-apis-without-authentication/

If you can't think of anything interesting here's two ideas, an easy
one and a harder one where you'll have to do a bit of research:

1. Query all Star Wars universe planets, inspect the raw data, note
   that the planets list different climates so iterate through the
   list of planets and only show the name of planets with 'temperate'
   in the 'climate' key. Use a requests GET on the following URL to
   get you started:
   
   https://swapi.co/api/planets/
   
2. Query the Google books API documented at the below link and show
   a summary of your most recent book.

   https://developers.google.com/books/

In [15]:
'''
NOT IMPLEMENTED YET - make it happen!
'''

r = requests.get('https://swapi.co/api/planets/')

In [16]:
r.json()

{'count': 61,
 'next': 'https://swapi.co/api/planets/?page=2',
 'previous': None,
 'results': [{'name': 'Alderaan',
   'rotation_period': '24',
   'orbital_period': '364',
   'diameter': '12500',
   'climate': 'temperate',
   'gravity': '1 standard',
   'terrain': 'grasslands, mountains',
   'surface_water': '40',
   'population': '2000000000',
   'residents': ['https://swapi.co/api/people/5/',
    'https://swapi.co/api/people/68/',
    'https://swapi.co/api/people/81/'],
   'films': ['https://swapi.co/api/films/6/', 'https://swapi.co/api/films/1/'],
   'created': '2014-12-10T11:35:48.479000Z',
   'edited': '2014-12-20T20:58:18.420000Z',
   'url': 'https://swapi.co/api/planets/2/'},
  {'name': 'Yavin IV',
   'rotation_period': '24',
   'orbital_period': '4818',
   'diameter': '10200',
   'climate': 'temperate, tropical',
   'gravity': '1 standard',
   'terrain': 'jungle, rainforests',
   'surface_water': '8',
   'population': '1000',
   'residents': [],
   'films': ['https://swapi.co/a

In [22]:
for entry in r.json()['results']:
    if 'temperate' in entry['climate']:
        print(entry['name'])
    else:
        print(entry['name'] + ' does not have a temperate climate.' )

Alderaan
Yavin IV does not have a temperate climate.
Hoth does not have a temperate climate.
Dagobah does not have a temperate climate.
Bespin
Endor
Naboo
Coruscant
Kamino
Geonosis does not have a temperate climate.


In [21]:
for entry in r.json()['results']:
     try:
        if int(entry['population']) >= 1000:
            print(entry['name'] + ' has a population greater than or equal to 1000')
        else:
            print(entry['name'] + ' has a population less than 1000.')
     except ValueError:
         print (entry['name'] + ' has an unknown population')

Alderaan has a population greater than or equal to 1000
Yavin IV has a population greater than or equal to 1000
Hoth has an unknown population
Dagobah has an unknown population
Bespin has a population greater than or equal to 1000
Endor has a population greater than or equal to 1000
Naboo has a population greater than or equal to 1000
Coruscant has a population greater than or equal to 1000
Kamino has a population greater than or equal to 1000
Geonosis has a population greater than or equal to 1000


In [14]:
cwd

'C:\\Users\\estyw\\Desktop\\Python Instruction\\python-instruction-master'