# Exercises due by EOD 2018.11.02

## goal

in this homework assignment we will practice various goals related to communication using web `requests` and to `merge` a `branch` using `git`

## method of delivery

as mentioned in our first lecture, the method of delivery may change from assignment to assignment. we will include this section in every assignment to provide an overview of how we expect homework results to be submitted, and to provide background notes or explanations for "new" delivery concepts or methods.

this week you will be submitting the results of your homework via an email to **BOTH** Zach (rzl5@georgetown.edu) and Carlos (chb49@georgetown.edu) titled "2018.10.19 answers", as well as commits to your `gu511_git_hw` on `github`

summary:

| exercise | deliverable | method of delivery |
|----------|-------------|--------------------|
| 1 | a `python` file `christopher_walkin.py` | attached to your submission email |
| 2 | a file `xpath_and_css.csv` | attached to your submission email |
| 3 | a file `hacker_news_selectors.csv` | attached to your submission email |
| 4 | a file `I_POST_the_gist.py` | attached to your submision email |
| 5 | a `merge` commit | `commit` is `push`ed to `github` |
| 6 | a `google` survey | fill it out online |

# exercise 1: google maps `json api`

let's use [the google maps directions `api`](https://developers.google.com/maps/documentation/directions/start) (itself one of [many, many open `api`s from google](https://developers.google.com/maps/documentation/)) to calculate the travel time from any arbitrary location to the Washington monument



## 1.1: not spending a billion dollars

before using any `api`, you should double-check the usage limits and pricing model. `google` is usually pretty friendly to small hacking projects, so we should be okay -- but let's make sure.

head over to [the directions `api` usage limits documentation](https://developers.google.com/maps/documentation/directions/usage-limits) and determine how `google` charges users for `api` calls. try and answer the following questions:

1. is any amount of access free? or does it all cost some amount of money?
1. who is charged, the user or someone else?
1. how does `google` know who to charge?


**actually think about it!**

thought about it? here are the answers:

1. `google` charges users 0.005 USD per each `api` call
1. I have created an account and am using the 300 USD promotional credit that comes with it -- this means we collectively have 60,000 calls left before I start paying. please don't do this homework question 1000s of times.
1. `google` associates requests to an individual's `google` account by using an **api key** -- a unique string



## 1.2: getting an `api` key

many applications require users to create `api` keys as a way of authenticating requests. this serves several purposes:

1. security: if you notice malicious actors, you have a way of shutting them down immediately
2. audit: you have a record of exactly who made every request, which is useful for retrospective analysis of usage, malicious action, etc.
3. throttling and resource management: you can identify major consumers of your resource and quantify how much they are taxing your resources
4. the big one: **billing**. in this instance, the key tells `google` who to charge

let's head to the [google api key page](https://developers.google.com/maps/documentation/directions/get-api-key) (**don't** press any buttons yet, folks) and read up a bit on it. if you wanted to use the api, you'd need to *activate* it for your account and get an *api key* from `google`.

in the email announcing this assignment, I sent you my `api` key. I considered posting it as a `gist` but decided against it in order to keep it at least semi-private. *this is a password!!* -- it really shouldn't be shared, let alone public. this is a violation of a best practice to save you all a few bucks!

if you are interested in spending a fraction of a cent to set up *your own* api usage, send me an email at rzl15@georgetown.edu



## 1.3: read the flipping manual

really, go read it: https://developers.google.com/maps/documentation/directions/intro


## 1.4: making a simple `GET` request: browser

the example directions request in the documentation is

```
https://maps.googleapis.com/maps/api/directions/json?origin=Toronto&destination=Montreal&key=YOUR_API_KEY
```

replace `YOUR_API_KEY` above with the `api` key value I emailed to you and try launching it in your web browser. please don't refresh this 1000 times.

you should receive a `json` representation of the path that `google` recommends you use to travel (`driving`, by default) from Toronto to Montreal


## 1.5: making a simple `GET` requests: `python` browser

translate the above url into a `python requests` library `GET` request. verify that the `json()` element of the returned response item is the same as the `json` that is returned via the browser. that is, write code like

```python
response = # FILL THIS PART IN

response.json()  # look at this
```

the thing you get from `response.json()` should be a python object (`list`s and `dict`s). figure out how to take the compound `dict` and `list` object the `requests` library returns and extract the numerical `value` of the `duration` of the trip (the duration of the trip in seconds). depending on the route `google` recommends, this could change slightly. when I ran it just now, I saw

```python
...
duration: {
    text: "5 hours 25 mins",
    value: 19510
},
...
```

you should be able to get the numeric value assocated with that `value` key, and it should be on the order of 19,000 - 20,000.



## 1.6: making a simple function

suppose we love throwing away all that beautiful data, and the only thing we care about is the trip duration in seconds. furthermore, let's assume we

1. always want to walk (not drive)
2. are always heading to "The Washington Monument" (whatever and wherever google thinks that is)

fill in the body of the below function, and save the contents in a file `christopher_walkin.py`:

```python
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
Module: christopher_walkin.py
"""

import requests


def walk_to_washington_monument(origin, apikey):
    """take an origin string (an address, place id, or lat,lon pair (even 
    lat,lon is a string)) and an api key, and return the time it would take to 
    *walk* from there. the destination parameter has value
    "The Washington Monument"
    
    """
    # make the request (destination is "The Washington Monument", mode is walking)
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    # extract the entire json dictionary from the response object we received
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    # extract the duration from the json response dictionary
    #---------------#
    # FILL ME IN!!! #
    #---------------#
    
    return duration
```

once you have this general function working, try it out. for example, open a `python` session and try

```python
from christopher_walkin import walk_to_washington_monument

walk_to_washington_monument('US Capital Building', apikey=YOUR_API_KEY)
walk_to_washington_monument('New York, NY', apikey=YOUR_API_KEY)
```

do the answers you get make sense? could you actually walk from NY, NY to DC in that amount of time (remember, time here is reported in seconds)? if not, you may want to double-check your function definition.


### 1.6.1: a static unit test

try running

```python
walk_to_washington_monument("St Mary's Hall Georgetown", apikey=YOUR_API_KEY)
```

you should get something close to (but not necessarily exactly) 3679.

do you? if not, you might want need to double check your  investigate your function further

I considered making the above test (distance from St. Mary's to the monument) into a "unit test" -- that is, I was going to demand that any function you wrote returned a certain value as a requirement for being correctly defined.

this is a best practice for developing software -- develop specifications which enumerate a concrete, permanent behavior, and make sure that your functions deterministically reproduce that expected behavior every time.

after thinking about it, though, I realized this was actually not a good unit test. 

why might this be a bad unit test?


###### attach `christopher_walkin.py` to your submission email

# exercise 2: `xpath` and `css` selectors in a controlled environment

take the following `html` document (also [available via `s3`](https://s3.amazonaws.com/shared.rzl.gu511.com/example.html) if you want to use chrome or firefox Inspect mode):

```html
<html>
    <head></head>
    <body>
        <div id="tablediv">
            <table id="important_table" class="very_pretty">
                <thead>
                    <tr>
                        <th>column a</th>
                        <th>column b</th>
                        <th>column c</th>
                    </tr>
                </thead>
                <tbody>
                    <tr class="oddrow">
                        <td>1</td>
                        <td>4</td>
                        <td>5</td>
                    </tr>
                    <tr class="evenrow">
                        <td>0</td>
                        <td>2</td>
                        <td>4</td>
                    </tr>
                </tbody>
            </table>
            <ul>
                <li>just to be tricky</li>
            </ul>
        </div>
        <div>
            <ul class="very_pretty">
                <li>hello</li>
                <li class="active">world</li>
            </ul>
            <ol class="kinda_ugly">
                <li>howya</li>
                <li class="inactive">doin</li>
            </ol>
        </div>
    </body>
</html>
```

you can create an `lxml` object of this webpage with the code

```python
import lxml.html
import requests

response = requests.get('https://s3.amazonaws.com/shared.rzl.gu511.com/example.html')
root = lxml.html.fromstring(response.text)
```

in the following, there are no trick questions. there will always be at least one element selected in 1 and 2, and at least one valid path in 3 and 4. also, remember that you can enter these `xpath` (chrome only) and `css selector` (chrome and firefox (firefox has tab complete!)) expressions directly in the developer tools (highlight the html elements window and press `Ctrl + F` or `Command + F`). check the number of matches and to cycle through them


## 2.1: selecting with `xpath`

for each of the below `xpath` expressions, idenfity the number of elements matched by that expression:

1. `/html/body/div/ul`
2. `/html/body/div/ul/li`
3. `/html/body/div/*/li`
4. `/html/body/div/*/li[@class]`
5. `/html/body/div/*/li[@class="active"]`


## 2.2: selecting with `css` selectors

for each of the below `css` selectors, identify the number of elements matched by that expression:

1. `tr`
2. `tr.evenrow`
3. `#important_table`
4. `.very_pretty`
5. `div > ul`


## 2.3: developing `xpath` expressions

for each of the below, develop the appropriate `xpath` expression

1. use an *absolute* path with an `attr=val` check to select the element `<li class="active">world</li>`
2. use a non-absolute path to select that same element `<li class="active">world</li>` which uses the `class` attribute
3. select all `td` elements
4. select all `td` element in a row with `class="evenrow"`
5. select the `<table id="important_table" class="very_pretty">` element using its `class` attribute
6. select the `<table id="important_table" class="very_pretty">` element using its `id` attribute


## 2.4: developing `css` selectors

now for each of the below, develop the appropriate `css` selector

1. use a *direct descendant* and a class indicator to select the element `<ul class="very_pretty">`
2. use an *any descendant* and an id indicator to select the four `<li>` elements in the *second* `div` block
3. select all `td` elements
4. select all `td` element in a row with `class="evenrow"`
5. select the `<table id="important_table" class="very_pretty">` element using its `class` attribute
6. select the `<table id="important_table" class="very_pretty">` element using its `id` attribute


## 2.5: deliverable

fill in the following table, save it as a `csv` with name `xpath_and_css.csv`

| exercise | answer |
|----------|--------|
| 2.1.1    | ? |
| 2.1.2    | ? |
| 2.1.3    | ? |
| 2.1.4    | ? |
| 2.1.5    | ? |
| 2.2.1    | ? |
| 2.2.2    | ? |
| 2.2.3    | ? |
| 2.2.4    | ? |
| 2.2.5    | ? |
| 2.3.1    | ? |
| 2.3.2    | ? |
| 2.3.3    | ? |
| 2.3.4    | ? |
| 2.3.5    | ? |
| 2.3.6    | ? |
| 2.4.1    | ? |
| 2.4.2    | ? |
| 2.4.3    | ? |
| 2.4.4    | ? |
| 2.4.5    | ? |
| 2.4.6    | ? |

##### attach the file `xpath_and_css.csv` to your submission email

# exercise 3: `xpath` and `css` selectors in the wild

let's construct several `xpath` expressions and `css` selectors to isolate elements on [the hacker news homepage](https://news.ycombinator.com/).

this diagram

<br><div align="center"><img src="http://drive.google.com/uc?export=view&id=0ByQ4VmO-MwEEd3hZN0xNSHV2WE0" width="700px"></div><br>

([link here](https://drive.google.com/file/d/0ByQ4VmO-MwEEd3hZN0xNSHV2WE0/view?usp=sharing)) highlights the four elements of each news article entry that we are looking to obtain.

in this excercise our goal is to write `xpath` and `css` that would help us parse this text, so I am looking for the `html` element which is "closest" to the text, i.e. directly contains the text. for example, if I wanted to find "has text in here" in the following:

```html
<div>                        <!-- NOT this -->
    <p>has text in here</p>  <!-- YES this -->
</div>
```

in the table below, I have listed the number and color on that diagram, a description of what that piece of information is, and the text I am talking about from that image.

of these, every example should have 1, 2, and 4. almost all articles will have 3. this means that a successful `xpath` or `css` selector statement will have no more than 30 hits if you do a `Ctrl + f` search in inspect mode, or a `xpath` or `cssselect` in `python` (with possibly one or two less than that for score).

fill in the `xpath` and `css` selector columns of the table below and save it as a `csv` named `hacker_news_selectors.csv`


| number | color  | description      | example                   | `xpath` | `css` selector |
|--------|--------|------------------|---------------------------|---------|----------------|
| 1      | red    | article title    | "If macOS High Sierra..." | ?       | ?              |
| 2      | blue   | article source   | "apple.com"               | ?       | ?              |
| 3      | orange | number of points | "167 points"              | ?       | ?              |
| 4      | green  | age of the post  | "55 minutes ago"          | ?       | ?              |


##### attach your `csv` to your submission email

# exercise 4: `POST` a `github gist`

we are going to use the [`github` api](https://developer.github.com/v3/gists/#create-a-gist) to `POST` a gist to our `github` accounts. use this `api` and the `python` `requests` library to create a `gist` with the following properties

1. it is public
1. it contains a file called `I_GET_the_gist.txt`
1. it has a description `look at this one, carlos`

beyond that, the `gist` contents can be anything you want (e.g. `hello world` or `hey carlos python >>> R`, he'll love that).

save the following as a `python` file named `I_POST_the_gist.py` and fill in the `FILL ME IN` blocks

```python
import getpass

import requests


def main():
    username = input('github username: ')
    pw = getpass.getpass('password: ')

    # -------------- #
    # FILL ME IN !!! #
    # -------------- #

    assert (
        (resp.status_code == 201)
        or (
            resp.status_code == 401
            and resp.json()['message'] == 'Must specify two-factor authentication OTP code.'
        )
    )

    # don't need to return anything, after you've posted just exit


if __name__ == '__main__':
    main()
```

##### attach your version of `I_POST_the_gist.py` to your submission email

# exercise 5: `merge` your `pipeline` branch into `master`

use [`git merge`](https://git-scm.com/docs/git-merge) to `merge` the changes that you have been tracking on the `pipeline` branch into `master`. make your `merge` `commit` message by `bringing pipeline development into master branch`. then `push` the updated `master` `branch` to `github`

*hint: if you're not sure, read the docs above to figure out which branch you should have checked out and which branch name you should include in your `git merge` call*

# exercise 6: fill out a mid-year course survey

I'd like your feedback on the course so far -- please fill out the form at https://goo.gl/forms/ZjmIXhwVN5EUMb1n1. this is 100% anonymous and not mandatory