# Section 1 Talking to the Internet

The goal of this section is to get hands on exposure to how data collection can look like leveraging Python. We will use
the `requests` library to communicate with the internet and use public data sources to extract data.

## What Happens When We Go to a Website?

We will first explore briefly what takes place when we make requests over the internet. This base level knowledge will let us make more complicated requests in the future.

To start, let's use the `requests` library to make a request to Google.

In [2]:
import requests

response = requests.get('https://www.google.com')

Lets break down what is happening in the lines of code above. The code is using the `get` function associated with the `requests` library. "Get" actually has a special meaning in this context.

The first piece to understand is that the internet as we understand it is built ontop of the protocol "HTTP". "HTTP" is an agreed upon way that computers can communicate. The protocol was designed around the idea of a "client-server" architecture. That means that a computer (the client) can open a channel and make requests against another computer (the server).

HTTP defines a few "methods" which represent different kinds of operations that are supported, a few of interest are listed below.

| Method | Purpose                                   |
|--------|-------------------------------------------|
| GET    | Request information from the server       |
| POST   | Send new information to the server        |
| PUT    | Update existing information on the server |
| DELETE | Delete some information on the server     |

There are others that you can find [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods), but we will mostly work with `GET` and `POST`.

So now we know the code is making an HTTP `GET` request which means we are requsting information from the server. To identify which server we want to "get" from, we provide the URL. URL's follow the basic structure

```
<protocol>://<domain>/<path>
```

Protocol in our case is either `http` or `https` (secure). The domain is how we find out where the requests should be sent to, and the path represents what specific information in the server we want to access. The protocol and domain are a firm part of the HTTP protocol, but the path is flexible and how it works depends on the server.

In summary, what we are doing above is

1. Making an HTTP "get" request
2. The request is following the HTTPS protocol (HTTP with encryption)
3. We are making the request against the domain "google.com"

Now lets look at what we get back

In [3]:
response

<Response [200]>

Ok so all that and what we get back is what seems like the number 200. 200 is actually good.

The HTTP protocol has a list of codes and what they represent. They help us figure out if the request was successful, and if it wasn't, why did it fail.

Codes in the range 200-299 represent a "Success response", so "200" means the server acknowledged our request and was able to successfully respond.

The full list of codes and their meanings can be found [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status).

Lets take a deeper look at what we got back.

In [None]:
response.content

So that looks ugly, but why?

What we got back is HTML, HTML is a language designed for representing websites, it is the backbone of web development and how we represent webpages. Our web browsers make these HTTP get requests, take in the HTML from the servers, and know how to render them as the websites we interact with every day.

For this seminar, we are going to look at servers that don't return HTML, but return something known as JSON.

JSON is "Javascript Object Notation" and is a way of representing data as text. Below is a basic object

```
{
    "name": "Collin Bolles",
    "age": 24,
    "occupation": "Software Engineer",
    "education": [
        {
            "type": "undergrad",
            "school": "RIT"
        },
        {
            "type": "masters",
            "school": "BU"
        }
    ]
}
```

Note the curly braces, quotes, and use of ":" to seperate keys and values.

Let's now take a look at an example website that returns JSON.

The national weather survice provides a RESTful API that returns information collected from different stations as JSON. Below is a sample code snippet that does just that.

In [None]:
base_url = 'https://api.weather.gov/stations/KBOS/observations/latest'
response = requests.get(base_url).json()
response

Lets walk through what is going on with this code snippet, and see what we can do with it.


Now try this out yourself. Based on how we got the wind speed, lets get the current temperature, then lets use that to determine what article of clothing is most appropriate. This problem will take more time, so break things up into pieces, make the request, get the tempature, use that to determine the correct clothing, then print out that article of clothing.

| Temperature Range | Clothing     |
|-------------------|--------------|
| < 40              | Winter Coat  |
| 40-50             | Sweat Shirt  |
| 50-65             | Light Jacket |
| > 65              | T-shirt      |

There are a ton of APIs with all kinds of data and functionality. A few are listed in the GitHub repository shown below.

https://github.com/public-apis/public-apis