## Hypertext Transfer Protocol (HTTP)
---

### Introduction
---

HTTP is a request-response protocol designed to enable communications between clients (e.g. web browsers, bots, scripts) and servers (e.g. application on a computer that hosts a web site).

The client **requests**

The server **responds**

This is the foundation of data communication for the World Wide Web.

In [66]:
# In python, HTTP requests are supported by the "requests" module.
import socket
import requests

### HTTP Methods
---

- `GET`: used to request data from a specified resource, basically the only one fo interest for web scraping. They have length restrictions.
- `POST` and `PUT`: used to send data to a server to create/update a resource.
- `DELETE`: used to delete the specified resource.

For example, the client requests:

```
GET /index.html HTTP/1.1
Host: georgetown.edu
```

And the server responds with:

```HTTP/1.1 200 OK
Date: Mon, 23 May 2018 22:38:34 GMT
Content-Type: text/html; charset=UTF-8
Content-Encoding: UTF-8
Content-Length: 138
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
ETag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Connection: close```

```html
<html>
    <head>
      <title>An Example Page</title>
    </head>
    <body>
      Hello World, this is a very simple HTML document.
    </body>
</html>```

### HTTP status codes
---

In [74]:
url = 'https://georgetown.edu'
res = requests.get(url)
print('\nThe status code for "%s" is: %d' % (url, res.status_code))
print('The reason is: %s' % res.reason)


The status code for "https://georgetown.edu" is: 200
The reason is: OK


- `2xx`: Successful
    - e.g. `"200: OK"`	means that the request is OK (this is the standard response for successful HTTP requests)
    


In [75]:
url = 'http://getstatuscode.com/301'
res = requests.get(url)
print('\nThe status code for "%s" is: %d' % (url, res.status_code))
print('The reason is: %s' % res.reason)


The status code for "http://getstatuscode.com/301" is: 301
The reason is: Moved Permanently


- `3xx`: Redirection
    - e.g. `"301: Moved Permanently"` means that the requested page has moved to a new URL 



In [76]:
url = 'https://georgetown.edu/yanni-or-laurel'
res = requests.get(url)
print('\nThe status code for "%s" is: %d' % (url, res.status_code))
print('The reason is: %s' % res.reason)


The status code for "https://georgetown.edu/yanni-or-laurel" is: 404
The reason is: Not Found


- `4xx`: Client Error
    - e.g. `"400: Bad Request"` means that the request cannot be fulfilled due to bad syntax
    - e.g. `"401: Unauthorized"` means that the request was a legal request, but the server is refusing to respond to it. For use when authentication is possible but has failed or not yet been provided
    - e.g. `"403: Forbidden"` means that the request was a legal request, but the server is refusing to respond to it
    - e.g. `"404: Not Found"` means that the requested page could not be found but may be available again in the future
    - e.g. `"407: Proxy Authentication Required"` means that the client must first authenticate itself with the proxy
    - e.g. `"408: Request Timeout"` means that the server timed out waiting for the request
    - e.g. `"410: Gone"` means that the requested page is no longer available
    - e.g. `"414: Request-URI Too Long"` means that the server will not accept the request, because the URL is too long. Occurs when you convert a `POST` request to a `GET` request with a long query information  

In [77]:
url = 'https://httpstat.us/502'
res = requests.get(url)
print('\nThe status code for "%s" is: %d' % (url, res.status_code))
print('The reason is: %s' % res.reason)


The status code for "https://httpstat.us/502" is: 502
The reason is: Bad Gateway


- `5xx`: Server Error  
    - e.g. `"500: Internal Server Error"`, a generic error message, given when no more specific message is suitable
    - e.g. `"502: Bad Gateway"` means that the server was acting as a gateway or proxy and received an invalid response from the upstream server
    - e.g. `"503: Service Unavailable"` means that the server is currently unavailable (overloaded or down)