# Perfect Request

[requests](https://requests.readthedocs.io/en/latest/) is *de facto* standard library used for HTTP requests with Python

In [1]:
import requests

Making basic HTTP requests doesn't require much code by defualt.

In [2]:
url = 'https://httpbin.org/get'

In [3]:
response = requests.get(url)

In [4]:
response.text

'{\n  "args": {}, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.27.1", \n    "X-Amzn-Trace-Id": "Root=1-637dfb31-4a47338b0c955df259a9beae"\n  }, \n  "origin": "85.128.82.242", \n  "url": "https://httpbin.org/get"\n}\n'

There are many more options that are useful for more advanced use cases.

### Passing Parameters

One way to pass parameters is to directly modify URL.

In [5]:
response = requests.get(url + '?key1=val1&key2=val2')

In [6]:
response.url

'https://httpbin.org/get?key1=val1&key2=val2'

But usually it is more convinient to pass the parameters as a dictionary and leave formatting the request to the library.

In [7]:
response = requests.get(url, params={'key1': 'val1', 'key2': 'val2'})

In [8]:
response.url

'https://httpbin.org/get?key1=val1&key2=val2'

### Timeouts

By default, there is no timeout on your requests, but you should use it for all production code.

In [9]:
response = requests.get(url, timeout=0.001)

ConnectTimeout: HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded with url: /get (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x1037112b0>, 'Connection to httpbin.org timed out. (connect timeout=0.001)'))

There are 2 timeouts we can use:  
  - **connection timeout** - number of seconds needed for client to establish connection to a remote server  
  - **read timeout** - number of seconds that the client will wait between bytes sent from the server
  
When only one value is passed as timeout agrument it is used for both timeouts.

Timeouts are application-specific, so it's hard to make general recommendations.
For connection timeout, it's a good practice to use slightly larger multiples of 3, which is the default [TCP packet retransmission window](https://www.hjp.at/doc/rfc/rfc2988.txt)

In [10]:
response = requests.get(url, timeout=(3.05, 10))

Check [requests documentation](https://requests.readthedocs.io/en/latest/user/advanced/#timeouts) for more information about timeouts.

### Proxies

Proxies can be specified for each individual request

```python
# not real proxies, just example

proxies = {
  'http': 'http://10.10.1.10:3128',
  'https': 'http://10.10.1.10:1080',
}

requests.get(url, proxies=proxies)
```

### Change User-Agent

Some websites display different content in the browser and different when queried by script. This is sometimes based on 'User-Agent' header and we may change it if needed.

In [11]:
# default headers
response = requests.get(url)

In [12]:
response.request.headers

{'User-Agent': 'python-requests/2.27.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

In [13]:
# custom headers
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
}
 
response = requests.get(url, headers=headers)

In [14]:
response.request.headers

{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

### Retries

Networks are unreliable and often require retrying requests. We can create retry logic ourselves, but this also supported by `requests` library. It is a bit more complicated and requires sessions.

In [15]:
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

In [16]:
session = requests.Session()

In [17]:
# uncomment to get more information about Retry object
# Retry?

In [18]:
retry = Retry(
    total=3,
    read=10,
    connect=3.05,
    backoff_factor=0.3,
    status_forcelist=(500, 502, 504),
)

`Retry` object specifies logic that should be applied for retrying requests.
We need to create use it with adapter that can be mounted on a session.

In [19]:
adapter = HTTPAdapter(max_retries=retry)

session.mount("http://", adapter)
session.mount("https://", adapter)

In [20]:
response = session.get(url)

## Sessions

There are many useful options in addition to basic requests. 
Specifiying all of them separately for each single request would be tidious and error-prone. In this case we can instead specify these options as a part of Session.

Sessions are the most useful for specifying timeouts and retries, but they are also used for other things like auth, cookies etc...

### Timeout Adapter

In [21]:
DEFAULT_TIMEOUT = (3.05, 10)

In [22]:
class TimeoutHTTPAdapter(HTTPAdapter):
    def __init__(self, *args, **kwargs):
        self.timeout = DEFAULT_TIMEOUT
        if "timeout" in kwargs:
            self.timeout = kwargs["timeout"]
            del kwargs["timeout"]
        super().__init__(*args, **kwargs)

    def send(self, request, **kwargs):
        timeout = kwargs.get("timeout")
        if timeout is None:
            kwargs["timeout"] = self.timeout
        return super().send(request, **kwargs)

### Perfect Session

Putting all the options together.

In [23]:
def my_session():
    retry = Retry(
        total=3,
        read=10,
        connect=3.05,
        backoff_factor=0.3,
        status_forcelist=(500, 502, 504),
    )
    
    session = requests.Session()
    adapter = TimeoutHTTPAdapter(timeout=(3.05, 10), max_retries=retry)
    
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    
    return session

In [24]:
perfect_session = my_session()

In [25]:
response = perfect_session.get(url)