# Web 4: More Flask (review)

- `@` operator is called a "decorator"
- `flask.Response`: enables us to create a response object instance
    - Arguments: `str` representing reponse, `headers` dict representing metadata, `status` representing status code.
    - ex: 
    ```python
    flask.Response("<b>go away</b>",
                              status=429,
                              headers={"Retry-After": "3"})
    ```
    
    ```python
    flask.Response("""User-Agent: *
    Disallow: /never
    """, headers={"Content-Type": "text/plain"})
    ```

- `flask.request.remote_addr`: enables us to take action based on the IP address from which we receive the request
- `flask.request.args`: enables us to get the arguments passed as part of the URL
    - How do we pass arguments?
        - at the end of the URL, add a "?"
        - then separate argument-value pair by "="
        - use "&" as delimiter between two argument-value pairs
    - examples: 
        - http://34.69.204.31:5000/add?x=10&y=20
        - http://34.69.204.31:5000/survey?major=CS
        - http://34.69.204.31:5000/survey?major=Mechanical Engineering

In [1]:
import requests
import time
import urllib.robotparser

import pandas as pd
# new import statement: requires pip3 install scipy
from scipy import stats

### Rate-limited webpage parsing

- `requests` module:
    - `resp = requests.get(<URL>)` method: enables us to send HTTP GET request
    - `resp.status_code`: status code of the response
    - `resp.text`: `str` text content of the response
    - `resp.headers`: `dict` content of response headers

In [2]:
base_url = "http://34.69.204.31:5000/"

In [3]:
def friendly_get(url):
    while True:
        resp = requests.get(url)
        if resp.status_code == 429:
            seconds = int(resp.headers.get("Retry-After", 1))
            print(f"sleep {seconds}")
            time.sleep(seconds)
            continue
        resp.raise_for_status() # raise exception if not 200
        return resp
    
friendly_get(base_url + "slow").text

'welcome!'

### `urllib.robotparser`

- Documentation: https://docs.python.org/3/library/urllib.robotparser.html

In [4]:
rp = urllib.robotparser.RobotFileParser()
rp.set_url(base_url + "/robots.txt")
rp.read()
# Incorrect verion
rp.can_fetch("cs320bot", base_url + "/slow") # extra /

True

In [5]:
# Incorrect version
rp.can_fetch("cs320bot", base_url + "/never") # extra /

True

In [6]:
# Correct versions
print(rp.can_fetch("cs320bot", base_url + "slow"))
print(rp.can_fetch("cs320bot", base_url + "never"))

True
False


# Web 5: A/B testing

In [7]:
df = pd.DataFrame({
    "click":    {"A": 50, "B": 55},
    "no-click": {"A": 50, "B": 45}
})
df
# Which has the higher CTR A or B?

Unnamed: 0,click,no-click
A,50,50
B,55,45


In [8]:
_, pvalue = stats.fisher_exact(df)
pvalue 
# no evidence that A and B are difference because pvalue is not less than 5%

0.5712421394829712

### Two situations when pvalue will be lower than significance threshold

1. Sample size is the same, but skew is very heavy --- unlikely to have that by chance
2. Sample size is large, but skew is small 

In [9]:
# Scenario 1: 
# Sample size is the same, but skew is very heavy --- 
# unlikely to have that by chance

df = pd.DataFrame({
    "click":    {"A": 50, "B": 75},
    "no-click": {"A": 50, "B": 25}
})
_, pvalue = stats.fisher_exact(df)
pvalue

0.00042033045869994034

In [10]:
# Scenario 2: 
# Sample size is large, but skew is small 

df = pd.DataFrame({
    "click":    {"A": 500, "B": 550},
    "no-click": {"A": 500, "B": 450}
})
_, pvalue = stats.fisher_exact(df)
pvalue

0.02820356890423392