# Python and the Web

- https://code.tutsplus.com/es/tutorials/using-the-requests-module-in-python--cms-28204
- https://realpython.com/python-requests/

## Get HTML page content
In this section are examples how to get HTTP response with two different libraries:
* <a href="https://docs.python.org/3.4/library/urllib.html?highlight=urllib">urllib</a> (standard library in Python 3)
* <a href="http://docs.python-requests.org/en/master/">Requests</a> (instalable through pip)

In this tutorial is mainly used the Requests library, as a prefered option.



### Urlib2 library
Example how to get static content of web page with Urlib2 follows:

In [76]:
from urllib.request import urlopen

r = urlopen('http://www.python.org/')
data = r.read()

print("Status code:", r.getcode())

Status code: 200


The variable `data` contains returned HTML code (full page) as string. You can process it, save it, or do anything else you need.

In [None]:
#!pip install requests

In [1]:
import requests

## Display HTML

In [2]:
from IPython.core.display import display, HTML
display(HTML('<h1>Hello, world!</h1>'))

## Get and display a web page

In [3]:
response = requests.get('http://www.google.com')

## Explore response

In [74]:
req = requests.get('http://www.ironhack.com/')
 
print(req.encoding)     
print(req.status_code)   
print(req.elapsed)       
print(req.url)          
print(req.history)     

utf-8
200
0:00:00.095300
https://www.ironhack.com/en
[<Response [301]>, <Response [301]>]


## Headers

In [20]:
req.headers

{'Date': 'Sun, 19 Jul 2020 10:37:59 GMT', 'Server': 'mw1326.eqiad.wmnet', 'X-Content-Type-Options': 'nosniff', 'P3p': 'CP="See https://en.wikipedia.org/wiki/Special:CentralAutoLogin/P3P for more info."', 'Content-Language': 'en', 'Vary': 'Accept-Encoding,Cookie,Authorization', 'X-Request-Id': '7f4f2bea-c075-44a3-88e1-85a0974bd85c', 'Last-Modified': 'Fri, 17 Jul 2020 10:11:44 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Content-Encoding': 'gzip', 'Age': '41564', 'X-Cache': 'cp3050 miss, cp3062 hit/11', 'X-Cache-Status': 'hit-front', 'Server-Timing': 'cache;desc="hit-front"', 'Strict-Transport-Security': 'max-age=106384710; includeSubDomains; preload', 'X-Client-IP': '79.155.46.113', 'Cache-Control': 'private, s-maxage=0, max-age=0, must-revalidate', 'Accept-Ranges': 'bytes', 'Content-Length': '75815', 'Connection': 'keep-alive'}


In [None]:
print(req.headers['Content-Type'])

## Errors

In [None]:
import requests
from requests.exceptions import HTTPError

for url in ['https://api.github.com', 'https://api.github.com/invalid']:
    try:
        response = requests.get(url)

        # If the response was successful, no Exception will be raised
        response.raise_for_status()
    except HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')  # Python 3.6
    except Exception as err:
        print(f'Other error occurred: {err}')  # Python 3.6
    else:
        print('Success!')


## Chunks

In [6]:
import requests
req = requests.get('https://cms-assets.tutsplus.com/uploads/users/1251/posts/28204/image/Forest_Background_Optimized.jpg', stream=True)
req.raise_for_status()
with open('Forest.jpg', 'wb') as fd:
    for chunk in req.iter_content(chunk_size=50000):
        print('Received a Chunk')
        fd.write(chunk)

Received a Chunk
Received a Chunk


![forest](Forest.jpg)

## URL params

In [7]:
import requests
 
query = {'q': 'Forest', 'order': 'popular', 'min_width': '800', 'min_height': '600'}
req = requests.get('https://pixabay.com/en/photos/', params=query)
 
req.url
# returns 'https://pixabay.com/en/photos/?order=popular&min_height=600&q=Forest&min_width=800'

'https://pixabay.com/en/photos/?q=Forest&order=popular&min_width=800&min_height=600'

## POST

In [51]:
req = requests.post('https://en.wikipedia.org/w/index.php', data = {'search':'Data analyst'})

## Search for related topics (links to other wiki pages)

In [67]:
# Regular expressions?

## Cookies

In [27]:
import requests
 
jar = requests.cookies.RequestsCookieJar()
jar.set('first_cookie', 'first', domain='httpbin.org', path='/cookies')
jar.set('second_cookie', 'second', domain='httpbin.org', path='/extra')
jar.set('third_cookie', 'third', domain='httpbin.org', path='/cookies')
 
url = 'http://httpbin.org/cookies'
req = requests.get(url, cookies=jar)
 
req.text

'{\n  "cookies": {\n    "first_cookie": "first", \n    "third_cookie": "third"\n  }\n}\n'

## Sessions

In [36]:
import requests
 
reqOne = requests.get('https://tutsplus.com/')

In [42]:
reqTwo = requests.get('https://code.tutsplus.com/tutorials')
reqTwo.cookies.get('__cfduid')

'dfadf55e274ddd377e73149e9cbf2ba7a1595281174'

In [43]:
ssnOne = requests.Session()
ssnOne.get('https://tutsplus.com/')
print(ssnOne.cookies.get('__cfduid'))

 
reqThree = ssnOne.get('https://code.tutsplus.com/tutorials')
print(ssnOne.cookies.get('__cfduid'))
 


d713adb5ca0d29867ff7982b6cd6e88211595281248
d713adb5ca0d29867ff7982b6cd6e88211595281248


In [44]:
import requests
 
ssn = requests.Session()
ssn.cookies.update({'visit-month': 'February'})
 
reqOne = ssn.get('http://httpbin.org/cookies')
print(reqOne.text)
# prints information about "visit-month" cookie
 
reqTwo = ssn.get('http://httpbin.org/cookies', cookies={'visit-year': '2017'})
print(reqTwo.text)
# prints information about "visit-month" and "visit-year" cookie
 
reqThree = ssn.get('http://httpbin.org/cookies')
print(reqThree.text)
# prints information about "visit-month" cookie

{
  "cookies": {
    "visit-month": "February"
  }
}

{
  "cookies": {
    "visit-month": "February", 
    "visit-year": "2017"
  }
}

{
  "cookies": {
    "visit-month": "February"
  }
}



## Other HTTP Methods

Aside from GET, other popular HTTP methods include POST, PUT, DELETE, HEAD, PATCH, and OPTIONS. requests provides a method, with a similar signature to get(), for each of these HTTP methods:


In [None]:
requests.post('https://httpbin.org/post', data={'key':'value'})


In [None]:
requests.put('https://httpbin.org/put', data={'key':'value'})


In [None]:
requests.delete('https://httpbin.org/delete')


In [None]:
requests.head('https://httpbin.org/get')


In [None]:
requests.patch('https://httpbin.org/patch', data={'key':'value'})


In [None]:
requests.options('https://httpbin.org/get')


## JSON data

In [77]:
import requests

r = requests.get("http://api.open-notify.org/iss-now.json")
obj = r.json()

print(obj)

{'timestamp': 1595285509, 'message': 'success', 'iss_position': {'latitude': '27.7056', 'longitude': '-76.2032'}}
