## Highly recommonded using library requests, rather than urllib  

# Requests  

simple example

In [None]:
import requests

response = requests.get('https://coreyms.com/')
print(type(response))
print(response.status_code)
print(type(response.text))
# print(response.text)
print(response.cookies)

You can choose any types of request :)  

you may think why we waste our time to learn urllib?  

It's because it is always easy to be from hard to easy, hard to be from easy to hard

In [None]:
import requests

a = requests.post('http://httpbin.org/post')
b = requests.get('http://httpbin.org/get')
c = requests.put('http://httpbin.org/put')
print(a,b,c,sep='\n')

All three status_code should be 200  

## get request

In [None]:
import requests

response = requests.get('http://httpbin.org/get')
print(response.text)

### get with parameters

In [None]:
import requests

response = requests.get('http://httpbin.org/get?name=canada&age=21')
print(response.text)

The method above, we add data or parameters following the url.  

Also, in the following, we can add para by creating a dict, which is easier to understand and implement.

In [None]:
import requests

data = {
    'name':'Kanye West',
    'age':'30'
}
response = requests.get('http://httpbin.org/get',params = data)
print(response.text)

### Parsing/Analysising json

In [None]:
import requests
import json

response = requests.get('http://httpbin.org/get')
print(type(response.text))
print(type(response.json()))
print(type(json.loads(response.text)))
print(response.json())
print(json.loads(response.text))

You can find the type of json() in requests and json.load is the same, and outputs are also totally same

### binary data get and strore

In [None]:
import requests

response = requests.get('https://github.com/favicon.ico')
print(type(response.text),type(response.content),sep='\n')

In this step we get the pic, which is an icon of Github, then how we store that?

In [None]:
import requests

response = requests.get('https://github.com/favicon.ico')
with open('favicon.ico','wb') as f:
    f.write(response.content)
    f.close()

### get with headers

In [None]:
import requests

response = requests.get('https://www.zhihu.com/explore')
print(response.text)

In [None]:
import requests

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0'
}
response = requests.get('https://www.zhihu.com/explore',headers=headers)
print(response.text)

This website needs to check your browser information, which is User-Agent.  

## Post request

In [None]:
import requests

data = {'name':'canada','ages':'21'}
response = requests.post('http://httpbin.org/post',data=data)
print(response.text)

Do not forget that you need to  mutate your form data when you request post  

### post request with headers

In [None]:
import requests

data = {'name':'mathgenius','ages':'21'}
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0'}
response = requests.post('http://httpbin.org/post',data=data,headers=headers)
print(response.json())

## Response  

### property of response

In [None]:
import requests

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0'}
response = requests.get('http://www.jianshu.com',headers=headers)
print(type(response.status_code),response.status_code,'\n')
print(type(response.headers),'\n')
print(type(response.cookies),response.cookies,'\n')
print(type(response.url),response.url,'\n')
print(type(response.history),response.history)

Just give you a better understanding of all these response.__  

### judgement of status_code

In [None]:
import requests

response = requests.get('https://www.jianshu.com')
print('Fail',response.status_code) if not response.status_code == 200 else print('Request Successfully')

Fail, since I forget to add headers

In [None]:
import requests

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0'}
response = requests.get('https://www.jianshu.com',headers=headers)
exit() if not response.status_code == 200 else print('Request Successfully')

Request Successfully!  
For the representive of every status_code, please wiki by yourself.  

## High-Level  operation of Requests
### upload the document

In [9]:
import requests

files = {'file':open('favicon.ico','rb')}
response = requests.post('http://httpbin.org/post',files=files)
# print(response.text)

### Grab the cookie

In [None]:
import requests

response = requests.get('https://www.baidu.com')
print(response.cookies)
for key,value in response.cookies.items():
    print(key+'='+value)

Then you get and print the cookies.  

### Maintain the Session  => Simulation for login

In [15]:
import requests

s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/7777777')
response = s.get('http://httpbin.org/cookies')
print(response.text)

{
  "cookies": {
    "number": "7777777"
  }
}



Using requests.Session() since you need to maintain the session and avoid requesting twice.  

### Certificate Validation

In [None]:
import requests

response = requests.get('https://www.12306.cn')
print(response.status_code)

In the past, 12306 will return you SSL Error because of failing to verify certifcation, but now it fixed.  

What I want to show you is you can add verify=False to scrap some website which is failing to verify cerfifcation.

In [None]:
import requests

response = requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

When you run this cell, it will recommend you'd better change False to True.  

How we eleminate it?

In [None]:
import requests
from requests.packages import urllib3

urllib3.disable_warnings()
response = requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

### Agent

In [None]:
import requests

proxies = {
    'http':'http://127.0.0.1:1234',
    'https':'https://127.0.0.1:1234'
}

response = requests.get('https://www.taobao.com',proxies=proxies)
print(response.status_code)

You will get connection refusion because this is my assigned proxies, not yours.

In [None]:
import requests

proxies = {
    'http':'http://user:passwd@127.0.0.1:1234/'
}
response = requests.get('https://www.taobao.com',proxies=proxies)
print(response.status_code)

Use the above if your Agenet needs user and passwd.  

Also, you may use Socks Agent, then we need to import

pip install 'requests[socks]' on your cmd or linux or Anaconda

In [None]:
import requests

proixes = {
    'http':'socks5://127.0.0.1:1234',
    'https':'socks5://127.0.0.1:1234'
}
response = requests.get('https://www.amazon.com',proxies=proxies)
print(response.status_code)

### Time out

In [None]:
import requests

response = requests.get('http://httpbin.org/get',timeout=0.001)
print(response.status_code)

You will get a connection timeout

In [32]:
import requests

try:
    response = requests.get('http://httpbin.org/get',timeout=0.01)
    print(response.status_code)
except:
    print('Timeout')

Timeout


### Auth validation, when you scrap website which needs username and passwd

In [None]:
import requests
from requests.auth import HTTPBasicAuth

response = requests.get('https://m.weibo.cn/',auth=HTTPBasicAuth('your username','your passwd'))
print(response.status_code)

In [None]:
import requests

response = requests.get('https://m.weibo.cn/',auth=('your username','your passwd'))
print(response.status_code)

### Error Handle

In [None]:
import requests
from requests.exceptions import ReadTimeout, ConnectionError, RequestException

try:
    response = requests.get('http://httpbin.org/get',timeout=0.005)
    print(response.status_code)
except ReadTimeout:
    print('Timeout')
except ConnectionError:
    print('Connection Error')
except RequestException:
    print('Error')