# Requests
作用：发送网络请求，获得响应数据

官方文档：https://requests.readthedocs.io/zh_CN/latest/index.html

Requests是用python语言基于urllib编写的，采用的是Apache2 Licensed开源协议的HTTP库

它比urllib更加方便，可以节约大量的工作，完全满足HTTP测试需求的库

⼀句话——Requests是一个Python代码编写的HTTP请求库，方便在代码中模拟浏览器发送http请求


###  安装命令：pip install requests


## 课堂目标：
###   1，Requests请求
###   2，Response响应
###   3，高级操作

# 一，Requests请求 

## 1，实例引入

In [17]:
# https://www.baidu.com/

import requests

response = requests.get('https://www.baidu.com/')
print(response) # 直接打印变量得到的是一个响应体对象

print(response.text)  # 查看响应体内容

print(type(response.text))  # 查看响应内容的数据类型

print(response.status_code)  # 查看响应状态码

<Response [200]>
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>ç¾åº¦ä¸ä¸ï¼ä½ å°±ç¥é</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus=au

##  2，各种请求方式

In [2]:
requests.get('http://httpbin.org/get')   # GET请求
requests.post('http://httpbin.org/post')  # POST请求
requests.put('http://httpbin.org/put')
requests.delete('http://httpbin.org/delete')
requests.head('http://httpbin.org/get')
requests.options('http://httpbin.org/get')

<Response [200]>

## 3.1，基于get请求

1.基本写法

注意：notebook的每个代码块之前是独立的，但是从上往下共享Python运行环境

In [7]:
# 测试网站：http://httpbin.org/get

url = 'http://httpbin.org/get'  # 目标站点
r = requests.get(url)
print(r.status_code)
print(r.text)
print(type(r.text)) 

200
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.28.1", 
    "X-Amzn-Trace-Id": "Root=1-6332ee0f-68fade4738ea7ef14dac52b9"
  }, 
  "origin": "175.8.48.78", 
  "url": "http://httpbin.org/get"
}

<class 'str'>


2.带参数的get请求

In [11]:
# 测试网站：http://httpbin.org/get
# 第一种写法
url= 'http://httpbin.org/get?name=lisi&age=18'
r = requests.get(url)
print(r.status_code)
print(r.text)

200
{
  "args": {
    "age": "18", 
    "name": "lisi"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.28.1", 
    "X-Amzn-Trace-Id": "Root=1-6332ef12-155ac0d66896433c147f280a"
  }, 
  "origin": "175.8.48.78", 
  "url": "http://httpbin.org/get?name=lisi&age=18"
}



In [15]:
# 推荐写法
# 把参数构建在字典里
data = {
    'name':"lisi",
    'age':'12'
}
url = 'http://httpbin.org/get'
r = requests.get(url,params=data)  # params参数的作用是携带get请求的相关参数的
print(r.text)

{
  "args": {
    "age": "12", 
    "name": "lisi"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.28.1", 
    "X-Amzn-Trace-Id": "Root=1-6332eff0-428014c1295041c234445a7c"
  }, 
  "origin": "175.8.48.78", 
  "url": "http://httpbin.org/get?name=lisi&age=12"
}



## 3.2，基于post请求

In [16]:
# http://httpbin.org/post
d = {
    'name':'lisi',
}
url ='http://httpbin.org/post'  
r = requests.post(url,data =d )  # data参数作用：携带post请求参数用的
print(r.text)

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "name": "lisi"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "9", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.28.1", 
    "X-Amzn-Trace-Id": "Root=1-6332f0e4-2969998206626bbd39ca5a01"
  }, 
  "json": null, 
  "origin": "175.8.48.78", 
  "url": "http://httpbin.org/post"
}



## 4，json解析 （对于json数据的获取）

In [24]:
import requests
import json

url = 'http://httpbin.org/get'
r = requests.get(url)
print(r.status_code)  # 查看响应状态码
a = r.text
print(a)
print(type(a))  # 我们将长得像字典的字符串数据 -- josn数据

data_dict = json.loads(a)
print(data_dict)
print(type(data_dict))

print(data_dict['url'])
print(data_dict['headers']['Host'])


data_json = r.json()  # .json()方法作用是获取Json形式的源码 数据类型dict
print(data_json)
print(type(data_json))

200
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.28.1", 
    "X-Amzn-Trace-Id": "Root=1-6332f30e-5a63a81256a7560b0c7fe629"
  }, 
  "origin": "175.8.48.78", 
  "url": "http://httpbin.org/get"
}

<class 'str'>
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.28.1', 'X-Amzn-Trace-Id': 'Root=1-6332f30e-5a63a81256a7560b0c7fe629'}, 'origin': '175.8.48.78', 'url': 'http://httpbin.org/get'}
<class 'dict'>
http://httpbin.org/get
httpbin.org
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.28.1', 'X-Amzn-Trace-Id': 'Root=1-6332f30e-5a63a81256a7560b0c7fe629'}, 'origin': '175.8.48.78', 'url': 'http://httpbin.org/get'}
<class 'dict'>


## 5，.content 获取二进制数据

In [25]:
#目标站点 -- 百度logo图片：https://www.baidu.com/img/baidu_jgylogo3.gif
url = 'https://www.baidu.com/img/baidu_jgylogo3.gif'
r = requests.get(url)
print(r.text)
print(type(r.text))

GIF89au & �  �2/���Y`虝��vt)2�����!�     ,    u &  �x���0� J0ɻ�`�UV!L���l��P���V�|����4���H�(�ɨ����t{���,w�|
�B�Z�aK�7|M�Ph
�%����n8FN&:@F��|V1~w�y��r� �9�khlO�j�!�s�\�m�&�\���AZ�PQ�~��yX��Rż���  � �WEz85�'���
������.�D�a����������,��L
�� &P��<�T�H���gt��gj��4 �.�O1 >*HF%ٽ$���i2@� L��\ N㼏$(�'&3g�9(�r���9�D�,i�q+l�;)4� 0�06`Z�fW"U�M���Ni᭨jC��X��x� m��.��ғ��eK��܊�؅����n��BC[�Р `�.�����_�:&`S��	����͚/m��Y��Ȗ� �a���~ִ��븱�0�����p�!i��6��f��y\<�{�f�[t�ȨO'�S�A� �\L����`� ��m�T52D]P��U�a�}��H�=��~�Uxm�d���e� Z$� #r0!~ *�W+ �vٱ#�U�a��mf=��*L���<03��]��x���\y��2���)�J�h��iHt��HK&���D�K��  ;
<class 'str'>


##### 以下演示获取百度图片保存至电脑本地 

In [27]:
# url = 'https://www.baidu.com/img/baidu_jgylogo3.gif'
# r = requests.get(url)
# print(r.content)
# print(type(r.content))


"""
bytes类型是指一堆字节的集合，在python中以b开头的字符串都是bytes类型

Bytes类型的作用:
    1, 在python中， 数据转成2进制后不是直接以0101010的形式表示的，而是用一种叫bytes(字节)的类型来表示
    2,计算机只能存储2进制， 我们的字符、图片、视频、音乐等想存到硬盘上，也必须以正确的方式编码成2进制后再存。
      记住一句话：在python中，字符串必须编码成bytes后才能存到硬盘上
"""
url = 'https://www.baidu.com/img/baidu_jgylogo3.gif'
r = requests.get(url)
data = r.content
print(data)
with open('bbfp.gif','wb')as f:
    f.write(data)
    


## 6，初步伪装小爬虫  -- 添加headers 
    
    浏览器用户身份的标识，缺少的话服务器会认为你不是一个正常的浏览器用户，而是一个爬虫程序
    user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36


In [28]:
# 目标站点 -- 知乎 ：https://www.zhihu.com/explore
url ='https://www.zhihu.com/explore'
r = requests.get(url)  
print(r.status_code)
print(r.text)

200
<!doctype html>
<html lang="zh" data-hairline="true" data-theme="light"><head><meta charSet="utf-8"/><title data-rh="true">发现 - 知乎</title><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1"/><meta name="renderer" content="webkit"/><meta name="force-rendering" content="webkit"/><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"/><meta name="google-site-verification" content="FTeR0c8arOPKh8c5DYh_9uu98_zJbaWw53J-Sch9MTg"/><meta name="description" property="og:description" content="知乎，中文互联网高质量的问答社区和创作者聚集的原创内容平台，于 2011 年 1 月正式上线，以「让人们更好的分享知识、经验和见解，找到自己的解答」为品牌使命。知乎凭借认真、专业、友善的社区氛围、独特的产品机制以及结构化和易获得的优质内容，聚集了中文互联网科技、商业、影视、时尚、文化等领域最具创造力的人群，已成为综合性、全品类、在诸多领域具有关键影响力的知识分享社区和创作者聚集的原创内容平台，建立起了以社区驱动的内容变现商业模式。"/><link data-rh="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.a53ae37b.png"/><link data-rh="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.a53ae37b.png" 

In [29]:
# 以下举例伪装版：
# 构建我的身份信息
h = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}
url ='https://www.zhihu.com/explore'
r = requests.get(url,headers =h )   # headers参数：携带身份信息
print(r.status_code)
print(r.text)

200
<!doctype html>
<html lang="zh" data-hairline="true" data-theme="light"><head><meta charSet="utf-8"/><title data-rh="true">发现 - 知乎</title><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1"/><meta name="renderer" content="webkit"/><meta name="force-rendering" content="webkit"/><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"/><meta name="google-site-verification" content="FTeR0c8arOPKh8c5DYh_9uu98_zJbaWw53J-Sch9MTg"/><meta name="description" property="og:description" content="知乎，中文互联网高质量的问答社区和创作者聚集的原创内容平台，于 2011 年 1 月正式上线，以「让人们更好的分享知识、经验和见解，找到自己的解答」为品牌使命。知乎凭借认真、专业、友善的社区氛围、独特的产品机制以及结构化和易获得的优质内容，聚集了中文互联网科技、商业、影视、时尚、文化等领域最具创造力的人群，已成为综合性、全品类、在诸多领域具有关键影响力的知识分享社区和创作者聚集的原创内容平台，建立起了以社区驱动的内容变现商业模式。"/><link data-rh="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.a53ae37b.png"/><link data-rh="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.a53ae37b.png" 

# 二，Response响应

## 1，response属性

In [33]:
# 目标网站 --  ：http://www.jianshu.com    
import requests
url = 'https://www.jianshu.com'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36'
}
r = requests.get(url,headers = headers)
print(r.status_code)  

# 查看响应头信息
print(r.headers)

# 查看url
print(r.url)

200
{'Date': 'Tue, 27 Sep 2022 13:20:14 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Set-Cookie': 'acw_tc=0b62602216642848142244354e015f3bb49d02e5e430511432b73a338f4d87;path=/;HttpOnly;Max-Age=1800, locale=zh-CN; path=/', 'Server': 'Tengine', 'Vary': 'Accept-Encoding', 'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'ETag': 'W/"5eb754bcbde9429f09d7f1a7466f5fb8"', 'Cache-Control': 'max-age=0, private, must-revalidate', 'X-Request-Id': 'e7465daf-05a4-4f9a-a4f6-fe9725994c5e', 'X-Runtime': '0.003770', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Content-Encoding': 'gzip'}
https://www.jianshu.com/


## 2，状态码判断 

# 三，高级操作

## 会话维持 

http/https  协议  是一种无状态的协议，对事物处理无记忆功能

所以每次请求都是一个独立状态

#### (1).通过cookie维持会话

In [34]:
# 通过cookie    用户信息（账号+密码）   只是维持登录状态
'''
好处： 就是可以爬取需要登录才能获取的网站
坏处：大大提高你被反爬的几率  （多账号手段）
'''

import requests
# 身份信息可以放入多条 
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36',
    'Cookie': ' '
}
response = requests.get("https://www.jianshu.com/",headers = headers)
print(response.text)


<!DOCTYPE html>
<script src="//aeu.alicdn.com/waf/antidomxss_v640.js"></script><script src="//aeu.alicdn.com/waf/interfaceacting220628.js"></script><!--[if IE 6]><html class="ie lt-ie8"><![endif]-->
<!--[if IE 7]><html class="ie lt-ie8"><![endif]-->
<!--[if IE 8]><html class="ie ie8"><![endif]-->
<!--[if IE 9]><html class="ie ie9"><![endif]-->
<!--[if !IE]><!--> <html> <!--<![endif]-->

<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=Edge">
  <meta name="viewport" content="width=device-width, initial-scale=1.0,user-scalable=no">

  <!-- Start of Baidu Transcode -->
  <meta http-equiv="Cache-Control" content="no-siteapp" />
  <meta http-equiv="Cache-Control" content="no-transform" />
  <meta name="applicable-device" content="pc,mobile">
  <meta name="MobileOptimized" content="width"/>
  <meta name="HandheldFriendly" content="true"/>
  <meta name="mobile-agent" content="format=html5;url=https://www.jianshu.com/">
  <!-- End of Baidu Transcode -->

    <me

#### (2).通过session维持会话

In [36]:
# 通过session    让服务器知道你是上一次的你
s = requests.session()  # 创建session对象 用来存储你的session信息
s.get('https://www.baidu.com')
res = s.get('https://www.baidu.com/s?wd=python&ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&fenlei=256&rsv_pq=0xe38dcd6a0009dc3d&rsv_t=6f478XyZarOYA%2FhDWS%2Fl9J3dF68Fu5S0wy48J%2BgivklUtP9lwbgVBC6lPlkk&rqlang=en&rsv_enter=1&rsv_dl=tb&rsv_sug3=7&rsv_sug1=4&rsv_sug7=101&rsv_sug2=0&rsv_btype=i&inputT=1739&rsv_sug4=2464&rsv_sug=2')
print(res.text)

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="utf-8">
    <title>ç¾åº¦å®å¨éªè¯</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <meta name="apple-mobile-web-app-capable" content="yes">
    <meta name="apple-mobile-web-app-status-bar-style" content="black">
    <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0">
    <meta name="format-detection" content="telephone=no, email=no">
    <link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon">
    <link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
    <link rel="stylesheet" href="https://ppui-static-wap.cdn.bcebos.com/static/touch/css/api/mkdjump_0635445.css" />
</head>
<body>
    <div class="timeout hide">
        <

## 代理设置 

In [37]:
# 目标站点：https://www.baidu.com
url= 'https://www.baidu.com'
r = requests.get(url)
print(r.status_code)

200


In [39]:
url= 'https://www.baidu.com'
# 构造ip信息
ip_data = {
    'http':'122.9.101.6:8888',
     'https':'122.9.101.6:8888',
}

# 构造身份信息
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36',
}
r = requests.get(url,headers = headers,proxies = ip_data)  #proxies参数作用：挂代理
print(r.text)

ProxyError: HTTPSConnectionPool(host='www.baidu.com', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))

## 超时设置

In [41]:
# 目标站点 ： http://baidu.com
url= 'https://www.baidu.com'
r = requests.get(url,timeout =0.00000001)  # 秒
print(r.status_code)

ConnectTimeout: HTTPSConnectionPool(host='www.baidu.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001A1EF9EDFD0>, 'Connection to www.baidu.com timed out. (connect timeout=1e-08)'))

## 异常处理

In [1]:
try:  # 处理异常，可能出现异常的代码都写在try方法里面
    url= 'https://www.baidu.com'
    r = requests.get(url,timeout = 0.00000000001)  # 秒
    print(r.status_code)
except: # 出现异常怎么处理
    print('timeout!')

timeout!
