---
layout: post
title: "「爬虫」豆瓣电影封面海报爬取(上)"
subtitle: 'Python Web 爬虫'
author: "Hufe"
header-img: "img/post-bg-python.jpg"
header-mask: 0.3
mathjax: true
tags:
  - Python
  - 爬虫
---

> 参考自：[极客时间 数据分析实战45讲 10丨Python爬虫：如何自动化下载王祖贤海报?](https://time.geekbang.org/column/article/76001)

爬虫的流程整个过程包括三个阶段：
- 打开网页
- 提取数据
- 保存数据

在 Python 中，这三个阶段都有对应的工具可以使用。

在“打开网页”这一步骤中，可以使用 `Requests` 访问页面，得到服务器返回给我们的数据，这里包括 HTML 页面以及 JSON 数据。

在“提取数据”这一步骤中，主要用到了两个工具。针对 HTML 页面，可以使用 XPath 进行元素定位，提取数据；针对 JSON 数据，可以使用 JSON 进行解析。

在最后一步“保存数据”中，我们可以使用 `Pandas` 保存数据，最后导出 CSV 文件。


### Requests 访问页面

`Requests` 是 Python HTTP 的客户端库，编写爬虫的时候都会用到，编写起来也很简单。它有两种访问方式：`Get` 和 `Post`。  
这两者最直观的区别就是：Get 把参数包含在 url 中，而 Post 通过 request body 来传递参数。

假设我们想访问豆瓣，那么用 Get 访问的话，代码可以写成下面这样的：

`r = requests.get('http://www.douban.com')`
代码里的“r”就是 Get 请求后的访问结果，然后我们可以使用 r.text 或 r.content 来获取 HTML 的正文。

如果我们想要使用 Post 进行表单传递，代码就可以这样写：

`r = requests.post('http://xxx.com', data = {'key':'value'})`
这里 data 就是传递的表单参数，data 的数据类型是个字典的结构，采用 key 和 value 的方式进行存储。

### XPath 定位

XPath 是 XML 的路径语言，实际上是通过元素和属性进行导航，帮我们定位位置。它有几种常用的路径表达方式。
![](https://static001.geekbang.org/resource/image/3b/ea/3bcb311361c76bfbeb90d360b21195ea.jpg)


我来给你简单举一些例子：

- `xpath(‘node’)` 选取了 node 节点的所有子节点；

- `xpath(’/div’)` 从根节点上选取 div 节点；

- `xpath(’//div’)` 选取所有的 div 节点；

- `xpath(’./div’)` 选取当前节点下的 div 节点；

- `xpath(’…’)`回到上一个节点；

- `xpath(’//@id’)` 选取所有的 id 属性；

- `xpath(’//book[@id]’)` 选取所有拥有名为 id 的属性的 book 元素；

- `xpath(’//book[@id=“abc”]’)` 选取所有 book 元素，且这些 book 元素拥有 id= "abc"的属性；

- `xpath(’//book/title | //book/price’)` 选取 book 元素的所有 title 和 price 元素。

上面只是列举了 `XPath` 的部分应用，`XPath` 的选择功能非常强大，它可以提供超过 100 个内建函数，来做匹配。我们想要定位的节点，几乎都可以使用 XPath 来选择。

使用 XPath 定位，你会用到 Python 的一个解析库 lxml。这个库的解析效率非常高，使用起来也很简便，只需要调用 HTML 解析命令即可，然后再对 HTML 进行 XPath 函数的调用。

比如我们想要定位到 HTML 中的所有列表项目，可以采用下面这段代码。

```
from lxml import etree
html = etree.HTML(html)
result = html.xpath('//li')
```

### JSON 对象

JSON 是一种轻量级的交互方式，在 Python 中有 JSON 库，可以让我们将 Python 对象和 JSON 对象进行转换。为什么要转换呢？原因也很简单。将 JSON 对象转换成为 Python 对象，我们对数据进行解析就更方便了。

![](https://static001.geekbang.org/resource/image/9a/43/9a6d6564a64cf2b1c256265eea78c543.png)


这是一段将 JSON 格式转换成 Python 对象的代码，你可以自己运行下这个程序的结果。

In [2]:
import json
jsonData = '{"a":1,"b":2,"c":3,"d":4,"e":5}';
input = json.loads(jsonData)
print(input)

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}


## 爬虫实战
本次实战从两个角度使用 Python 爬取海报，上篇是通过 JSON 数据爬取，下篇是通过 XPath 定位爬取。

## 普通爬取

本次爬取网页为`https://www.douban.com/j/search_photo?q=%E7%8E%8B%E7%A5%96%E8%B4%A4&limit20&start=0`  
即`https://www.douban.com/j/search_photo?q=王祖贤&limit=20&start=0`

![title](https://raw.githubusercontent.com/hufe09/GitNote-Images/master/gitnote/2019/05/13/1557761380378-1557761380386.png)
url 中的乱码正是中文的 url 编码，打开后，我们看到了很清爽的 JSON 格式对象，展示的形式是这样的：
```
{"images":
       [{"src": …, "author": …, "url":…, "id": …, "title": …, "width":…, "height":…},
    …
	 {"src": …, "author": …, "url":…, "id": …, "title": …, "width":…, "height":…}],
"total": 23499,
"limit": 20,
"more": true
}
 ```
从这个 JSON 对象中，我们能看到，王祖贤的图片一共有 23499 张，其中一次只返回了 20 张，还有更多的数据可以请求。数据被放到了 images 对象里，它是个数组的结构，每个数组的元素是个字典的类型，分别告诉了 src、author、url、id、title、width 和 height 字段，这些字段代表的含义分别是原图片的地址、作者、发布地址、图片 ID、标题、图片宽度、图片高度等信息。

有了这个 JSON 信息，你很容易就可以把图片下载下来。当然你还需要寻找 XHR 请求的 url 规律。

In [4]:
import json
import os

import requests
from concurrent.futures import ThreadPoolExecutor, as_completed, wait, ALL_COMPLETED

query = '王祖贤'

def download(src, id):
    folder_name = query
    if not os.path.exists(folder_name):
        os.makedirs(folder_name)
    dir = './' + query + '/' + str(id) + '.jpg'
    try:
        pic = requests.get(src, timeout=10)
        fp = open(dir, 'wb')
        fp.write(pic.content)
        fp.close()
    except requests.exceptions.ConnectionError:
        print(str(id) + '无法下载')
        
for i in range(0, 500, 20):
    print(f'---------------正在爬取第{i // 20 + 1}页---------------')
    url = 'https://www.douban.com/j/search_photo?q=' + query + '&limit20&start=' + str(i)
    html = requests.get(url).text
    response = json.loads(html, encoding='utf-8')
    j = 1
    for image in response['images']:
        print(image['src'])
        download(image['src'], image['id'])
        print(f'第{i + j}张图片')
        j += 1

---------------正在爬取第1页---------------
https://img3.doubanio.com/view/photo/thumb/public/p637714342.jpg
第1张图片
https://img1.doubanio.com/view/photo/photo/public/p637081759.jpg
第2张图片
https://img3.doubanio.com/view/photo/thumb/public/p642504596.jpg
第3张图片
https://img3.doubanio.com/view/photo/thumb/public/p642504623.jpg
第4张图片
https://img1.doubanio.com/view/photo/thumb/public/p642504659.jpg
第5张图片
https://img3.doubanio.com/view/photo/thumb/public/p642504673.jpg
第6张图片
https://img3.doubanio.com/view/photo/thumb/public/p642504722.jpg
第7张图片
https://img3.doubanio.com/view/photo/thumb/public/p642507941.jpg
第8张图片
https://img3.doubanio.com/view/photo/thumb/public/p642507983.jpg
第9张图片
https://img3.doubanio.com/view/photo/thumb/public/p642508003.jpg
第10张图片
https://img3.doubanio.com/view/photo/thumb/public/p642508066.jpg
第11张图片
https://img1.doubanio.com/view/photo/thumb/public/p642508108.jpg
第12张图片
https://img3.doubanio.com/view/photo/thumb/public/p642510954.jpg
第13张图片
https://img1.doubanio.com/view/phot

---------------正在爬取第25页---------------


## 增加部分细节

In [6]:
import json
import os
import time

import requests


def download(src, img_name, query):
    """" 下载图片 """
    # print('启动下载进程，进程号[%d].' % os.getpid())
    dir = './' + query + '/' + str(img_name) + '.jpg'
    try:
        pic = requests.get(src, timeout=10)
        fp = open(dir, 'wb')
        fp.write(pic.content)
        fp.close()
        return '开始下载:' + str(img_name), src
    except requests.exceptions.ConnectionError:
        return 'ConnectionError:' + str(img_name) + '无法下载!'
    except OSError:
        return 'OSError:' + str(img_name) + '无法下载!'

def crawling_poster(url, query):
    '''解析url, 获取元素'''
    html = requests.get(url).text
    response = json.loads(html, encoding='utf-8')
    srcs = []
    ids = []
    for image in response['images']:
        srcs.append(image['src'])
        ids.append(image['id'])
    lists = []
    for src, id, q in zip(srcs, ids, [query] * len(srcs)):
        lists.append([src, id, q])
    print('启动下载进程，进程号[%d].' % os.getpid())
    start = time.time()

    for src, id in zip(srcs, ids):
        download(src, id, query)
    end = time.time()
    print(f'本次下载耗费{end - start}秒')
    return url

def main():
    query = '王祖贤'

    # 创建文件夹
    folder_name = query
    if not os.path.exists(folder_name):
        os.makedirs(folder_name)

    urls = []
    for i in range(0, 500, 20):
        urls.append('https://www.douban.com/j/search_photo?q=' + query + '&limit20&start=' + str(i))

    start = time.time()
    #单进程
    for url in urls:
        print(crawling_poster(url, query))

    end = time.time()
    print(f'全程耗费{end - start}秒')

if __name__ == '__main__':
    main()

启动下载进程，进程号[17124].
本次下载耗费0.0秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=0
启动下载进程，进程号[17124].
本次下载耗费0.0秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=20
启动下载进程，进程号[17124].
本次下载耗费0.0秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=40
启动下载进程，进程号[17124].
本次下载耗费0.0秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=60
启动下载进程，进程号[17124].
本次下载耗费0.0秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=80
启动下载进程，进程号[17124].
本次下载耗费0.0秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=100
启动下载进程，进程号[17124].
本次下载耗费0.0秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=120
启动下载进程，进程号[17124].
本次下载耗费0.0秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=140
启动下载进程，进程号[17124].
本次下载耗费0.0秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=160
启动下载进程，进程号[17124].
本次下载耗费0.0秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=180
启动下载进程，进程号[17124].
本次下载耗费0.0秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&star

## 多线程

In [15]:
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed, wait, ALL_COMPLETED

def multi_threads_pool(num, func, lists):
    """线程池下载"""
    threads_pool = ThreadPoolExecutor(num)

    # 如果要运行的函数只有一个参数
#     threads = [threads_pool.submit(func, (src) ) for src in srcs]

    # 如果要运行的函数有多个参数，需要借助lambda函数
    # list_var1 = ['https://img3.doubanio.com/view/celebrity/s_ratio_celebrity/public/p751.webp', 'hello']
    # list_var2 = ['https://img3.doubanio.com/view/celebrity/s_ratio_celebrity/public/p2414157745.webp', 'world']
    # threads_pool.submit(lambda p: download(*p), list_var1)
    # threads_pool.submit(lambda p: download(*p), list_var2)

    threads = [threads_pool.submit(lambda p: func(*p), list) for list in lists]
    # for t in threads:
    #     print(t.done())

    for future in as_completed(threads):
        # 使用as_completed方法一次取出所有任务的结果
        data = future.result()
        print(f"{data}")
    # wait方法可以让主线程阻塞，直到满足设定的要求
    wait(threads, return_when=ALL_COMPLETED)

## 多线程简单介绍

### 如果要运行的函数只有一个参数

In [7]:
import time

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed, wait, ALL_COMPLETED

def test(num):
    list = []
    for i in range(0, num):
        list.append(i)
    return sum(i for i in range(0, num))

def multi_threads_pool(num, func):
    """线程池下载"""
    threads_pool = ThreadPoolExecutor(num)
    threads = [threads_pool.submit(func, 10000)]
    for future in as_completed(threads):
        # 使用as_completed方法一次取出所有任务的结果
        data = future.result()
        print(f"{data}")
    # wait方法可以让主线程阻塞，直到满足设定的要求
    wait(threads, return_when=ALL_COMPLETED)
    
start = time.time()    
multi_threads_pool(4,test)
end = time.time()
print(f'耗时{end-start}秒')

49995000
耗时0.01598644256591797秒


### 如果运行的函数有多个参数

In [8]:
import time

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed, wait, ALL_COMPLETED

def test(num1, num2):
    list = []
    for i in range(0, num1):
        for j in range(0, num2):
            list.append(i + j)
    sum1 = sum(y for y in range(0, num1))
    sum2 = sum(n for n in range(0, num2))
    return (sum1+sum2)

def multi_threads_pool(num, func):
    """线程池下载"""
    threads_pool = ThreadPoolExecutor(num)
    threads = [threads_pool.submit(func, 1000, 1000) ]
    for future in as_completed(threads):
        # 使用as_completed方法一次取出所有任务的结果
        data = future.result()
        print(f"{data}")
    # wait方法可以让主线程阻塞，直到满足设定的要求
    wait(threads, return_when=ALL_COMPLETED)
    
start = time.time()    
multi_threads_pool(4,test)
end = time.time()
print(f'耗时{end-start}秒')

999000
耗时0.6536271572113037秒


### 如果执行多组求和

- 方法1：按需传参

In [9]:
import time

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed, wait, ALL_COMPLETED

def test(num1, num2):
    list = []
    for i in range(0, num1):
        for j in range(0, num2):
            list.append(i + j)
    sum1 = sum(y for y in range(0, num1))
    sum2 = sum(n for n in range(0, num2))
    return (sum1+sum2)

def multi_threads_pool(num, func, list1, list2):
    """线程池下载"""
    threads_pool = ThreadPoolExecutor(num)
    threads = [threads_pool.submit(func, num1, num2) for num1, num2 in zip(list1, list2)]
    for future in as_completed(threads):
        # 使用as_completed方法一次取出所有任务的结果
        data = future.result()
        print(f"{data}")
    # wait方法可以让主线程阻塞，直到满足设定的要求
    wait(threads, return_when=ALL_COMPLETED)
    
start = time.time()
list1 = [i for i in range(10, 100, 10)]
list2 = [i for i in range(20, 200, 20)]
print(list1,list2)

multi_threads_pool(4,test, list1, list2)
end = time.time()
print(f'耗时{end-start}秒')

[10, 20, 30, 40, 50, 60, 70, 80, 90] [20, 40, 60, 80, 100, 120, 140, 160, 180]
8910
2205
6175
3940
235
970
12145
15880
20115
耗时0.06396269798278809秒


In [153]:
list1 = [i for i in range(10, 100, 10)]
list2 = [i for i in range(20, 200, 20)]
print(list1,list2)

mylist = []
for a, b in zip(list1, list2):
    mylist.append([a, b])
print(mylist)

[10, 20, 30, 40, 50, 60, 70, 80, 90] [20, 40, 60, 80, 100, 120, 140, 160, 180]
[[10, 20], [20, 40], [30, 60], [40, 80], [50, 100], [60, 120], [70, 140], [80, 160], [90, 180]]


- 方法2：将参数放入`lsit`，借助`lambda`函数

In [10]:
import time

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed, wait, ALL_COMPLETED

def test(num1, num2):
    list = []
    for i in range(0, num1):
        for j in range(0, num2):
            list.append(i + j)
    sum1 = sum(y for y in range(0, num1))
    sum2 = sum(n for n in range(0, num2))
    return (sum1+sum2)

def multi_threads_pool(num, func, lists):
    """线程池下载"""
    threads_pool = ThreadPoolExecutor(num)
    threads = [threads_pool.submit(lambda p: func(*p), list) for list in lists]
    for future in as_completed(threads):
        # 使用as_completed方法一次取出所有任务的结果
        data = future.result()
        print(f"{data}")
    # wait方法可以让主线程阻塞，直到满足设定的要求
    wait(threads, return_when=ALL_COMPLETED)

    
list1 = [i for i in range(10, 100, 10)]
list2 = [i for i in range(20, 200, 20)]

mylist = []
for a, b in zip(list1, list2):
    mylist.append([a, b])
print(mylist)

start = time.time()    
multi_threads_pool(4,test, mylist)
end = time.time()
print(f'耗时{end-start}秒')

[[10, 20], [20, 40], [30, 60], [40, 80], [50, 100], [60, 120], [70, 140], [80, 160], [90, 180]]
970
2205
3940
235
6175
8910
20115
12145
15880
耗时0.41576218605041504秒


## 多进程

In [None]:
import time,os

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed, wait, ALL_COMPLETED

def test(num1, num2):
    list = []
    for i in range(0, num1):
        for j in range(0, num2):
            list.append(i + j)
    sum1 = sum(y for y in range(0, num1))
    sum2 = sum(n for n in range(0, num2))
    print('启动下载进程，进程号[%d].' % os.getpid())
    return ((sum1+sum2), '进程号'+str(os.getpid()))

def multi_process_pool(num, func, list1, list2):
    '''进程池'''
    processes_pool = ProcessPoolExecutor(max_workers=num)
#     processes = [processes_pool.submit(lambda p: func(*p), list) for list in lists]
    processes = [processes_pool.submit(func, num1, num2) for num1, num2 in zip(list1, list2)]
    for process in as_completed(processes):
        # 使用as_completed方法一次取出所有任务的结果
        data = process.result()
        print(f"{data}")

    # wait方法可以让主线程阻塞，直到满足设定的要求
    wait(processes, return_when=ALL_COMPLETED)

    
list1 = [i for i in range(10, 100, 10)]
list2 = [i for i in range(20, 200, 20)]

mylist = []
for a, b in zip(list1, list2):
    mylist.append([a, b])
print(mylist)

start = time.time()    
multi_process_pool(4,test, list1, list2)
end = time.time()
print(f'耗时{end-start}秒')

```
[[10, 20], [20, 40], [30, 60], [40, 80], [50, 100], [60, 120], [70, 140], [80, 160], [90, 180]]
启动下载进程，进程号[4036].
启动下载进程，进程号[4037].
启动下载进程，进程号[4035].
启动下载进程，进程号[4038].
启动下载进程，进程号[4036].
启动下载进程，进程号[4035].
启动下载进程，进程号[4037].
启动下载进程，进程号[4036].
启动下载进程，进程号[4035].
(2205, '进程号4037')
(970, '进程号4036')
(235, '进程号4035')
(3940, '进程号4038')
(6175, '进程号4036')
(12145, '进程号4035')
(8910, '进程号4037')
(15880, '进程号4036')
(20115, '进程号4035')
耗时0.12399792671203613秒
```

## Bugs

-  bug1

    ```
    concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending
    进程池中的进程在未来运行或挂起时突然终止
    ```

    **原因**：
    - 使用进程池会出现此错误，单进程不会

    **解决方法：**
    把代码加到main中执行

    ```
    def main():
        your code
        pass
    if __name__ = '__main__':
        main()
    ```
    - 环境问题： Windows Jupyter Notebook环境导致，Ubuntu不会出现

-  bug2

    ```
    _pickle.PicklingError: Can't pickle <function parse_poster at 0x0B2BFA98>: it's not the same object as __main__.parse_poster
    ```
    **解决方法：**
    本人使用Pychram作为编译环境，重新run`poster_parse.py`文件，顺利执行  
    在 Window Jupyter Notebook 环境会出现bug1 

## 多进程&多线程爬取
完整代码如下：

In [12]:
# poster_parse.py
import json
import os
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed, wait, ALL_COMPLETED

import requests


def download(src, img_name, query):
    """" 下载图片 """
    # print('启动下载进程，进程号[%d].' % os.getpid())
    dir = './' + query + '/' + str(img_name) + '.jpg'
    try:
        pic = requests.get(src, timeout=10)
        fp = open(dir, 'wb')
        fp.write(pic.content)
        fp.close()
        return '开始下载:' + str(img_name), src
    except requests.exceptions.ConnectionError:
        return 'ConnectionError:' + str(img_name) + '无法下载!'
    except OSError:
        return 'OSError:' + str(img_name) + '无法下载!'


def multi_threads_pool(num, func, lists):
    """线程池下载"""
    threads_pool = ThreadPoolExecutor(num)
    threads = [threads_pool.submit(lambda p: func(*p), list) for list in lists]

    for future in as_completed(threads):
        # 使用as_completed方法一次取出所有任务的结果
        data = future.result()
        print(f"{data}")

    # wait方法可以让主线程阻塞，直到满足设定的要求
    wait(threads, return_when=ALL_COMPLETED)


def multi_process_pool(num, func, lists, query):
    '''进程池'''
    processes_pool = ProcessPoolExecutor(max_workers=num)
    # processes = [processes_pool.submit(lambda p: func(*p), list) for list in lists]
    processes = [processes_pool.submit(func, list, query) for list, query in zip(lists, [query] * len(lists))]
    for process in as_completed(processes):
        # 使用as_completed方法一次取出所有任务的结果
        data = process.result()
        print(f"{data}")

    # wait方法可以让主线程阻塞，直到满足设定的要求
    wait(processes, return_when=ALL_COMPLETED)


def crawling_poster(url, query):
    '''解析url, 获取元素'''
    html = requests.get(url).text
    response = json.loads(html, encoding='utf-8')
    srcs = []
    ids = []
    for image in response['images']:
        srcs.append(image['src'])
        ids.append(image['id'])
    lists = []
    for src, id, q in zip(srcs, ids, [query] * len(srcs)):
        lists.append([src, id, q])
    print('启动下载进程，进程号[%d].' % os.getpid())
    start = time.time()
    #线程池下载图片
    multi_threads_pool(8, download, lists)

    #单线程下载图片
    for src, id in zip(srcs, ids):
        download(src, id, query)
    end = time.time()
    print(f'本页图片下载耗时{end - start}秒')
    return url

def main():
    query = '王祖贤'

    # 创建文件夹
    folder_name = query
    if not os.path.exists(folder_name):
        os.makedirs(folder_name)

    urls = []
    for i in range(0, 500, 20):
        urls.append('https://www.douban.com/j/search_photo?q=' + query + '&limit20&start=' + str(i))


    start = time.time()

    # 进程池
#     multi_process_pool(2, crawling_poster, urls, query)

    # 单进程
    for url in urls:
        print(crawling_poster(url, query))

    end = time.time()
    print(f'\nDownload Finished!\n耗时{end - start}秒')

if __name__ == '__main__':
    main()

启动下载进程，进程号[17124].
('开始下载:637081759', 'https://img1.doubanio.com/view/photo/photo/public/p637081759.jpg')
('开始下载:642504659', 'https://img1.doubanio.com/view/photo/thumb/public/p642504659.jpg')
('开始下载:642504623', 'https://img3.doubanio.com/view/photo/thumb/public/p642504623.jpg')
('开始下载:642504673', 'https://img3.doubanio.com/view/photo/thumb/public/p642504673.jpg')
('开始下载:642504596', 'https://img3.doubanio.com/view/photo/thumb/public/p642504596.jpg')
('开始下载:642504722', 'https://img3.doubanio.com/view/photo/thumb/public/p642504722.jpg')
('开始下载:637714342', 'https://img3.doubanio.com/view/photo/thumb/public/p637714342.jpg')
('开始下载:642507941', 'https://img3.doubanio.com/view/photo/thumb/public/p642507941.jpg')
('开始下载:642508003', 'https://img3.doubanio.com/view/photo/thumb/public/p642508003.jpg')
('开始下载:642507983', 'https://img3.doubanio.com/view/photo/thumb/public/p642507983.jpg')
('开始下载:642508108', 'https://img1.doubanio.com/view/photo/thumb/public/p642508108.jpg')
('开始下载:642508066', 'http

('开始下载:398416089', 'https://img1.doubanio.com/view/photo/thumb/public/p398416089.jpg')
('开始下载:398416110', 'https://img3.doubanio.com/view/photo/thumb/public/p398416110.jpg')
('开始下载:398416159', 'https://img1.doubanio.com/view/photo/thumb/public/p398416159.jpg')
('开始下载:398416294', 'https://img3.doubanio.com/view/photo/thumb/public/p398416294.jpg')
本页图片下载耗时7.3727827072143555秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=80
启动下载进程，进程号[17124].
('开始下载:398416404', 'https://img3.doubanio.com/view/photo/thumb/public/p398416404.jpg')
('开始下载:398416434', 'https://img3.doubanio.com/view/photo/thumb/public/p398416434.jpg')
('开始下载:398416464', 'https://img3.doubanio.com/view/photo/thumb/public/p398416464.jpg')
('开始下载:398416424', 'https://img3.doubanio.com/view/photo/thumb/public/p398416424.jpg')
('开始下载:398416604', 'https://img3.doubanio.com/view/photo/thumb/public/p398416604.jpg')
('开始下载:398416648', 'https://img1.doubanio.com/view/photo/thumb/public/p398416648.jpg')
('开始下载:398416772', 'ht

('开始下载:392878097', 'https://img1.doubanio.com/view/photo/thumb/public/p392878097.jpg')
('开始下载:392880946', 'https://img3.doubanio.com/view/photo/thumb/public/p392880946.jpg')
('开始下载:392879147', 'https://img1.doubanio.com/view/photo/thumb/public/p392879147.jpg')
('开始下载:392881257', 'https://img1.doubanio.com/view/photo/thumb/public/p392881257.jpg')
('开始下载:392882457', 'https://img1.doubanio.com/view/photo/photo/public/p392882457.jpg')
('开始下载:392884420', 'https://img3.doubanio.com/view/photo/thumb/public/p392884420.jpg')
('开始下载:392884282', 'https://img3.doubanio.com/view/photo/thumb/public/p392884282.jpg')
('开始下载:392884564', 'https://img3.doubanio.com/view/photo/thumb/public/p392884564.jpg')
('开始下载:392885494', 'https://img3.doubanio.com/view/photo/thumb/public/p392885494.jpg')
('开始下载:392889149', 'https://img1.doubanio.com/view/photo/thumb/public/p392889149.jpg')
('开始下载:392887558', 'https://img1.doubanio.com/view/photo/photo/public/p392887558.jpg')
('开始下载:392887283', 'https://img3.doubanio.c

('开始下载:743480644', 'https://img3.doubanio.com/view/photo/thumb/public/p743480644.jpg')
本页图片下载耗时5.560817718505859秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=260
启动下载进程，进程号[17124].
('开始下载:743480739', 'https://img1.doubanio.com/view/photo/thumb/public/p743480739.jpg')
('开始下载:743480912', 'https://img3.doubanio.com/view/photo/thumb/public/p743480912.jpg')
('开始下载:743481049', 'https://img1.doubanio.com/view/photo/photo/public/p743481049.jpg')
('开始下载:743480865', 'https://img3.doubanio.com/view/photo/thumb/public/p743480865.jpg')
('开始下载:743480668', 'https://img1.doubanio.com/view/photo/thumb/public/p743480668.jpg')
('开始下载:743481062', 'https://img3.doubanio.com/view/photo/thumb/public/p743481062.jpg')
('开始下载:743480983', 'https://img3.doubanio.com/view/photo/thumb/public/p743480983.jpg')
('开始下载:743480763', 'https://img3.doubanio.com/view/photo/thumb/public/p743480763.jpg')
('开始下载:743481071', 'https://img3.doubanio.com/view/photo/thumb/public/p743481071.jpg')
('开始下载:743481112', 'ht

('开始下载:743477850', 'https://img3.doubanio.com/view/photo/thumb/public/p743477850.jpg')
('开始下载:743477873', 'https://img3.doubanio.com/view/photo/thumb/public/p743477873.jpg')
('开始下载:743477954', 'https://img3.doubanio.com/view/photo/thumb/public/p743477954.jpg')
('开始下载:743478005', 'https://img3.doubanio.com/view/photo/thumb/public/p743478005.jpg')
('开始下载:743477984', 'https://img3.doubanio.com/view/photo/thumb/public/p743477984.jpg')
('开始下载:743478024', 'https://img3.doubanio.com/view/photo/thumb/public/p743478024.jpg')
('开始下载:743471054', 'https://img3.doubanio.com/view/photo/thumb/public/p743471054.jpg')
('开始下载:743478048', 'https://img1.doubanio.com/view/photo/thumb/public/p743478048.jpg')
('开始下载:743471116', 'https://img3.doubanio.com/view/photo/thumb/public/p743471116.jpg')
('开始下载:743471141', 'https://img3.doubanio.com/view/photo/thumb/public/p743471141.jpg')
('开始下载:743471309', 'https://img1.doubanio.com/view/photo/thumb/public/p743471309.jpg')
('开始下载:743471151', 'https://img3.doubanio.c

本页图片下载耗时8.402345895767212秒
https://www.douban.com/j/search_photo?q=王祖贤&limit20&start=440
启动下载进程，进程号[17124].
('开始下载:742958868', 'https://img1.doubanio.com/view/photo/thumb/public/p742958868.jpg')
('开始下载:742958720', 'https://img3.doubanio.com/view/photo/photo/public/p742958720.jpg')
('开始下载:742958146', 'https://img3.doubanio.com/view/photo/photo/public/p742958146.jpg')
('开始下载:742958786', 'https://img3.doubanio.com/view/photo/thumb/public/p742958786.jpg')
('开始下载:742958920', 'https://img3.doubanio.com/view/photo/thumb/public/p742958920.jpg')
('开始下载:742959441', 'https://img3.doubanio.com/view/photo/thumb/public/p742959441.jpg')
('开始下载:742959545', 'https://img3.doubanio.com/view/photo/photo/public/p742959545.jpg')
('开始下载:742959560', 'https://img3.doubanio.com/view/photo/photo/public/p742959560.jpg')
('开始下载:742959697', 'https://img1.doubanio.com/view/photo/thumb/public/p742959697.jpg')
('开始下载:742959607', 'https://img1.doubanio.com/view/photo/photo/public/p742959607.jpg')
('开始下载:742961013', 'ht

![title](https://raw.githubusercontent.com/hufe09/GitNote-Images/master/gitnote/2019/05/14/1557765183198-1557765183221.png)