Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

设置python爬虫IP代理(urllib/requests模块) #1

Open
lznism opened this issue Jan 12, 2018 · 0 comments
Open

设置python爬虫IP代理(urllib/requests模块) #1

lznism opened this issue Jan 12, 2018 · 0 comments
Labels

Comments

@lznism
Copy link
Owner

lznism commented Jan 12, 2018

urllib模块设置代理

如果我们频繁用一个IP去爬取同一个网站的内容,很可能会被网站封杀IP。其中一种比较常见的方式就是设置代理IP

from urllib import request
proxy = 'http://39.134.93.12:80'
proxy_support = request.ProxyHandler({'http': proxy})
opener = request.build_opener(proxy_support)
request.install_opener(opener)
result = request.urlopen('http://baidu.com')

首先我们需要构建一个ProxyHandler类,随后将该类用于构建网页代开的opener的类,最后在request中安装opener

requests模块使用代理

该模块设置代理非常容易

import requests
proxies = {
    'http': 'http://10.10.1.10:3128',
    'https': 'http://10.10.1.10:1080'
}
r = requests.get('http://icanhazip.com', proxies=proxies)
@lznism lznism added the Python label Jan 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant