Skip to content
No description, website, or topics provided.
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
tor_browser_headers
.gitignore
LICENSE
README.rst
setup.cfg
setup.py
test.py

README.rst

PyPI version Requirements Status

torbrowser-headers

Collection of HTTP headers (including User-Agent) of different versions of Tor Browser. Scrapy middleware that picks up random headers.

Source of HTTP headers - freshly installed Tor Browser of different versions. https://archive.torproject.org/tor-package-archive/torbrowser/

Usage

import random
from tor_browser_headers.headers import tor_browser_headers_list

torBrowserHeaders = random.choice(tor_browser_headers_list)
print('Version of Tor Browser: {}'.format(torBrowserHeaders.tor_browser_version))
for header in torBrowserHeaders.headers:
    print('\t{}: {}'.format(header.name, header.value))

Scrapy configuration

Enable middleware

Turn off the built-in DefaultHeadersMiddleware and UserAgentMiddleware, enable TorBrowserHeadersMiddleware.

In Scrapy >=1.0:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': None,
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'tor_browser_headers.middleware.TorBrowserHeadersMiddleware': 500,
}

In Scrapy <1.0:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.contrib.downloadermiddleware.defaultheaders.DefaultHeadersMiddleware': None,
    'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
    'tor_browser_headers.middleware.TorBrowserHeadersMiddleware': 400,
}

Configuring headers type (Optional)

By default, Scrapy middleware sets the default values for all HTTP headers, so the request looks like it was made by the Tor Browser that was just installed. But you can limit list of headers that should be modified. Add lines in your settings.py:

from torbrowser-headers.model import HTTPHeaderType

TOR_BROWSER_HEADERS__TYPES = [HTTPHeaderType.UserAgent,
                              HTTPHeaderType.Accept,
                              HTTPHeaderType.AcceptLanguage,
                              HTTPHeaderType.AcceptEncoding,
                              HTTPHeaderType.Connection]

Do not forget to enable DefaultHeadersMiddleware if you need only User-Agent

Configure to use with scrapy-proxies (Optional)

To use with middlewares of random proxy such as scrapy-proxies, you need:

  1. set TOR_BROWSER_HEADERS__CONSTANT_HEADERS_FOR_PROXY to True to use headers of only one version of TorBrowser with a proxy
  2. set priority of TorBrowserHeadersMiddleware to be greater than scrapy-proxies, so that proxy is set before handle headers
You can’t perform that action at this time.