Skip to content

svetlyak40wt/scrapy-useragents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 

Repository files navigation

scrapy-useragents

This is a middleware for Scrapy framework. It allows to set random User-Agent for each request.

Configuration

Create a text file with one user agent per line.

This way you can grep user agents from Nginx logs:

egrep -h -o '"[^"]+" "[^"]+"$' * | cut -d '"' -f 2 | sort -u > user-agents.txt

Call this file user-agents.txt or use another name and write it's path in the USER_AGENTS_LIST_FILE setting.

Then add it into the middleware list, and remove default middleware:

DOWNLOADER_MIDDLEWARES = {
    # we'll turn off standart user agent middleware
    'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
    'scrapy_useragents.middleware.UserAgentsMiddleware': 400,
}

Contribution

Author of this project is Alexander Artemenko. You are welcomed to send patches and pull requests to the GitHub.

About

A middleware to use random user agent in Scrapy crawler.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages