Skip to content

Commit

Permalink
Initial Commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Jeffrey Nappi committed May 1, 2012
1 parent 4a4a2b8 commit ec2cd67
Showing 1 changed file with 24 additions and 1 deletion.
25 changes: 24 additions & 1 deletion README.md
@@ -1,4 +1,27 @@
gearnado gearnado
======== ========


Experimental Distributed Web Crawling with Python + Gearman Experimental Distributed Web Crawling with Python + Gearman


Setup Instructions for Ubuntu:

$ sudo apt-get install git gearman libgearman-dev python-setuptools build-essential libxml2-dev libxslt-dev python-dev

$ sudo easy_install pyquery gearman tornado

If you are looking to do more than 1024 simultaneous connections on a single machine make sure you edit /etc/security/limits.conf and increase the soft/hard nofile limits.

Clone the Git Repo:

$ git clone https://github.com/iAcquire/gearnado
$ cd gearnado

Launch 30 TweetScout workers in one terminal:

$ for i in `seq 1 30`; do ./TweetScout.py & done

And run the TweetHandler in another:

$ time ./TweetHandler.py --url_file=python_crawler_urls.txt

0 comments on commit ec2cd67

Please sign in to comment.