Permalink
Browse files

Initial Commit

  • Loading branch information...
1 parent 4a4a2b8 commit ec2cd6786c559bf15579fff39748fa87bf06023d @jeffnappi jeffnappi committed May 1, 2012
Showing with 24 additions and 1 deletion.
  1. +24 −1 README.md
View
@@ -1,4 +1,27 @@
gearnado
========
-Experimental Distributed Web Crawling with Python + Gearman
+Experimental Distributed Web Crawling with Python + Gearman
+
+
+Setup Instructions for Ubuntu:
+
+ $ sudo apt-get install git gearman libgearman-dev python-setuptools build-essential libxml2-dev libxslt-dev python-dev
+
+ $ sudo easy_install pyquery gearman tornado
+
+If you are looking to do more than 1024 simultaneous connections on a single machine make sure you edit /etc/security/limits.conf and increase the soft/hard nofile limits.
+
+Clone the Git Repo:
+
+ $ git clone https://github.com/iAcquire/gearnado
+ $ cd gearnado
+
+Launch 30 TweetScout workers in one terminal:
+
+ $ for i in `seq 1 30`; do ./TweetScout.py & done
+
+And run the TweetHandler in another:
+
+ $ time ./TweetHandler.py --url_file=python_crawler_urls.txt
+

0 comments on commit ec2cd67

Please sign in to comment.