-
Merge commit 'claudehohl/master'
kurokikaze committedMar 10, 2011 -
saving couchdb documents working again
claudehohl committedMar 10, 2011 -
claudehohl committed
Mar 10, 2011
-
Merge commit 'claudehohl/master'
kurokikaze committedMar 4, 2011 -
claudehohl committed
Mar 4, 2011
-
Somehow updated to 0.1.93, HTTP module still throwing errors sometimes
kurokikaze committedMay 6, 2010 -
Added try-catch block around JSON parsing, need to investigate on bro…
kurokikaze committedMay 6, 2010 …ken JSON
-
Added version info to Readme, added indexer info
kurokikaze committedApr 22, 2010 -
kurokikaze committed
Apr 22, 2010 -
Added site to perform searches
kurokikaze committedApr 22, 2010 -
Fixed numeration of saved pages, removed `published` from XML
kurokikaze committedApr 22, 2010 -
Fixed numeration of saved pages
kurokikaze committedApr 22, 2010
-
Fixed error with libxmljs find() after links retrieval
kurokikaze committedApr 20, 2010 -
Added crawl timeout to settings
kurokikaze committedApr 20, 2010 -
Preserving querystring with link to parse GET-query pages
kurokikaze committedApr 20, 2010 -
Added crawl timeout to settings
kurokikaze committedApr 20, 2010 -
kurokikaze committed
Apr 20, 2010
-
Added numeric ID to documents for indexing with Sphinx
kurokikaze committedApr 19, 2010 -
Refactored parsing/info retrieval part a bit
kurokikaze committedApr 19, 2010
-
Escaped html body before posting it in Sphinx
kurokikaze committedApr 16, 2010 -
Added URL field to document, cleaned code a bit
kurokikaze committedApr 16, 2010
-
Re-worked pipe to stream documents instead of buffering them
kurokikaze committedApr 15, 2010 -
Removed unintentionally committed archive
kurokikaze committedApr 15, 2010 -
First try at Sphinx pipe script
kurokikaze committedApr 15, 2010 -
Made each pseudo-thread use its own http.Client
kurokikaze committedApr 15, 2010 -
Added ability to crawl in several streams
kurokikaze committedApr 15, 2010 -
kurokikaze committed
Apr 15, 2010 -
kurokikaze committed
Apr 15, 2010 -
kurokikaze committed
Apr 15, 2010 -
Made spider save crawled pages in CouchDB
kurokikaze committedApr 15, 2010
-
Changed default site to wikipedia.org, added checks to stay on one do…
kurokikaze committedMar 25, 2010 …main
-
Removed habra-page files from git
kurokikaze committedMar 24, 2010 -
Parsing Habrahabr homepage with Parser
kurokikaze committedMar 24, 2010 -
Spider ready, parsing google homepage
kurokikaze committedMar 24, 2010