PBooksSpider Scrapy spider to download all books related to DSP books on http://serv.yanchick.org/Books/
Note: You need to have Python2.7
, pip
and git
pre-installed.
-
Clone the github repo to get the spider.
$ git clone https://github.com/gpalsingh/dspbooksspider.git
-
Install the dependencies.
$ pip install scrapy appdirs colorama
-
Move to the directory in which the
README.md
file lies. -
Use the
getbooks
shell scipt.$ ./getbooks
or
Run the spider manually.
$ scrapy crawl dspbooks
On running the spider, it will make a link to the location where
the downloaded data is placed by the name saved_books
. The exact location
is system dependent. To get the absolute path to the files run the
wherefiles
script.
$ ./wherefiles