Skip to content

gpalsingh/dspbooksspider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PBooksSpider Scrapy spider to download all books related to DSP books on http://serv.yanchick.org/Books/

Codacy Badge Code Climate Test Coverage Issue Count

Getting the spider

Note: You need to have Python2.7, pip and git pre-installed.

  1. Clone the github repo to get the spider.

    $ git clone https://github.com/gpalsingh/dspbooksspider.git
  2. Install the dependencies.

    $ pip install scrapy appdirs colorama

Running the spider

  1. Move to the directory in which the README.md file lies.

  2. Use the getbooks shell scipt.

    $ ./getbooks

    or

    Run the spider manually.

    $ scrapy crawl dspbooks

Getting the saved data

On running the spider, it will make a link to the location where the downloaded data is placed by the name saved_books. The exact location is system dependent. To get the absolute path to the files run the wherefiles script.

$ ./wherefiles

About

Scrapy spider to download all books related to DSP books on http://serv.yanchick.org/Books/

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages