Chan scraper

This program is capable of downloading attachments from threads on 2ch and 4chan. You can select what to download: images, videos or all files.

Requirements

Python 3
requests
(optional for downloading and updating) git

Installation

Clone the repo using this command:

git clone https://github.com/m3tro1d/chan-scraper

Or just download the zip.

Install the dependencies:

python -m pip install -r requirements.txt

Master branch is usually stable, so there won't be any issues.

Usage

Usage: chan-scraper.py [OPTIONS] URL [URL]...

URL:
  Thread's URL

Options:
  -h,  --help     show help
  -m,  --mode     specify content for downloading:
                  all, images, videos (def: all)
  -p,  --pause    make a pause after each download
                  useful if the server throttles (def: False)
  -o,  --output   output directory (def: current)

For more information visit:
https://github.com/m3tro1d/chan-scraper

For example:

python chan-scraper.py -o img -m images https://2ch.hk/s/res/2127464.html

This will download all images from the 2127464 thread on /s/ in the img folder.

Another one:

python chan-scraper.py -o threads https://boards.4channel.org/g/thread/77369090 https://boards.4channel.org/g/thread/77368911

This will download all files from both threads and place them into separate folders with their thread number in the threads folder.

Attention: by default, if the directory you have selected with -o option exists and there was an image with the conflicting name it won't be replaced.

Extending

If you want to add support for another imageboard, there is a simple scheme for an 'extractor'. It is a class containing the following properties:

name - string representing imageboard's name. For example: self.name = "fourchan". This is used for naming the directories when dowloading multiple threads;
match() - a static method (docs) that returns a re.match object. Determines which links the extractor supports;
thread_number - int with thread's number according to the URL;
get_files_urls_names() - function that returns a tuple (or list) of tuples, each containing files' URL and name.

The constructor (e.g. __init__) must trow an error if a network error is encountered. All handling is done in the Scraper class.

Also make sure to modify the Scraper constructor: import your extractor and add it to the list self.extractors.

Background implementation is up to you, but I suggest reading the documentation on imageboard's API and use it if possible. Also refer to the existing extractors for more practical info.

TODO

Pass the thread if it yields an HTTP error and continue to other threads (and files)
Option to pause after each download to prevent server throttling
Rewrite the script to make it more modular and easier to maintain and extend
Print the full information (summary) at the end of the downloading (make it an option?)
Add usercode_auth optional cookie code for dvach restricted boards (as an input argument)
Use thread's header text for naming the output folders
Add option for saving with poster's filename

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
extractors		extractors
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
args.py		args.py
chan-scraper.py		chan-scraper.py
fileutils.py		fileutils.py
httputils.py		httputils.py
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extractors

extractors

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

args.py

args.py

chan-scraper.py

chan-scraper.py

fileutils.py

fileutils.py

httputils.py

httputils.py

requirements.txt

requirements.txt

scraper.py

scraper.py

Repository files navigation

Chan scraper

Requirements

Installation

Usage

Extending

TODO

About

Releases

Packages

Languages

License

nightnoryu/chan-scraper

Folders and files

Latest commit

History

Repository files navigation

Chan scraper

Requirements

Installation

Usage

Extending

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Languages