This package scrapes posts from DCInside and saves them as a TSV file.
pip install dcinside_scraper
or
pip install git+https://github.com/kes09021/dcinside_scraper.git
dcinside-scraper 'gallery_url' start_page end_page output.tsv --sleep 1
gallery_url
: The URL of the DCInside gallerystart_page
: The starting page numberend_page
: The ending page numberoutput_file
: The output TSV file--sleep
: (Optional) Time to sleep between page requests (in seconds)
To scrape posts from the DCInside gallery with the ID "thesingularity" from page 1 to page 5 and save the results to output.tsv
with a 1-second delay between requests, use the following command:
dcinside-scraper 'https://gall.dcinside.com/mgallery/board/lists/?id=thesingularity' 1 5 output.tsv --sleep 1
- Python 3.6+
requests
beautifulsoup4
pandas
These dependencies will be installed automatically when you install the package via pip
.
If you want to contribute to this project, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes.
- Commit your changes (
git commit -am 'Add new feature'
). - Push to the branch (
git push origin feature-branch
). - Create a new Pull Request.
This project is licensed under the MIT License - see the LICENSE
file for details.
For any questions or suggestions, please open an issue or contact the repository owner.