scraping-utils

tl;dr This is a set of tools to get tags of images from various sites

Compatibility

wallhaven.cc
konachan.net/com*
yande.re

*the script for konachan, makes request for the .com version - since it does not restrict explicit, and someone may need it.

Tools

wallhaven/auto_sidecar-wh.py

Requires: Python(tested with 3.13.3) and requests library.
Optionally requires putting an api key, if wish to access nsfw.
Uses wallhaven.cc api to get tags and purity, and then create a sidecard for every file in specified directory.

konachan/auto_rename-kc.py

Requires: Python(tested with 3.13.3)
Renames files downloaded from konachan for use with auto_sidecar-kc.py

konachan/auto_sidecar-kc.py

Requires: Python(tested with 3.13.3) and library beautifulsoup4
Uses BeautifulSoup to scrape konachan.com for tags and purity, and then create a sidecard for every file in specified directory.
Unfortunately, I could not find a way to get single post data using konachan's api. sry if there is a way and i'm dumb.

yandere/* is almost identical to konachan.

Timeout and Rate Limiting

Sidecar creators have 3 parameters that will space out requests so that you won't get rate limited.

time_between_requests controls sleep time before every request
timeout_time specifies sleep time for when we do a timeout
timeout_request_count specifies, every which request do we wait n seconds, so we don't get rate limited. (n being timeout_time).

For wallhaven.cc, it is stated that you can do 45 request per minute, so do that.
I could not find such info for konachan and yande.re, so I made it 12 requests ever 1s and then a 15s pause

Background (boringg)

I love collecting data, segregating them, tagging them, measuring sizes, etc.
And also I love hoarding. (okay that might be the same thing hehe)

What's more, wallpapers and pfps have meanings for me, they are tied to certain timelines of my life.
So, I like to keep sh*tton of them and sometimes swap.

But there has been a problem, a big one - I have too many :< That pushed me into organizing them, not based on folder structures - but on tags. There is this project called hydrus network, that helps you organize all your media into one big database, which you can search by tags. (and I need a db for my screenshots too.)

There is a function in hydrus to import a set of tags when importing files. That is, by using something called sidecar files. (iirc it's late) When you have a media file, let's say waterfall.jpg, you can create a file waterfall.jpg.txt,
that will contain a set of tags, for example waterfall scenery landscape, all separated by newlines.
When choosing the import option, you can tell hydrus to get the sidecars, and it will add the tags alongside media.

Notes

purity tag will be formated like this purity:safe
konachan's scrapper may not be working all the time, since it relies on html scraping - not an api.
konachan's scrapper might work on other similar sites?
yande.re's scrapper was derived from konachan, with small changes to fix tag scraping.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
konachan		konachan
wallhaven		wallhaven
yandere		yandere
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scraping-utils

tl;dr This is a set of tools to get tags of images from various sites

Compatibility

Tools

Directory

Timeout and Rate Limiting

Background (boringg)

Notes

About

Uh oh!

Releases

Packages

Languages

License

itzreesa/scraping-utils

Folders and files

Latest commit

History

Repository files navigation

scraping-utils

tl;dr This is a set of tools to get tags of images from various sites

Compatibility

Tools

Directory

Timeout and Rate Limiting

Background (boringg)

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages