New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions, Feedback, and Suggestions #4 #5262
Comments
For most sites I'm able to sort files into year/month folders like this:
However for redgifs it doesn't look like there's a date keyword available for |
There's a typo in
|
There's also another typo in |
Can you grab all the media from quoted tweets? Example. |
#5262 (comment) It's implemented as a search for 'quoted_tweet_id:…' on Twitter.
#5262 (comment) This on was on the same line as the previous one ... (9fd851c)
Regarding typos, thanks for pointing them out. @biggestsonicfan |
EDIT: Actually, I think there's just something wrong with that URL. I had it saved for a long time and searching that tag normally gives a different URL ( |
You could use |
Is there support to remove metadata like this?
Post-processor: "filter-metadata":
{
"name": "metadata",
"mode": "delete",
"event": "prepare",
"fields": ["preview[images][0][resolutions]"]
} I've tried a few variations but no dice. "fields": ["preview[images][][resolutions]"] "fields": ["preview[images][N][resolutions]"] "fields": ["preview['images'][0]['resolutions']"] |
Hello, I left a comment in #4168 . Does the |
@taskhawk def remove_resolutions(metadata):
for image in metadata["preview"]["images"]:
del image["resolutions"] (untested, might need some check whether @YuanGYao |
@mikf |
Not sure if I'm missing something, but are directory specific configurations exclusive to running gallery-dl via the executable? Basically, I have a directory for regular tags, and a directory for artist tags. For regular tags I use So right now the only way I know to get this per-directory configuration to work, is to copy the gallery-dl executable everywhere I want to use a master configuration override. Am I missing something? It feels like there should be a better way. |
Huh? No, the configuration works always in the same way. You're simply using different configuration files? |
From the readme:
I want to override my master configuration |
You can load additional configuration files from the console with:
You just need to specify the path to the file and any options there will overwrite your main configuration file. Edit: From my understanding, yeah, automatic loading of local config files in each directory is only possible having the standalone executable in each directory. Are different directory options the only thing you need? |
Thanks, that's exactly what I was looking for! Guess I didn't read the documentation thoroughly enough. For now the only thing I'd want to override is the directory structure for artist tags. I don't think it's possible to determine from the metadata alone if a given tag is the name of an artist or not, so I thought the best way to go about it is to just have a separate directory for artists, and use a configuration override. So yeah, loading that override with the -c flag works great for that purpose, thanks again! |
You kinda can, but you need to enable "gelbooru": {
"directory": {
"search_tags in tags_artists": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"],
"" : ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]
},
"tags": true
}, Set Of course, this depends on the artists being correctly tagged. Not sure if it happens on Gelbooru, but at least in other boorus and booru-like sites I've come across posts with the artist tagged as a general tag instead of an artist tag. Another limitation is that your search tag can only include one artist at a time, doing more will require a more complex expression to check all tags are present in What I do instead is that I inject a keyword to influence where it will be saved, like this:
And in my config I have "gelbooru": {
"directory": ["boorus", "{search_tags_type}", "{search_tags}"]
}, You can have: "gelbooru": {
"directory": {
"search_tags_type == 'artists'": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"],
"" : ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]
}
}, You can do this for other tag types, like general, copyright, characters, etc. Because it's a chore to type that option every time I made a wrapper script, so I just call it like this because artists is my default:
For other tag types I can do:
|
Thanks for pointing out there's a tags option available for the gelbooru extractor. I already used it in the kemono extractor to get the name of the artist, but it didn't occur to me that gelbooru might also have such an option (and just accepted that the tags aren't categorized). For artists I store all the url's in their respective gelbooru.txt, rule34.txt, etc files like so:
And then just run |
When I'm making an extractor, what do I do if the site doesn't have different URL patterns for different page types? Every single page is just a numerical ID that could be a forum post, image, blog post, or something completely different. |
@Wiiplay123 You handle everything with a single extractor and decide what type of result to return on the fly. The |
Hi, what options should I use in my config file to change the format of dates in metadata files? I would like to use And would it also be possible to do this for json files that ytdl creates? I downloaded some videos with gallery-dl but the dates got saved as |
hi, how do I download files from oldest to newest? I'm using this:
and I need to start downloading from the oldest posts first, how do I do that? |
Hi! Is it possible to download posts from Pixiv from a specified bookmark page? For example I want to download not all bookmarks but only from page 2. I tried this URL |
@mikf
be possible? |
@throwaway26425 @JailSeed @Hrxn This could be more reliably done in an f-string:
|
@mikf Thanks, that helps. |
@mikf The scenario: Submission on reddit, hosted on redgifs, but it's actually an image (yes, I know.. edge case. But I've seen it at least once) I believe it should be possible to solve this with a conditional directory setting using what we already got in gallery-dl, but I'm not sure. Accessing metadata coming from reddit can be done with Example from
but at the same time
and
which.. totally makes sense.. The easiest way would probably be something like this "directory": {
"'_reddit_' in locals() and extension in ('mp4', 'webm')" : ["Video"],
"'_reddit_' in locals() and extension in ('gif', 'apng')" : ["Gif"],
"'_reddit_' in locals() and extension in ('jpg', 'png')" : ["Picture"], in using |
Wouldn't a It wouldn't really be complicated to make |
config excerpt (a bit simplified), giving me the output paths I'm using for a while now and would like to keep: "redgifs":
{
"image":
{
"directory": {
"'_reddit_' in locals()": ["+Clips"],
"locals().get('bkey')" : ["Redgifs", "Clips", "{bkey}"],
"" : ["Redgifs", "Clips", "Unsorted"]
}
}
} I'm using
[1] = <filename with metadata from redgifs and from This is an example with a direct submission link from reddit, but it works the same with different categories from reddit (with a different "prefix" name instead of
Ah, okay. I thought this would be just one more metadata field, basically, without breaking anything. |
Wouldn't it be possible to use "reddit>redgifs":
{
"image":
{
"directory": ["+Clips"],
"postprocessors": ["classify"]
}
},
"redgifs":
{
"image":
{
"directory": {
"locals().get('bkey')" : ["Redgifs", "Clips", "{bkey}"],
"" : ["Redgifs", "Clips", "Unsorted"]
}
}
} |
Good idea. Almost forgot that this option exists. Seems like it should be the right fit for such a task. But does this change anything with regard to how the |
Yep, this is just an additional step. It will still load options from |
@mikf I would like to use "pixiv": {
"postprocessors": [
{
"name": "python",
"event": "prepare",
"function": "{base-directory}/utils.py:pixiv_tags"
}
]
} is it possible to do so? |
@mikf Congrats for making it into the GitHub 10k stars club! 🔥 |
@AyHa1810 Also, @Hrxn |
Correct me if I'm wrong, but it looks like when using things like Is there a way to make it only count existing items in "archive" as skipped, but not the ones that are have existing files (but preferably still not redownload these)? Basically, what I want to accomplish is to find a way to periodically download all posts until reached the last downloaded record (so abort:1). But between two download sessions, I may have already downloaded some of these posts manually and put the files into the folder already. I don't want these to terminate my download session prematurely. |
@fireattack Suggestion: Use different |
hello, i don't really understand what the difference is between |
I mean before it gets converted to path, its just a string, right? So it should be possible imo also yeah I do want it to get override with the |
@fireattack @docholllidae It is usually the latter that gets restricted by some sort of rate limit, as is the case for Twitter. |
@mikf the env var method works, thanks for the suggestion! |
I think I haven't come across any ugoira using PNGs for its images. Does anyone have an example they could share? |
how do I prevent myself from getting banned on Instagram? I'm currently using:
should I increase those numbers?? how much? (are there any other parameters that I can use to prevent myself from being banned on IG?) |
* save cookies to tempfile, then rename avoids wiping the cookies file if the disk is full * [deviantart:stash] fix 'index' metadata (mikf#5335) * [deviantart:stash] recognize 'deviantart.com/stash/…' URLs * [gofile] fix extraction * [kemonoparty] add 'revision_count' metadata field (mikf#5334) * [kemonoparty] add 'order-revisions' option (mikf#5334) * Fix imagefap extrcator * [twitter] add 'birdwatch' metadata field (mikf#5317) should probably get a better name, but this is what it's called internally by Twitter * [hiperdex] update URL patterns & fix 'manga' metadata (mikf#5340) * [flickr] add 'contexts' option (mikf#5324) * [tests] show full path for nested values 'user.name' instead of just 'name' when testing for "user": { … , "name": "…", … } * [bluesky] add 'instance' metadata field (mikf#4438) * [vipergirls] add 'like' option (mikf#4166) * [vipergirls] add 'domain' option (mikf#4166) * [gelbooru] detect returned favorites order (mikf#5220) * [gelbooru] add 'date_favorited' metadata field * Update fapello.py get fullsize image instead resized * fapello.py Fullsize image by remove ".md" and ".th" in image url, it will download fullsize of images * [formatter] fix local DST datetime offsets for ':O' 'O' would get the *current* local UTC offset and apply it to all 'datetime' objects it gets applied to. This would result in a wrong offset if the current offset includes DST and the target 'datetime' does not or vice-versa. 'O' now determines the correct local UTC offset while respecting DST for each individual 'datetime'. * [subscribestar] fix 'date' metadata * [idolcomplex] support new pool URLs * [idolcomplex] fix metadata extraction - replace legacy 'id' vales with alphanumeric ones, since the former are no longer available - approximate 'vote_average', since the real value is no longer available - fix 'vote_count' * [bunkr] remove 'description' metadata album descriptions are no longer available on album pages and the previous code erroneously returned just '0' * [deviantart] improve 'index' extraction for stash files (mikf#5335) * [kemonoparty] fix exception for '/revision/' URLs caused by 03a9ce9 * [steamgriddb] raise proper exception for deleted assets * [tests] update extractor results * [pornhub:gif] extract 'viewkey' and 'timestamp' metadata (mikf#4463) mikf#4463 (comment) * [tests] use 'datetime.timezone.utc' instead of 'datetime.UTC' 'datetime.UTC' was added in Python 3.11 and is not defined in older versions. * [gelbooru] add 'order-posts' option for favorites (mikf#5220) * [deviantart] handle CloudFront blocks in general (mikf#5363) This was already done for non-OAuth requests (mikf#655) but CF is now blocking OAuth API requests as well. * release version 1.26.9 * [kemonoparty] fix KeyError for empty files (mikf#5368) * [twitter] fix pattern for single tweet (mikf#5371) - Add optional slash - Update tests to include some non-standard tweet URLs * [kemonoparty:favorite] support 'sort' and 'order' query params (mikf#5375) * [kemonoparty] add 'announcements' option (mikf#5262) mikf#5262 (comment) * [wikimedia] suppress exception for entries without 'imageinfo' (mikf#5384) * [docs] update defaults of 'sleep-request', 'browser', 'tls12' * [docs] complete Authentication info in supportedsites.md * [twitter] prevent crash when extracting 'birdwatch' metadata (mikf#5403) * [workflows] build complete docs Pages only on gdl-org/docs deploy only docs/oauth-redirect.html on mikf.github.io/gallery-dl * [docs] document 'actions' (mikf#4543) or at least attempt to * store 'match' and 'groups' in Extractor objects * [foolfuuka] improve 'board' pattern & support pages (mikf#5408) * [reddit] support comment embeds (mikf#5366) * [build] add minimal pyproject.toml * [build] generate sdist and wheel packages using 'build' module * [build] include only the latest CHANGELOG entries The CHANGELOG is now at a size where it takes up roughly 50kB or 10% of an sdist or wheel package. * [oauth] use Extractor.request() for HTTP requests (mikf#5433) Enables using proxies and general network options. * [kemonoparty] fix crash on posts with missing datetime info (mikf#5422) * restore LD_LIBRARY_PATH for PyInstaller builds (mikf#5421) * remove 'contextlib' imports * [pp:ugoira] log errors for general exceptions * [twitter] match '/photo/' Tweet URLs (mikf#5443) fixes regression introduced in 40c0553 * [pp:mtime] do not overwrite '_mtime' for None values (mikf#5439) * [wikimedia] fix exception for files with empty 'metadata' * [wikimedia] support wiki.gg wikis * [pixiv:novel] add 'covers' option (mikf#5373) * [tapas] add 'creator' extractor (mikf#5306) * [twitter] implement 'relogin' option (mikf#5445) * [docs] update docs/configuration links (mikf#5059, mikf#5369, mikf#5423) * [docs] replace AnchorJS with custom script use it in rendered .rst documents as well as in .md ones * [text] catch general Exceptions * compute tempfile path only once * Add warnings flag This commit adds a warnings flag It can be combined with -q / --quiet to display warnings. The intent is to provide a silent option that still surfaces warning and error messages so that they are visible in logs. * re-order verbose and warning options * [gelbooru] improve pagination logic for meta tags (mikf#5478) similar to 494acab * [common] add Extractor.input() method * [twitter] improve username & password login procedure (mikf#5445) - handle more subtasks - support 2FA - support email verification codes * [common] update Extractor.wait() message format * [common] simplify 'status_code' check in Extractor.request() * [common] add 'sleep-429' option (mikf#5160) * [common] fix NameError in Extractor.request() … when accessing 'code' after an requests exception was raised. Caused by the changes in 566472f * [common] show full URL in Extractor.request() error messages * [hotleak] download files with 404 status code (mikf#5395) * [pixiv] change 'sanity_level' debug message to a warning (mikf#5180) * [twitter] handle missing 'expanded_url' fields (mikf#5463, mikf#5490) * [tests] allow filtering extractor result tests by URL or comment python test_results.py twitter:+/i/web/ python test_results.py twitter:~twitpic * [exhentai] detect CAPTCHAs during login (mikf#5492) * [output] extend 'output.colors' (mikf#2566) allow specifying ANSI colors for all loglevels (debug, info, warning, error) * [output] enable colors by default * add '--no-colors' command-line option --------- Co-authored-by: Luc Ritchie <luc.ritchie@gmail.com> Co-authored-by: Mike Fährmann <mike_faehrmann@web.de> Co-authored-by: Herp <asdf@qwer.com> Co-authored-by: wankio <31354933+wankio@users.noreply.github.com> Co-authored-by: fireattack <human.peng@gmail.com> Co-authored-by: Aidan Harris <me@aidanharr.is>
How to put artist name in file path for e-hentai? because the |
@taskhawk @throwaway26425 @Immueggpain |
I'm using or, do I need to use both?
I don't understand this, can you please explain it better? :( |
The instagram code sends specific HTTP headers when making API requests, which might now be out-of-date, meaning I should update them again. The last time I did this was October 2023 (969be65). |
I'm pretty sure this has been asked before but can't find it. My goal is to run gallery-dl as a module to download, while also get the record of processed posts (URLs, post ids) so I can use that info to do some custom functions. I've read #642, but I still don't quite get it. It looks like you have to use My current code is pretty simple, just def load_config():
....
def set_config(user_id):
....
def update(user_id):
load_config()
profile_url = set_config(user_id)
job.DownloadJob(profile_url).run() I tried to patch So I'm curious if there is an easier way to just do it other than re-write a whole new |
does |
I have a suggestion, though I'm not sure how feasible or practical it would be. Currently behavior:
Could the behavior for indices be made consistent across all sites? |
Continuation of the previous issue as a central place for any sort of question or suggestion not deserving their own separate issue.
Links to older issues: #11, #74, #146.
The text was updated successfully, but these errors were encountered: