-
-
Notifications
You must be signed in to change notification settings - Fork 965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions, Feedback and Suggestions #2 #74
Comments
A bit of feedback: This also led me to another idea/suggestion: Not sure, just an idea... |
Sounded like a useful feature, so I tried to put something together: 97f4f15. It should more or less behave like one would expect, but there are at least two things that might be better handled otherwise: Edit: Never mind the points below, I thought about it and decided to change its behavior. It is no longer persistent and stores exactly the same log messages as shown on screen (c9a9664).
|
One small addition to logging behaviour: Not entirely sure without a test case at hand now, but how's the current output for images on Tumblr relying on the fallback mechanism? I don't know if I remember that correctly, but it appeared that trying to download some specific image (e.g. some old images, old URL scheme etc.) with Edit: Had the opportunity to think about this again, and I'm not sure if it's actually worth to bother. Sure, it may be less than ideal (Think of your average end-user©® and the reaction: "OMG It says there's some error"), but this can be solved by simply, uh, explaining the stuff. And if this really warrants to deal with errors that are actually not so much of an error, by having to handle different error types, error classes, error codes and whatever, just for the sake of what, actually, consistency (?), I'm not really sure anymore. |
I think it is actually worth to bother. With the way things were, it was impossible to tell if all files have been downloaded successfully by just looking at a log file, and error messages from an image with fallback URL were kind of misleading as well, since the image download in question did succeed in the end. I added two more logging messages to hopefully remedy this (db7f04d):
Maybe it would be better to categorize all HTTP errors as warnings and only show the |
Yeah, sounds good. |
|
|
actually i got the wrong impression from here is what i can come up with https://gist.github.com/rachmadaniHaryono/e7d40fcc5b9cd6ecc1f9151c4f0f5d84 full code https://github.com/rachmadaniHaryono/RedditImageGrab/blob/master/redditdownload/api.py this module will not download a file, but it will only extract from url |
You should use these functions instead of manually manipulating |
Which version of gallery-dl is that? Can you run |
Can we have percent-encoding conversions for saved files? I.e. replacing |
@Hrxn , I'm a bit embarrassed ... I found a strange thing. I have 2 folders, gallery_dl and gallery_dln. The first is the old version 1.1.2, the second is 1.2.1. Both are in the same directory. When I run any command using the bat file from the folder with the new version, the modules are taken from the old one. When I run -version from the 1.2.1 folder, 1.1.2 is displayed. I do not think that this is a problem program, rather Windous or Python. I apologize for the disturbance. |
@ChiChi32 the
@Bfgeshka Sure, I think I'll add another conversion option for format strings to let users unquote the "offending" parts of a filename. |
@mikf I encountered it in direct link download. |
Some small thing I've noticed. Not a real issue deserving of a ticket, I presume.
The download seems to work, just as it's apparent above. But the output is a bit different, not what I'm used to see, just observe this output path: The extra backslash, as if some directory is missing in between. I post the configuration used here:
{
"base-directory": "E:\\Transfer\\INPUT\\GLDL",
"netrc": false,
"downloader":
{
"part": true,
"part-directory": null,
"http":
{
"rate": null,
"retries": 5,
"timeout": 30,
"verify": true
}
},
"extractor":
{
"keywords": {"bkey": "", "ckey": "", "tkey": "", "skey": "", "mkey": ""},
"keywords-default": "",
"archive": "E:\\Transfer\\INPUT\\GLDL\\_Archives\\gldl-archive-global.db",
"skip": true,
"sleep": 0,
[...]
"gfycat":
{
"directory": ["Anims", "{bkey}", "{ckey}", "{tkey}", "{skey}", "{mkey}"],
"filename": "{title:?/ /}{gfyName}.{extension}",
"format": "webm"
}, But it does not happen here, for example:
"imgur":
{
"image":
{
"directory": ["{bkey}", "{ckey}", "{tkey}", "{skey}", "{mkey}", "Images"],
"filename": "{title:?/ /}{hash}.{extension}"
},
"album":
{
"directory": ["{bkey}", "{ckey}", "{tkey}", "{skey}", "{mkey}", "Albums", "{album[title]:?/ /}{album[hash]}"],
"filename": "{album[hash]}_{num:>03}_{hash}.{extension}"
},
"archive": "E:\\Transfer\\INPUT\\GLDL\\_Archives\\gldl-archive-imgur.db",
"mp4": true
}, Single Image:
Album:
Will add some more tests eventually, to see if I can get any different results with various input file options. But so far, it seems to have something to do with |
This happens because of >>> from os.path import join
>>> join("", "d1", "", "d2")
'd1/d2'
>>> join("", "d1", "", "d2", "")
'd1/d2/' It adds a slash (or back-slash on Windows) to the end if the last argument is an empty string. I've been using |
Ah, thanks. Makes sense. This behaviour of Edit: |
BTW, everything works with the latest commit, example URL above is correct and did not encounter it anywhere else! |
Just something quick I noticed - gallery-dl appears to be unable to handle certain emoji appearing in captions on tumblr (and maybe elsewhere??). [gallery-dl][debug] Version 1.3.2 This is using the executable download, btw. |
You are running this via CMD.exe I presume? What happens if you do this first in CMD: Edit: Or try to use Powershell. I've moved completely to Powershell by now as well.. |
This is a more general problem with the interaction between Windows, the Python interpreter, Unicode, code pages and so on. As @Hrxn mentioned, you should be able to work around this yourself by changing the default code page to UTF-8 via
Python 3.6 and above also doesn't have this problem (it implements PEP 528), so using this instead of the standalone exe might be another option. I tried to implement a simple workaround in 0381ae5 by setting the default error handler for stdout and co. to replace all non-encodable characters with a question mark. Tested this on my Windows 7 VM with Python3.3 to 3.6 and it seems to work. |
Thanks. I got Python 3.6, installed it with PIP as per instructions and it doesn't crash anymore. I put Python on my PATH and it's the same user experience as using the EXE anyways. Now, I might be missing something, but is there any way to extract "gallery order" for use in the filenames from Tumblr? You can test this yourself by creating a photo post, uploading say four photos one at a time, then saving the post. In addition to this, I don't see a way to extract the 'core name' of a file for use in the extractor.*.filename parameter. |
Yes, I know what you mean. This was never a problem for me so far, because I've only downloaded picture sets that are just, well, a set of pictures, apparently, so the order was actually not relevant. But I agree, it's entirely different for something like a comic strip. The filenames in a set post do not reflect the displayed order of the elements, as you already said. You also stated the reason for this, if you make a picture post and upload some files, they get generated names in this order. But this can be rearranged now, changing the order of the displayed items. What happens is that the structure you see in the end (in HTML) has the rearranged order as done by the creator of the post, but the filenames keep being the same as they were at the upload.
I assume what you mean with end result here is the order of the actual downloaded files. Yes, that is how they are sorted by the filesystem, in "natural" order.
Now this bit is really interesting. Because gallery-dl just takes what it gets from the API, and this seems to indicate that the API returns the single elements in the correct order, i.e. as rendered in a browser.
Not exactly sure what you mean here. gallery-dl/gallery_dl/extractor/tumblr.py Line 54 in ffc0c67
Compared to your example at the end, the number in the filename is {id} , and that oX part is {offset} .
Small side note: This is a bit confusing within the source code, because |
Ah, yes I did mean the final order as sorted by the filesystem - since there's no way right now to get 'gallery order' from gallery-dl, then the only ordering the filesystem has to go off of is the filename with the Just as a note, I haven't done thorough testing on all cases of the reordered gallery, so I haven't proven the ordering comes out like that in all cases. Ah, sorry, I was using several different posts for testing and got confused about the outputs. I didn't notice the |
Try converting images with odd number in width or height and you will get this error without any flags, but yeah maybe you're right that this should remain optional as if user wanted to change output video scale this could result in conflict and would need additional handling. For example with this command
And if using gallery-dl it just generates an empty mp4 file. |
I did, and works fine. Maybe it depends on the FFmpeg version?
|
yeah it works fine with
it doesnt |
can pixiv extractor support illust_id ? |
That has already been supported since 2a97296. |
oh thank, i dont even know it already supproted but if people not ask, how they know to use it "..." when trying to download single pixiv post ? did we missing guide sector? like filter to ignore something, use --range to download from specific pics,... |
Not sure what you mean? Using quotation marks for URLs is standard practice for any shell, basically. One should always use it. And what about a single post? gallery-dl always works in the same fashion, if you browse Pixiv, or any other site for that matter, you just click on that posting you want and the URL to use for gallery-dl is what is displayed in the browser's address bar (You can also right-click on that link element while surfing and copy the link from there, of course). |
Hello again... Maybe I'll ask a stupid question now, but does the blacklist in "extract.recursive.blacklist" work only for this extractor or can it be used for deviantart for example? And if it does not work, is it planned for a blacklist for keywords for extractors? |
This is not about blacklisting keywords, this is only to prevent recursive usage of extractors specified there.. |
@mikf I know about this option (although I admit I forgot about it), but ... 1. The command line is inconvenient if you need to filter out a few words. 2. All sites will be filtered, which is not very suitable. |
on deviantart, how i can disable download zip file when i'm enabled "original": true , thank :) ..i only saw filter with image |
Not really possible, I think.. If the actual linked "original" is indeed a ZIP archive file, this is what you're supposed to get. I'm not surprised to see this, I've seen all kinds of different file formats uploaded on DA. But if you could provide an example link to such an Deviation entry, it sure would help 😄 |
https://www.deviantart.com/oofiloo/art/GF-PROMETHEUS-NO-ARMOR-ORIGINAL-BONES-472067552 i need to run it twice with and w/o original to download full sample/image, it's not a problem but i think it should have a filter, if original is zip or other, download preview image instead |
Pretty nice program, though i have come across a few minor issues In the json output from another issue ive had is is there a way to save the metadata alongside the downloaded images, and if not perhaps a |
output.num-to-str (48a8717), but it converts all numeric values to strings. Would it be preferable to only convert integers > 2**52 and < -2**52, i.e. anything a
That's by design, because in most cases the extra file-specific metadata is only its filename and extension from the download URL and maybe its width and height, so nothing you would want/need to create an extra directory for. Instead of re-evaluating the entire format string and calling makedirs() for each file, I wanted to have it only done once and then put every file into that one target directory.
Not yet, but I'm working on something. |
should work well, can always convert strings to numbers if theres ever a need for doing operations on the values.
in my case for pixiv files, i use the illust_id and title for subdirectories, which is useful for multi image galleries. for the diff --git a/gallery_dl/extractor/pixiv.py b/gallery_dl/extractor/pixiv.py
index 0005f92..8491bf7 100644
--- a/gallery_dl/extractor/pixiv.py
+++ b/gallery_dl/extractor/pixiv.py
@@ -214,7 +214,7 @@ class PixivWorkExtractor(PixivExtractor):
def get_metadata(self, user=None):
self.work = self.api.illust_detail(self.illust_id)
- return PixivExtractor.get_metadata(self, self.work["user"])
+ return self.work; also pixiv sending me an email everytime i use this is a bit annoying, |
Hi, I have a question to the config-file / filter. I want to exclude tags on pixiv and maybe other sides. After running the gallery-dl -K option, I get the tags[] array with \uXXXX char-codes and when i try to run e.g: I tried other tags[] related queries but none worked. Can you please explain what I'm missing or is it a bug? Anyway thanks in advance Edit: I added one tag to the Mute-list on pixiv and the is_muted attribute change to true and the --filter "is_muted == False" ignored the muted files, but in the config-file the "image-filter":"is_muted == False" statement is still not working |
@AKluge If you want to test if a string is inside a list, you can use the
The command-line option does exactly the same as the I've been using this configuration to test and it works just fine:
I would also recommend (*) it effectively sets a global value for
Shouldn't happen anymore: 8faf03e Concerning directory metadata, you can put a diff --git a/gallery_dl/extractor/pixiv.py b/gallery_dl/extractor/pixiv.py
index 115b1fb..8716a2d 100644
--- a/gallery_dl/extractor/pixiv.py
+++ b/gallery_dl/extractor/pixiv.py
@@ -31,7 +31,6 @@ class PixivExtractor(Extractor):
metadata = self.get_metadata()
yield Message.Version, 1
- yield Message.Directory, metadata
@@ -55,11 +54,13 @@ class PixivExtractor(Extractor):
"_ugoira600x600", "_ugoira1920x1080")
work["frames"] = ugoira["frames"]
work["extension"] = "zip"
+ yield Message.Directory, work
yield Message.Url, url, work
elif work["page_count"] == 1:
url = meta_single_page["original_image_url"]
work["extension"] = url.rpartition(".")[2]
+ yield Message.Directory, work
yield Message.Url, url, work
@@ -67,6 +68,7 @@ class PixivExtractor(Extractor):
url = img["image_urls"]["original"]
work["num"] = "_p{:02}".format(num)
work["extension"] = url.rpartition(".")[2]
+ yield Message.Directory, work
yield Message.Url, url, work But keep in mind that, as for right now, gallery-dl isn't optimized for this kind of thing. Usually general metadata for directories and specialized metadata for filenames is more than enough. @wankio |
why this happen ? i thought it must re-download 10 times before timeout ?? ty |
Seems like Sankaku unexpectedly aborted your connection to its servers and the underlying requests- and urllib3-libraries reported it as a weird/unrecognized exception, so gallery-dl stopped its extraction. It does retry a HTTP request (up to 10 times for Sankaku) if the error is reported as |
I have ffmpeg on my path, but the pixiv downloader still just downloads ugoira as ZIP and leaves them like that. I've looked through the options but can't figure out how to make them convert. |
You need to enable the One way, to see if it works, is to use $ gallery-dl --ugoira-conv "https://www.pixiv.net/member_illust.php?mode=medium&illust_id=71828129" If everything works as it should, you may want to set a permanent option with better encoding settings in your config file. A working example looks something like this: {
"extractor":
{
"postprocessors": [{
"name": "ugoira",
"whitelist": ["pixiv", "danbooru"],
"extension": "webm",
"ffmpeg-twopass": true,
"ffmpeg-output": true,
"ffmpeg-args": ["-c:v", "libvpx-vp9", "-an", "-b:v", "0", "-crf", "30"]
}]
}
} See also |
How i can download specific months in tumblr.com/archive ? Because when i'm using gallery-dl on big tumblr (3-10years old) 100% it will reached api limit, if i change api i must rerun it from start or from --range and it will get limit too. TY |
@wankio I don't think it is really possible to request posts from a specific month or time-frame using Tumblr's API (correct me if I'm wrong). |
yes, my own api key. now i'm using tumbex to get post link auto. And gallery-dl can do same "--mirror" as wget ? |
No, it can't. There is |
Continuation of the old issue as a central place for any sort of question or suggestion not deserving their own separate issue.
#11 had gotten too big, took several seconds to load, and was closed as a result.
There is also https://gitter.im/gallery-dl/main if that seems more appropriate.
The text was updated successfully, but these errors were encountered: