Using in conjunction with curl to download .htm files, failing to do so for Deviantart deviations with / in title? #3369
-
Ok, so I'm continuing on with a new problem I'd faced and didn't get an answer to while getting some help weeks ago for my prior questions, and haven't been able to ask about again before now for various reasons. Anyway: so currently, for Deviantart I have an
I was trying to figure out where the problem might lie, and then in writing this out, I think I realized the problem must specifically be with the filename-writing part of the
Is there anything I can put in there to have it modify problematic characters like slashes when writing the filename so it doesn't fail on works with them in the name? Also, once this is figured out, I would like help figuring out what I'd need to modify on these postprocessors to get basically the same setup of not just image and metadata .json, but also a description.txt postprocessor and exec curl .htm postprocessor, to other extractors like pixiv, furaffinity, and boorus (I think those are all I'd need them anytime in the near future). |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 17 replies
-
Normally I'd manually replace the offending characters like this (reference):
But for some reason it doesn't like slashes and throws an error. It would be nice to have a conversion specifier to sanitize the variable like yt-dlp does, but for the time being, I think you'd have to use a Python expression in place of the argument for "\fE '{}{}-{}'.format(_directory, title.replace('/', '_'), index)" The
I'd recommend checking the metadata JSON files before deciding whether to download the web pages. Much if not all of the metadata available on the web page is already included in the JSON file. Similarly, a JSON postprocessor would dump everything available to gallery-dl, so a description PP would only make it slightly more readable. Anyway, take furaffinity for example, here is the complete configuration: "furaffinity": {
"#": "remove this line if you prefer plain text instead",
"descriptions": "html",
"postprocessors": [
{
"#": "this is already included in the json file!",
"name": "metadata",
"mode": "custom",
"directory": "Descriptions",
"content-format": "{description}\n",
"extension-format": "descr.htm"
},
{
"name": "metadata",
"mode": "json",
"filename": "{id}_info.json",
"event": "file"
},
{
"name": "exec",
"command": [
"C:\\Users\\tfaho\\curl\\curl-7.86.0_2-win64-mingw\\bin\\curl.exe",
"--create-dirs", "-o", "{_directory}mydir\\{title}-{id}.htm",
"https://www.furaffinity.net/view/{id}/"
]
}
]
} You can run |
Beta Was this translation helpful? Give feedback.
Normally I'd manually replace the offending characters like this (reference):
But for some reason it doesn't like slashes and throws an error. It would be nice to have a conversion specifier to sanitize the variable like yt-dlp does, but for the time being, I think you'd have to use a Python expression in place of the argument for
"-o"
:"\fE '{}{}-{}'.format(_directory, title.replace('/', '_'), index)"
The
exec
postprocessor does not supportdirectory
. In order to tell curl to download your HTML document to a separate directory, you need to modify the argument for"-o"
and use"--create-dirs"
: