Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an additional channel for downloading the metadata of an entire post or gallery #511

Closed
wants to merge 5 commits into from

Conversation

GiovanH
Copy link
Contributor

@GiovanH GiovanH commented Dec 9, 2019

This is a potential way to resolve #461

Uses a new message, Metadata, a new kwarg, metadata_only, and a new "postprocessor", Metadata_Bypost.

The preprocessor only uses the prepare function, so it does not depend on a successful file download.

Messages go through the standard handle_url route, and handle_url calls all postprocessor prepare functions. Then, if the metadata_only flag is set, it exits prematurely (like it does when the file exists.)

When this postprocessor is enabled (without the Metadata postprocessor at all), metadata is correctly saved for each post once, including posts that don't have any other media to download.

@GiovanH
Copy link
Contributor Author

GiovanH commented Dec 9, 2019

======================================================================
FAIL: test_find (test.test_postprocessor.TestPostprocessorModule)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/mikf/gallery-dl/test/test_postprocessor.py", line 35, in test_find
    self.assertIs(cls.__base__, PostProcessor)
AssertionError: <class 'gallery_dl.postprocessor.metadata.MetadataPP'> is not <class 'gallery_dl.postprocessor.common.PostProcessor'>

Metadata_bypostPP is a subclass of MetadataPP, which is a subclass of PostProcessor. I'm not sure if this is a quirk of the tests or if you'd prefer the change.

@mikf
Copy link
Owner

mikf commented Dec 9, 2019

In general, I think this is pretty neat, but I'd rather do things a slight bit (or completely) different:

    • Instead of running the metadata-only message through handle_url(), use a separate handle_metadata() method (copypaste the code from handle_url() and delete everything unnecessary)
    • Create a new entry-point for post processors (run_metadata()?) and call this from handle_metadata()
    • Instead of deriving from the original MetadataPP, implement the functionality in its run_metadata() (or whatever) method
    • Scrap everything and use text:… URLs: Putting text: in front of a bunch of text will write said text into its own file.
    • It doesn't have the options of the metadata post processor, but it can be done in 3 or so lines:
if self.config("write-content"):  # option name should be something different
    post["extension"] = "txt"
    yield Message.Url, "text:" + (content or ""), post

You can also run all tests on your own machine:

  • Either with make test or ./scripts/run_tests.sh if you have make and bash installed
  • Or go into the test directory and run the test_….py files: python test_postprocessor.py etc.

@GiovanH
Copy link
Contributor Author

GiovanH commented Dec 9, 2019

I'm not familiar with the codebase, so I wasn't sure how to go about adding an entirely new handler, which is why I tried to route through existing paths where I could, but I'll give it a shot.

I would slightly prefer using the pluggable postprocessor system instead of trying to hardcode a text-only dump, although the URL idea is appealing

@GiovanH
Copy link
Contributor Author

GiovanH commented Dec 9, 2019

I hate to re-implement the same metadata behavior in two places, since the metadata about the gallery has the same formatting and options as the standard metadata, but if that's how you'd prefer it, here's that isolated at 59759dd.

@GiovanH
Copy link
Contributor Author

GiovanH commented Dec 9, 2019

I should note that this has the side effect of re-saving metadata even if the target gallery/files already exist, but that doesn't come at the expense of any additional downloads, because gallery-dl has to get the metadata in order to compute the destination path anyway.

@GiovanH
Copy link
Contributor Author

GiovanH commented Dec 10, 2019

Check seems to have failed due to something unrelated?

@mikf
Copy link
Owner

mikf commented Dec 16, 2019

Sorry this took so long. I've now merged your commits into master (excluding the last one, because copy-pasting the other metadata code is not what I meant; apologies for not being clear enough), made some smaller changes myself, and included the metadata-per-post functionality into the "main" metadata post-processor with an option called bypost. (63e6993)

I've tested this with the following as config file and it works as it should:

{
    "postprocessors": [{
        "name": "metadata",
        "mode": "custom",
        "bypost": true,
        "content-format": "{content}\n"
    }]
}

Thank you for the contribution and my apologies for any inconveniences.

@mikf mikf closed this Dec 16, 2019
@chrsmlls333
Copy link

Notice: Exec postprocessor after Metadata with bypost:true does not trigger to run!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request: Save Patreon text
3 participants