Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plurk - More granular control #1111

Open
musjj opened this issue Nov 10, 2020 · 6 comments
Open

Plurk - More granular control #1111

musjj opened this issue Nov 10, 2020 · 6 comments

Comments

@musjj
Copy link

musjj commented Nov 10, 2020

Plurk extractor lacks the granular control that other Twitter-like extractors provides.
Some options that I think would be useful:

  • Content type filter
    From what I gather there's a few possible types of content within a plurk. Being able to control which kind of content you want to get would be useful:
    • Native media
      Images uploaded directly to plurk. Right now the extractor seems to treat it as simple direct links without any filename customization support.
    • Emoticons
      Don't know if anyone would find this useful though.
    • Plurk links
      Links to other plurks.
    • External content
      Links to external sites like YouTube, Imgur, etc.
  • Plurk type filter
    • Plurk
      Just a normal post, equivalent to a tweet.
    • Replurks
      It seems to be the equivalent to retweets. Filtering these out would be useful if you want to download content only from a single user profile.
      There doesn't seem to be any equivalent for quoted tweets yet.
    • Likes
      You can like plurks, but your likes are not publicly displayed like in Twitter from what I can see. Maybe the API supports it?
    • Encourage this plurk(?)
      Not sure what this does. Would appreciate it if anyone can clarify.
  • Comments filter
    It also would be nice if you can choose to only process comments made by the original poster of the plurk. If the API somehow allows you to request comments that a user has made in plurks from other profiles, that would be a neat thing to include too.
  • R18 content?
    Not sure if scraping R18 content requires authentication or not.

If anyone thinks something is missing here, please do tell in the comments.

@nisehime
Copy link

nisehime commented Nov 20, 2020

I'd also note that plurk URLs are processed by DirectlinkExtractor, so all of them (by default) are in the directlink folder and not grouped by users. Which also makes it harder to filter other non-plurk links like twitter etc, even with whitelist/blacklist.

UPD: Actually nvm, I understood why it is like that. Still, more granular control is indeed needed.

@Hrxn
Copy link
Contributor

Hrxn commented Nov 20, 2020

Your update means that it is working as expected for you now?
It should, based on what you are describing here. Try setting the category-transfer option inside the plurk extractor options to true

@nisehime
Copy link

Your update means that it is working as expected for you now?

No, and I'm not the author of this issue. Filters described by the author would be really helpful. Also, I guess I rethought my upd. I think handling native plurk links with plurk extractor is not that bad idea.

@musjj
Copy link
Author

musjj commented Nov 30, 2020

Yes, the bigger issue with this is that the Plurk extractor currently does not provide any filename or directory keywords. So if you want to customize your filename or extract metadata, it's not possible right now.

@nisehime
Copy link

@mikf By the way, I tried to implement temporary solution to the directory issue, and this is what I figured out:
This is the config:

{
    "extractor": {
        "plurk": {
            "comments": true,
            "directory": ["plurk_test"],
            "whitelist": ["directlink"],
            "parent-metadata": true,
            "filename": "{owner_id} {plurk_id}",
            "category-transfer": true
        }
    }
}

As was said, plurk's images are transfered to directlink extractor for some reasons, so by default the result will be:

F:\gallery-dl>gallery-dl --chapter-range -2 https://www.plurk.com/BOW99
# .\gallery-dl\directlink\images.plurk.com__1LjmLKh7htkja9vB6z7EB3.png
# .\gallery-dl\directlink\images.plurk.com__286ILNC4UKy8rZPxm5Pi7O.png

Unless I misunderstand how category-transfer works, with the above config expected result should be:

F:\gallery-dl>gallery-dl --chapter-range -2 https://www.plurk.com/BOW99
# .\gallery-dl\plurk_test\{owner_id} {plurk_id}.png
# .\gallery-dl\plurk_test\{owner_id} {plurk_id}.png

But instead it looks like this:

F:\gallery-dl>gallery-dl --chapter-range -2 https://www.plurk.com/BOW99
# .\gallery-dl\plurk\images.plurk.com__1LjmLKh7htkja9vB6z7EB3.png
# .\gallery-dl\plurk\images.plurk.com__286ILNC4UKy8rZPxm5Pi7O.png

So, the directory has changed, but it is the default plurk's directory name, not from the config. Meanwhile filename hasn't changed at all and has remained the default for directlink extractor.

Is it intended that category-transfer transfers extractor's default options to it's child? And why in this case only default name for directory was passed, but not filename?

Also, setting parent-directory to true instead of category-transfer doesn't seem to work. It just downloads everything to directlink's default folder again.

And other question: why is plurk extractor considered manga chapter extractor?

mikf added a commit that referenced this issue Apr 19, 2021
@nisehime
Copy link

@mikf still doesn't seem to work properly. image-filter set for the plurk extractor is not passed to the directlink extractor.

mikf added a commit that referenced this issue Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants