Plurk - More granular control #1111

musjj · 2020-11-10T19:35:28Z

Plurk extractor lacks the granular control that other Twitter-like extractors provides.
Some options that I think would be useful:

Content type filter
From what I gather there's a few possible types of content within a plurk. Being able to control which kind of content you want to get would be useful:
- Native media
  Images uploaded directly to plurk. Right now the extractor seems to treat it as simple direct links without any filename customization support.
- Emoticons
  Don't know if anyone would find this useful though.
- Plurk links
  Links to other plurks.
- External content
  Links to external sites like YouTube, Imgur, etc.
Plurk type filter
- Plurk
  Just a normal post, equivalent to a tweet.
- Replurks
  It seems to be the equivalent to retweets. Filtering these out would be useful if you want to download content only from a single user profile.
  There doesn't seem to be any equivalent for quoted tweets yet.
- Likes
  You can like plurks, but your likes are not publicly displayed like in Twitter from what I can see. Maybe the API supports it?
- Encourage this plurk(?)
  Not sure what this does. Would appreciate it if anyone can clarify.
Comments filter
It also would be nice if you can choose to only process comments made by the original poster of the plurk. If the API somehow allows you to request comments that a user has made in plurks from other profiles, that would be a neat thing to include too.
R18 content?
Not sure if scraping R18 content requires authentication or not.

If anyone thinks something is missing here, please do tell in the comments.

nisehime · 2020-11-20T19:01:22Z

I'd also note that plurk URLs are processed by DirectlinkExtractor, so all of them (by default) are in the directlink folder and not grouped by users. Which also makes it harder to filter other non-plurk links like twitter etc, even with whitelist/blacklist.

UPD: Actually nvm, I understood why it is like that. Still, more granular control is indeed needed.

Hrxn · 2020-11-20T22:07:43Z

Your update means that it is working as expected for you now?
It should, based on what you are describing here. Try setting the category-transfer option inside the plurk extractor options to true

nisehime · 2020-11-20T23:29:36Z

Your update means that it is working as expected for you now?

No, and I'm not the author of this issue. Filters described by the author would be really helpful. Also, I guess I rethought my upd. I think handling native plurk links with plurk extractor is not that bad idea.

musjj · 2020-11-30T11:16:17Z

Yes, the bigger issue with this is that the Plurk extractor currently does not provide any filename or directory keywords. So if you want to customize your filename or extract metadata, it's not possible right now.

nisehime · 2021-04-18T19:23:40Z

@mikf By the way, I tried to implement temporary solution to the directory issue, and this is what I figured out:
This is the config:

{
    "extractor": {
        "plurk": {
            "comments": true,
            "directory": ["plurk_test"],
            "whitelist": ["directlink"],
            "parent-metadata": true,
            "filename": "{owner_id} {plurk_id}",
            "category-transfer": true
        }
    }
}

As was said, plurk's images are transfered to directlink extractor for some reasons, so by default the result will be:

F:\gallery-dl>gallery-dl --chapter-range -2 https://www.plurk.com/BOW99
# .\gallery-dl\directlink\images.plurk.com__1LjmLKh7htkja9vB6z7EB3.png
# .\gallery-dl\directlink\images.plurk.com__286ILNC4UKy8rZPxm5Pi7O.png

Unless I misunderstand how category-transfer works, with the above config expected result should be:

F:\gallery-dl>gallery-dl --chapter-range -2 https://www.plurk.com/BOW99
# .\gallery-dl\plurk_test\{owner_id} {plurk_id}.png
# .\gallery-dl\plurk_test\{owner_id} {plurk_id}.png

But instead it looks like this:

F:\gallery-dl>gallery-dl --chapter-range -2 https://www.plurk.com/BOW99
# .\gallery-dl\plurk\images.plurk.com__1LjmLKh7htkja9vB6z7EB3.png
# .\gallery-dl\plurk\images.plurk.com__286ILNC4UKy8rZPxm5Pi7O.png

So, the directory has changed, but it is the default plurk's directory name, not from the config. Meanwhile filename hasn't changed at all and has remained the default for directlink extractor.

Is it intended that category-transfer transfers extractor's default options to it's child? And why in this case only default name for directory was passed, but not filename?

Also, setting parent-directory to true instead of category-transfer doesn't seem to work. It just downloads everything to directlink's default folder again.

And other question: why is plurk extractor considered manga chapter extractor?

broken since commit 055c32e

nisehime · 2021-04-26T12:12:53Z

@mikf still doesn't seem to work properly. image-filter set for the plurk extractor is not passed to the directlink extractor.

(#1111)

mikf added the site:feature label Nov 27, 2020

mikf added a commit that referenced this issue Apr 19, 2021

fix 'category-transfer' (#1111)

b4ed7cb

broken since commit 055c32e

mikf added a commit that referenced this issue Apr 27, 2021

reorder config access in Job constructor

5b4da4b

(#1111)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plurk - More granular control #1111

Plurk - More granular control #1111

musjj commented Nov 10, 2020 •

edited

Loading

nisehime commented Nov 20, 2020 •

edited

Loading

Hrxn commented Nov 20, 2020

nisehime commented Nov 20, 2020

musjj commented Nov 30, 2020

nisehime commented Apr 18, 2021

nisehime commented Apr 26, 2021

Plurk - More granular control #1111

Plurk - More granular control #1111

Comments

musjj commented Nov 10, 2020 • edited Loading

nisehime commented Nov 20, 2020 • edited Loading

Hrxn commented Nov 20, 2020

nisehime commented Nov 20, 2020

musjj commented Nov 30, 2020

nisehime commented Apr 18, 2021

nisehime commented Apr 26, 2021

musjj commented Nov 10, 2020 •

edited

Loading

nisehime commented Nov 20, 2020 •

edited

Loading