Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some twitter posts are ignored #2875

Open
rivke41levp656 opened this issue Aug 29, 2022 · 15 comments
Open

Some twitter posts are ignored #2875

rivke41levp656 opened this issue Aug 29, 2022 · 15 comments

Comments

@rivke41levp656
Copy link

There are some twitter posts that are embedded somehow such that gallery-dl does not detect them as media.

2 example video URLs:

https://twitter.com/AyakaOhashi/status/1555841160312025089
https://twitter.com/bang_dream_1242/status/1561548715348746241

Both of these can be downloaded via yt-dlp directly but gallery-dl ignores them with the following output:

click to expand
gallery-dl -v --ignore-config https://twitter.com/AyakaOhashi/status/1555841160312025089
[gallery-dl][debug] Version 1.23.0
[gallery-dl][debug] Python 3.10.6 - Linux-5.19.4-arch1-1-x86_64-with-glibc2.36
[gallery-dl][debug] requests 2.28.1 - urllib3 1.26.12
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/AyakaOhashi/status/1555841160312025089'
[twitter][debug] Using TwitterTweetExtractor for 'https://twitter.com/AyakaOhashi/status/1555841160312025089'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): twitter.com:443
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/ItejhtHVxU7ksltgMmyaLA/TweetDetail?variables=%7B%22focalTweetId%22%3A%221555841160312025089%22%2C%22with_rux_injections%22%3Afalse%2C%22withCommunity%22%3Atrue%2C%22withQuickPromoteEligibilityTweetFields%22%3Atrue%2C%22withBirdwatchNotes%22%3Afalse%2C%22includePromotedContent%22%3Afalse%2C%22withSuperFollowsUserFields%22%3Atrue%2C%22withBirdwatchPivots%22%3Afalse%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%2C%22withSuperFollowsTweetFields%22%3Atrue%2C%22withClientEventToken%22%3Afalse%2C%22withVoice%22%3Atrue%2C%22withV2Timeline%22%3Afalse%2C%22__fs_interactive_text%22%3Afalse%2C%22__fs_dont_mention_me_view_api_enabled%22%3Afalse%7D HTTP/1.1" 200 3893
[twitter][info] No results for https://twitter.com/AyakaOhashi/status/1555841160312025089

and here is an image URL https://twitter.com/bang_dream_1242/status/1561674543323910144
that fails similarly.

Perhaps notably these posts do not appear under twitter.com/user/media.

@nisehime
Copy link

@rivke41levp656
Copy link
Author

I considered that but didn't realize it had the the 'ytdl' option. I had in my config cards set to true. After switching it to 'ytdl' the videos now work, but the image still fails :

downloader.ytdl: ERROR: [twitter] 1561674543323910144: No video formats found!; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
download: Failed to download ytdl:https://twitter.com/i/web/status/1561674543323910144

but there is no video in this tweet so it shouldn't pass to yt-dlp

@nisehime
Copy link

nisehime commented Aug 29, 2022

but the image still fails

Yeah, even if you set cards to true it still can't download it.

@mikf
Copy link
Owner

mikf commented Aug 29, 2022

The card type from 1561674543323910144 is not yet supported by gallery-dl, hence it forwards it to ytdl.
(It will be supported the next time I do a git push)

@rivke41levp656
Copy link
Author

OK, sounds good. But I remember now why I didn't have cards set to "ytdl". The problem is some people post youtube embeds so just running gallery-dl twitter.com/user could result in it downloading dozens of hours of youtube videos instead of just a twitter feed. Is there any way to filter these out? I could use --filesize-max but that still downloads some data. I want to download only the cards that have native content on twitter, not those that link elsewhere.

@Hrxn
Copy link
Contributor

Hrxn commented Aug 30, 2022

What about "cards": true and "videos": "ytdl"?

@rivke41levp656
Copy link
Author

What about "cards": true and "videos": "ytdl"?

That's my current setup. It won't download the videos in the OP, I guess because they're unsupported so gallery-dl doesn't recognize them as videos to begin with.

@mikf
Copy link
Owner

mikf commented Aug 30, 2022

I looked a bit more into this and the video from https://twitter.com/bang_dream_1242/status/1561548715348746241 is easy enough to support as well.

https://twitter.com/AyakaOhashi/status/1555841160312025089 on the other hand is a "broadcast"/livestream and requires m3u8/HLS support, so gallery-dl cannot download it without ytdl.

I guess I could implement some sort of card filter option with which it would be possible to ignore unwanted cards like YT embeds.

@Hrxn
Copy link
Contributor

Hrxn commented Aug 30, 2022

I guess I could implement some sort of card filter option with which it would be possible to ignore unwanted cards like YT embeds.

Sounds great to me!

mikf added a commit that referenced this issue Aug 31, 2022
just removing the 'type' check seems to work
@mikf
Copy link
Owner

mikf commented Aug 31, 2022

General support for all(?) unified cards got added in commit 4d7cb0b, meaning it now also downloads image_website, video_website, etc. cards. (https://twitter.com/bang_dream_1242 has quite a lot of them)

There's now also a cards-blacklist option (4d78ca8)
YT embed cards are of type player, but so are probably a lot of other external video sites.

@Hrxn
Copy link
Contributor

Hrxn commented Aug 31, 2022

Would it be a good idea to write the URLs blocked by "cards-blacklist": ["player"], for example, to the unsupported file? Or maybe to the log?

@mikf
Copy link
Owner

mikf commented Sep 1, 2022

gallery-dl does not know which URL a specific card has before applying cards-blacklist, and I don't think there is an easy way of extracting this value. Each specific card type can be very different.

I could implement this functionality for player cards specifically, but there might be several others where this would also be necessary and three lines of code could become 50 or 100 just to get an URL and I don't know if I want to do that.

@Hrxn
Copy link
Contributor

Hrxn commented Sep 1, 2022

Oh, okay, I just assumed that the URL would already be known somehow for the cards, and that different card types would be rather similar? Apparently not.

I mean, the primary use case would only be the player cards anyway here, in order to avoid embedded YouTube clips (as mentioned in this comment), because as I've discovered myself, these can get really huge..

Of course, just a suggestion, simply disregard.

@biggestsonicfan
Copy link

Be warned with some filters on some users. For example gallery-dl "https://twitter.com/search?q=from:S_ABOTEN__ filter:links" only grabs a single file where gallery-dl https://twitter.com/S_ABOTEN__ grabs all 30.

mikf added a commit that referenced this issue Sep 17, 2022
allow blacklisting domains and 'name:domain',
where 'domain' depends on a card's 'vanity_url' value
@mikf
Copy link
Owner

mikf commented Sep 18, 2022

@Hrxn I've updated the cards-blacklist option a bit to where it should now be possible to ignore youtube videos by simply specifying "youtube.com" or "player:youtube.com" in the list. It depends on the vanity_url value of a card, which should be present for at least all player cards. (e99a9b2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants