Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Twitter Galleries Not Fully Downloading #2226

Closed
ghost opened this issue Jan 25, 2022 · 23 comments
Closed

Large Twitter Galleries Not Fully Downloading #2226

ghost opened this issue Jan 25, 2022 · 23 comments

Comments

@ghost
Copy link

ghost commented Jan 25, 2022

I tried to download every MP4 from this gallery (NSFW), and it only went as far back as this tweet (also NSFW). After that tweet, it just stopped and acted as though it had downloaded the entire gallery, meaning that any older tweets, such as this one, were excluded.

If I don't use the link for the media tab, it stops at an even more recent tweet (NSFW).

For reference, the command I ran was gallery-dl "https://twitter.com/furui_1111/media" --filter "extension in ('mp4')"

@Zoodee
Copy link

Zoodee commented Jan 25, 2022

This is a limit on twitter's end, unfortunately.

#1396

@ImportTaste
Copy link
Contributor

I personally ended up having to make a PowerShell script that would cycle through two week intervals since the user's registration date using Twitter search queries, with one day of overlap in both from: and until: so nothing gets skipped over (with include:nativeretweets in the search and &f=live instead of &f=top in the url).

Even still, I don't think that will get everything, because Twitter is dumb like that. It caps out retrieving a user's timeline at around 3200 tweets (and that includes retweets).

@tux93
Copy link
Contributor

tux93 commented Feb 10, 2022

What usually works for me is using a search query to download:
gallery-dl "https://twitter.com/search?q=(from:USER)"

@wankio
Copy link
Contributor

wankio commented May 25, 2022

twitter shadow hide all nsfw from search feature so i think there's no way to download all, you must use
https://stevesie.com/apps/twitter-api/scrape/tweets/by-user

@Twi-Hard
Copy link

From my experience, it seems people usually don't flag stuff as "sensitive content". I've seen that with both art and irl stuff. I've been using the api to get the total tweet count of 100% nsfw accounts and compared it against how much I could get from the search results and it's usually most of the tweets. I tried searching a nsfw tag in the browser and tweets marked by Twitter as "sensitive content" (ones that require you to verify you want to see it) still popped up occasionally.
Here's how they determine what's allowed in search results: https://help.twitter.com/en/using-twitter/twitter-search-not-working

@wankio
Copy link
Contributor

wankio commented May 25, 2022

it's been blocked since 2020

@biggestsonicfan
Copy link

biggestsonicfan commented Aug 9, 2022

Bumping this because I'm trying to consolidate my various twitter archives. I have a lot of content that was downloaded from twMediaDownloader and I planned to merge this content with gallery-dl using twitter-click-and-save to minimize file duplication by hardlinking across the drive.

My example case is casulcasulcasul with the following settings:

"twitter":
        {
            "sleep": 3.0,
            "sleep-request": 3.0,
            "archive": "/run/media/xxx/bfd18/dl/gallery-dl/sql/twitter.sqlite3",
            "archive-format": "{author[name]}—{date:%Y.%m.%d}—{tweet_id}—{filename}",
            "username": "xxx",
            "password": "yyy",
            "cards": false,
            "conversations": false,
            "quoted": false,
            "replies": true,
            "retweets": false,
            "text-tweets": false,
            "twitpic": false,
            "users": "timeline",
            "videos": true,
            "filename": "[twitter] {author[name]}—{date:%Y.%m.%d}—{tweet_id}—{filename}.{extension}"
        },

As this account seems to fall under 1000 media tweets, I try gallery-dl https://twitter.com/casulcasulcasul/media.

Out of the 970 media files twMediaDownloader calculated (using dryrun), gallery-dl using the above command downloaded 944, seemingly omitting anything earlier than November 2019.

Using gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)" we get a bit further (but the process is much slower):
This results in 968 files...

I don't quite know what/where the issue is, but the two tweets gallery-dl seems to miss are this one and this one. Manually grabbing those tweets with gallery-dl downloads just fine.

EDIT: I guess theoretically, you could use twMediaDownloader to generate a list of media tweets and use the .csv file it provides as input for gallery-dl. 🤔

EDIT2: Some issues are stating to include filter:media in your gallery-dl command but I have never once had this work. However, I have found f=media works, so the full command would be gallery-dl "https://twitter.com/search?q=(from:username)&f=media" and the quotes are important for single command line but aren't needed if an input file is used. This command gets more media than just gallery-dl https://twitter.com/username/media and is faster than gallery-dl "https://twitter.com/search?q=(from:username)" alone.

@nisehime
Copy link

Are you sure gallery-dl "https://twitter.com/search?q=(from:username)&f=media" makes a difference? From what I see it shouldn't.

You can use both https://twitter.com/casulcasulcasul/media and https://twitter.com/search?q=(from:casulcasulcasul) links. Once you download first one, copy the ID of the last downloaded tweet and put it in a search like this: https://twitter.com/search?q=from:casulcasulcasul+max_id:ID_HERE. To speed up the process you can add filter:links
(https://twitter.com/search?q=from:casulcasulcasul+max_id:ID_HERE+filter:links).

Or you can download latest artifact and simply paste https://twitter.com/casulcasulcasul if you don't want to bother with 2 links yourself.

@biggestsonicfan
Copy link

biggestsonicfan commented Aug 14, 2022

I will scream it from the rooftops:

xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=from:casulcasulcasul+max_id:1081020936185274371+filter:links
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=from:casulcasulcasul+max_id:1081020936185274371
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links"
bash: syntax error near unexpected token `('
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371+filter:links"
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)+max_id:1081020936185274371"
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)+filter:links"
twitter: NotFoundError: Requested user could not be found
xxx@DESKTOP-KLLQALU:~> gallery-dl "https://twitter.com/search?q=(from:casulcasulcasul)"
/run/media/xxx/bfd18/dl/gallery-dl/twitter/casulcasulcasul/[twitter] casulcasulcasul—2022.08.13—1558577888386920448—FaEnVbxUIAAuR8J.jpg
/run/media/xxx/bfd18/dl/gallery-dl/twitter/casulcasulcasul/[twitter] casulcasulcasul—2022.08.12—1558220785201713152—FZ-ytOZakAEydm7.jpg

Addding + or filter: will not work with my install of gallery-dl. --version output: 1.23.0-dev (linux)

@mikf
Copy link
Owner

mikf commented Aug 14, 2022

@cglmrfreeman use %20 or plain spaces instead of + signs

$ gallery-dl https://twitter.com/search?q=from:casulcasulcasul%20max_id:1081020936185274371%20filter:links
/tmp/twitter/casulcasulcasul/1081020936185274371_1.jpg
/tmp/twitter/casulcasulcasul/1071670086094643201_1.jpg
...
$ gallery-dl "https://twitter.com/search?q=from:casulcasulcasul max_id:1081020936185274371 filter:links"
/tmp/twitter/casulcasulcasul/1081020936185274371_1.jpg
/tmp/twitter/casulcasulcasul/1071670086094643201_1.jpg
...

@biggestsonicfan
Copy link

Huh, that one worked. I don't think I've ever seen anyone suggest that before. It's always "copy the twitter search url" https://twitter.com/search?q=from%3Acasulcasulcasul+max_id%3A1081020936185274371+filter%3Alinks which does not work or use the + or & signs that seemingly throw Requested user could not be found.

I will def be using this from now on, thanks!

@nisehime
Copy link

nisehime commented Aug 14, 2022

You probably should have put the link in double quotes "https://twitter.com/search?q=from:casulcasulcasul+max_id:ID_HERE+filter:links" to get + working.

But as mikf said plain spaces are fine too. In double quotes as well.

@mikf
Copy link
Owner

mikf commented Aug 14, 2022

No, + signs as space replacements do not work in gallery-dl.

The function that parses query parameters does not "support" them, meaning it just returns + as is and does not replace them with a space character as might be expected.

@nisehime
Copy link

I see. Well, it still works with twitter specifically. Pluses in a query string are just ignored by twitter (or treated as spaces).

@mikf
Copy link
Owner

mikf commented Aug 14, 2022

Oh, so the "NotFoundError"s are a bug introduced with 77bdd8f.

This commit splits search queries by whitespace only, and throws an error because there is no user named casulcasulcasul+max_id:1081020936185274371+filter:links

@biggestsonicfan
Copy link

Ah I only recently started using gallery-dl for twitter archiving and I definitely updated after that, so that might explain it.

@nisehime
Copy link

I see x2, I'm on latest stable ver, so I didn't notice. I thought you would leave the behavior for search as it was. I guess you should also consider that there can be multiple from: in a query if you haven't already. Also @ can be used instead of from:

mikf added a commit that referenced this issue Aug 18, 2022
... and do not raise exception if searched user does not exist
@biggestsonicfan
Copy link

For smaller galleries gallery-dl "https://twitter.com/search?q=from:Cotonus filter:links" does not grab nearly as much as gallery-dl https://twitter.com/Cotonus, and gallery-dl "https://twitter.com/search?q=from:Cotonus filter:media" only grabs 3 files. Twitter filters really suck these days.

@nisehime
Copy link

nisehime commented Sep 3, 2022

If you mean retweets you should add include:nativeretweets in the search

@biggestsonicfan
Copy link

I don't mean retweets.
gallery-dl "https://twitter.com/search?q=from:Cotonus filter:links" - 28 files
gallery-dl https://twitter.com/Cotonus - 32 files
gallery-dl "https://twitter.com/search?q=from:Cotonus filter:media" - 3 files

@nisehime
Copy link

nisehime commented Sep 3, 2022

Yeah, there's 2 posts which don't appear in the search at all. Even without filters.

@mikf mikf closed this as completed Dec 4, 2022
@biggestsonicfan
Copy link

Popping back in here to say after fairly extensive testing, gallery-dl https://twitter.com/username is actually giving the maximum number of results at this point.

@wankio
Copy link
Contributor

wankio commented Jan 24, 2023

Popping back in here to say after fairly extensive testing, gallery-dl https://twitter.com/username is actually giving the maximum number of results at this point.

usually username and username/media, but pretty sure if their twitter have so many retweet and media, you can't get all, tries some 5-10k tweet to see, that's twitter limit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants