Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only downloads the first 150 profiles it sees with :story. #724

Closed
Liz-chan opened this issue Jul 10, 2020 · 9 comments
Closed

Only downloads the first 150 profiles it sees with :story. #724

Liz-chan opened this issue Jul 10, 2020 · 9 comments
Labels
bug Bug

Comments

@Liz-chan
Copy link

Describe the bug

When you use :story to download all the stories from all the profiles you follow, it only gets the first 150.

To Reproduce

Try to use the :story target and see that it misses accounts that are past the initial 150.

Expected behavior

I expected all the profile's stories to be downloaded, not just the first 150 it can see.

Error messages and tracebacks

No error message, it just says it's complete.

Instaloader version

4.4.4

Additional context

I'm following around 1.5k accounts, so maybe that's why it's having trouble downloading all the stories for all those profiles.

@Liz-chan Liz-chan added the bug Bug label Jul 10, 2020
@e5150
Copy link
Contributor

e5150 commented Jul 10, 2020

Might be the same thing as with #204. How many stories are available through the website? If only the first 150 are visible on the site you'd have to resort to instaloader --stories @myusername. But if more user stories are loaded as you scroll, then pagination has been implemented on the server side, and a fix in Instaloader shouldn't be too hard.

@Liz-chan
Copy link
Author

Oh I see, yeah, only the first 150 are available on the website. I did try to use the combo that downloads only stories and only of the people you follow, but it kept hitting the rate limit and it seems like it wasn't doing anything, since after 2 hours it was still stuck in the same place.

Here's the command that I used for reference, there doesn't seem to be anything wrong with it right?
instaloader --login=xxxxxxxxxxxxxx --dirname-pattern={profile} --filename-pattern={profile}-{date_utc}-{target} --no-video-thumbnails --no-metadata-json --stories --no-posts --no-profile-pic @xxxxxxxxxxxxxx

I was thinking that a way this could be solved is that while downloading the initial 150, there's an option to mark them as read and restart, so that you're able to access the fresh new unread ones. :stories is so so much faster than --stories --no-posts so I think that would be the best for efficiency.

@e5150
Copy link
Contributor

e5150 commented Jul 11, 2020

Your command looks fine, but with many followees I get that is might take too long.
I quicky made a potential fix, by marking stories as seen after downloading them, which I've been thinking about for a while anyway. It's available at https://github.com/e5150/instaloader/tree/seen_stories
It unconditionally sends a POST request to the IG servers after each StoryItem has been downloaded, similarly to the official website, but I don't think that would count towards the ratelimit. I don't follow sufficiently many accounts to be able to really stress test it though.

@Liz-chan
Copy link
Author

Liz-chan commented Jul 12, 2020

Alright, thank you! I'll give it a try and report back.

EDIT: Guess I'll try later, 500 error for now.

@Liz-chan
Copy link
Author

Liz-chan commented Jul 12, 2020

Hm, seems that it's repeating trying to get existing media since it's not marking as seen on some, or it's just not working, since when I go on the website and on the app it still looks like it hasn't been marked as seen, though it did work for others, maybe there's a rate limit? On the app when I went through the stories it marked them correctly, so not sure what's up. Ah actually that might be it, the website isn't updating the read status anymore.

429 Error

Maybe it'd be better to only mark the latest story as viewed, so that you don't send that many requests? If you mark the latest as viewed it marks the ones before as well. If possible, maybe it would also be good to skip attempting to download stories from profiles that have all been viewed?

@e5150
Copy link
Contributor

e5150 commented Jul 13, 2020

Instaloder downloads stories regardless of their seen status, since stories could be seen on the app, and would then be skipped by Instaloader. Skipping already seen could be achieved by implementing a --story-filter and comparing Story.last_seen_utc with Story.latest_media_utc. But making that the default would be problematic.
Only marking the latest could be done, though I feel it would be rather ugly. I did make a small change however, so that only "new" (not previously downloaded by Instaloader) stories are marked.

Ultimately I don't think this can be solved seamlessly, best case is to mark downloaded as seen, and have the IG servers showing another 150 on the next invocation of instaloader.

@Liz-chan
Copy link
Author

Liz-chan commented Jul 14, 2020

That's true, so I don't think it should be by default either, but I think an option would be helpful since seeing Instaloader trying to download a profile again and just saying "exists" on the latest post is inefficient. On the website the profiles that don't have any new stories have a grey outline, so maybe if Instaloader sees that it just ignores them completely, with say --skip-all-seen? So instead of trying all 150 profiles, it only tries the ones with new stories.
Second, because of the rate limit, it will never be possible to download a huge volume of stories without being rate limited for almost 24 hrs, and although the app API doesn't have this limitation, we're not using the app one here. The only way I see this could be worked around would be if it only marked as seen on the latest StoryItem, since with the amount of stories that I have to download I get rate limited by the 3rd run. Not marking them as seen if they exists is good though, congrats. I think for marking latest only, it could be something like --only-mark-latest-as-seen. It's already kind of ugly though, since instaloader downloads the latest StoryItem first, marks it as seen, and then marks the ones before as seen, like you're viewing them backwards.
The rate limit can't be worked around by waiting unfortunately, so I think the only way to maximize how many you can mark as seen would be to either do that, or to use the app API which has a way higher limit apparently since I've never gotten a rate limit for marking stories as seen on there.

EDIT: Ah! I was able to do what I wanted. I put
if count == 1: item.mark_as_seen()
inside the download_stories function and it works perfectly, although now I can't access the downloaded variable, hm. Any idea how I could access it now?

@github-actions
Copy link

There has been no activity on this issue for an extended period of time. This issue will be closed after further 14 days of inactivity.

@github-actions github-actions bot added the stale Issue is inactive for a long time label Oct 12, 2020
@Liz-chan
Copy link
Author

Liz-chan commented Dec 5, 2020

I think it would be good to add these changes into the official branch, since I've been running this setup for months and haven't had any issues.

EDIT: I can submit a pull request if need be.

aandergr added a commit that referenced this issue Dec 12, 2020
Fixes efficiency of the download_profiles() function when called with
--no-posts --no-profile-pic by reordering an if statement.

This inefficiency has been reported in #724.

Co-authored-by: André Koch-Kramer <koch-kramer@web.de>
@aandergr aandergr removed the stale Issue is inactive for a long time label Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug
Projects
None yet
Development

No branches or pull requests

3 participants