Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backfill statuses from remote accounts when first subscribed #34

Open
Gargron opened this issue Sep 10, 2016 · 142 comments · May be fixed by #32634
Open

Backfill statuses from remote accounts when first subscribed #34

Gargron opened this issue Sep 10, 2016 · 142 comments · May be fixed by #32634
Labels
activitypub Protocol-related changes, federation

Comments

@Gargron
Copy link
Member

Gargron commented Sep 10, 2016

No description provided.

@mjankowski
Copy link
Contributor

I'd like to take a crack at this, but am still trying to wrap my head around the world of services/workers and which things are triggered when.

My guess here is that somewhere in the collection of remote follow services and workers we want to add a background worker task which pulls the feed of the remote account which was just followed, and import the last 5 or 10 or so items.

The other thing I'm struggling to locate is any code which is effectively "go grab this account's atom feed and import some entries" ... it strikes me as possible that this doesn't exist, because all statuses are pushed onto the server and never (yet) pulled?

Any pointers here are appreciated!

@Gargron
Copy link
Member Author

Gargron commented Apr 15, 2017

@mjankowski The main roadblock to this is actually #1059, because if you went back and imported 5 statuses from an account that last posted 6 months ago, you would get 5 6mo-old posts on the top of the public timeline (and potentially home timelines as well)

So this should probably stay a wontfix until something is done about IDs and their sorting.

@mjankowski
Copy link
Contributor

Thanks, I read that thread and looked at more code, and I agree that backfilling is blocked on improving the ordering approach to either get away from ID-sorting, or make ID-sorting reliable for backfilled records. I'll pause on this, and leave a comment over there.

53ningen pushed a commit to 53ningen/mastodon that referenced this issue Apr 26, 2017
lnanase pushed a commit to lnanase/mastodon that referenced this issue Aug 9, 2017
MitarashiDango referenced this issue in MitarashiDango/mastodon Sep 24, 2017
knzk-gauge-icon.pngが表示されない問題を修正
@deutrino
Copy link

#3307 is about accurate follower counts, but in the comments I discuss some implementation ideas which could theoretically encompass this.

@kit-ty-kate
Copy link

Any updates on this ?

@Gargron
Copy link
Member Author

Gargron commented Feb 25, 2018

It's now possible to implement this, but it's not clear how many items to fetch (and if you say "all", think about accounts with 100,000 statuses...)

@kit-ty-kate
Copy link

Mmh maybe it can be done by displaying a button where the missing toots are, to fetch the 5 or 10 next toots ? So it would be done manually by users little by little.

@deutrino
Copy link

What about just fetching blocks of 20 toots, triggered by scrollbar position?

@kit-ty-kate
Copy link

@deutrino Generally this seems good, but some toots might be already fetched manually by users and so there would be holes to fill here and there.
For example if there is a profile with no visible toots from one instance and you have access to the urls of some toots and you insert them into the search bar they are going to be fetched and added to the profile of the user in question (from your instance point of view).

@kit-ty-kate
Copy link

@deutrino ok, I was thinking back at what you were suggesting and it seems to be the good thing to do considering that mastodon already serve 20 toots at a time when looking at a profile. So everything can probably be done smoothly on the server side with field saying "I know I'm up-to-date up to this toot/date" and contacting the other instance if we are trying to get toots that are in the "probably not up-to-date" range.

@kit-ty-kate
Copy link

Maybe fixing this would also partly fix #6137, as looking for a deleted account toots would give hints to the server to remove the account from its local database ?

@kit-ty-kate
Copy link

Now that #7459 has been merged, I guess it seems simpler to finally implement this. @ThibG ?

@kit-ty-kate
Copy link

Mmh, thinking again, maybe not. Sorry. They seemed closely related in my head but seems like I'm too tired today :/

@kit-ty-kate
Copy link

any chance for this issue to get fixed or partially fixed by the next release (2.5) ?

@ClearlyClaire
Copy link
Contributor

@kit-ty-kate with some logic similar to #7459 we could probably fetch private toots from a remote user, yeah.
Fetching “up to the N last toots” on first follow is easy enough, but:

  • it would not be applied to users already followed before the update
  • it would not be applied to users not followed
  • “up to the last N toots” is pretty arbitrary

The suggestion of having a “gap” that users could click sounds very nice, but it is a lot harder to implement. Indeed, the protocol mandates toots to be strictly ordered, but I don't think there is a mechanism to make sure you're not missing toots, nor to request a certain range of items, so efficiently filling a gap seems pretty hard. Add to that that some items may only be displayed to some authenticated users, and things can get quite complicated…

walfie added a commit to walfie/mastodon that referenced this issue Jun 17, 2018
@JCViscera
Copy link

JCViscera commented Aug 29, 2023

I made an account on Github just to add my support for this addition and to comment on it. I'm a relatively new user on Mastodon, and I must say, it's incredibly confusing, annoying, and discouraging to necessarily have to go and find another account on a separate instance/domain just to see what toots they've made before. I understand that the point is to not have an owner of any instance to host a lot of data, but am I going to necessarily have to refigure out which instance the public toots of another user are every time I want to see them?

I should also add that this is doubly annoying because some instances don't allow you to do certain things (like view certain toots) unless you have an account on there. What if I want to favorite or retoot a toot that isn't on my instance and also is not loaded on my instance? I don't want to have to create multiple accounts on Mastodon just because toots that I want to go through in someone's backlog and favorite aren't available on the particular instance my account is on. I specifically favorite toots to look back on and find later, so it really breaks a big way that I currently use social media.

If backfilling was ever implemented, I feel there could be a couple options:

  • When a user on one instance follows someone located on another instance, backfill a set number of toots of that followed profile to the current instance. The number of toots to be backfilled could be a number set by the owner of the instance (25, 50, 100, etc), and if the owner doesn't want to allow backfilling, then they can turn off the option altogether/set the number to 0. Could also maybe include an option for amount of data retrieved per profile (1MB, 10MB, etc) if the profile includes a lot of images/video in their toots.
  • Backfills of public accounts are able to be seen by anyone on the current instance. If an account is Locked, they would only be "Unlocked" to the person who has had their follow accepted and only on the original page. This would be for privacy purposes, since if locked/private accounts' toots were downloaded to the instance owner's server, this would be an obvious security issue. A flag to allow or disallow this would be able to be set by the owner of each instance. The user would also be notified of this when their follow is accepted.
  • In the same vein, backfills of public toots are able to be seen by anyone on the current instance. If an account has private toots on it, they would only be viewable to the person who has had their follow accepted and only on the original page. This would again be for privacy purposes, etc. A flag to allow or disallow this would also be able to be set by the owner of each instance. The user would also be notified of this when they follow someone with private toots for the first time.
  • If some user spams or follows a lot of accounts at once (either intentionally cuz they're going on a binge run or disingenuously), only a set number of accounts will be backfilled per day. This number per day (or potentially per account) could be set by the owner.
  • If a user did follow a bunch of accounts disingenuously, the owner could either choose to not have accounts followed by them be backfilled or ban the user from the site altogether. Could be a flag or something to set.

I really want to give this place a chance, because I'm really missing a social media that has a kind of community atmosphere to it, but this issue in specific really discourages me from staying, as I do go back through people's profiles often when first following them and also to determine whether I want to follow them or not (based on what they toot/retoot). Please at least consider the option. Thank you!

@tesaguri
Copy link

  • When a user on one instance follows someone located on another instance, backfill a set number of toots of that followed profile to the current instance. The number of toots to be backfilled could be a number set by the owner of the instance (25, 50, 100, etc), and if the owner doesn't want to allow backfilling, then they can turn off the option altogether/set the number to 0. Could also maybe include an option for amount of data retrieved per profile (1MB, 10MB, etc) if the profile includes a lot of images/video in their toots.

Couldn't we just use the existing rate limit machinery instead of introducing a new system?

  • Backfills of public accounts are able to be seen by anyone on the current instance. If an account is Locked, they would only be "Unlocked" to the person who has had their follow accepted and only on the original page. This would be for privacy purposes, since if locked/private accounts' toots were downloaded to the instance owner's server, this would be an obvious security issue. A flag to allow or disallow this would be able to be set by the owner of each instance. The user would also be notified of this when their follow is accepted.
  • In the same vein, backfills of public toots are able to be seen by anyone on the current instance. If an account has private toots on it, they would only be viewable to the person who has had their follow accepted and only on the original page. This would again be for privacy purposes, etc. A flag to allow or disallow this would also be able to be set by the owner of each instance. The user would also be notified of this when they follow someone with private toots for the first time.

The concern about visibility is not unique to backfilling. It applies to manual fetch of posts by putting URL in the search field, for example, and so I guess the necessary precautions are already implemented.

  • If some user spams or follows a lot of accounts at once (either intentionally cuz they're going on a binge run or disingenuously), only a set number of accounts will be backfilled per day. This number per day (or potentially per account) could be set by the owner.
  • If a user did follow a bunch of accounts disingenuously, the owner could either choose to not have accounts followed by them be backfilled or ban the user from the site altogether. Could be a flag or something to set.

Again, doesn't rate limiting of following suffice? Perhaps, we might want to reconsider the default limit value based on the increased cost, though (given that we're actually going to backfill when following an account instead of e.g. when the user explicitly requested to backfill (#34 (comment)) or when scrolling a user timeline (#34 (comment)) as suggested before).

@tesaguri
Copy link

Well, the issue has gotten really long and it became more and more hard to grasp the points and blockers.

We need a summary of existing discussions, I suppose? Maybe I can make a write-up if I find time.

@trwnh
Copy link
Member

trwnh commented Aug 31, 2023

closest thing to that: #34 (comment)

points:

the current plan i think is to have a button on profiles that lets you fetch more statuses on-demand, rather than having it be automatically done on follow, but there's issues with doing this consistently as mentioned above. the only truly consistent way to do it is to fetch the entire outbox every single time. the next best thing is to start at the beginning and keep loading new pages until you start seeing stuff that you haven't seen before.

@tesaguri
Copy link

tesaguri commented Sep 1, 2023

I'm really sorry for the duplication, but I had already began working on my write-up, so… Here's an A-to-Z-version summary:

Pitch

Add a mechanism for backfilling the user timeline of a remote account (or in ActivityPub jargon, the outbox of a remote actor) on behalf of a local user.

The issue title suggests that the backfilling occur when a local user follows an remote account, but there are multiple suggestions as to when the backfilling should be done, as detailed in Design Questions section below.

Motivation

I tried to include the main points mentioned in the issue, but still this section isn't a comprehensive summary of the thread nor did I think it needs to be so, since the Collaborators are already in favor of the feature (particularly, it is Gargron himself who created the very issue and ClearlyClaire has kept on giving helpful insights on design questions) and so it doesn't seem to be lack of motivation that is blocking the issue).

When you open the profile page of a remote account, sometimes you will see an empty timeline, saying that "Older posts from other servers are not displayed." (assuming you are using the Web app). But if you open the profile page in its origin server, you would find out that the account has made a lot of posts in fact. That is often the case when no users in the local server have followed the account and thus the server has not subscribed to the account's outbox. The behavior leads to the following problems:

  • Confusion and loss of social opportunity: the behaviour is not quite intuitive unless you are familiar with the mechanism of Fediverse, possibly misleading people into concluding that the account is inactive. This may result in missed opportunity of following the account or interacting with the posts that would otherwise have occured.
  • Inability to view non-public posts: the original profile page doesn't let you view followers-only posts even if you follow the remote account, whereas followers of the account on that server can view all the posts.
  • Low-grade UX: even for technically-knowledgeable users, opening a browser tab every time they want to view a user timeline is not quite convenient.

Also, if you want to interact with a post that you saw when scrolling through the remote profile page, you will need to put the post's URL in the search field of your client to open the post in the client. Again, the UX here is arguably not the most intuitive nor convenient.

This is a kind of problem that wouldn't arise in centralized platforms, and that fact might make users feel penalized for being on a small server, or being in a federated network in general. Making the UX here close to those platforms would benefit the promotion of decentralized networks.

Design questions

When/how many to backfill

If we are to backfill the timeline when the user followed a remote account, it's not clear how many posts we should fetch. Fetching all the posts is not practical when the account has an enormous number of posts (#34 (comment)). But "fetching last N posts" is pretty arbitrary, and it doesn't apply to accounts not followed by local users (#34 (comment)).

Another approach is to fetch last N posts when a local user saw or interacted with a post of the remote account (#34 (comment), #34 (comment)). It applies to accounts not followed by local users, but the number of the posts fetched is still arbitrary. Also, it has concerns about overloading the remote server (#34 (comment)).

Alternatively, we can let the user request backfilling on demand by displaying a button on the timeline view in place of the missing posts (#34 (comment)) or trigger backfilling when the user scrolled through the timeline (#34 (comment)).

"Gaps" problem and lack of outbox cursoring

If we are to backfill the timeline on demand, it's still not clear how exactly we should backfill the timeline, because some posts might have already been fetched manually via the search field so the timeline might have "gaps" between those posts (#34 (comment)).

An approach would be to keep track of the range of posts that the server knows it has fetched/received all the posts within it (#34 (comment)), but there is no mechanism in ActivityPub to ensure that the server hasn't missed activities nor to fetch a certain range of activities (#34 (comment), w3c/activitypub#378 (comment)) (Editor's note: even if you follow the accout, delivery of activities might fail for some reason). Also, it doesn't always work for follower-only posts or posts with any other novel visibility (circles-only, group-only etc.) (w3c/activitypub#378 (comment)).

In fact, ActivityPub's outboxes (or Activity Stream's Collections in general) have the concept of "paging", but it only lets you to scan from the first page or last page, and doesn't let you to request something like "posts between this post and that post" (cursor pagination). While many server implementation (including Mastodon) actually implements the paging by appending parameters like max_id to the outbox URL, the format is not standardized and thus cannot be relied upon (#34 (comment)).

Miscellaneous

Alternatives

Related features

@jakewinter1
Copy link

“Connecting to the remote mastodon instance and loading content directly”

This exactly what Mammoth does and it’s a really clean and seamless experience.

@tesaguri
Copy link

tesaguri commented Sep 1, 2023

“Connecting to the remote mastodon instance and loading content directly”

This exactly what Mammoth does and it’s a really clean and seamless experience.

Ah, a client feature to fetch the outbox is an important alternative indeed. I'm not sure that the original comment actually meant that, but maybe client-side fetch and server-side proxying (and possibly cross-domain auth) in combination should sufficiently represent "Connecting to the remote mastodon instance and loading content directly," I suppose.

I've updated the list to replace the item with that one.

@tesaguri
Copy link

tesaguri commented Sep 2, 2023

I know the issue has already had enough discussion about the motivation for the feature, but I'd like to add a new one that has not been mentioned yet, concerning privacy.

While you can open the profile page on the original server to browse a remote user timeline, directly connecting to the original server means that your fingerprinting information like IP address is exposed to the server. And theoretically speaking, if you interact with the remote account using your account shortly afterward, the administrator of the server might be able to link the fingerprint with your account.

That might sound irrationally nervous, but with the upcoming participation of closed-sourced corporate implementations like Tumblr, Post News and Threads in the Fediverse, I expect the problem of tracking will become more serious.

And in connecton with this motivation, I'm concerned about the use of authorized fetches in backfilling. While using the user's private key to fetch outboxes enables fetch of non-public posts, that reveals the user's identity to the remote server. If the backfilling is to occur when the user followed an account, that's fine since to follow an account essentially involves revelation of identity to the followed account. But if the backfilling is to automatically occur when just scrolling through a user timeline, it is arguably not the user's intention to tell the remote server (possibly operated by the owner of the remote account) that it is the user who is scrolling through the remote account's user timeline right then.

So I think that the server actor's key should always be used to backfill timelines at least if the user who triggered it is not following the target account, and even if the user is following the account, there should be a setting to opt in/out of the use of the user's key, or at least some sort of warning message in the UI.

Also, we can use the key of a randomly selected follower of the account (similar to #7459) to "anonymize" the requester, but that's only a mitigation and doesn't help at all when you are the only follower of the account on the server (or if the account's followers are very few), and I don't expect it would work properly for remote servers with a novel visibility support like "group-only".

@brendanjones
Copy link

What about follows/followers? Wouldn't the moment that statuses are backfilled be a good time to also fetch the follows/followers of the account?

Assuming of course it's not already done at the moment you first view the profile, as requested by #19880 and #20533.

As suggested in those issues, I think it better to always be able to see the full follows/followers list of an account, but if for some technical reason that can't or shouldn't be done, then grabbing the list when you follow an account seems like a good second-best solution.

@smiba
Copy link
Contributor

smiba commented Nov 20, 2023

(Slightly off-topic, but in response to @brendanjones)

The struggle with this is that Mastodon doesn't like unresolved items (which is sensible), meaning that by resolving this list, it will also need to load and store information for all these profiles (like bio, profile picture, profile name). And as there is no other connection to these accounts, they will grow stale and show incorrect information (name doesn't update, profile picture doesn't update etc.)

There is no real way to solve this, other then store and exchange massive amounts of information that are very unlikely to be ever seen by any users. (And this times 100,000, for every instance of mastodon that exists)

This is simply due to the federated nature of mastodon, and the zero-trust approach. You'd either introduce a massive amount of data to be exchanged or break the zero-trust, both are very serious considerations with negative drawbacks

@trwnh trwnh added the activitypub Protocol-related changes, federation label Dec 31, 2023
@zakwilson
Copy link

So it seems to me there's mostly consensus that this should be done, some debate over the details, and a lack of resources to actually work on it.

I know Rails. I like Mastodon. I want this to happen. What behavior would actually be accepted if implemented in an otherwise-satisfactory PR? My thinking is that a page of the ActivityPub outbox should get loaded, and if the user scrolls to the bottom, the next page should get loaded.

@brendanjones
Copy link

Pinging @Gargron , there’s^^ someone offering to do the work. Are you able to define requirements/acceptance criteria?

@agambon
Copy link

agambon commented Feb 24, 2024

As others have said this seems absolutely paramount to wider adoption. I just configured my own selfhosted Mastodon instance, and was stunned to see that this issue exists. Was more stunned to see that it's been flagged for 7 years, with no solve. Was even more stunned to see that someone offered to do the work 2 months ago, and no one is taking them up on it.

@smiba
Copy link
Contributor

smiba commented Feb 24, 2024

It's an open source repo, I assume it will get accepted as PR but even if it doesn't, if it works fine I'd be more then happy to adopt it for my instance.

@zakwilson If you're up for making this still, I'm confident you'll make a lot of users happy!

@Sythelux
Copy link

At this point I can only recommend @zakwilson make a ready PR at the glitch-soc repo (https://glitch-soc.github.io/docs/ ) and talk with them on how to get an upstream PR to here. This discussion is going on for so long that there won't be any more movement back an forth.

@FediVideos
Copy link

FediVideos commented Mar 18, 2024

At this point I can only recommend @zakwilson make a ready PR at the glitch-soc repo (https://glitch-soc.github.io/docs/ ) and talk with them on how to get an upstream PR to here. This discussion is going on for so long that there won't be any more movement back an forth.

If Glitch introduced some kind of backfilling, think a lot of instances would switch over to it. That would be a massive advantage over vanilla Mastodon.

@ShadowJonathan
Copy link
Contributor

Changes from glitch-soc often upstream to mastodon proper, in many ways its a develop branch of mastodon where features get beta-tested in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
activitypub Protocol-related changes, federation
Projects
None yet
Development

Successfully merging a pull request may close this issue.