-
-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backfill statuses from remote accounts when first subscribed #34
Comments
I'd like to take a crack at this, but am still trying to wrap my head around the world of services/workers and which things are triggered when. My guess here is that somewhere in the collection of remote follow services and workers we want to add a background worker task which pulls the feed of the remote account which was just followed, and import the last 5 or 10 or so items. The other thing I'm struggling to locate is any code which is effectively "go grab this account's atom feed and import some entries" ... it strikes me as possible that this doesn't exist, because all statuses are pushed onto the server and never (yet) pulled? Any pointers here are appreciated! |
@mjankowski The main roadblock to this is actually #1059, because if you went back and imported 5 statuses from an account that last posted 6 months ago, you would get 5 6mo-old posts on the top of the public timeline (and potentially home timelines as well) So this should probably stay a wontfix until something is done about IDs and their sorting. |
Thanks, I read that thread and looked at more code, and I agree that backfilling is blocked on improving the ordering approach to either get away from ID-sorting, or make ID-sorting reliable for backfilled records. I'll pause on this, and leave a comment over there. |
Merge tag 'v1.5.1' into imastodon
knzk-gauge-icon.pngが表示されない問題を修正
#3307 is about accurate follower counts, but in the comments I discuss some implementation ideas which could theoretically encompass this. |
Any updates on this ? |
It's now possible to implement this, but it's not clear how many items to fetch (and if you say "all", think about accounts with 100,000 statuses...) |
Mmh maybe it can be done by displaying a button where the missing toots are, to fetch the 5 or 10 next toots ? So it would be done manually by users little by little. |
What about just fetching blocks of 20 toots, triggered by scrollbar position? |
@deutrino Generally this seems good, but some toots might be already fetched manually by users and so there would be holes to fill here and there. |
@deutrino ok, I was thinking back at what you were suggesting and it seems to be the good thing to do considering that mastodon already serve 20 toots at a time when looking at a profile. So everything can probably be done smoothly on the server side with field saying "I know I'm up-to-date up to this toot/date" and contacting the other instance if we are trying to get toots that are in the "probably not up-to-date" range. |
Maybe fixing this would also partly fix #6137, as looking for a deleted account toots would give hints to the server to remove the account from its local database ? |
Mmh, thinking again, maybe not. Sorry. They seemed closely related in my head but seems like I'm too tired today :/ |
any chance for this issue to get fixed or partially fixed by the next release (2.5) ? |
@kit-ty-kate with some logic similar to #7459 we could probably fetch private toots from a remote user, yeah.
The suggestion of having a “gap” that users could click sounds very nice, but it is a lot harder to implement. Indeed, the protocol mandates toots to be strictly ordered, but I don't think there is a mechanism to make sure you're not missing toots, nor to request a certain range of items, so efficiently filling a gap seems pretty hard. Add to that that some items may only be displayed to some authenticated users, and things can get quite complicated… |
I made an account on Github just to add my support for this addition and to comment on it. I'm a relatively new user on Mastodon, and I must say, it's incredibly confusing, annoying, and discouraging to necessarily have to go and find another account on a separate instance/domain just to see what toots they've made before. I understand that the point is to not have an owner of any instance to host a lot of data, but am I going to necessarily have to refigure out which instance the public toots of another user are every time I want to see them? I should also add that this is doubly annoying because some instances don't allow you to do certain things (like view certain toots) unless you have an account on there. What if I want to favorite or retoot a toot that isn't on my instance and also is not loaded on my instance? I don't want to have to create multiple accounts on Mastodon just because toots that I want to go through in someone's backlog and favorite aren't available on the particular instance my account is on. I specifically favorite toots to look back on and find later, so it really breaks a big way that I currently use social media. If backfilling was ever implemented, I feel there could be a couple options:
I really want to give this place a chance, because I'm really missing a social media that has a kind of community atmosphere to it, but this issue in specific really discourages me from staying, as I do go back through people's profiles often when first following them and also to determine whether I want to follow them or not (based on what they toot/retoot). Please at least consider the option. Thank you! |
Couldn't we just use the existing rate limit machinery instead of introducing a new system?
The concern about visibility is not unique to backfilling. It applies to manual fetch of posts by putting URL in the search field, for example, and so I guess the necessary precautions are already implemented.
Again, doesn't rate limiting of following suffice? Perhaps, we might want to reconsider the default limit value based on the increased cost, though (given that we're actually going to backfill when following an account instead of e.g. when the user explicitly requested to backfill (#34 (comment)) or when scrolling a user timeline (#34 (comment)) as suggested before). |
Well, the issue has gotten really long and it became more and more hard to grasp the points and blockers. We need a summary of existing discussions, I suppose? Maybe I can make a write-up if I find time. |
closest thing to that: #34 (comment) points:
the current plan i think is to have a button on profiles that lets you fetch more statuses on-demand, rather than having it be automatically done on follow, but there's issues with doing this consistently as mentioned above. the only truly consistent way to do it is to fetch the entire outbox every single time. the next best thing is to start at the beginning and keep loading new pages until you start seeing stuff that you haven't seen before. |
I'm really sorry for the duplication, but I had already began working on my write-up, so… Here's an A-to-Z-version summary: PitchAdd a mechanism for backfilling the user timeline of a remote account (or in ActivityPub jargon, the outbox of a remote actor) on behalf of a local user. The issue title suggests that the backfilling occur when a local user follows an remote account, but there are multiple suggestions as to when the backfilling should be done, as detailed in Design Questions section below. MotivationI tried to include the main points mentioned in the issue, but still this section isn't a comprehensive summary of the thread nor did I think it needs to be so, since the Collaborators are already in favor of the feature (particularly, it is Gargron himself who created the very issue and ClearlyClaire has kept on giving helpful insights on design questions) and so it doesn't seem to be lack of motivation that is blocking the issue). When you open the profile page of a remote account, sometimes you will see an empty timeline, saying that "Older posts from other servers are not displayed." (assuming you are using the Web app). But if you open the profile page in its origin server, you would find out that the account has made a lot of posts in fact. That is often the case when no users in the local server have followed the account and thus the server has not subscribed to the account's outbox. The behavior leads to the following problems:
Also, if you want to interact with a post that you saw when scrolling through the remote profile page, you will need to put the post's URL in the search field of your client to open the post in the client. Again, the UX here is arguably not the most intuitive nor convenient. This is a kind of problem that wouldn't arise in centralized platforms, and that fact might make users feel penalized for being on a small server, or being in a federated network in general. Making the UX here close to those platforms would benefit the promotion of decentralized networks. Design questionsWhen/how many to backfillIf we are to backfill the timeline when the user followed a remote account, it's not clear how many posts we should fetch. Fetching all the posts is not practical when the account has an enormous number of posts (#34 (comment)). But "fetching last N posts" is pretty arbitrary, and it doesn't apply to accounts not followed by local users (#34 (comment)). Another approach is to fetch last N posts when a local user saw or interacted with a post of the remote account (#34 (comment), #34 (comment)). It applies to accounts not followed by local users, but the number of the posts fetched is still arbitrary. Also, it has concerns about overloading the remote server (#34 (comment)). Alternatively, we can let the user request backfilling on demand by displaying a button on the timeline view in place of the missing posts (#34 (comment)) or trigger backfilling when the user scrolled through the timeline (#34 (comment)). "Gaps" problem and lack of outbox cursoringIf we are to backfill the timeline on demand, it's still not clear how exactly we should backfill the timeline, because some posts might have already been fetched manually via the search field so the timeline might have "gaps" between those posts (#34 (comment)). An approach would be to keep track of the range of posts that the server knows it has fetched/received all the posts within it (#34 (comment)), but there is no mechanism in ActivityPub to ensure that the server hasn't missed activities nor to fetch a certain range of activities (#34 (comment), w3c/activitypub#378 (comment)) (Editor's note: even if you follow the accout, delivery of activities might fail for some reason). Also, it doesn't always work for follower-only posts or posts with any other novel visibility (circles-only, group-only etc.) (w3c/activitypub#378 (comment)). In fact, ActivityPub's outboxes (or Activity Stream's Miscellaneous
Alternatives
Related features |
“Connecting to the remote mastodon instance and loading content directly” This exactly what Mammoth does and it’s a really clean and seamless experience. |
Ah, a client feature to fetch the outbox is an important alternative indeed. I'm not sure that the original comment actually meant that, but maybe client-side fetch and server-side proxying (and possibly cross-domain auth) in combination should sufficiently represent "Connecting to the remote mastodon instance and loading content directly," I suppose. I've updated the list to replace the item with that one. |
I know the issue has already had enough discussion about the motivation for the feature, but I'd like to add a new one that has not been mentioned yet, concerning privacy. While you can open the profile page on the original server to browse a remote user timeline, directly connecting to the original server means that your fingerprinting information like IP address is exposed to the server. And theoretically speaking, if you interact with the remote account using your account shortly afterward, the administrator of the server might be able to link the fingerprint with your account. That might sound irrationally nervous, but with the upcoming participation of closed-sourced corporate implementations like Tumblr, Post News and Threads in the Fediverse, I expect the problem of tracking will become more serious. And in connecton with this motivation, I'm concerned about the use of authorized fetches in backfilling. While using the user's private key to fetch outboxes enables fetch of non-public posts, that reveals the user's identity to the remote server. If the backfilling is to occur when the user followed an account, that's fine since to follow an account essentially involves revelation of identity to the followed account. But if the backfilling is to automatically occur when just scrolling through a user timeline, it is arguably not the user's intention to tell the remote server (possibly operated by the owner of the remote account) that it is the user who is scrolling through the remote account's user timeline right then. So I think that the server actor's key should always be used to backfill timelines at least if the user who triggered it is not following the target account, and even if the user is following the account, there should be a setting to opt in/out of the use of the user's key, or at least some sort of warning message in the UI. Also, we can use the key of a randomly selected follower of the account (similar to #7459) to "anonymize" the requester, but that's only a mitigation and doesn't help at all when you are the only follower of the account on the server (or if the account's followers are very few), and I don't expect it would work properly for remote servers with a novel visibility support like "group-only". |
What about follows/followers? Wouldn't the moment that statuses are backfilled be a good time to also fetch the follows/followers of the account? Assuming of course it's not already done at the moment you first view the profile, as requested by #19880 and #20533. As suggested in those issues, I think it better to always be able to see the full follows/followers list of an account, but if for some technical reason that can't or shouldn't be done, then grabbing the list when you follow an account seems like a good second-best solution. |
(Slightly off-topic, but in response to @brendanjones) The struggle with this is that Mastodon doesn't like unresolved items (which is sensible), meaning that by resolving this list, it will also need to load and store information for all these profiles (like bio, profile picture, profile name). And as there is no other connection to these accounts, they will grow stale and show incorrect information (name doesn't update, profile picture doesn't update etc.) There is no real way to solve this, other then store and exchange massive amounts of information that are very unlikely to be ever seen by any users. (And this times 100,000, for every instance of mastodon that exists) This is simply due to the federated nature of mastodon, and the zero-trust approach. You'd either introduce a massive amount of data to be exchanged or break the zero-trust, both are very serious considerations with negative drawbacks |
So it seems to me there's mostly consensus that this should be done, some debate over the details, and a lack of resources to actually work on it. I know Rails. I like Mastodon. I want this to happen. What behavior would actually be accepted if implemented in an otherwise-satisfactory PR? My thinking is that a page of the ActivityPub outbox should get loaded, and if the user scrolls to the bottom, the next page should get loaded. |
Pinging @Gargron , there’s^^ someone offering to do the work. Are you able to define requirements/acceptance criteria? |
As others have said this seems absolutely paramount to wider adoption. I just configured my own selfhosted Mastodon instance, and was stunned to see that this issue exists. Was more stunned to see that it's been flagged for 7 years, with no solve. Was even more stunned to see that someone offered to do the work 2 months ago, and no one is taking them up on it. |
It's an open source repo, I assume it will get accepted as PR but even if it doesn't, if it works fine I'd be more then happy to adopt it for my instance. @zakwilson If you're up for making this still, I'm confident you'll make a lot of users happy! |
At this point I can only recommend @zakwilson make a ready PR at the glitch-soc repo (https://glitch-soc.github.io/docs/ ) and talk with them on how to get an upstream PR to here. This discussion is going on for so long that there won't be any more movement back an forth. |
If Glitch introduced some kind of backfilling, think a lot of instances would switch over to it. That would be a massive advantage over vanilla Mastodon. |
Changes from glitch-soc often upstream to mastodon proper, in many ways its a |
No description provided.
The text was updated successfully, but these errors were encountered: