-
-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Post Migration #12423
Comments
Once more I fully support this and have been eagerly awaiting the much requested feature. I would love to see the ability to import the data archives (favorites boosts and toots) exported from the https://instance.social/settings/export section, which is currently a one-way system only. Unfortunately there doesn't seem to be a plan for this yet, but I'm staying hopeful that will change eventually. I believe Eugen mentioned performance concerns on the previous discussion. |
Reiterating points from previous discussion:
|
Thanks @trwnh!
What's backfilling? |
backfilling is #34 -- fetching old statuses from a profile that were never delivered to the instance. you can't expect to deliver 100,000 activities every time you migrate, because that's wildly inefficient. |
Regarding performance and backfilling: This is why I believe that post migration is possible, but those who choose to use it will need to accept a long waiting period. There will probably have to be three types of limitations to make it viable:
Obviously, if you're importing something like 10.000 boosts + faves + toots, the process may take days if not weeks. It's a price the user will have to pay. But in this format it sounds like it should be performance viable at least. |
That makes sense! Personally I'm perfectly fine with them taking a long while to move over, so long as they move over eventually. I'm not quite sure what the limit should be, though – one post per 10 seconds seems awfully slow to me, unless you have like 100 accounts being migrated at once. Maybe set it relatively fast by default (like 1/s or maybe even a little higher), since most instances are small and won't need to deal with a lot of migrations, then larger instances can set it slower? |
Although I would really, really prefer if everything gets migrated, even if it takes weeks or even months. I don't want to lose my old posts – that's basically an archive of what I'm like, so new people can get to know me a little better before they decide whether to follow me. It's also history, and while some may not want to keep that sort of thing, others (like me) would. (Even if you technically have everything in the backup, that's incredibly cumbersome to look at, given I'm not aware of any software that can actually import it.) |
stuff can get imported, but it is absolutely a bad idea to redeliver it. i am firmly of the opinion that a Move with post import should maybe do a rewrite of existing statuses in the database, but nothing more. it is unwise to even consider backfilling until the actual decoupling and importing gets implemented. more to the point in general, though: it is fundamentally bad idea to assume that every instance should have a copy of every single post at all times. we don't need to duplicate everything 6000x or more. what is fundamentally happening with a "migration" is that the authority is being shifted; the location is being shifted, that's all. it's a Bad Idea to treat them as new statuses. hopefully i've explained why |
Hm...well, would they be new statuses? I figured they would simply be reassigned, but then maybe redelivered as the originals, not duplicated and "reposted". Or maybe I'm completely misunderstanding how federation works. I know very little about the backend.
Is this actually being discussed? I'd say for a migration the situation is a bit different; only the destination server has to receive a copy of every single post, not everybody. |
yeah, and it should be limited to this. looking at prior art from zot, you can do an online migration (fetch old messages from your outbox and inbox to the new server) or an offline migration (using an exported account archive that contains at minimum your keys, your posts, and your address book) |
That sounds lovely! I've been a little worried in the back of my mind about how to handle a server that just suddenly dies. |
Is there at least a first-party tool that users (not servers) can use to work their downloaded backup and generate something usable? Something like a HTML file (plus media?) that can be uploaded into a remote server. Would at least temporarily solve the problem of making the lost content accessible again. |
@lmachucab I made a NodeJS script that I used to port my faves and boosts to other accounts. Unfortunately it no longer worked last time I tried it, and it was a crude solution that put quite a bit of stress on the target server and even triggered an instance's flood protections once. If you think you might have some use for it still, it's still available on my Github: |
Hi there! Mastodon.cloud literally has a month to live. We're being shut down because the domain is being forced to bow to insane regulations, and the admin isn't having any of it. We had been the target of immense DDoS attacks for the past six months. |
I'd like to second that Mastodon.cloud disappearing shows this is a crucial feature. Mastodon is a great piece of software, and designing it to be decentralized was a great idea. Making it possible for anyone to set up an instance, yet having instances be able to communicate with one another, makes the network resilient and eliminates single points of failure. But without the ability to jump from one instance to another at will, this decentralization has a large drawback. If an instance decides to call it quits, you're just going to lose your data. This also means that as a user, you can't be sure if the instance you picked is going to live a long or a short life. Given how many instances there are, it's totally reasonable to expect that some of them are going to disappear. Aside from users choosing to migrate somewhere else, there should also be a migration plan for instance admins. If an instance shuts down, it shouldn't just be that active users who care about their account get to keep their data, because that means even if user migration is possible there's still going to be a big loss of online culture and data. I think this should probably be a separate issue and come with a host of problems by itself, but without it I think we'll see Mastodon content regularly disappearing forever as instances come and go. |
Yeah even I didn't expect this sad news. mastodon.cloud banned me randomly and without explanation after a few months of using it... at the time I was upset, now I'm glad I escaped this event. It's sad to hear even Japan is becoming a dangerous place for internet freedom and has a government failing to understand technology and the flow of information. I know mastodon.cloud was one of the largest instances out there. If it can go down, the fediverse as a whole is at serious threat of significant data loss. That's why I would further implore that we please support FULL account migration now... even if as an option disabled by default which the instance admin can enable if they so wish, in case performance is such a big concern. An equally better idea would be a node-based approach to storage, similarly to how IPFS / DAT in concept; Instead of an instance being stored on and served by one physical server, it can be stored in a decentralized cloud where anyone may run a node to serve a copy of the data. Sadly this would require such a rewrite that we'd be talking about a whole new project over the existing Mastodon. |
A reminder that we're supposed to "Own Your Own Data With Mastodon!" If we don't find a way to do that with toots, we are bound to be sued for false advertising. I smell a frivolous lawsuit that's bound to cripple the Fediverse and we need to get ahead of it. While I may not ideologically agree with Gab, which accounts for 25% of the Fediverse's mass, they have every right to be here as we do, and we need to prepare for pointed media smear campaigns that could potentially harm our reputation. We need to emphasize our content controls, and remind users that they can ban who they want from their own console, and that moderators do not have to do it for them. We have that advantage over Twitter and we need to advertise th out of it. |
Mastodon's own code is under the AGPL, right? That's the situation you're talking about, right? I'll agree the AGPL would have no effect on media outlets saying whatever charged things about mastodon they want to, though. |
That is an interesting point. I think they forked Mastodon, though (as of 2.8.5) and created Gab out of it. Ironically, it seems they were able to preserve their "toots" during the migration. My comment revolved around the idea that you can "Own Your Own Data With Mastodon". If this is the case, we must ensure that people have the capability to export their toots otherwise that slogan could be misconstrued as false advertising. We must have a provision that allows users to migrate to another server if the one they're on is being shut down through no fault of its own (regulatory crackdowns in the case of Mastodon.cloud). What happened in Minneapolis is a distraction. The attacks against the Fediverse and all other entities like it will intensify. We must fortify our infrastructure and address any outstanding issues so that we're ready for any exodus. |
Even a static representation of said toots is fine with me, similar to Twitter's archive, which you can navigate offline. They don't need to be integrated into the post chronology. Perhaps we could create a designation for such toots, and also prohibit their editing or removal to ensure a smooth migration. |
there's an issue for that #9461 (export including some bare html-version of your toots) With regard to european GDPR, there's the rule that you must have access to a machine readable version of you data. This is satisfied already. While this satisfies the letter of this particular law, I do wish it was easier for everyone to use that data (hence the ticket above). I also think re-importing would be nice. Yet, I do not believe there's any grounds to your "false advertising" point. Sorry. (edit: I linked to the wrong issue before, this has been corrected.) |
Thank you for clarifying that. I'll follow that issue too. ^_^ |
I strongly support this. I have my own independent instance that I don't want to maintain anymore so I moved over to another instance. DigitalOcean doesn't let you download complete images as a backup so I can't wait to upgrade to a later version that supports it and transfer the old content over. |
This is a feature that it's impressive that Mastodon does not have in 2020, honestly. Without the ability to export and import post history not only can accounts be lost but also identities - as even if an account is migrated to another server, at the moment this is a partial "in name only" process. Any content that the old account had posted is still bound to the lifetime of the old service (and its FQDN). After having tested some scripts for extracting the information from the exported data dumps, I think there's enough of those scripts and projects wandering about that it's perfectly feasible to grab one (1), integrate it into Mastodon, and offer it as an utility that eg.: generates full sessions / scripts for clients like |
@lmachucab do the existing scripts deal with not treating the imported posts as new posts that would be picked up for federation? From what I'm reading above, that's what should be avoided to reduce load on servers. I would be happy if my posts were only viewable from my account page/feed rather than immediately (or ever) federated (i.e., they would only be federated if someone was looking at my account). All I want to be able to do is change my username or move to another instance if needed. |
I thought I'd bring this up: One of the SPC admins revived bofa.lol, a previously discontinued instance, only for it to regurgitate its old content over TWKN, some of which was excessively vile. bofa.lol aside, I'd like to focus on the mechanism that caused those toots to refederate and see if we can harness that mechanism to archive them for other instances. As of this writing, bofa.lol was taken down less than a day after it was briefly revived. *Edited for context and some spelling mistakes at 21:50-0500 |
2021* |
This is going to be incredibly important to ensuring Fediverse platforms don't centralize around the most popular instances. If I have the ability to completely move my account from one instance to another with minor hassle, chances are I'm going to be more willing to take risks on smaller instances. Some of my first accounts on Masto/Peertube were on small instances, but they ended up closing down on me unexpectedly. |
@Gargron, do you consider this to be a feature worth pursuing? if so, what is needed to make it happen? it's been open for nearly 2 years without any comments from mastodon's lead, so i thought i would ask. |
Apologies for the delay in sharing this here. A development/test version of the solution outlined here is now available as an external command-line tool, at https://mastodoncontentmover.github.io/ It's not as sophisticated a solution as many in this thread are looking for, but it provides basic functionality to move post content, including media attachments, for people who need to migrate between instances. It does not preserve threaded interactions, but it does preserve self-reply threads (involving threads where all posts are by the account owner themselves). It respects the Mastodon standard API rate limits with an additional margin, and it is possible for users to optionally slow the tool's activity even more. Media posts in particular are significantly throttled because media processing is resource intensive, and by default it suppresses public posts so that timelines are not flooded. It's possible to selectively save and/or repost only bookmarked posts. It runs on any platform that has a Java runtime environment (version 1.7 or above), and the source code is available on Github. It is only intended as a workaround; I built it because I needed to move my own content, but was mindful throughout that this functionality was also needed by others — it makes sense to share it, but because of how it came to be and because of time constraints it is a little rough around the edges. Nonetheless, I hope it helps some of those facing the loss of their content due to the absence of this functionality from Mastodon itself, and ultimately I hope it might prompt Mastodon to consider building at least this basic level of functionality into core (which would offer some advantages such as allowing admins more control over post importing and how that is prioritised on their instance, and possibly allowing posts to be correctly backdated as well). |
Really fantastic to see some progress toward a content migration solution! Thank you for sharing your work. |
I favor a very simple ‘forwarding address’ approach that leans on existing federation features as a form of archiving:
|
@MastodonContentMover this looks great! Will definitely help a lot of people solve this problem for themselves without requiring any major changes to mastodon's architecture. I'm also continuing to work on the solution I described in my earlier posts. I'm hoping it will provide a seamless solution to the Fediverse's issues with account and post migration in general. I've gotten started on it and have figured out a lot of the implementation details, but have been struggling to implement them due to my poor understanding of OAuth 2.0 and OIDC. I'm brushing up on that at the moment. For more regular updates or if you want to get involved, follow the discussion thread I've started here. |
I think doing a form of auto-boosting into the new instance might be the simplest way forward. Internally this could be a special boost that would then be "assigned" to the new account so that it shows up in archives. |
I've been reading through this and past issues to get a gist of what the technical problems here are. There's a lot of fluff above and in previous threads, so to save others the time of catching up here's a summary of what needs to be done.
If one is migrating from an instance about to bite the dust: that's all you need, really. This is pretty much implemented with MastodonContentMover above. It fucks up posting dates but that's not fixable without being an admin. If one is migrating from a live instance, there's more to think about. Ideally, if the original server is up & cooperating, it would be nice for posts & conversations to function just as they did on the original post: i.e. there needs to be a way to say "this toot is now owned by a different instance, send your requests there". So if someone comes across a pre-migration post, they can see all replies to it and can like/boost/reply to it. To the user, this would probably look like:
Paging @trwnh as they seem to deeply understand the issues here: Is this an accurate summary? Did I miss anything? I'm unable to find a tracking issue for "use GUIDs", so I suppose that's just this one. |
Datapoint: Mastodon's federation inboxes arent "smart", and shared inboxes are a thing. Likes are sent as activitypub objects, and those are just sent to inboxes. Mastodon's inboxes just stores the incoming data into a queue to be processed later. This way, one request doesn't hold up the line, and the federation worker is immediately ready to receive the next one. This would make any "this post isn't ours anymore" response have to happen asynchronously, or the "data owner" of a post needs to be changed, and the URI's rewritten when that happens, though that has other problems (such as ID types not being consistent between servers, needing every ID to get re-fetched to re-map them, resulting in a multi-MB blob of remapping, possibly.) |
@ShadowJonathan I think the "this post isn't ours anymore" response happening asynchronously should be fine? (if it isn't, can you elaborate?) With #10745 this only ever happens once per instance after all as the instance will then update its user id (my example pipeline is a bit wrong in this respect, I wrote it before understanding #10745). Also as I understand it |
Organizationally I would expect that if a user has migrated away from one (working) server to another, then the source server might not want to host or publish the content of the users post. With that in mind the "this content isn't ours anymore" can be come a very simple problem to solve by having a single, static redirect on the user level. "That user is over there now, you might be able to find this post there or you might not" is a perfectly valid answer IMHO. |
@omentic yeah mostly, the only thing i'd point out is that using GUIDs isn't enough because anyone can claim any id. so you need to bind it to some authority -- either an HTTP origin (DNS hostname) or some public key cryptography alternative (which is its own big breaking change, but would ultimately grant more portability). "identity" in general has to be rooted in one of the two. you'd basically need an indirection layer along the lines of something like #10745 would make it easier to get there but would not solve the issue entirely on its own. it just allows the username to change. you'd still need a way to allow the authority to change (or otherwise defer to some other authority). |
@trwnh could you elaborate on this? What's wrong with having a Really Big GUID namespace and assigning ids randomly? If I understand things, then right now definitionally that authority is |
the problem is bad actors that intentionally use a conflicting id. it doesn't matter how big the namespace is. you need some authority, like a domain or otherwise signing the guid. right now the authority is the best we can do right now is you could mint identifiers against some stable authority, like a PURL service. basically, if changing something is a problem, you have two possible solutions:
|
@trwnh In FEP-ef61 the authority is a DID (so it's a breaking change), but I think the interoperability with existing software can be preserved if implementations will generate IDs as HTTPS URLs containing a DID URL instead of just DID URLs (this idea is discussed in the "Compatibility" section of the proposal). Similar to how IPFS objects can be referenced either by |
@trwnh I still don't see the problem. Where do bad actors come into play? You have to trust the migrated-from instance to correctly point at your new profile, and subsequently point other instances to update their internal id mapping. Another instance's GUID is only relied upon by the migrated-to instance when importing posts, so that they have the same GUID (actually, going forth they wouldn't even need to use the old authority In general I think that fully decoupling identity from instances requires a significant amount of additional complexity and could be done later anyway, so if it isn't strictly necessary for post migration I'd like to avoid thinking about it. |
(you might have to kind of hand-hold me through an explanation of why bad actors are a problem here - i only starting reading through these issues last night, and have no experience running an instance or dealing with federation) |
Once a post's UID becomes known, any bad-acting instance admin can claim that post for their own just by adding a post with that ID to their server. At best, there is a duplicate ID now floating out there in the fediverse. At worst, the forged post is now considered to be the real post. I believe the commenters above are correct; there has to be some tether to an authority to validate a post identifier. |
OK, so here's an example:
The problem as it relates to migration of posts is that |
Ah, I see now. I don't think this needs to be a problem. Posts currently have an internal UUID: there are duplicates ids between instances floating abouts, and it doesn't matter because remote posts are not resolved by relying entirely on the UUID. What I thought was being proposed with the GUIDs (and what I think should be proposed) is that the "global" aspect of the GUID only becomes relevant upon post import. Things can function as they currently function otherwise: despite ostensibly being a GUID were all actors trusted, the GUID is treated exactly how internal post identifiers are currently treated - as a UUID - with the sole exception being during post import, where you trust both parties. So this would mean that upon a move to a new username/domain the old |
(@trwnh if you've got the time, i'm curious if my thinking here is accurate? if so i might move ahead with a GUID proposal) |
@omentic How can Intstance A, the moved-from instance, communicate the migration of a user to all other relevant instances in the network. At the end of the day, all relevant instances need to change the information of the users origin in their databases. You can't just assign a GUID, because in fedi, there is no single Authority. Nothing ensures a GUIDs uniqueness, nor can you resolve it uniquely to an origin server or a public key. So, sure, using GUIDs would mean that all TRUSTED servers wouldn't have to worry about id collisions between posts. However, any malicious server can claim: "I generated this GUID for this post. It has the content: 'i am a moron, signed, Donald Duck'" Even though Mr. Duck has never posted something like that. The mechanism we use to keeo ids globally unique right now is the Domain Name System. The AUTHORITY who can delete a post, recieves replies for the post, cab edit the post, etc, is baked right in. A "full move" of a post is essentially a transfer of authority, and given that ids are tied to the hostname, IDS MUST CHANGE in order for the move to take place. The only way around this would be cryptography. This is a massive breaking change with its own problems - regarding key management and key ownership- as trwnh said. But with cryptography, you could theoretically spin up a new instance, dig out your private key, and sign a thousand messages saying 'i'm here now' And then all existing fedi servers would recieve the proof, and add a redirection entry into their database, or change the ids of existing objects, though that would be a greater risk to data integrity. |
My guess would be: Solution 1:
Solution 2:
|
Solution 2 also doesnt work as a proof to remote instances that are newer than your move - They have never seen your uncompromised actor jsonld |
Your solution 1 seems like the least complicated system of migration, if I am understanding correctly - the 301s create authority over the post with the notification of the change federating in a simliar way to edit notifications. That way any further actions taken on the post originating from the new instance will be trusted. |
@kartonrad I think you're missing what I'm saying. I am saying that
this is not true, and I am also saying that
this is not relevant, and that
this is false. (the transfer of authority part is true. the ids needing to change and the only way around this being cryptography is false.) |
I keep getting notifications here. Stop yapping and write the code that will make this happen. Update: This is not to the core team (who I hope will some day get to it), this is to everybody who's commenting here. This is open source and it thrives off contributions. The amount of energy wasted here is incredible. |
@kartonrad Nomadic identity already works in Hubzilla and Streams, both are Fediverse projects. They use different protocols alongside ActivityPub, but a similar solution can be implemented in pure ActivityPub too. That solution is described in proposal FEP-ef61: https://codeberg.org/fediverse/fep/src/branch/main/fep/ef61/fep-ef61.md It introduces a new kind of object ID where authority is indicated by a cryptographic key instead of a domain name. |
@silverpill But... is there really any project that is fully activity pub compliant? Mastodon could, of course, move to towards this system, theoretically. But, as stated in the document you linked:
I guess i did say "fedi" not "AP" |
Unsubscribed IMO this should be a discussion Good luck! |
#177 – Support Account Migration – was closed after implementing follower migration, but this is only one small part of a true migration. To really be able to change instances, you need to be able to take your posts with you. There was some good discussion on this over there; I'm opening a new issue to make it clear that this is a separate concern from that issue, which seemed to evolve into only being about followers.
Personally, I couldn't care less about migrating my followers/following list. I can refollow people and have them refollow me. It'll all work out. My posts, however, are currently impossible to restore.
The text was updated successfully, but these errors were encountered: