Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2783: Homeserver Migration Data Format #2783

Draft
wants to merge 6 commits into
base: old_master
Choose a base branch
from

Conversation

ShadowJonathan
Copy link
Contributor

@ShadowJonathan ShadowJonathan commented Sep 19, 2020

Rendered

Related to #2760

Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>

@ShadowJonathan ShadowJonathan changed the title [WIP] Homeserver Migration Data Format [WIP] MSC2783: Homeserver Migration Data Format Sep 19, 2020
@ShadowJonathan ShadowJonathan marked this pull request as draft September 19, 2020 13:42
@ara4n
Copy link
Member

ara4n commented Sep 19, 2020

ooh, interesting - thanks for starting this. heads up though that the chances are high that the core team ends up sinking time into decentralised accounts (#915, as unlocked by MSC #1228), which then solves the migration problem (at least on a per-user basis, but could obviously be automated for a serverwise migration), and which in turn is required for P2P, and also solves server scalability, vhosting, HA and geo-HA requirements...

@turt2live turt2live added kind:feature MSC for not-core and not-maintenance stuff proposal A matrix spec change proposal labels Sep 19, 2020
@ShadowJonathan
Copy link
Contributor Author

ShadowJonathan commented Sep 19, 2020

@ara4n, I seriously want to emphasise that i see decentralized user accounts as a non-solution for the problem this MSC is trying to solve; backing up, and easy sysadmin migration, and safekeeping of homeserver state, while also avoiding potential lock-in. This is about lateral movement on a software/admin-level, not lateral movement on a user-level.

This sentiment is outlined in the Values of the foundation, which i plan to adhere to, and so i think its important that this problem is solved before it ever comes up.

I understand the priorities the core team has, but I wanted to repeat my views on matrix-org/matrix-spec#246, as I saw this proposal being dismissed in favour of it, which I dont see as helpful.

(Note: i currently cant participate in #matrix-spec:matrix.org because of matrix-org/synapse#8340, this issue is still a WIP until then, and until i've gotten good feedback)

@ara4n
Copy link
Member

ara4n commented Sep 20, 2020

Judging by the strength of your reaction I may be missing something :)

In an MSC1228 world: any server which hosts the private part of your user_key can participate in a room on behalf of that user. So if you were a server admin migrating from synapse to dendrite or whatever, you would copy all your user_keys over to the new server, and it would participate in the rooms as that user (and replicate over account_data somehow - probably by representing it as a room). The details are a bit fuzzy given they haven't been fleshed out yet, but it's the same mechanism by which a P2P user would log into two instances as the same user. To complete the migration, the old server would then be turned off. In other words: the migration is simplified to just copying keys, and then Matrix replicates the rest of the data over... via Matrix, obviating the need to specify and maintain a new interchange format in addition to core Matrix itself.

That said, there have been some concerns voiced about giving servers freestyle responsibility for the user_key - in an E2EE-by-default world you could argue the user should look after their user_key themselves and sign which servers are allowed to host their account. If this came to pass, then sysadmins would not be able to force-migrate their users (which would be a pain both for this use case, as well for vhosting & HA purposes), and an interchange format would be more important.

Surely you agree that if we can solve both use cases with one solution, we should do so to avoid the spec getting too sprawling?

@ShadowJonathan
Copy link
Contributor Author

Judging by the strength of your reaction I may be missing something :)

Sorry for the strong wording, I simply wanted to make something clear up-front

Surely you agree that if we can solve both use cases with one solution, we should do so to avoid the spec getting too sprawling?

I do agree on the spec sprawling if this would be included, but I think this counters some other key problems: domain hot-potato, consistent backup format, atomic and reversible migration

I don't see these problems being solved if 2 servers exchange data via p2p upon the live migration, both servers would need different domains (though if the server part is a pubkey, it's essentially the same problem, because clients or links could be keyed to that server, which doesn't exist then anymore, unless some other spec abstracts away serverparts even more to provide an interface for selection and redirection), and a latent need for a backup format (instead of just backing up application data) still isn't fulfilled. There is also the possibility of "dead links" in the form of room aliases currently not being resolvable because the old server is down, and the new server living on an unknown subdomain.

I'm just saying this to counter some of your arguments, I agree that the spec should be reduced as possible, but I don't think a p2p framework overlaying this will fix most things.

Please correct me if I make assumptions that are just plain wrong, though. I have yet to completely drill down to the core of the p2p ideas presented from matrix.org over the years, so I might be missing a crucial piece that indeed makes this much more possible.

@ara4n
Copy link
Member

ara4n commented Sep 21, 2020

I think part of the problem here is that we don't have a full proposal for decentralised accounts yet. It is not at all p2p specific though: the intention is to find a solution that lets both server admins and end-users pick a set of servers to host the users' accounts, rather than just being stuck with one server per account. The user's account would then be replicated between the servers via normal Matrix (nothing P2P-specific at all).

Concretely, one strawman approach (which I sketched out a few months ago but haven't put into a proper MSC yet, given MSC1228 is on the critical path first) would be:

  • A user's account can exist on any server which has that user's private key
  • Each user instance connecting from a different server is modelled as a separate room member
  • Room members with the same user_key are merged together in the CS API and shown as a single logical user.
  • (This conveniently kills off "device lists" as a concept from the spec, as each device in the room gets its own member)
  • Auth rules are extended to ensure that the same user on different servers can see the same events in all places.
  • The user's account data is replicated by the users by storing it as room state in a room specific to that user.
  • Users auth themselves by cryptographic proof, by proving they own the private half of the key whose private half is in account_data (as SSSS). In other words, we use today's recovery key/passphrase to login rather than having the separate confusing account password.

...at which point I think you have decentralised accounts, without any P2P voodoo beyond switching mxids to be pubkeys (MSC1228), using normal Matrix to replicate the data around. This could be used as equally for migrating between servers on the same domain, as for balancing users within servers on the same domain, as for users migrating between domains, or indeed balancing themselves between domains.

Now, it'd be easy to write this all off as scifi, but we have a really urgent need for it for P2P to work, as well as to support synapse->dendrite migrations etc, where we are not planning to spend any time on speccing interchange formats, but instead charge off and try to get the nirvana of decentralised accounts working.

That said, that's just the current proposed direction for the core team, and very very happy to sanity-check it and consider alternatives :)

@ShadowJonathan
Copy link
Contributor Author

ShadowJonathan commented Sep 23, 2020

Looking at that strawman approach, i think i would need to dive a little deeper before i could have an argument on that, and also would need to get some more details, but i see how that could indeed transparently migrate stuff (and effectively make servers temporary valets for users' account authority). I dont want to clutter this MSC with response to that, is there any channel which i can join to follow this development? I have some vague concerns, but i think i need some extra information before i could voice them correctly. (Particularly, administrative complexity, and "what is happening"-obscurity to the user, and how to communicate it simply, effectively, and correctly.)

@ShadowJonathan
Copy link
Contributor Author

I'm going to try to spend time unlocking this issue, transitioning it from draft status to full status, and then i'll probably see to making PoCs for synapse to reliably (per major version) extract data from the database to the format described here.

@ShadowJonathan ShadowJonathan changed the title [WIP] MSC2783: Homeserver Migration Data Format MSC2783: Homeserver Migration Data Format Mar 14, 2021
@ShadowJonathan ShadowJonathan marked this pull request as ready for review March 14, 2021 14:05
@ShadowJonathan
Copy link
Contributor Author

ShadowJonathan commented Mar 14, 2021

There are still a few problems with this, and i think i havent yet enumerated everything, but i consider this solid enough for now.

I'll be trying to find time to make a PoC exporter/importer for Synapse 1.29.0 to this format, updating the format with my findings and challenges as i go along.

There are some points left where i'm explicitly asking for feedback with XXX, before to give a comment if you have one.

@ShadowJonathan ShadowJonathan marked this pull request as draft May 11, 2021 15:50
@ShadowJonathan
Copy link
Contributor Author

ShadowJonathan commented May 11, 2021

I recognise this is a gargantuan effort, and that some parts of it aren't even clear if they should exist in the spec at all, so for now
I'm making this a draft while I work on this independently from the Matrix Spec.

I'll take the ideas in here and put them into a personal repository with which I'll experiment and research further efforts, I'll get a good tested PoC going for some major matrix homeservers, while also thinking about which parts of this should be applicable to the spec, if any at all.

For now, I'll say that I'm suspending this MSC until I've decided what should go into it, thanks for some of the feedback I've gotten that made me realise this, I'll take it into consideration.

Edit: I made a room for this, you can discuss it on #mhmf:matrix.org

@erkinalp
Copy link

If this came to pass, then sysadmins would not be able to force-migrate their users

That is a data protection feature, as there is no in-band way of knowing that a server would not change operators during a migration.

## General Structure

The proposal mainly defines a directory structure, this directory structure can be captures in ZIP files,
RAR files, `.tar.gz` files, or any other sort of archival or indexable directory "target".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Just pick one. Choice is bad for interoperability. I would probably start with .tar.gz because gzip is good for text and tar+compression has cross-file compression which will probably be helpful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the above formats can be interpreted as a file system directory structure, and I'm not going to mandate which one to pick, but to allow any sort of "input" to the importer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that a user will move from host $X to host $Y and won't be able to upload their data because $X exported a zipfile and $Y only accepts tarballs. It is better for the ecosystem to define the best supported export format to avoid this issue. For technical users and servers that accept many common formats this may not be an issue but I really don't see much benefit in providing choice here.

claim a specifier with this prefix (such as MSCs and custom implementations).

However, also, when processing a manifest, *all* items prefixing with `m.` MUST be processed or otherwise handled,
when an importer encounters a `m.`-prefixed item specifier it does not understand, it must abort the import process.
Copy link

@kevincox kevincox May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strong. What if an old deprecated feature is not supported? I think ideally the operator would be notified and given the choice.

I think the spec isn't the right place to decide that some imports MUST be aborted. In general for transfers like this "as much as possible with a report of failures" is what I want to see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"As much as possible" is lossy, I explicitly don't want a lossy import, the m.-prefixed keys mandate core data structures, ones that cannot be ignored, such as events, keys, login details, and other such things.

Any other key is best-effort, in that namespace you can add impl-specific config or data, and there it's free game, but not with core data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But lossy is often better than nothing. I agree that it isn't the best option but who are we to decide. I would rather let the user decide what they need at that moment. I think we could mandate something like "MUST notify the operator that the import was not successful" but may allow the operator to accept the lossy import. Maybe only one room has an unknown event and I am fine just dropping that room.

To be honest it probably doesn't matter too much because the tools will just ignore this part of the spec but I would rather not require this in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core data is core data, if you lose out on core data, no matter what it is, your homeserver will still have scars, so no.

Lossiness will only apply to the non-m.-prefixed areas, because non-m. is "best effort only", but missing out a m. is unacceptable (you wouldn't want user accounts to be malformatted when imported, or entirely missing, would you?).


It owns the directory `m.events/`, and files `events.*.cbor`.

The files contain CBOR-encoded mappings of room ID -> array of events.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec is all JSON except for #3079 which isn't accepted yet. Should we just stick with JSON for now to ensure that there are no ambiguities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON is way too sparse of a data structure, costly without reason, CBOR is a direct alternative without compromises.

Though, I'm probably going to change this to SQLite databases because CBOR files of this type don't have iterative parsers/consumers for every language out there, and I want memory usage to be low, and not that a 1 GB CBOR file causes a 2 GB memory spike on top of it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compromise is that the spec is defined in JSON and any incompatibility can be a source of issues.

I think COBR is fine but it makes sense to consider the alternatives. I am curious if you have a size comparison between compressed JSON and CBOR. I suspect there isn't that much of a difference in size. (Parse time is maybe more of a concern however I suspect that disk IO is a bigger bottleneck for most cases)

I am hesitant about requiring an implementation as the format. What about the simple alternative of newline separate JSON which can be parsed with log memory usage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly mapped JSON to CBOR has minimal size advantages. For a 352 byte standard event:

{"type":"m.room.member","sender":"@username:matrix.org","content":{"avatar_url":"mxc://matrix.org/FLOIFFxSxJVrtqHqKJAvaZGNRJ","membership":"join","displayname":"Username"},"event_id":"$15943751151sWhiO:matrix.org","unsigned":{"age":14,"replaces_state":"$15943751120CQQql:matrix.org"},"state_key":"@username:matrix.org","origin_server_ts":1594375115332}
  • As direct CBOR: 308 bytes
  • As gzipped JSON: 259 bytes
  • As MSC3079 CBOR: 205 bytes
  • As gzipped MSC3079 CBOR: 178 bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for curiosity's sake, what about gzipped CBOR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

254 bytes.


The files contain CBOR-encoded mappings of user ID -> user details.

A user ID key mapping MUST only exist *once* across all files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? A server can have a lot of user info and it may be useful to split it into multiple files to enable parallel dumping/loading. For example imagine a bot user that is in many thousands of rooms. This also seems inconsistent with the m.rooms export.

Copy link
Contributor Author

@ShadowJonathan ShadowJonathan May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User details contain information about account data and keys and such, user membership is in the state events given with the regular "event data", and a server can/should reconstruct membership based on that data, like the old server can.

If that data is over one gigabyte, then that's on the user and the admin partially, I don't expect to be handling a gigabyte of account data, but even if that's the case, I don't wanna prescribe hell to the migration tools by splitting and merging keys across multiple files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather prescribe "hell" than have import fail. For a lot of lower-end devices slow is acceptable.

You are right that rooms are not in this file but I see what room_tags are which can still be unbounded in size. I guess a future change could define an alternate format for those that is size-bounded?


XXX: Need expertise for this, I don't know how much or what specifically i should or could capture here.

## Potential issues

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import object size.

  1. Large files may be hard for servers to manage. Ideally we would allow non-streaming parsing to be viable for ease of client implementations. Maybe we need to limit files to be <16MiB or something? (Note that the in-memory representation of COBR/JSON can be multiple times larger than the serialized representation).
  2. Large files that must be parsed sequentially may prevent parallelism and cause unacceptably slow imports. For example large files are are a single JSON or COBR object can not be parsed in parallel. Maybe we should consider a format such as JSON objects on separate lines which can be parsed in parallel. (That being said tar archives can't fully be parsed in parallel anyways, but they can be streamed and the import can be parallelized)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, large object files are bad, but many small files are also bad, as that is relatively heavy on the file system, so I decided to experiment with SQLite for the time being.
  2. The parallelisation needs to be at both ends, and I don't know if a server importer could support importing multiple events from multiple timelines at once, maybe "per room", but even then for the stateres (for it to be efficient) it has to happen sequentially per room.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather err on the size of many small files than many large files. Small files are slow, large files fail. I agree that the idea spot is somewhere in the middle which is why it makes sense to allow batching things into files but splitting them at sensible points.

Right now all of the files appear to be single JSON objects which can be very memory demanding to parse (unless you are using streaming parsing which is rare). A mitigation would be to ensure that the file format can be naturally parsed in a streaming manor such as JSON-lines or SQLite.

I agree that both ends must be able to be parallelized. I'm not sure if stateres can be parallelized but I wouldn't want the import format to be blocking that. I can also imagine it would be possible to do a bulk parallel import then run a stateres which may be able to avoid outdated work and may be significantly faster than doing a serial import even on the per-room level. For human chats the size of an individual room shouldn't be too bad but Matrix does try to be general in what rooms are for.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parallelisation needs to be at both ends, and I don't know if a server importer could support importing multiple events from multiple timelines at once, maybe "per room", but even then for the stateres (for it to be efficient) it has to happen sequentially per room.

If you treat the import as a bulk data input, then you can do the state resolution after completing the import. Do not couple the import and processing.

@turt2live turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Jun 8, 2021
@nyabinary
Copy link

This still seems extremely useful (and necessary) current day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants