Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3972: Lexicographical strings as an ordering mechanism #3972

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Dominaezzz
Copy link
Contributor

Rendered

This MSC is meant to address the existing client implementation bugs of converting space order strings to numbers and back. Particularly the ones that arose from implementing MSC3230.
The existing implementations either create gaps (reducing the number of possible order strings) or create collisions which can subtly break drag and drop.

Signed-off-by: Dominic Fischer dominicfischer7@gmail.com

@turt2live turt2live added proposal A matrix spec change proposal client-server Client-Server API kind:maintenance MSC which clarifies/updates existing spec needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Feb 26, 2023
@Dominaezzz
Copy link
Contributor Author

I'm not sure how to satisfy the spell check there but I'll take this out of draft regardless.

@Dominaezzz Dominaezzz marked this pull request as ready for review February 26, 2023 18:36
@Dominaezzz
Copy link
Contributor Author

I don't have a client using this yet but I have this implemented multiple ways in Kotlin with plenty unit tests in https://github.com/Dominaezzz/MatrixSort .

Would this satisfy the "needs-implementation"?

Comment on lines +11 to +12
The existing implementations take the obvious route of converting the lexicographical strings to numbers,
performing the generative operation with the numbers, then converting the numbers back to strings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are describing below is a mapping to an integer. That's however not what the ordering fields were supposed to be for. They are actually an implementation of a floating point number, since you can always insert new characters to insert something in between the previous 2 elements. Hardcoding some integer conversion would be somewhat limiting then imo. Can you show some existing implementations that actually do, what you described? In the end lexical strings can always be compared correctly without converting them to numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you can always insert new characters to insert something in between the previous 2 elements

Yes, this works, until you hit the character limit, then you have to get more creative.

Hardcoding some integer conversion would be somewhat limiting then imo

How so? This algorithm creates a one to one mapping between a string and an integer, so limits stay the same. (Assuming the language uses some big integer implementation of course).

show some existing implementations that actually do

Sure! In the MSC I referenced, there's

  • Element Android which creates collisions as multiple strings map to the same number. For example "A" and "AA" are both 0 when they should be 1 and 2 respectively.
  • Element Web which creates gaps as the strings don't map to a contiguous set of numbers. For example "A" and "AA" map to 27 and 36 respectively. The gap is problematic when doing midpoint calculations as you can imagine.

In the end lexical strings can always be compared correctly without converting them to numbers.

Comparison is easy to do correctly but calculating the midpoint between two strings is not and that is what I'm solving with this MSC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? This algorithm creates a one to one mapping between a string and an integer, so limits stay the same. (Assuming the language uses some big integer implementation of course).

You would need to resend all space children events whenever you reorder rooms. The current implementation allows for resending just a single event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where I lost you but this MSC doesn't affect the number of events sent. Element Web and Android currently convert the order strings to integers but they're doing it wrong. This MSC proposes a correct way to do the conversion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently space events have an ordering, which uses these strings. Lets assume you have 2 elements, using 2 consecutive strings:

  • A
  • AA

Those map to 1 and 2 using your algorithm. Now you want to insert into the middle of those. Using your algorithm, you would need to renumber AA, because you don't have fractions:

  • A
  • AA <- new element
  • AAA <- renumbered

With the current spec, you can inset a new Element without renumbering

  • A
  • A0 <- new element.
  • AA <- no renumbering

This always works as long as the first and second Element are not both of the max string length (50) AND have a distance of only 1 or the second Element is not a single space. You have to be very intentional to break that.

Your algorithm however makes it very easy to break by accident. If only "", "A", "AA" are the 3 smallest numbers, then clients will start making the first space in a list "", and the second one "A". Then moving any space up to the top there means you need to renumber everything after it.

You can still make broken orders in the existing spec, that need lots of reorderings to update the space list, but I would say with your approach it is significantly more likely. Yes, the existing spec makes the midpoint calculation harder, but only if you want an exact midpoint. In most cases finding some string in between is already enough.

I guess you could fix that by not making the mapping strict integer mappings, but still allowing the floating behaviour where you can just make the original string longer to add more insertion points, but currently your proposal removes that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone back to reread my MSC and I think I see where the confusion is from.

For the sake of simplicity, I will be limiting the range of characters to be just A, B and C with a limit of 3 characters.

This limitation I stated here was for the sake of explanation. I'm not actually saying we should change the dictionary that's currently use in the spec, I'm saying that the explanation below will use a small dictionary just so it's easier to comprehend.

With the current spec, you can inset a new Element without renumbering

0 isn't a valid character in my example so you wouldn't be able to use it.

Your algorithm however makes it very easy to break by accident. If only "", "A", "AA" are the 3 smallest numbers, then clients will start making the first space in a list "", and the second one "A". Then moving any space up to the top there means you need to renumber everything after it.

I could say the same thing for the existing spec's dictionary.
The existing algorithm however makes it very easy to break by accident. If only "", " ", " " are the 3 smallest numbers, then clients will start making the first space in a list "", and the second one " ". Then moving any space up to the top there means you need to renumber everything after it.

Yes, the existing spec makes the midpoint calculation harder, but only if you want an exact midpoint.

I'm not changing the difficulty of midpoint calculation (this algorithm definitely wasn't the easiest to come up with), I'm just fixing a common bug in one of the steps.

In most cases finding some string in between is already enough.

Sure, but who wants to deploy code that only works most of the time, instead of all of the time.

I guess you could fix that by not making the mapping strict integer mappings, but still allowing the floating behaviour where you can just make the original string longer to add more insertion points

I'm not sure what your issue is with strict integer mappings but when you have a max length and a finite set of characters, the set of all possible strings is finite. Which means you can represent every possible string with a unique integer.

but currently your proposal removes that.

No it doesn't. My proposal provides a reliable way for clients to do midpoint calculations correctly.
It is up to clients to make sure that when they assign order strings that they do it in an evenly distributed fashion to allow plenty space for inserting without renumbering.

See this thread here. The "Recommended algorithm to compute mid points:" section is precisely what I'm trying to fix. Nothing more, nothing less.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API kind:maintenance MSC which clarifies/updates existing spec needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants