-
-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change DB structure #1902
Merged
jubalh
merged 3 commits into
profanity-im:master
from
H3rnand3zzz:feature/improved-history
Nov 6, 2023
Merged
Change DB structure #1902
jubalh
merged 3 commits into
profanity-im:master
from
H3rnand3zzz:feature/improved-history
Nov 6, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
October 27, 2023 14:24
8b85929
to
36b6fc7
Compare
H3rnand3zzz
changed the title
Change DB structure [POC, WIP, DANGEROUS (READ DESCRIPTION BEFORE TESTING)]
Change DB structure [POC, WIP, READ DESCRIPTION BEFORE TESTING]
Oct 27, 2023
H3rnand3zzz
changed the title
Change DB structure [POC, WIP, READ DESCRIPTION BEFORE TESTING]
Change DB structure [READ DESCRIPTION BEFORE TESTING]
Oct 27, 2023
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
October 28, 2023 12:28
36b6fc7
to
73afa26
Compare
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Oct 28, 2023
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp` column as well. Performance boost is noticeable. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
October 28, 2023 16:05
73afa26
to
574d010
Compare
H3rnand3zzz
changed the title
Change DB structure [READ DESCRIPTION BEFORE TESTING]
Change DB structure
Oct 28, 2023
jubalh
reviewed
Oct 28, 2023
jubalh
reviewed
Oct 28, 2023
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Oct 29, 2023
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp` column as well. Performance boost is noticeable. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
October 29, 2023 07:08
574d010
to
fcd6605
Compare
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Oct 29, 2023
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp` column as well. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
October 29, 2023 07:09
fcd6605
to
d17671a
Compare
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Oct 29, 2023
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Write messages with duplicate stanza IDs in the DB and allow LMC for them. Index `timestamp` column as well. Further details are available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
October 29, 2023 14:17
d17671a
to
4383e00
Compare
jubalh
reviewed
Oct 29, 2023
jubalh
reviewed
Oct 29, 2023
jubalh
reviewed
Oct 29, 2023
jubalh
reviewed
Oct 29, 2023
jubalh
reviewed
Oct 29, 2023
jubalh
reviewed
Oct 29, 2023
jubalh
reviewed
Oct 29, 2023
jubalh
reviewed
Oct 29, 2023
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Nov 3, 2023
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
November 3, 2023 15:16
fc20e1f
to
539dd10
Compare
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Nov 3, 2023
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
November 3, 2023 16:56
539dd10
to
a1fd2a1
Compare
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Nov 3, 2023
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
November 3, 2023 17:12
a1fd2a1
to
0e17651
Compare
Last version has been successfully retested (tests list is in the PR message). |
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Nov 3, 2023
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
November 3, 2023 17:34
0e17651
to
846642e
Compare
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Nov 3, 2023
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
November 3, 2023 18:34
846642e
to
56453b4
Compare
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Nov 3, 2023
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
November 3, 2023 18:59
56453b4
to
ac03776
Compare
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Nov 3, 2023
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
November 3, 2023 19:19
ac03776
to
d8d4ad7
Compare
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
November 3, 2023 19:32
d8d4ad7
to
d89b5a0
Compare
Enhance data consistency by updating the database to treat empty strings as NULL values. This change simplifies queries and improves overall database integrity.
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
November 6, 2023 19:57
bd13b30
to
6c5e79e
Compare
Disallow correcting historical MUC messages, as the XEP-308 requires. Previous changes introduce problem with "Illicit LMC attempt from conference@server/user for message from user" During investigation it was revealed that XEP does not recommend support of historical MUC messages correction. ``` When used in a Multi-User Chat (XEP-0045) context, corrections must not be allowed (by the receiver) for messages received before the sender joined the room - particularly a full JID leaving the room then rejoining and correcting a message SHOULD be disallowed, as the entity behind the full JID in the MUC may have changed. ``` https://xmpp.org/extensions/xep-0308.html#rules XEP details mentioned by @jubalh Bug discovered and solution improved by @jaeckel
H3rnand3zzz
force-pushed
the
feature/improved-history
branch
from
November 6, 2023 21:08
6c5e79e
to
a671cff
Compare
sjaeckel
approved these changes
Nov 6, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! Thanks
Thanks everybody for the detailed work and reviewing and patience. |
H3rnand3zzz
added a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Nov 11, 2023
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
H3rnand3zzz
pushed a commit
to H3rnand3zzz/profanity
that referenced
this pull request
Nov 11, 2023
Follow-up of 89dc7a4 resp. profanity-im#1902 Signed-off-by: Steffen Jaeckel <jaeckel-floss@eyet-services.de>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backup your DB before performing any testing.
All tests should be done with the final version.
/receipts
on/offEven though it does not drop any tables, I want to be on the cautious side and avoid any potential data loss by current or future profanity developers.
It's not ready for release yet and just shows PoC. The point of the PR is to share the idea for further discussion and development. I am not even sure if it works properly or even works in some parts. In 1 day I plan to tidy it up and make it generally better.
Read commit message for further details. Discussion links are going to be attached here and in commit message later.
LMC is (not) fun
Abstract
The initial problem is that when we receive messages from old client, they can have duplicate IDs and currently we don't even write them to DB, but it goes much further than that: current DB visualization (message history) relies on heavy JOIN with redacted messages. Explanation for necessity of
replaces_db_id
can be found in earlier discussions (#1893, but primarily in #1899), but to simplify, it would improve performance as well as give another benefit which is going to be discussed further. I will provide explanation for message editing (chained edits vs direct edits), short explanation for duplicated IDs problem and explained new message storage approach.Chained Edits vs Direct Edits
Chained Edit is edit that points to previous edit on second and further edits. E.g.
We have to traverse all the edits until we can find the original message. This is not an ideal solution and LMC was not designed to be this way. Quoting the XEP,
If the same message is to be corrected several times, the id of the original message is used in each case (e.g. If a message of id 1 is corrected by a message of id 2, a subsequent correction should correct 1, not 2).
Though this misconception is so common that even Profanity, as well as some other popular XMPP clients (e.g. Psi) use this approach for subsequent edits.
Note: sometimes it's regarded in conversation as edit->edit->edit->original.
Direct edit is edit that always points to the original message
This approach allows us to always find the original message without any need to follow chain of edits. Some clients (e.g. Gajim) follow this XEP.
But as some clients make chained edits with different stanza ids, the best way for us to handle both ways would be introducing
replaces_db_id
and using it for direct-edit approach internally, thus accepting both ways for editing messages.Here we would check xx's replace_db_id and set
replaces_db_id
as 1 (because it's the original message). Thus we don't rely on external client's approach and can always save messages in a suitable manner (performance- and design-wise).Duplicated ID problem
Duplicated ID is not a problem within itself. We can easily put messages with duplicated IDs in the DB as even DB field for
stanza_id
is not set to beUNIQUE
. The problem starts with message replacement. When we get message replacement (LMC) for the message withstanza_id
that is present 2 or more times, with old approach it wouldn't be (or barely would be) possible to replace them correctly on visualization, not to mention trying to stay performant while doing so.replaces_db_id
could be a key to solving this problem, but it still would be just as an inefficient visualization with JOIN which can be called on every scroll up.New DB approach
Introduced fields:
replaces_db_id
database ID for correcting message of the original messagereplaced_by_db_id
database ID for original message of the last correcting messageThis way we connect all chained edits as direct edits, hence always have reference to the original message.
replaced_by_db_id
provides ability to replace messages quickly and correctly (JOIN only with latest change wasn't possible even withreplaces_db_id
).Editable message storage approach (unacceptable, rejected by design police, left for historical ref.)
- `replace_id` -> `replaces_stanza_id` stanza ID of the replaced message - `replaces_db_id` db ID of the replaced message - `original_message` original message before replacement - `message` now is used to store visual message (replaced by LMC)So to show messages, we just need to select. But to store previous message, we can use
original_message
which stores a message which was in the original's message column at the moment of change. E.g. "Hlo" being replaced to "hello", and then to "Hello world!" we save "Hlo" tooriginal_message
field of the LMC and set the original message to "hello", after this on further change we will save "hello" as anoriginal_message
and set the original's message to "Hello world". In the DB it will look like that:Despite looking unintuitive (theoretically,
original_message
could be renamed toprevious_message
), we have performant solution that keeps full edit history, performant on visualization (just a select) and robust enough to allow direct linking despite external chain link.Other changes
Edited visualization of LMC, allowing to receive messages that do not have original message counterpart in the DB (as original might be lost, e.g. received by other client during disconnect). I couldn't find good way to allow deeper (3+ edits) chained edits, so it's now similar to gajim's approach (every second edit we display a new message and then it can be redact once ad infinitum). In future we can use DB ID as a key for buffer instead of stanza ids.
Fix #1899