Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race in application of pending edits leads to stability issues #8770

Closed
hubertp opened this issue Jan 16, 2024 · 7 comments
Closed

Race in application of pending edits leads to stability issues #8770

hubertp opened this issue Jan 16, 2024 · 7 comments
Assignees
Labels
--bug Type: bug -language-server p-highest Should be completed ASAP

Comments

@hubertp
Copy link
Contributor

hubertp commented Jan 16, 2024

Individual edit requests are submitted by GUI client and served by ApplyEditHandler while preserving their order.

The actual application of pending edits is delayed until EditFileCmd is executed (as submitted by collaborative buffer). There we enqueue and trigger a compilation. The latter will eventually attempt to get all edits, apply them and execute the compilation.

We were seeing numerous stability issues, like #8174, #7978 and plenty of others that have not been reported, which we were never able to reproduce and fix properly.

After adding some debugging statements I'm fairly sure that what we are seeing is a race-condition in the execution of EditFileCmds. While they are submitted to the executor in the same order as the client requests, nothing really guarantees that the executor will run them in that same order. Most of the time that will be the case, or otherwise one would not be able to use the IDE, but occasionally one of them will execute out-of-order leading to immediate synchronization issues. Old IDE would reopen the file on such failure but a) it stopped doing that b) it's a hack leading to data loss.

The solution is to ensure that all EditFileCmd commands are executed sequentially and in the order of submission. The latter is already ensured with the appropriate use of locks.

@hubertp hubertp added --bug Type: bug p-highest Should be completed ASAP -language-server labels Jan 16, 2024
@JaroslavTulach JaroslavTulach moved this from ❓New to 🔧 Implementation in Issues Board Jan 16, 2024
@enso-bot
Copy link

enso-bot bot commented Jan 16, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-15):

Progress: Trying to come up with a small reproducible case for the truffle assertion failure that manifested itself in #8595. The equality check seems to be dependent on the order of specializations making it hard. The wrong definition for Nothing with Warnings will be addressed separate, as discussed with Pavel. After looking into reported logs I think I finally came up with a scenario that leads to ongoing stability issues. Will need to specialize some commands to execute sequentially and in order of submission (currently not guaranteed for edits). It should be finished by 2024-01-17.

Next Day: Next day I will be working on the #8770 task. Continue investigating the issue

@hubertp
Copy link
Contributor Author

hubertp commented Jan 16, 2024

Rather than adding another pool for executing yet another synchronous commands I think we should generalize the approach and allow for executing sequentially any tasks grouped under the same key.

hubertp added a commit that referenced this issue Jan 17, 2024
This is a quick fix to a long standing problem of
`org.enso.interpreter.service.error.FailedToApplyEditsException` which
would prevent backend from processing any more changes, rendering GUI
(and backend) virtually useless.
Edits are submitted for (background) processing in the order they are
handled. However the order of execution of such tasks is not guaranteed.
Most of the time edits are processed in the same order as their requests
but when they don't, files get quickly out of sync.

I'm not a fan of this change because it essentially blocks all open/file
requests until all edits are processed and we already have logic to deal
with that appropriately. Moreover those tasks can and should be
processed independently. Since we already had the single thread executor
present to ensure correct synchronization of open/file/push commands, we
are simply adding edit commands to the list.

Ideally we want to have a specialized executor that executes tasks
within the same group sequentially but groups of tasks can be executed
in parallel, thus ensuring sufficient throughput. The latter will take
much longer and will require significant rewrite of the command
execution.

Related to #8770.
@enso-bot
Copy link

enso-bot bot commented Jan 17, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-16):

Progress: Reviewing #8712. Trying to come up with a generic solution for executing (some) tasks sequentially for #8770. It should be finished by 2024-01-17.

Next Day: Next day I will be working on the #8770 task. Continue investigating the issue

@enso-bot
Copy link

enso-bot bot commented Jan 18, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-17):

Progress: Provided a temporary workaround that fixes most of stability issues. Still trying to figure a more generic solution. Stumbled upon some bugs (#8793, #8792) during testing and verifying logs submitted by the users. Having a hard time coming up with a unit test case. It should be finished by 2024-01-17.

Next Day: Next day I will be working on the #8770 task. Investigating a generic solution + test case.

hubertp added a commit that referenced this issue Jan 19, 2024
This is a quick fix to a long standing problem of
`org.enso.interpreter.service.error.FailedToApplyEditsException` which
would prevent backend from processing any more changes, rendering GUI
(and backend) virtually useless.
Edits are submitted for (background) processing in the order they are
handled. However the order of execution of such tasks is not guaranteed.
Most of the time edits are processed in the same order as their requests
but when they don't, files get quickly out of sync.

I'm not a fan of this change because it essentially blocks all open/file
requests until all edits are processed and we already have logic to deal
with that appropriately. Moreover those tasks can and should be
processed independently. Since we already had the single thread executor
present to ensure correct synchronization of open/file/push commands, we
are simply adding edit commands to the list.

Ideally we want to have a specialized executor that executes tasks
within the same group sequentially but groups of tasks can be executed
in parallel, thus ensuring sufficient throughput. The latter will take
much longer and will require significant rewrite of the command
execution.

Related to #8770.
@enso-bot
Copy link

enso-bot bot commented Jan 19, 2024

Hubert Plociniczak reports a new 🔴 DELAY for yesterday (2024-01-18):

Summary: There is 2 days delay in implementation of the Race in application of pending edits leads to stability issues (#8770) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Unexpected time off

@enso-bot
Copy link

enso-bot bot commented Jan 19, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-18):

Progress: Continued working on a test case. It should be finished by 2024-01-19.

Next Day: Next day I will be working on the #8770 task. Still investigating

@hubertp hubertp moved this from 🔧 Implementation to 👁️ Code review in Issues Board Jan 22, 2024
@enso-bot
Copy link

enso-bot bot commented Jan 22, 2024

Hubert Plociniczak reports a new STANDUP for the provided date (2024-01-19):

Progress: Struggled with JPMS to allow for mocking elements of our infrastructure. With help from Pavel we were able to get it working. Looked into random timeouts reported for engine tests (#8806); will need proper investigation on reducing resources used by zio. It should be finished by 2024-01-19.

Next Day: Next day I will be working on the #8770 task. Address PR review. Pick up next item.

mergify bot pushed a commit that referenced this issue Jan 22, 2024
This is a quick fix to a long standing problem of
`org.enso.interpreter.service.error.FailedToApplyEditsException` which would prevent backend from processing any more changes, rendering GUI (and backend) virtually useless.
Edits are submitted for (background) processing in the order they are handled. However the order of execution of such tasks is not guaranteed. Most of the time edits are processed in the same order as their requests but when they don't, files get quickly out of sync.

Related to #8770.

# Important Notes
I'm not a fan of this change because it essentially blocks all open/file requests until all edits are processed and we already have logic to deal with that appropriately. Moreover those tasks can and should be processed independently. Since we already had the single thread executor present to ensure correct synchronization of open/file/push commands, we are simply adding edit commands to the list.

Ideally we want to have a specialized executor that executes tasks within the same group sequentially but groups of tasks can be executed in parallel, thus ensuring sufficient throughput. The latter will take much longer and will require significant rewrite of the command execution.

Added tests that would previously fail due to non-deterministic execution.
@hubertp hubertp closed this as completed Jan 24, 2024
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
--bug Type: bug -language-server p-highest Should be completed ASAP
Projects
Archived in project
Development

No branches or pull requests

1 participant