-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race in application of pending edits leads to stability issues #8770
Comments
Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-15): Progress: Trying to come up with a small reproducible case for the truffle assertion failure that manifested itself in #8595. The equality check seems to be dependent on the order of specializations making it hard. The wrong definition for Nothing with Warnings will be addressed separate, as discussed with Pavel. After looking into reported logs I think I finally came up with a scenario that leads to ongoing stability issues. Will need to specialize some commands to execute sequentially and in order of submission (currently not guaranteed for edits). It should be finished by 2024-01-17. Next Day: Next day I will be working on the #8770 task. Continue investigating the issue |
Rather than adding another pool for executing yet another synchronous commands I think we should generalize the approach and allow for executing sequentially any tasks grouped under the same key. |
This is a quick fix to a long standing problem of `org.enso.interpreter.service.error.FailedToApplyEditsException` which would prevent backend from processing any more changes, rendering GUI (and backend) virtually useless. Edits are submitted for (background) processing in the order they are handled. However the order of execution of such tasks is not guaranteed. Most of the time edits are processed in the same order as their requests but when they don't, files get quickly out of sync. I'm not a fan of this change because it essentially blocks all open/file requests until all edits are processed and we already have logic to deal with that appropriately. Moreover those tasks can and should be processed independently. Since we already had the single thread executor present to ensure correct synchronization of open/file/push commands, we are simply adding edit commands to the list. Ideally we want to have a specialized executor that executes tasks within the same group sequentially but groups of tasks can be executed in parallel, thus ensuring sufficient throughput. The latter will take much longer and will require significant rewrite of the command execution. Related to #8770.
Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-16): Progress: Reviewing #8712. Trying to come up with a generic solution for executing (some) tasks sequentially for #8770. It should be finished by 2024-01-17. Next Day: Next day I will be working on the #8770 task. Continue investigating the issue |
Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-17): Progress: Provided a temporary workaround that fixes most of stability issues. Still trying to figure a more generic solution. Stumbled upon some bugs (#8793, #8792) during testing and verifying logs submitted by the users. Having a hard time coming up with a unit test case. It should be finished by 2024-01-17. Next Day: Next day I will be working on the #8770 task. Investigating a generic solution + test case. |
This is a quick fix to a long standing problem of `org.enso.interpreter.service.error.FailedToApplyEditsException` which would prevent backend from processing any more changes, rendering GUI (and backend) virtually useless. Edits are submitted for (background) processing in the order they are handled. However the order of execution of such tasks is not guaranteed. Most of the time edits are processed in the same order as their requests but when they don't, files get quickly out of sync. I'm not a fan of this change because it essentially blocks all open/file requests until all edits are processed and we already have logic to deal with that appropriately. Moreover those tasks can and should be processed independently. Since we already had the single thread executor present to ensure correct synchronization of open/file/push commands, we are simply adding edit commands to the list. Ideally we want to have a specialized executor that executes tasks within the same group sequentially but groups of tasks can be executed in parallel, thus ensuring sufficient throughput. The latter will take much longer and will require significant rewrite of the command execution. Related to #8770.
Hubert Plociniczak reports a new 🔴 DELAY for yesterday (2024-01-18): Summary: There is 2 days delay in implementation of the Race in application of pending edits leads to stability issues (#8770) task. Delay Cause: Unexpected time off |
Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-18): Progress: Continued working on a test case. It should be finished by 2024-01-19. Next Day: Next day I will be working on the #8770 task. Still investigating |
Hubert Plociniczak reports a new STANDUP for the provided date (2024-01-19): Progress: Struggled with JPMS to allow for mocking elements of our infrastructure. With help from Pavel we were able to get it working. Looked into random timeouts reported for engine tests (#8806); will need proper investigation on reducing resources used by zio. It should be finished by 2024-01-19. Next Day: Next day I will be working on the #8770 task. Address PR review. Pick up next item. |
This is a quick fix to a long standing problem of `org.enso.interpreter.service.error.FailedToApplyEditsException` which would prevent backend from processing any more changes, rendering GUI (and backend) virtually useless. Edits are submitted for (background) processing in the order they are handled. However the order of execution of such tasks is not guaranteed. Most of the time edits are processed in the same order as their requests but when they don't, files get quickly out of sync. Related to #8770. # Important Notes I'm not a fan of this change because it essentially blocks all open/file requests until all edits are processed and we already have logic to deal with that appropriately. Moreover those tasks can and should be processed independently. Since we already had the single thread executor present to ensure correct synchronization of open/file/push commands, we are simply adding edit commands to the list. Ideally we want to have a specialized executor that executes tasks within the same group sequentially but groups of tasks can be executed in parallel, thus ensuring sufficient throughput. The latter will take much longer and will require significant rewrite of the command execution. Added tests that would previously fail due to non-deterministic execution.
Individual edit requests are submitted by GUI client and served by ApplyEditHandler while preserving their order.
The actual application of pending edits is delayed until EditFileCmd is executed (as submitted by collaborative buffer). There we enqueue and trigger a compilation. The latter will eventually attempt to get all edits, apply them and execute the compilation.
We were seeing numerous stability issues, like #8174, #7978 and plenty of others that have not been reported, which we were never able to reproduce and fix properly.
After adding some debugging statements I'm fairly sure that what we are seeing is a race-condition in the execution of
EditFileCmd
s. While they are submitted to the executor in the same order as the client requests, nothing really guarantees that the executor will run them in that same order. Most of the time that will be the case, or otherwise one would not be able to use the IDE, but occasionally one of them will execute out-of-order leading to immediate synchronization issues. Old IDE would reopen the file on such failure but a) it stopped doing that b) it's a hack leading to data loss.The solution is to ensure that all
EditFileCmd
commands are executed sequentially and in the order of submission. The latter is already ensured with the appropriate use of locks.The text was updated successfully, but these errors were encountered: