[FL-3783] Asynchronous Infrared remote manipulation #3503

gsurkov · 2024-03-07T18:35:38Z

What's new

Load, delete, rename, move whole remotes and individual buttons in a separate thread in order to not stall the UI.

Verification

Install both the firmware and the Infrared application.
Go to Infrared -> Universal Remotes -> TVs
Mash the buttons while the busy animation is playing. The app should ignore the input and successfully load the file.
Perform all other potentially lengthy operations: Rename the signals, move the signals. Mash the buttons during the busy animation as well (Works best on large remotes with hundreds of signals). The result should be the same as in step 3.
Test all other features to make sure that no regressions are present.

Checklist (For Reviewer)

PR has description of feature/bug or link to Confluence/Jira task
Description contains actions to verify feature/bugfix
I've built this code, uploaded it to the device and verified feature/bugfix

…rently

github-actions · 2024-03-07T18:37:28Z

Compiled f7 firmware for commit bbcc3917:

…_fix

CookiePLMonster · 2024-03-07T19:18:48Z

What happens to the concurrent jobs if the user leaves the scenes triggering them very fast? Seems like it's impossible to block on the runner task to wait before freeing the context data gracefully, which seems like it could cause issues if the app gets closed before the task gets the chance to finish work - maybe not exactly in the current use cases, but it would probably happen if future users of this toolbox lib put truly long operations in there.

Ideally the runner spawn could return a handle the user could join/block on, but that would also require freeing it manually - unless thread semantics were to be applied there and the user could perform a detach operation at their own risk.

gsurkov · 2024-03-08T10:32:20Z

What happens to the concurrent jobs if the user leaves the scenes triggering them very fast?

This should not be possible as before starting a concurrent job a busy animation view is put on the view stack which is blocking all input events. In the improbable case in which it happens despite that, no memory leak would occur as the runner will free its resources automatically and the event will be delivered in some other view.

Your remark is perfectly valid though, I will make it more clear in the docs that the application is responsible for keeping all relevant contexts valid until the runner completes (the completion is signaled by the finished_callback).

Adding a return handle would defeat the purpose of this helper, as it would be no different from using a FuriThread directly in terms of code verbosity.

CookiePLMonster · 2024-03-08T18:31:44Z

Your remark is perfectly valid though, I will make it more clear in the docs that the application is responsible for keeping all relevant contexts valid until the runner completes (the completion is signaled by the finished_callback).

I'm just kind of worried that it might be technically impossible to 100% safeguard against the code getting ripped out from below this thread. It's an edge case, but possibly a valid one nonetheless. Fair warning, it's nitpicky but probably technically valid unless threads keep a strong reference to the application code, which I don't know if they do.

My concern is a scenario covered by Raymond Chen on The Old New Thing - while it concerns Windows, the problem is not platform-specific. Consider the following chain of actions:

App A starts.
App A wants to spawn a Concurrent Runner (I will call it Job from now on, to make it shorter), but it also wants to make sure that the app cannot terminate before the job is done. For this, an Event X gets created.
App A spawns Job 1, and passes Event X in the context, then it proceeds with other work.
Job 1 starts doing its work. In the meantime, App A wants to terminate, but in order to avoid use-after-free of the context, it waits for Event X to become signaled.
Job 1 finishes and executes the finish callback. In that callback, the event gets signaled with furi_event_flag_set.
Now consider what happens if the context switch back to the main thread of App A happens right after furi_event_flag_set gets called, before it returns back to the app code to execute the function epilogue: The thread wakes up because the event is set, frees resources, and terminates. Flipper Application Loader proceeds to unload the FAP code from memory.
In the meantime, context switch back to the finish callback happens and furi_event_flag_set returns. However, App A code has already been unloaded, and this possibly causes a crash.

Now, I am actually not sure how to fix this. On Windows, this class of issues is resolved with FreeLibraryAndExitThread that is a noreturn function and it exits the thread without returning back to the application code. However, a theoretical furi_event_flag_set_and_exit_thread would not help (even if FreeRTOS had means to implement it, which I don't know if it does), because the finish callback is executed on the timer thread! Unless... "exiting" is implemented by returning "two functions up" somehow, maybe via setjmp? The goal there would be to jump from inside furi_event_flag_set_and_exit_thread straight to concurrent_runner_timer_callback, so a firmware function -> firmware function jump, without going through even a single instruction of the application code in that window.

One way to solve this would be to let the concurrent runner accept an optional event flag parameter, and have that flag set from inside concurrent_runner_timer_callback itself. This way if proper synchronization is used, the job code would not ever race against the app teardown. However, that adds complexity to setting up a concurrent runner just to close an extreme edge case, so I wouldn't necessarily be a fan of it either, unless maybe as an _ex call.

EDIT:
setjmp and longjmp would technically be fit for this purpose (longjmp is marked with noreturn!), but they require passing the context parameter around, so it's not good enough :/

EDIT2:
Alternately since the "code disappears from under the running thread" concerns only the process exit and everything else can be guarded against well, maybe the concurrent runner should keep a count of active runners and provide a function to wait for this count to drop to 0? This way it could just be called once on exit (maybe even by the Flipper Application Loader and not the user) and only proceed to free the code once all runners finish.

EDIT3:
Yet another idea, possibly the easiest one - disabling the scheduler (furi_kernel_lock) for the duration of the finish callback would close this potential race entirely. It would only need an additional note in the docs that you may not wait on events/mutexes from the finish callback. This should be fine to enforce, if the application really needs to do that it could do so at the end of the job callback and not in the finish callback. Besides, even without this change, stalling the timer thread is a bad idea.

DrZlo13 · 2024-03-11T10:48:39Z

@CookiePLMonster concurrent_runner_timer_callback function is lying on the mcu flash, so it will be safe to jump there

CookiePLMonster · 2024-03-11T11:08:03Z

@CookiePLMonster concurrent_runner_timer_callback function is lying on the mcu flash, so it will be safe to jump there

@DrZlo13 Yes, but the finish_callback can still race with the other thread that could immediately wake up upon signaling an event in that callback. In a theoretical scenario like:

void finish_callback_in_my_app(void* context)
{
   // Do stuff
   furi_event_flag_set(this_event_will_let_the_app_quit, 1);
   
   // The function epilogue (and RAII destructors in C++) can still execute here, while this code may have been released by the app exiting already
}

To fix this, the function signaling the event must either be a noreturn function (which is not viable in this case because teardown after the finish_callback still needs to be executed), or the scheduler must not be allowed to context switch until it's returned from the user callback, hence the idea to employ furi_kernel_lock before calling into finish_callback.

So, a function like this would be safe from the race, with an additional note in the docs that the finish callback should not perform any blocking tasks (sleep, locking mutexes, waiting on queues). Those shouldn't be performed there even now, because the callback executes on the timer thread, but with the scheduler disabled this becomes a matter of utmost importance.

static void concurrent_runner_timer_callback(void* context, uint32_t arg) {
    UNUSED(arg);

    ConcurrentRunner* instance = context;
    furi_thread_join(instance->thread);

    if(instance->finished_callback) {
        furi_kernel_lock();
        instance->finished_callback(instance->context);
        furi_kernel_unlock();
    }

    furi_thread_free(instance->thread);
    free(instance);
}

EDIT:
I apologize for a wrong ping, I need more coffee.

gsurkov · 2024-03-11T12:36:36Z

@CookiePLMonster Thanks for a clear example!

The solution you have proposed won't work though, because we HAVE to wait on a queue in view_dispatcher_send_custom_event(), which is currently called in the finished_callback every time.

After some discussion, we have decided that this implementation has lots of potential misuse cases and we'll come up with a better one in the future (since such API would be nice to have anyway). For now, it will be removed from the firmware API and will become a part of the Infrared application, where it serves a very particular purpose and will not suffer from any above mentioned concerns due to a limited scope of usage.

P.S maybe I'll get rid of it altogether, because there's a simpler way to do just that with plain threads.

CookiePLMonster · 2024-03-11T12:40:36Z

The solution you have proposed won't work though, because we HAVE to wait on a queue in view_dispatcher_send_custom_event(), which is currently called in the finished_callback every time.

Could you instead call it at the very end of your work callback and not the finished callback? From the app's point of view this would most likely be equivalent, but then you'd need to implement the finished callback via something like an event if you want to prevent the app from terminating too early.

EDIT:

P.S maybe I'll get rid of it altogether, because there's a simpler way to do just that with plain threads.

Yeah, since you spin up a new thread every time, you may be right. Maybe it's easier to just spawn a thread with a queue that services those jobs, and avoid any of those issues altogether.

CookiePLMonster · 2024-03-11T14:51:21Z

Looking at the revised approach - what happens if furi_thread_start and furi_thread_set_callback are called on a thread that hasn't finished yet? Do they wait for it to finish first or will it crash/throw a furi_check*?

* I realize it is impossible to run two jobs at once with the current design due to these jobs blocking input, but it's a curiosity that popped in my head when I saw this pattern.

gsurkov · 2024-03-11T15:23:18Z

@CookiePLMonster such code will crash if it is compiled with DEBUG=1 passed to fbt due to the furi_assert checks. However (unlike furi_check) these are omitted when debug flags are off, which will lead to unpredictable behaviour. This distinction was chosen way back when as a size-saving measure.

Right now, a separate effort is being made to replace furi_asserts with furi_checks in all user-facing methods in order to secure them even more.

Running into this problem would indeed be improbable in this particular situation.

button delete is not working

gsurkov added 2 commits March 7, 2024 18:06

Introduce ConcurrentRunner, load universal and regular remotes concur…

7f65ae4

…rently

Perform all lengthy operations in a ConcurrentRunner

3724ce8

gsurkov added 2 commits March 7, 2024 21:37

Fix python formatting

a419dd0

Merge remote-tracking branch 'origin/dev' into gsurkov/3783_ir_freeze…

b887bd6

…_fix

Clean up code

66a644f

Add usage warning

2efa2a4

gsurkov marked this pull request as ready for review March 8, 2024 11:29

gsurkov requested review from skotopes, DrZlo13, hedger and nminaylov as code owners March 8, 2024 11:29

gsurkov marked this pull request as draft March 11, 2024 13:07

gsurkov added 3 commits March 11, 2024 16:41

Remove ConcurrentRunner, use a plain FuriThread instead

af10316

Load remotes asynchronously in RPC mode as well

53dbee7

Reorder code for clarity

09b76d8

gsurkov and others added 3 commits March 11, 2024 18:25

Merge branch 'dev' into gsurkov/3783_ir_freeze_fix

c4e3c6c

Clean up, use thread return code to report errors

1e4eeee

Improve wording

7d514d9

gsurkov marked this pull request as ready for review March 11, 2024 16:12

skotopes previously approved these changes Mar 11, 2024

View reviewed changes

skotopes self-requested a review March 11, 2024 16:55

Fix logical error

bbcc391

skotopes approved these changes Mar 11, 2024

View reviewed changes

skotopes merged commit 022fccf into dev Mar 11, 2024
9 checks passed

skotopes deleted the gsurkov/3783_ir_freeze_fix branch March 11, 2024 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FL-3783] Asynchronous Infrared remote manipulation #3503

[FL-3783] Asynchronous Infrared remote manipulation #3503

gsurkov commented Mar 7, 2024 •

edited by skotopes

github-actions bot commented Mar 7, 2024 •

edited

CookiePLMonster commented Mar 7, 2024

gsurkov commented Mar 8, 2024 •

edited

CookiePLMonster commented Mar 8, 2024 •

edited

DrZlo13 commented Mar 11, 2024

CookiePLMonster commented Mar 11, 2024 •

edited

gsurkov commented Mar 11, 2024 •

edited

CookiePLMonster commented Mar 11, 2024 •

edited

CookiePLMonster commented Mar 11, 2024 •

edited

gsurkov commented Mar 11, 2024 •

edited

[FL-3783] Asynchronous Infrared remote manipulation #3503

[FL-3783] Asynchronous Infrared remote manipulation #3503

Conversation

gsurkov commented Mar 7, 2024 • edited by skotopes

What's new

Verification

Checklist (For Reviewer)

github-actions bot commented Mar 7, 2024 • edited

CookiePLMonster commented Mar 7, 2024

gsurkov commented Mar 8, 2024 • edited

CookiePLMonster commented Mar 8, 2024 • edited

DrZlo13 commented Mar 11, 2024

CookiePLMonster commented Mar 11, 2024 • edited

gsurkov commented Mar 11, 2024 • edited

CookiePLMonster commented Mar 11, 2024 • edited

CookiePLMonster commented Mar 11, 2024 • edited

gsurkov commented Mar 11, 2024 • edited

gsurkov commented Mar 7, 2024 •

edited by skotopes

github-actions bot commented Mar 7, 2024 •

edited

gsurkov commented Mar 8, 2024 •

edited

CookiePLMonster commented Mar 8, 2024 •

edited

CookiePLMonster commented Mar 11, 2024 •

edited

gsurkov commented Mar 11, 2024 •

edited

CookiePLMonster commented Mar 11, 2024 •

edited

CookiePLMonster commented Mar 11, 2024 •

edited

gsurkov commented Mar 11, 2024 •

edited