Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS crash after conditionally adding to an entity when using Explorer #1180

Closed
robo-todd opened this issue Mar 29, 2024 · 7 comments
Closed
Labels
bug Something isn't working

Comments

@robo-todd
Copy link

I'm working with a large flecs application in C++ and have been progressively adding in meta support for using with flecs explorer.
The application uses tags (empty structs of a specific type) and adds them to entities to turn on/off various rendering options.
The application itself works fine and has had no issues with this.

When I connect flecs explorer and then add a tag to an entity (during the execution of a system that checks UI inputs) flecs itself crashes after this happens. It is 100% reproducible.

The line of code that adds things is the following (where ui_root is an entity that has a bunch of these tags on it). ShowActuatorStates is just a 'tag' struct and showActuatorStates is a boolean variable.

ui_root.add_if<ShowActuatorStates>(showActuatorStates);

This normally flips my visualizer rendering of various things on and off just fine, but with explorer connected it seems to crash on the next loop after this is executed.

Additional context
This is on Ubuntu 22.04, Flecs v3.2.11, gcc 11.4.0-1ubuntu1~22.04

Here is the crash dump from gdb when this happens:

#0  0x0000555555decb07 in flecs_rule_var_get_entity (ctx=<optimized out>, 
    ctx=<optimized out>, var_id=1 '\001')
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:174
#1  flecs_get_ref_entity (ref=<optimized out>, ref=<optimized out>, 
    flag=<optimized out>, ctx=<optimized out>)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:279
#2  flecs_get_ref_entity (ref=0x55555ab2e030, flag=<optimized out>, 
    ctx=<optimized out>)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:271
#3  0x0000555555decc36 in flecs_rule_op_get_id_w_written (op=0x55555ab2e010, 
    written=3, ctx=ctx@entry=0x7fffffffaaf0)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:303
#4  0x0000555555def05b in flecs_rule_op_get_id (ctx=0x7fffffffaaf0, 
    op=0x55555ab2e010)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:322
#5  flecs_rule_ids (ctx=0x7fffffffaaf0, redo=<optimized out>, 
    op=0x55555ab2e010)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:1322
#6  flecs_rule_dispatch (op=0x55555ab2e010, redo=<optimized out>, 
    ctx=0x7fffffffaaf0)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:2417
#7  0x0000555555defe56 in flecs_rule_run_until (redo=<optimized out>, 
    ctx=0x7fffffffaaf0, ops=0x55555ab2dc80, first=18, cur=<optimized out>, 
    until=until@entry=EcsRuleEnd)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:2472
#8  0x0000555555deff25 in flecs_rule_run_block (redo=<optimized out>, 
    ctx=ctx@entry=0x7fffffffaaf0, op_ctx=op_ctx@entry=0x55555a9f7340)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:2152
#9  0x0000555555dee7ce in flecs_rule_run_block_w_reset (ctx=0x7fffffffaaf0, 
    redo=<optimized out>, op=0x55555ab2dfe0)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:2196
#10 flecs_rule_optional (ctx=0x7fffffffaaf0, redo=<optimized out>, 
    op=0x55555ab2dfe0)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:2222
#11 flecs_rule_dispatch (op=0x55555ab2dfe0, redo=<optimized out>, 
    ctx=0x7fffffffaaf0)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:2424
#12 0x0000555555defe56 in flecs_rule_run_until (redo=<optimized out>, 
    ctx=ctx@entry=0x7fffffffaaf0, ops=ops@entry=0x55555ab2dc80, 
    first=first@entry=-1, cur=<optimized out>, until=until@entry=EcsRuleYield)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:2472
#13 0x0000555555df0085 in ecs_rule_next_instanced (it=it@entry=0x7fffffffb780)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:2627
#14 0x0000555555df0590 in ecs_rule_next (it=0x7fffffffb780)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engine.c:2666
#15 ecs_rule_next (it=0x7fffffffb780)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rules/engin--Type <RET> for more, q to quit, c to continue without paging--
e.c:2656
#16 0x0000555555dc22e7 in ecs_iter_next (iter=<optimized out>)
    at /home/todd/Projects/sandbox/build/src/flecs/src/iter.c:575
#17 ecs_page_next_instanced (it=<optimized out>)
    at /home/todd/Projects/sandbox/build/src/flecs/src/iter.c:915
#18 ecs_page_next (it=<optimized out>)
    at /home/todd/Projects/sandbox/build/src/flecs/src/iter.c:998
#19 ecs_page_next (it=it@entry=0x7fffffffb4b0)
    at /home/todd/Projects/sandbox/build/src/flecs/src/iter.c:985
#20 0x0000555555dff21e in ecs_iter_to_json_buf (
    world=world@entry=0x5555575d9eb0, it=it@entry=0x7fffffffb4b0, 
    buf=buf@entry=0x7fffffffc3f8, desc=desc@entry=0x7fffffffb490)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/json/serialize.c:2301
#21 0x0000555555de58a7 in flecs_rest_iter_to_reply (
    world=world@entry=0x5555575d9eb0, req=req@entry=0x7fffc2a7a8a0, 
    reply=reply@entry=0x7fffffffc3f0, it=it@entry=0x7fffffffb780)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rest.c:402
#22 0x0000555555de5bb0 in flecs_rest_reply_query (world=0x5555575d9eb0, 
    req=0x7fffc2a7a8a0, reply=0x7fffffffc3f0)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rest.c:496
#23 0x0000555555ddc936 in http_handle_request (srv=srv@entry=0x555558c20b40, 
    req=0x7fffc2a7a8a0)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/http.c:1336
#24 0x0000555555ddd96b in http_dequeue_requests (delta_time=0, 
    srv=0x555558c20b40)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/http.c:1405
#25 ecs_http_server_dequeue (srv=0x555558c20b40, delta_time=<optimized out>)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/http.c:1611
#26 0x0000555555de4f3b in DequeueRest (it=0x7fffffffd470)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/rest.c:1064
#27 0x0000555555da879e in ecs_run_intern (world=world@entry=0x5555575d9eb0, 
    stage=0x55555763ca70, system=system@entry=522, system_data=0x5555571f0fc0, 
    stage_index=stage_index@entry=0, stage_count=stage_count@entry=1, 
    delta_time=delta_time@entry=0.0338850953, offset=0, limit=0, 
    param=<optimized out>)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/system/system.c:116
#28 0x0000555555da2c8e in flecs_run_pipeline_ops (world=0x5555575d9eb0, 
    stage=stage@entry=0x55555763ca70, stage_index=stage_index@entry=0, 
    stage_count=stage_count@entry=1, delta_time=delta_time@entry=0.0338850953)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/pipeline/pipeline.c:569
#29 0x0000555555da2f0c in flecs_run_pipeline (world=<optimized out>, 
    world@entry=0x55555763ca70, pq=pq@entry=0x5555570d1750, 
    delta_time=delta_time@entry=0.0338850953)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/pipeline/pipeline.c:638
#30 0x0000555555da46c9 in flecs_workers_progress (
    world=world@entry=0x5555575d9eb0, pq=0x5555570d1750, 
    delta_time=delta_time@entry=0.0338850953)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/pipeline/worker.c:240
#31 0x0000555555da3337 in ecs_progress (world=0x5555575d9eb0, 
    user_delta_time=<optimized out>)
    at /home/todd/Projects/sandbox/build/src/flecs/src/addons/pipeline/pipeline.c:752
#32 0x0000555555685691 in flecs::world::progress (this=0x5555575d9dd8, 

@robo-todd robo-todd added the bug Something isn't working label Mar 29, 2024
@robo-todd
Copy link
Author

I'm starting to run into this in other places. It seems like if I "add" a tag (a struct that has been declared as a component) that has no contents. Flecs explorer does not handle this case somehow? I must be doing something wrong, but anything that has a tag added kills the process inside of the REST response.

@SanderMertens
Copy link
Owner

SanderMertens commented Apr 1, 2024

According to the stack trace it's crashing on this line:

ecs_entity_t *entities = table->data.entities.array;
var->entity = entities[var->range.offset]; // <--

suggesting that entities is either NULL (likely) or a bad address (that would be bad). That's very odd, the only table that I can think of for which entities is NULL is the root table, and it should never get matched with a query.

EDIT: it could also be that var->range.offset is out of range for the array, which could point to a bug in the engine.

A couple of things to verify/try out:

  • Is it possible that multiple threads are accessing/mutating the flecs world? Flecs APIs are not inherently thread safe, so any mutation on the world while the REST API is attempting to read it could cause a crash
  • Are you running the application in debug mode? This could show additional information about the issue
  • Are you able to tell for which query the issue is occurring? (should be able to figure out by checking the network tab in the browser devtools). If it's a user query, can you share which one?
  • Is the issue reproducible on latest master?

If that doesn't help, are you able to share a reproducer with example code that crashes the explorer? It can't just be related to adding tags, that's something that would've crashed any app connected to the explorer.

@robo-todd
Copy link
Author

robo-todd commented Apr 2, 2024

Thanks for reviewing. I'll see what I can do to add to it. I can't quite get a smaller repro. Only happens in large application right now.

  1. In this configuration I'm not running the application multithreaded even for flecs threads. It is just the progress loop. There are threads in the application, but specifically isolated from any flecs calls/structure mutation. I'll double check this.
  2. This happens just running the flecs explorer on the main tab with no query in the query interface. I can connect to the running application and things are fine. I can look around, check out entities, components, etc. see my component definitions and the moment that I open a UI box (which is triggered by adding the 'tag' to a top-level UI entity) the app continues running until the next REST 'update' comes through. It takes like ~1 second for the refresh to hit.
  3. When I've built the application in Debug I don't get much more from gdb. The build above is RelWiDebInfo settings in CMAKE. Maybe I need to read up is Debug mode a switch to the engine?
  4. I'll try latest master and report.

@SanderMertens
Copy link
Owner

  1. That sounds like you're running in release mode. RelWiDebInfo likely passes an NDEBUG flag when compiling, which disables things like asserts. It's possible/likely that whatever is causing this will error out with an assert.

You should be able to enable debug mode with:

-DCMAKE_BUILD_TYPE=Debug

@robo-todd
Copy link
Author

Just making sure I need to rebuild flecs with Debug or just my application?

Also another tidbit of information: The application has all kinds of tag data structures entity.add() all over the place that are set before the main loop starts while loading things up. This appears to cause no issues. But as soon as I add one to the UI entity I get the crash.

@SanderMertens
Copy link
Owner

SanderMertens commented Apr 2, 2024

I'd build both of them in debug.

But as soon as I add one to the UI entity I get the crash.

Right, that's what's leading me to believe it could have something to do with multithreading. If a thread is adding a tag at the same moment as the REST API is querying the world, a crash could happen.

If that's not the case, it's possible that the world was already in a bad state. Asserts should catch most of that and provide more information on what's happening.

@robo-todd
Copy link
Author

Following up:

I ran everything in the debug builds of current flecs release and mainline. This helped me backtrace the problem. This is 100% my bug and not a flecs engine issue. Thanks for the advice on running with debug build of flecs. I was really scratching my head before that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants