Running with an executor _sometimes_ hits a bug #48

liamhuber · 2023-10-23T19:05:18Z

It's quite infrequent (< 1 in 10), but sometimes this test fails in the following way:

======================================================================
FAIL: test_with_executor (test_macro.TestMacro.test_with_executor)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\pyiron_workflow\pyiron_workflow\tests\unit\test_macro.py", line 522, in test_with_executor
    self.assertIs(
AssertionError: <pyiron_workflow.macro.Macro object at 0x0000021B611DB610> is not None : Returned nodes should get the macro as their parent

----------------------------------------------------------------------

I.e. when a composite is run on an executor and copies of its children are returned as the result, these new children are somehow failing to get the parent-child relationship set up reflexively (which is the expected behaviour). I vaguely suspect it has something to do with Future.result() returning new instances each call, or some sort of delayed execution that I am wrongly assuming is happening synchronously, or similar, but working with executors is pretty new to me and the error is not consistently reproducible, so I haven't managed to track it down yet.

The text was updated successfully, but these errors were encountered:

liamhuber · 2023-10-30T23:22:21Z

Happened again today. Was on the Windows 3.11 CI (I want to start noting this here in case there is some pattern of where it comes up, e.g. on windows)

liamhuber · 2023-11-04T20:06:15Z

Happened again on the windows 3.11 CI

1/10 seems like an overestimate of how often it crops up, and so far really seems like a windows thing

liamhuber · 2023-11-06T23:10:18Z

Happened on Ubuntu 3.11 today, so I guess it's not just windows.

liamhuber · 2023-11-11T00:43:05Z

Happened on Ubuntu 3.11 today.

I can't remember the last time it happened on my local machine, so maybe it's exclusively an issue in the CI environment?

liamhuber · 2023-11-11T05:49:15Z

Seems to be happening consistency today in #70, so maybe there's a chance to nail this down

liamhuber · 2023-11-11T16:43:55Z

Ok, I added a little sleep and then suddenly the CI passed, so maybe this is fixed after all? Could it be simply that the callback function is executing to slowly on the CI (but fast enough on my local machine) that the test comes up before the state is ready for it?

returned_nodes = result.result()  # Wait for the process to finish
        from time import sleep  # The "fixing" change (so far)
        sleep(1)  # The "fixing" change (so far)
        self.assertIsNot(
            original_one,
            returned_nodes.one,
            msg="Executing in a parallel process should be returning new instances"
        )
        self.assertIs(
            macro,
            macro.nodes.one.parent,
            msg="Returned nodes should get the macro as their parent"
            # Once upon a time there was some evidence that this test was failing
            # stochastically, but I just ran the whole test suite 6 times and this test
            # 8 times and it always passed fine, so maybe the issue is resolved...
        )

liamhuber · 2023-12-13T21:39:52Z

Finally happened again today (windows), but it's been over a month, so I guess the sleep call is doing the trick.

liamhuber · 2024-01-09T19:54:42Z

Again on windows today. Pretty rare, but horrible that it exists at all. Adding the sleep when collecting results really seems to have helped, but it's already a full second of extra waiting -- more would be a big pain, but is this going to have horrible scaling when the amount of data on the nodes increases? I still find this bug overall very disconcerting.

liamhuber · 2024-03-12T21:12:41Z

It finally happened on windows again! #241. It's been a while.

liamhuber · 2024-03-18T21:17:58Z

Again today on windows on #773. Maybe the months long stretch was just some kind of good luck?

liamhuber · 2024-06-07T17:46:12Z

The interval is nice and long, but it is still happening. Windows running on merge to main. https://github.com/pyiron/pyiron_workflow/actions/runs/9420579496/job/25952940926

liamhuber · 2024-09-07T00:19:13Z

Again after two months. Must be a thread safety thing?

liamhuber added the bug Something isn't working label Oct 23, 2023

liamhuber added the wontfix This will not be worked on label Nov 2, 2023

liamhuber mentioned this issue Mar 12, 2024

Add error message for return item without channel #241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running with an executor _sometimes_ hits a bug #48

Running with an executor _sometimes_ hits a bug #48

liamhuber commented Oct 23, 2023

liamhuber commented Oct 30, 2023

liamhuber commented Nov 4, 2023

liamhuber commented Nov 6, 2023

liamhuber commented Nov 11, 2023

liamhuber commented Nov 11, 2023

liamhuber commented Nov 11, 2023

liamhuber commented Dec 13, 2023

liamhuber commented Jan 9, 2024

liamhuber commented Mar 12, 2024

liamhuber commented Mar 18, 2024

liamhuber commented Jun 7, 2024

liamhuber commented Sep 7, 2024

Running with an executor _sometimes_ hits a bug #48

Running with an executor _sometimes_ hits a bug #48

Comments

liamhuber commented Oct 23, 2023

liamhuber commented Oct 30, 2023

liamhuber commented Nov 4, 2023

liamhuber commented Nov 6, 2023

liamhuber commented Nov 11, 2023

liamhuber commented Nov 11, 2023

liamhuber commented Nov 11, 2023

liamhuber commented Dec 13, 2023

liamhuber commented Jan 9, 2024

liamhuber commented Mar 12, 2024

liamhuber commented Mar 18, 2024

liamhuber commented Jun 7, 2024

liamhuber commented Sep 7, 2024