Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional race condition during compilation #282

Closed
rvalek opened this issue Jul 27, 2023 · 9 comments
Closed

Occasional race condition during compilation #282

rvalek opened this issue Jul 27, 2023 · 9 comments

Comments

@rvalek
Copy link
Contributor

rvalek commented Jul 27, 2023

Hi @EvgSkv!

Problem statement:

I am noticing a rather infrequent crash during compilation:

  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 457, in __init__
    rules = self.UnfoldRecursion(rules)
  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 506, in UnfoldRecursion
    annotations = Annotations(rules, {})
  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 138, in __init__
    self.annotations = self.ExtractAnnotations(
  File "/.venv/lib/python3.9/site-packages/logica/compiler/universe.py", line 396, in ExtractAnnotations
    ql = expr_translate.QL(Thrower(), ThrowException, ThrowException,
  File "/.venv/lib/python3.9/site-packages/logica/compiler/expr_translate.py", line 146, in __init__
    self.built_in_functions = copy.deepcopy(self.bulk_functions)
  File "/usr/local/lib/python3.9/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.9/copy.py", line 229, in _deepcopy_dict
    for key, value in x.items():
RuntimeError: dictionary changed size during iteration

(the line numbers might be a bit shifted because I caught this on a slightly outdated compiler version -- this shouldn't relevant here).

This is coming from inside the builtin copy.deepcopy, and indicates that the number of keys of the dict in self.bulk_functions was modified while it was being copied.
I am getting this in a python process that may concurrently run several compilation (LogicaProgram(parsed_logica, user_flags=flags).FormattedPredicateSql(...)) threads.

So, the guess is that one thread crashes during copy.deepcopy while another is somehow modified the same dict.


Sentiment:

I know that compiler thread-safety is nowhere guaranteed, so I would totally understand if you chose to disregard issue.

That said, I chose to post here because I am a little dumbfounded: I see this error periodically happening on this threaded service, but I can't reproduce it directly. I am also out of ideas for now as to what causes it, because going through the code I reach the following conclusions:

  1. If compilation in thread 2 is modifying the dict in self.bulk_functions that is simultaneously being copied in another compilation in thread 1, then the thread 2 is at a stage of compilation that is before or after making it's own copy.deepcopy.
  2. It can't be AFTER, because once thread 2 compilation has made it's own deepcopy, it will operate on it and will not touch the original.
  3. All operations that compiler does to the bulk_functions dict BEFORE making a deep copy must've already been done, inside the thread 1 compilation (because it has reached copy.deepcopy and crashed), and looking at the code I see that once bulk_functions is populated (prior to copying), it will not be changed.

So, I can't even put a finger or a specific place where thread 2 might be mutating bulk_functions during it's own compilation to cause a crash in thread 1's compilation at copy.deepcopy.

What am I missing? Do you have an idea where I should be looking? Does the reasoning make sense? Is the whole direction wrong?

P.S.

While analysing the above, I was trying to understand the relevance of CleanDictionary, as I see it happens sometimes, but I never actually reach deletion of any keys when I try to go step-by-step with the debugger. Either way, it shouldn't be related to the above issue because CleanDictionary operates on the deepcopy after it has been made, so it can't touch the original dict, right?

Any ideas on this subject would be appreciated!
Sorry for a tl;dr...

@EvgSkv
Copy link
Owner

EvgSkv commented Aug 7, 2023

Hi @rvalek ,
sorry, just saw the issue! I will dive in and see how complex is it to fix.
Thanks for bringing it up!

@EvgSkv
Copy link
Owner

EvgSkv commented Aug 8, 2023

Hi @rvalek ,
can you please try ti2023 branch to see if it helps with the issue?

git clone -b ti2023 https://github.com/evgskv/logica

It does look like there might be some kind of race condition in the way main is written, but it's hard to be sure.
Please let me know if the fix is works or not.

@EvgSkv
Copy link
Owner

EvgSkv commented Aug 8, 2023

PS: It should also be easy to manually patch in the change into your fork if needed: 7164c91

@rvalek
Copy link
Contributor Author

rvalek commented Aug 8, 2023

Hey @EvgSkv! Thank you for looking into it! We applied your suggested change to try it out. I'll let you know whether the error is gone by the end of the week (it was occurring pretty infrequently, 5-10 times a week).

@rvalek
Copy link
Contributor Author

rvalek commented Aug 8, 2023

@EvgSkv sorry to report that the errors are still happening. We noticed a few within an hour of applying your change :\
The issue seems to be elsewhere. I shared my meager efforts at analysis it in my original message, in case it makes sense.

@EvgSkv
Copy link
Owner

EvgSkv commented Aug 12, 2023

Hi @rvalek
it does look to me that there was a possibility of the dictionary being modified while copying, even though I can't reproduce it myself.
And as I was trying to debug it further I understood that my previous fix had a bug and wasn't really a fix at all. I've pushed a new version to the same branch (ec506cd)

Can you please try patching that and let me know if it helps?

Apologies for the bug, I am sure it won't take too many iterations to have it eliminated.
And thank you for your help!
Evgeny.

@KantorSerhiy
Copy link

Hey @EvgSkv!
Thanks for this solution, we tested for a week and it works for us :)

@EvgSkv
Copy link
Owner

EvgSkv commented Aug 27, 2023

Great to hear that @KantorSerhiy ! 😄

@rvalek
Copy link
Contributor Author

rvalek commented Oct 19, 2023

thanks again for helping out, @EvgSkv ! closing the issue

@rvalek rvalek closed this as completed Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants