Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding a better solution instead of simply using check_call #3

Closed
ganler opened this issue Dec 20, 2021 · 11 comments
Closed

Finding a better solution instead of simply using check_call #3

ganler opened this issue Dec 20, 2021 · 11 comments
Labels
enhancement New feature or request

Comments

@ganler
Copy link
Member

ganler commented Dec 20, 2021

check_call(f'python -u -m nnsmith.graph_gen --output_path {output_path}'

It is not a good workaround to use check_call here which:

  • requires everything to be re-initialized.
  • makes it extremely harder to perform stateful fuzzing (I find it hard to merge my new implementation here).

We can discuss better solutions in this thread.

Previously my implementation makes the generation phase in-process which works well. It seems there are some issues regarding timeout and generation failure.

  • If timeout is a bug of z3 then we should avoid any usage of it;
  • In terms of generation failure, we should be able to detect it by catching exceptions;
  • At least we should use sub-process instead of check_call to allow stateful fuzzing;

We can discuss more in this thread.

@ganler ganler added the enhancement New feature or request label Dec 20, 2021
@ganler
Copy link
Member Author

ganler commented Dec 20, 2021

i am working on a fix to at least get rid of check_call by using fork and IPC. tho it might not have to be this complicated.

@ganler
Copy link
Member Author

ganler commented Dec 20, 2021

https://github.com/ise-uiuc/nnsmith/blob/main/nnsmith/graph_input_gen.py#L38

Made a commit to use real fork.

Still need to understand the reason of hangs to further improve this to a single-process mode for best efficiency and engineering convenience.

BTW, inconsistency in code is bad so we might want to stop using/migrate gen_model_and_range into the new one. @lazycal

@lazycal
Copy link
Collaborator

lazycal commented Dec 20, 2021

Though not sure what functionality you desire, one alternative I tried is to use multiprocess (you can toggle it by copy-pasting forked function from https://gist.github.com/schlamar/2311116#gistcomment-3932763 into util.py and uncomment this line

# @util.forked # call this function in a forked process to work around unknown z3 issues
). This solves your initialization problem, not sure about the stateful thing though. I forfeit this solution main because of

  1. It is more difficult to reproduce a generation failure, since it is different from directly calling graph_gen.py.
  2. Preivously I found it still hanged, so I thought the forked process crashed and the main process failed to detect it (this is indeed possible based on that forked function)
  3. Now I think the hang issue is more likely because of z3. check_call is convenient in that it has a timeout option as well. Not sure about multiprocess.

So if what you meant by fork and IPC is what I described, then the 3 problems above may be worth noting.

@ganler
Copy link
Member Author

ganler commented Dec 20, 2021

@lazycal What I mean by "stateful" is that: say coverage guided fuzzing, we need to let the generation function in the subprocess know the coverage change. But it is not quite good to pass the coverage information through command lines (i.e., check_call)

@ganler
Copy link
Member Author

ganler commented Dec 20, 2021

@lazycal Yeah. my new implementation has the exact functionality of a new process. e.g., timeout, etc.

Check here. https://github.com/ise-uiuc/nnsmith/blob/main/nnsmith/graph_input_gen.py#L66

But it becomes much easier as you can simply dispatch a subprocess to execute a python function with real forking (e.g., copy-on-write, isolation, IPC, etc.).

@ganler
Copy link
Member Author

ganler commented Dec 20, 2021

Data communication can be done using shared variables so that we don't have to ship everything into the disk which is slow and inconvenient.

Check here: https://github.com/ise-uiuc/nnsmith/blob/main/nnsmith/graph_input_gen.py#L58

@lazycal
Copy link
Collaborator

lazycal commented Dec 20, 2021

Just read your code. It looks like you are doing the same thing as the forked function I cited, but yours fixes Problem 2 && 3. Cool! If we don't worry about reproducibility then it's fine.

@ganler
Copy link
Member Author

ganler commented Dec 20, 2021

Regarding reproducibility you mean seed? I think I also adapted that but not sure if it is accurately done.

@ganler
Copy link
Member Author

ganler commented Dec 20, 2021

BTW, it is important to accurately locate why it hangs.

If it is an issue of our operators, then we might need to come up with some ideas to solve it.

We should definitely report it to z3 community if you believe it is a bug by z3.

From my last hang experience, it is due to the operators (i.e., Reshape5D).

You can try to diagnose it by printing the constraint expressions and try to write some manual and simple z3 expressions including the patterns to see if it is slow.

Last time I tried "abcd = ef*g" which is very slow that I thought it was not an issue from z3 but the expressions themselves.

@lazycal
Copy link
Collaborator

lazycal commented Dec 20, 2021

Regarding reproducibility you mean seed? I think I also adapted that but not sure if it is accurately done.

Probably not beacause of seed, but z3's problem. When I tried the fork approach, I find there are some generation failures (timeout) and when I fed the same seed into graph_gen.py they usually got passed. (Note that I can reproduce for those successful generations, so it's not seed issue). Maybe it's because the import order or something? Or maybe it's no longer an issue. Let's see.

BTW, it is important to accurately locate why it hangs.

If it is an issue of our operators, then we might need to come up with some ideas to solve it.

We should definitely report it to z3 community if you believe it is a bug by z3.

From my last hang experience, it is due to the operators (i.e., Reshape5D).

You can try to diagnose it by printing the constraint expressions and try to write some manual and simple z3 expressions including the patterns to see if it is slow.

Last time I tried "a_b_c_d = e_f*g" which is very slow that I thought it was not an issue from z3 but the expressions themselves.

I think z3 is really not stable. I found many weird z3 problems before, so in my mind hang in z3 is not surprising and a must to be handled. That's also why I am more inclined to believe it's a z3 issue and didn't prioritize locating hangs... I will probably do it next week though.

@ganler
Copy link
Member Author

ganler commented Dec 20, 2021

I understood that z3 is unstable if we use some uncommon features (z3.set_param(...)) like setting timeout or so. If the hang still exists without the timeout setting, then I really doubt if it is the fault of z3 because I used to have such kind of feeling but eventually found that the root cause is something else. It is not likely that z3 will produce wrong results and unreasonable hanging under the simplest use case (or say if we don't set any other things).

I do think this is important to fix as any of our results will not be reliable if the code logic from ours or z3 is wrong. I will also help investigate this case to see if it is a z3 issue or some lagging operators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants