-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fleshing: Vulkan::Calling vkWaitForFences Timeout #53
Comments
I tried the Amber file on my laptop, which also has Intel Mesa drivers (not sure of the driver version), and I get the same. A VkWaitForFences means that the test case has timed out. I tried this out using the NVIDIA drivers on fennel (I built Amber locally - you should be able to do the same, e.g. make a jack directory under /data) and it passes instantly there. This could already be a compiler bug. Can you look into simplifying the SPIR-V assembly to get something with the same control flow but even simpler instructions, to see whether the problem still triggers? E.g., the data on when to go left/right could all be embedded in the SPIR-V, so no need for corresponding Amber declarations. |
I'll have a look at simplifying this tomorrow. I tried it with the SwiftShader driver enabled on my desktop and it also passes instantly there. |
I have a few strange findings.
Note that I've reduced both of the above examples by removing input related code and hard-coding the path. The reduced program that still fails is below:
|
This is looking more and more like a Mesa bug (albeit perhaps a slightly boring one since it depends on the feature of a block having duplicate successors which, as you say, will be disallowed in future SPIR-V versions). Would you be able to spend say 30 mins preparing as simple a repro for this bug as you can, and then maybe we can go through it carefully in our meeting with @johnwickerson today? Then if all looks good you can report it to Mesa (assuming it still repros on the latest release driver, if that's not what you're testing.) In the repro, I think removing output code that isn't relevant to the expected output is a good thing if it still causes the test to fail. Compiler bugs always do have really weird effects, and changing seemingly unrelated parts of the test case often has an impact (because something bad has gone wrong). Aside: if "duplicate successors" is the tell tale sign for this bug then perhaps you could write some Mesa-testing infrastructure to skip test cases that have this property. More generally, perhaps your experimental scripts should support an "ignore" script, which will run on each asm file before it is fleshed and discard the asm file if it fails some target-specific checks. This could be a nice way to get around what @johnwickerson has dubbed "The Cludedo Problem". |
I've been able to reproduce the issue on the latest release and on the main branch of Mesa. As you suggested I created a modified version with a block B5 that is the target of the false case in the Here is the minimal amber file that reproduces the issue. There must be a side effect in LH0 in order for that path to be taken, so the output variable is still necessary. Is this in a suitable form to submit as a bug report?
The output of running the above is:
|
Great! Please go ahead and file to Mesa. If you look in the doc about building and fuzzing Mesa, it links to some bug reports that we filed previously. Please follow the structure of one of those for decent practice on what info to provide in the report. |
When executing some of the amber files that result from fleshing the xml file in the s057 folder that is generated by the Vulkan CTS scraper, I repeatedly get a
Vulkan::Calling vkWaitForFences Timeout
error.I have drawn out the CFG below:
![PNG image](https://user-images.githubusercontent.com/13239506/160412119-86691e99-db49-46b1-a8bc-40d6e29c7645.png)
The error doesn't occur when following the path B4->LH1->B0. Whenever the LH1->LH0 branch is taken, it results in the error. I've generated many different amber files which contain paths that cycle the loop different numbers of times and they all reproduce the error 100% of the time.
I've checked the generated spirv and it looks ok to me. Does anyone see anything wrong?
This was run on my laptop which is using the Intel open-source Mesa driver version 88088582 (API version 1.2.182).
Has anyone encountered this error before?
The text was updated successfully, but these errors were encountered: