-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VS2015 Build Errors #31
Comments
Interesting. Maybe I'm using an old fork, I mean, I am using an old fork. What happens if you replace the clBLAS referenced by DeepCL with a new fork? I think to do that you would do something like:
... and then rebuild again. If that works, please submit a pull request, or at least the commit hash that you tested against, and found working, and I will take a look. Edit: hmmm, the error is appearing in the headerfiles. It's possible I'm calling something in an unsupported way. Dont suppose... can you provide the full build output please? (Edit 2: hmmm, having looked at the headerfile, seems unlikely that it's to do with how I'm calling it (though it's possible), so I reckon first thing to do is try using a newer fork, as per the start of this reply) Edit 3: Per https://github.com/clMathLibraries/clBLAS/releases/tag/v2.8 , v2.8 supports VS2015. That strongly implies to my mind that pre-2.8 does not support VS2015. Please confirm, as per test outlined in the start of this reply, and then I'll update DeepCL to point to clBLAS 2.8. |
Changing clBLAS library to the latest version does solve a lot of problems (alignment and a couple of others), but it still doesn't build successfully. Full (new) output log here. Limiting myself to clBLAS-external project and getting an error:
I'll continue by checking if OpenCL libraries might be incorrectly configured and continue from there on with my "investigation". |
I created a branch with clblas-2.8.0 on. Builds ok for me, but doesnt run on my nvidia GPU, I get clMathLibraries/clBLAS#153 , but might work for you? Branch is |
For those sprintf build warnings, pzawal found a solution #30 (comment) |
For the clEnqueueBarrierWaitList error, the |
I was a minute or two faster, same time practically. I download your clblas-2.8.0 branch and it compiled without any problems :). Thanks!
|
Works with my older amd gpu, doesn't support OpenCL 2.0 which seems to be the problem I get on my notebook. Thank you very much for your help! Based on my experimenting today it's really awesome, keep up the good work! |
Ok. What do you mean by 'doesnt support OpenCL 2.0'? You mean, if OpenCL 2.0 is available, then something in DeepCL stops working? |
I believe everything is OK with DeepCL itself, I was referencing to clMathLibraries/clBLAS#153 error you and I both received (but a different message). It doesn't appear on my older amd gpu, while it does on a newer with OpenCL 2.0. So I'd say that the problem is purely clBLAS specific. |
Hmmm, ok. Arguably, it's still DeepCL's responsibility to fix it somehow, eg by not using clBLAS, or by fixing it in clBLAS, but since it's a new release of clBLAS, I reckon might be worth waiting a few weeks, in case someone fixes it for us :-) (Note that I'm pondering that my own segfaults, on NVIDIA, might be because clBLAS really is using clEnqueueBarrierWithLists, which just points into the void on my card, hence the segfault. I might see if I can add a guard into clew for that, which at least throws a useful exception instead, ideally.) |
Hi jakakonda, I've fixed both the errors I was getting on clblas now, and updated DeepCL to point to fixes for those. It's still a bit leaky (very leaky actually :-( ) but at least no obvoius failures/seg-faults. Do you want to pull down the latest clblas-2.8.0 branch of DeepCL, and see what happens? |
Wow that was fast! I was just getting ready to wait for a week or two :). |
Ok :-) (Edit: make sure to monitor your memory carefully, if you run the unit-tests; I can only run about half of them, and then I have to ctrl-C out :-P ) |
Hi. Quick update, commit d93a41a , on branch clblas-2.8.0 of DeepCL, fixes the memory leaks, and now the tests run to completion for me:
|
Hmmm, but hte error you posted earlier, about failure to build program, is quite different from the errors I've been seeing... there's nothing I've changed that will have addressed that... (edit... well, it might have done ... if it was caused by the initializerlist... I guess you can try, and then you will need to log the issue with clBLAS probably, if it persists, along with as much logfiles and so on as you can find) |
After a couple of hiccups while building (purely my stupidity), builds fine, all tests passing until:
Output for unittests. Tests ran on intel hd4000, but the same assertion occurs while running mnist training on amd. |
Hi jakakonda, accidentally replied to the gist rather than to the issue here :-P Can you do the following please:
then rebuild and rerun.
(Edit: hopefully the sgemm sample will fail for you, with same error about buildoptions, in which case its easy to raise an issue in the clblas project issue tracker) |
I'd suspect that the origin of the problem comes with different OpenCL version, which would also explain why it works on an older gpu. |
That's interesting. Hmmm, so what do you see if, from the DeepCL
There is a setting 'OPENCL_VERSION'. What does it say? What happens if you change it to read '1.2', press 'c' for configure, 'g' for generate, at which point ccmake will exit, and then rebuild? |
OPENCL_VERSION:STRING is set to 1.2, which is strange as the kernel is compiled with 2.0 (based on build options output). Now I'm compiling clBLAS with OpenCL 2.0 to see what happens. Edit: same error occurs with DeepCL (assertion... xgemm.cc line 163). So I'd say there are two bugs:
|
Hmmm, that's .... odd :-P There's nothing in DeepCL that knowingly requests opencl 2.0. In fact, my own gpus have always been opencl 1.1, for a very long time (until about 3 days ago, when I upgraded the drivers of one of them, to the lofty heights of 1.2 :-P ). But I have a few ideas of what could be going on plausibly, and will have a dig through the code a bit. |
Hi jakakonda, in your DeepCL directory, can you open the file |
Sure I can:
Everywhere the string values is I also did a search for "cl2.0" string in entire DeepCL directory and got:
clBLAS is surely set to 1.2:
The only point from where the values above could become 2.0 (based on my findings) is at
... but before it is a pretty obvious "if", which makes everything odd... |
Hmmm... that's kind of odd. So, let's work up the chain. I'll do it here, as I'm writing:
I reckon that somehow the rebuild hasnt fully recompiled the AutoGemmBuildOptionsSource somehow. One thing you might consider doing is to either use gdb to examine the stack frame at each point in this chain, or else put printf's at strategic points along this chain. |
Just an intermediate finding about:
I added the following line to line number 424 in xgemm.cc:
Got fired 88 times always with value Last few lines of unittest output:
Result from "find usages" also get me a few additional results (special cases):
Currently I'm working on a quick command line debugging tutorial or how to get to the crash point from VS to get call stack (I imagine everything will get much easier). EDIT1: Got successfully hooked up with VS debugger EDIT2: -cl-std=2.0 is present at GemmSpecialCases.cpp line 670, working further up the chain. EDIT3: It gets really easy from there on as the values are hardcoded:
Is this clBLAS bug? |
Got the full call stack! (There was a slight problem with the project cmake generated, so initially I couldn't get the program debug database for DeepCL.dll to load missing some frames. Fixed by changing at DeepCL project properties -> Linker -> Debugger -> Generate Program Database file to inherit). Added call stack with parameter values to gist for better readability. And a slightly cleaner version of call stack.
EDIT1: Changed UserGemmKernelSourceIncludes.h line 11 to 13 to version 1.2 just to see what happens. Problematic test passed successfully, but after that nothing pretty...
|
Nice!
Error -38? That sounds familiar. That was the error I was getting when it was reusing an old kernel from a previous OpenCL context, because the previous one hadnt been cleared down during clblasTeardown. Seems plausible that the cause could be similar this time too. |
Hi, it looks like if we define (Edit: eg you could add to the top of clBLAS.h:
) |
Hmmm, I think it would be good to find out on line 669 of GemmSpecialCases.cpp:
I usually need a bit of hacking around to print addresses, because compiler tries to make sure we really do intend to do that, but maybe something like:
... but I remember that in VS, Edit: basicaly what I reckon is that one of these values is non-zero, and is not being wiped correctly by clblasTeardown. I suspect it is the binary field, since I already added wiping for the kernel field, ie clMathLibraries/clBLAS#163 |
Oh: sgemm_Col_TN_B1_MX032_NX032_KX16_BRANCH_clKernel is not in AutoGemmClKernels.cpp Probably needs to be wiped too somehow. Edit: looks like it is in |
I'll try to get all of the logs and outputs today in the evening (I'm GMT +1) or at the latest tomorrow in the afternoon. I'm at faculty all day (student here). |
Hi, I've upgraded clblas to wipe autogemm userkernels during teardown. I dont have solid evidence that it's the cause of your error, but it seems likely. Can you pull down the latest DeepCL, and rerun |
Hi I was redirected from clBLAS issue #169. Browsing through your conversation, I have a couple comments. |
Update everything (redownloaded, reconfigured and rebuilt entire solution just to be sure), kept UserGemmKernelSourceIncludes.h line 11 and 13 at version 1.2.
Added.
VS (surprisingly) did not complain.
If you're doing estimated guesses and fixing things (working blind basically) and need access to a machine with specific hardware I'm sure we can work something out... |
Hi jakakonda, Hmmm, ok. Don't suppose, can you provide the output of the following please, just to check you really do have the latest clblas version. Starting from the
On my build, tihs gives:
Yes, I lack an AMD card to test on really... Edit: oh, I sent you linux commands :-P Ok, let me re-ponder that. |
I can give you the output from the first two commands...
...but you slightly lost me with nm. Haven't switched from VS yet (I can do that tomorrow), instead tried with dumpbin.exe /ALL but could find the initUserGemmClKernels, therefore I hooked up with debugger to make sure it's executed (it is) and calculated SHA1 hashes from both files (they are identical) to make sure they are the same (I assume that was the point behind those two lines). |
Added call stack for current error.. While running units tests, Based on the values and error (
|
(gah, ctrl-c causes evreything to be deleted, starting over again :-( ) Ok, will take a look through this. In the meantime, here is what is happening conceptually. I already typed htis once, and accidentlaly lost it, which is a bit annoying :-P Anyway, so for each of the unit tests, what happens is:
That's how it works without clblas. Note that the kernel is bound to a context. If you create a new context, you'll need to compile the kernel again. Now, let's throw clblas into the mix. this means we need to add in a clblasSetup, and a clblasTeardown, and let's add in a call to xgemm:
Ok, so far so good. that will all run ok. But what if we call xgemm twice. Within the same test. Well, it will reuse the existing kernel, that's ok, because it is the same context. But what is happening is that when we run multiple tests, one after the other, each test runs in its own opencl context. Therefore, it's vital that the kernel vairable, in clblas, is set bakc to 0. Otherwise it will try to use a kernel object created for a different context. And we'll get error -38. So, the test should call clblasTeardown at the end, and clblasTeardown should release the kernel (ideally), and set the kernel varaibel to 0 (essential). So, there are at least a copule of reasons why we might get error -38:
It's possible that both these reasons are occurring in fact... Anyway, this is how I see the issue conceptually. |
Hi Timmy, Thanks! Couple of questoins:
|
jakakonda, I'm going to see if I can create a test case that calls UserKernels, so I can check that / if the kernels are being cleaned during teardown. |
Hi jakakonda and Timmy, logged an issue for the 2.0 hardcoding issue at clMathLibraries/clBLAS#172 , including a sample test case, that fails ok on my machine too. |
Hi jakakonda, I've reproduced the bug you see above on my machine, see test case at clMathLibraries/clBLAS#169 (comment) , an dfixed these in patches to clblsa hughperkins/clBLAS@79ea756 and hughperkins/clBLAS@7d708e4 , and pushed this to branch clblas-2.8.0 of DeepCL. Dont suppose... can you pull down the latest version of DeepCL clblas-2.8.0 branch, and run |
Hi, sorry for long response.
And if there is anything interesting in full output for those failed tests. |
Cool :-) I think this is really closed this time :-) The two tests that failed, one is just, I probably should make it not so stochastic probably, and the other requires norb dataset, so they dont worry me. Cool, looks good :-) |
Everything quickly deviated from primary problem (VS2015 build problems), but everything got solved eventually :-). Thanks a lot! |
:-D I put your name in the 'recent changes' on the front page by the way, for helping with getting this working (*). Thank you for all your help with getting this working by the way. Very much appreciated :-) (*) If you dont want your name there, feel free to let me know by the way, and I can remove it. Up to you :-) |
Hi!
I'm trying to build DeepCL libraries from scratch with Visual Studio 2015, MSVC 14.0 compiler. I do know that is not officially supported, but I love the way this library is written (and supports OpenCL) and decided to give it a shoot.
The error bellow is shown ~4500 times on a couple of different lines and haven't got any ideas where to start dealing with this error.
I also tried building clBLAS as a standalone cloned from clMathLibraries/clBlas git repository and it works without any problems.
Same problem occurs on 32 and 64bit compiler settings, tested under Win 10 x64.
The text was updated successfully, but these errors were encountered: