Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmallptGPU regression #44

Closed
kraiskil opened this issue Dec 17, 2013 · 11 comments
Closed

SmallptGPU regression #44

kraiskil opened this issue Dec 17, 2013 · 11 comments
Milestone

Comments

@kraiskil
Copy link
Member

This example program http://davibu.interfree.it/opencl/smallptgpu/smallptGPU.html works with current pocl when compiled against LLVM 3.3-scripts, API mode or LLVM 3.4 does not work.
The above example needs this patch to compile: https://gist.github.com/kraiskil/8004743

Seemingly newer, but of lesser version number, ocl-toys are available here: http://code.google.com/p/ocltoys/

@pjaaskel
Copy link
Member

And LLVM 3.4-scripts doesn't work either?

@kraiskil
Copy link
Member Author

No. 3.4 assert in LLVM, possibly 3.3-api does too (not sure if I have a debug build of it). Only 3.3-scripts looks correct.

@kraiskil
Copy link
Member Author

At least one sort of failure is due to 49af9a9 (adding of -DNDEBUG) when LLVM has debug symbols. Guess we need to check at ./configure time if LLVM is built with or without NDEBUG, and set CXXFLAGS according to this.

@pjaaskel
Copy link
Member

I checked out SmallPTGPU v1.0 from code.google.com and it seems to work fine (LLVM 3.4+LLVMAPI). Something like 1.8M samples/sec with this Intel 4-core CPU. How should it break exactly? The version you used is different as I could not find the file to patch. Are you sure this is not a bug in the app itself that causes undef behavior?

@kraiskil
Copy link
Member Author

With 167840e, that fixes 49af9a9, I don't see the asserts anymore on any build.

@fabiand
Copy link
Contributor

fabiand commented Jan 1, 2014

Sorry - the latest master doesn't fix this issue for me, I'm seeing:

(gdb) bt
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:171
#1  0x00007ffff7290da8 in POclSetKernelArg (kernel=0xffffffffe80008e0, arg_index=3975150316, arg_size=4, arg_value=0x7fffeceffaec)
    at clSetKernelArg.c:72
#2  0x0000003eb620aa4f in clSetKernelArg () from /lib64/libOpenCL.so.1
#3  0x000000000045b395 in SmallPTGPU::RenderThreadImpl(SmallPTGPU*, unsigned int) ()
#4  0x0000000000495be4 in thread_proxy ()
#5  0x0000003eb6e07f33 in start_thread (arg=0x7fffecf00700) at pthread_create.c:309
#6  0x0000003eb66f4ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) 

@fabiand
Copy link
Contributor

fabiand commented Jan 1, 2014

With the following cpu:

$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 20
model       : 1
model name  : AMD E-350 Processor
stepping    : 0
microcode   : 0x5000029
cpu MHz     : 800.000
cache size  : 512 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 6
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter
bogomips    : 3193.41
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
...

@kraiskil kraiskil reopened this Jan 1, 2014
@pjaaskel
Copy link
Member

pjaaskel commented Jan 2, 2014

OK, I tracked this in the fabiand's box. This is somehow related to the lack of aligned_malloc in that system. It complains about it during compilation but compiles OK. After I replaced those calls with the plain malloc it works.

What I do not yet understand is which function it ends up calling for aligned_malloc() if it doesn't find the function (there is no linkage error). The result of the call is an illegal pointer which causes the crash when it tries to copy the argument data to the parameter buffer. The proper fix is to find or implement a replacement for aligned_malloc that works also in this fedora box correctly.

@pjaaskel
Copy link
Member

pjaaskel commented Jan 2, 2014

Actually it was the other way around: it has aligned_alloc() which is used for aligned allocation instead of posix_memalign which works. If I force POSIX_MEMALIGN, it works (added these to the pocl_util.h):

+#undef HAVE_ALIGNED_ALLOC
+#define HAVE_POSIX_MEMALIGN

@pjaaskel
Copy link
Member

pjaaskel commented Jan 2, 2014

"The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment." http://en.cppreference.com/w/c/memory/aligned_alloc. The alignment can be bigger than the size, I think, thus it might not work. I think we can fix this by just removing the possibility to use aligned_alloc. If there is no posix_memalign we have the custom fallback function already. Testing this.

@fabiand
Copy link
Contributor

fabiand commented Jan 2, 2014

Yes Pekka! @pjaaskel - That was it. Runs now smooth and fine :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants