Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NOP instructions in Matrix Multiplication #5

Closed
albertorodes opened this issue Jun 1, 2021 · 3 comments
Closed

NOP instructions in Matrix Multiplication #5

albertorodes opened this issue Jun 1, 2021 · 3 comments

Comments

@albertorodes
Copy link

albertorodes commented Jun 1, 2021

Hello!

I am trying to run a series of tests to compare the reliability of different versions of the Matrix Multiplication. The kernels that I am using have a parameter that allows to change the thread block size. I performed tests with this parameter set to 32x32 and had no problems or unexpected results. However, when I tried to change that parameter to 16x16 or 8x8 I started getting these types of results:

inspecting: voidmatrixMulCUDA<8>(float*,float*,float*,int,int)
num_static_instrs: 90
maxregs: 30(30)
Injection data
index: 0
kernel_name: voidmatrixMulCUDA<8>(float*,float*,float*,int,int)
ctas: 256
instrs: 10452992
grp 0: 0 grp 1: 2097152 grp 2: 3145728 grp 3: 278528 grp 4: 1671168 grp 5: 3260416 grp 6: 8781824 grp 7: 8503296
mask: 0x0
beforeVal: 0x0;afterVal: 0x0
regNo: -1
opcode: NOP
pcOffset: 0x0
tid: -1
Error not injected

I checked the injection file in the logs and found lines like this one in all the injections that failed:
1;voidmatrixMulCUDA<8>(float*,float*,float*,int,int);0;28898422;0.947758577437;0.204871567272:0x0:NOP: -1:0x0:15.610934:19::value_before0x0:value_after0x0

As I said, these injections on NOP instructions never happened with the 32x32 thread block size, but it happens almost 80% of the time with other values.

Thank you in advance!

@sivahari
Copy link
Contributor

sivahari commented Jun 1, 2021 via email

@albertorodes
Copy link
Author

I checked and yes, the profiler is running and generating different results depending on the parameters. However, it is true that the block size values that generate this errors have a much higher instruction count that the ones that don't generate any.
To be specific the profiler with a 32x32 block size counts 263168 instructions (doesn't generate any problems) and with a 8x8 block size it counts 1052672 (generates 80% of "not injected errors").
It could be something about the implementation, but the instruction count difference seems too large.

@sergicuen
Copy link

The issue was solve using the wordaround described here: Error not injected when threads/block different to 1024 #7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants