Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary integer moves #245

Closed
christiealappatt opened this issue Apr 15, 2020 · 4 comments
Closed

Unnecessary integer moves #245

christiealappatt opened this issue Apr 15, 2020 · 4 comments
Labels

Comments

@christiealappatt
Copy link

For some stencils the YASK compilers generates unnecessary integer move operations. This is caused due to the construction of unaligned vectors, although it is not required in many cases.

As an example see the attached RHS_LC stencil which for instance when compiled with fold of 'x=1,y=1,z=4‘ and radius 4 on arch=hsw has many unaligned vector constructions inside 'calc_loop_of_clusters‘, however most of them are not necessary. (I meant the code after the comment "Construct unaligned vector starting at ...“ in generated file yask_stencil_code.hpp)

These unnecessary moves generate unwanted ‚movq‘ instructions for AVX/AVX2 code, which hurts performance a lot for this kernel. For example on an Intel Haswell CPU (E5-2695) the performance gain by avoiding them was 1.8 x on 1 socket.

yask_movq_issue.zip

@chuckyount
Copy link
Contributor

Thanks for this report. I will investigate.

@chuckyount
Copy link
Contributor

I reproduced the issue. Looks like a bug all right. Debugging.

@chuckyount chuckyount added the bug label Apr 15, 2020
@chuckyount
Copy link
Contributor

chuckyount commented Apr 16, 2020

The issue turned out to be unexpected expressions ("t + 0" for example) in the stencil indices. These should now be recognized (as equivalent to "t" in the example) and handled properly. I pushed a change to the "develop" branch if you want to test it before I merge it into "master", probably sometime tomorrow assuming all regression tests pass. Performance of the test case you provided increased 1.86x on my system, consistent with your measurement.

@christiealappatt
Copy link
Author

The "t+0" occurs as the stencil itself is produced using a different tool. Thank you for the hotfix, it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants