Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance is generally weak for short inputs #11

Open
eyalroz opened this issue Apr 20, 2017 · 0 comments
Open

Performance is generally weak for short inputs #11

eyalroz opened this issue Apr 20, 2017 · 0 comments

Comments

@eyalroz
Copy link
Owner

eyalroz commented Apr 20, 2017

There's currently almost no attempt to optimize performance for short inputs - e.g. tweaking the serialization factor or even the core algorithm to allow for better parallelizing cases in which we can fill entire blocks the way we would like to.

This would require scrutiny of the kernels, and no less importantly the launch configuration resolution code (both the general mechanism and the kernel-specific parts).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant