Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about supporting CPUs and accelerators #125

Closed
romix opened this issue May 21, 2018 · 0 comments
Closed

Question about supporting CPUs and accelerators #125

romix opened this issue May 21, 2018 · 0 comments

Comments

@romix
Copy link

romix commented May 21, 2018

PlaidML looks very interesting. It is very impressive that it can quickly produce very efficient kernels without providing any manual scheduling descriptions like e.g. in case of TVM or Halide!

But I've got a couple of questions about supporting CPUs and accelerators:

  • The internal representation and the model of a target in PlaidML seems to be rather OpenCL-like. It is not quite clear how well CPUs are supported by this approach. E.g. is the provided code generation for CPUs as good as for GPUs? Are there any benchmarks on CPUs comparing it with e.g. TF, Caffe2, TVM, Halide, etc?

  • Is it possible for PlaidML to support more complex kinds of targets like e.g. custom accelerators, which often have more layers of the memory hierarchy than OpenCL devices, a number of dedicated multipliers for tensors of specific sizes, support for quantized tensors, etc?

    • How would one model memory buffers at different levels of the memory hierarchy?
    • How would one model e.g. explicit transfers between the host memory and target memory or between different levels of the target's memory hierarchy?
    • Are there any plans for improvements in this area? Is PlaidML/Tile expressive enough to support such targets? Which parts of PlaidML codebase would need to be extended to support those targets? Would it need any conceptual changes (e.g. introducing new kinds of abstractions for memory hierarchy layers, etc) or just adding a new backend? How much would the increased target complexity affect e.g. the current PlaidML code for generating efficient kernels?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants