You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PlaidML looks very interesting. It is very impressive that it can quickly produce very efficient kernels without providing any manual scheduling descriptions like e.g. in case of TVM or Halide!
But I've got a couple of questions about supporting CPUs and accelerators:
The internal representation and the model of a target in PlaidML seems to be rather OpenCL-like. It is not quite clear how well CPUs are supported by this approach. E.g. is the provided code generation for CPUs as good as for GPUs? Are there any benchmarks on CPUs comparing it with e.g. TF, Caffe2, TVM, Halide, etc?
Is it possible for PlaidML to support more complex kinds of targets like e.g. custom accelerators, which often have more layers of the memory hierarchy than OpenCL devices, a number of dedicated multipliers for tensors of specific sizes, support for quantized tensors, etc?
How would one model memory buffers at different levels of the memory hierarchy?
How would one model e.g. explicit transfers between the host memory and target memory or between different levels of the target's memory hierarchy?
Are there any plans for improvements in this area? Is PlaidML/Tile expressive enough to support such targets? Which parts of PlaidML codebase would need to be extended to support those targets? Would it need any conceptual changes (e.g. introducing new kinds of abstractions for memory hierarchy layers, etc) or just adding a new backend? How much would the increased target complexity affect e.g. the current PlaidML code for generating efficient kernels?
The text was updated successfully, but these errors were encountered:
PlaidML looks very interesting. It is very impressive that it can quickly produce very efficient kernels without providing any manual scheduling descriptions like e.g. in case of TVM or Halide!
But I've got a couple of questions about supporting CPUs and accelerators:
The internal representation and the model of a target in PlaidML seems to be rather OpenCL-like. It is not quite clear how well CPUs are supported by this approach. E.g. is the provided code generation for CPUs as good as for GPUs? Are there any benchmarks on CPUs comparing it with e.g. TF, Caffe2, TVM, Halide, etc?
Is it possible for PlaidML to support more complex kinds of targets like e.g. custom accelerators, which often have more layers of the memory hierarchy than OpenCL devices, a number of dedicated multipliers for tensors of specific sizes, support for quantized tensors, etc?
The text was updated successfully, but these errors were encountered: