We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU的算法文件包含algorithmMap和kernelThreadMap,当模型仅包含一些简单OP(eltwise, power等)时,不需要对tiling等参数做搜索,这时algorithmMap就是空的,kernelThreadMap中仍然包含着这些OP的local搜索结果。
因此存在一种corner case:algorithmMap.size() == 0 && kernelThreadMap.size() > 0
这时void saveMapToFile() 就会出现bug,导致这种模型的local搜索结果不会被保存到算法文件中。从而,模型下次初始化时虽然链接了这个算法文件,仍然需要重新搜索local。这时模型的第一次执行就会非常慢。具体表现是-w 0和-w 1的执行时间差异非常明显。
void saveMapToFile()
The text was updated successfully, but these errors were encountered:
感谢您的反馈,确实会存在上述问题,可以尝试删除common/uni/include/algorithm_map.h 377行的if (targetMap.size() > 0)判断
Sorry, something went wrong.
No branches or pull requests
GPU的算法文件包含algorithmMap和kernelThreadMap,当模型仅包含一些简单OP(eltwise, power等)时,不需要对tiling等参数做搜索,这时algorithmMap就是空的,kernelThreadMap中仍然包含着这些OP的local搜索结果。
因此存在一种corner case:algorithmMap.size() == 0 && kernelThreadMap.size() > 0
这时
void saveMapToFile()
就会出现bug,导致这种模型的local搜索结果不会被保存到算法文件中。从而,模型下次初始化时虽然链接了这个算法文件,仍然需要重新搜索local。这时模型的第一次执行就会非常慢。具体表现是-w 0和-w 1的执行时间差异非常明显。The text was updated successfully, but these errors were encountered: