-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory error on Power 8 IBM machine #6
Comments
@jtian0 @codyjrivera @sheltongeosx Hi Shelton, Thanks for testing. For your questions:
Please test cuSZ to first understand the compression quality and throughput on your datasets. We are actively developing these features which will released in October. Thanks for your understanding about the delay. Hi @codyjrivera , It seems we have a bug in radix_sort function for Power8+P100 machine. Do you have any idea on this? Thanks, |
@codyjrivera @dingwentao This seems a known bug from |
Hi @sheltongeosx @MauricioAP @jiemeng-total We have released cuSZ v0.1.1. Pleae check it out. The major changes include:
For the reported bug, as pointed by @jtian0, it seems from thrust with CUDA 10 on P100. We are testing cuSZ with CUDA 9 on P100 and see if the issue will be solved. Thanks, |
@dingwentao, @MauricioAP, @jiemeng-total Hi Dingwen, thank you very much for your updates! The following are what I experienced in running the new code:
Best, |
Hi Shelton, Thanks for your followup. Yes, we did notice this issue when we were testing with GCC 6. Are you using GCC 6 with cuSZ? If so, can you try with GCC 7.3? As we mentioned in README, cuSZ requires GCC 7.3+. We'll investigate soon what triggers cuSZ free() error with GCC 6. Thanks! |
Hi Dingwen, I am using gcc./7.3.0 and cuda/10.1. Shelton |
Hi Shelton, Thanks for the information. Yes, you're right. It doesn't make any sense that *.outlier is larger than *.dat, since *.outlier stores the unpredictable data which are only a small portion of the original data. I'm wondering if you might be willing to share with us the input data (if it's not sensitive) and your compression configuration to help us investigate the bug? Please contact me directly via dingwen.tao@wsu.edu. Thanks! Best, |
Hi @sheltongeosx @MauricioAP @jiemeng-total, We have released cuSZ v0.1.3. Please check it out. The two main updates are as follows:
The bug that broke cuSZ on Pascal GPUs turned out to be unrelated to Thrust as we first suspected, and we fixed our code accordingly. I have tested our new release on a few different Pascal GPUs, on CUDA 10.1 and CUDA 11.0 (see I hope this new release fixes the issue you brought up in your first message. Best, |
Hi @sheltongeosx @MauricioAP @jiemeng-total,
Thanks, |
@dingwentao @MauricioAP @jiemeng-total Hi Dingwen,
Hope this is helpful. Best, |
@sheltongeosx |
Hi @sheltongeosx @MauricioAP @jiemeng-total, Please check out our last commit 81248dd. Below is the output tested on V100 GPU. Compression with breakdown time:
So, 5.687s in total: (time loading + write Huff. bin + tar) for 3.592s, 2.095s for "real" cuSZ. Considering 2.095s includes CUDA malloc host (~ 1 second) which is a one-time cost, if enabling cuSZ library API in the future, the performance will be improved a lot. We're actively working on the development of cuSZ API and library support. Verified decompression:
Please let us know if this will work on your P100 and Parihaka dataset. Thanks for your help! Best, |
@dingwentao @MauricioAP @jiemeng-total, Hi Dingwen,
But the issue is that the data looks has not been compressed much:
Even worse for -e=0.000001:
Hope you are able to repeat the issues. Best |
Hi @sheltongeosx, Thanks for your quick feedback. Yes, based on our observation, the relative error bound of 1e-4 on this dataset can only provide a compression ratio of about 5, so worse with 1e-5 or 1e-6 is reasonable. I'm not sure if you've tested the data with CPU SZ. Based on our preliminary experiment, this dataset is very hard to compress with SZ as well. Considering cuSZ is a variant of SZ focusing on the compression speed improvement, I recommend you work with our SZ team (szlossycompressor@gmail.com) to improve the compression algorithm for higher ratio first. Then, cuSZ can incorporate this algorithm into the code. Again, thanks a lot for helping us solve several issues for large and hard-to-compress data. Best, |
@dingwentao, @MauricioAP, @jiemeng-total
Hi Dingwen,
Thank you very much for letting me know that the cuda version has been released!
Here are couple of issues/questions:
cusz_errmsg.txt
Best,
Shelton
The text was updated successfully, but these errors were encountered: