-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues when using density inside the Blosc meta-compressor #47
Comments
Hey @FrancescAlted Thanks for your issue !
Do you have an idea on these ? |
This is what I get on OS/X with the latest dev version : Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
BloscLZ: 1.0.5
LZ4: 1.7.0
Snappy: unknown
Zlib: 1.2.5
DENSITY: 0.12.6
Using compressor: density
Running suite: single
--> 4, 2097152, 8, 19, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 19 significant bits (out of 32)
Dataset size: 2097152 bytes Type size: 8 bytes
Working set: 256.0 MB Number of threads: 4
********************** Running benchmarks *********************
memcpy(write): 595.1 us, 3360.7 MB/s
memcpy(read): 218.6 us, 9149.1 MB/s
Compression level: 0
comp(write): 331.4 us, 6034.9 MB/s Final bytes: 2097168 Ratio: 1.00
decomp(read): 214.7 us, 9313.6 MB/s OK
Compression level: 1
comp(write): 2216.0 us, 902.5 MB/s Final bytes: 1204240 Ratio: 1.74
decomp(read): 537.3 us, -0.0 MB/s FAILED. Error code: -1
OK
Compression level: 2
comp(write): 2206.0 us, 906.6 MB/s Final bytes: 1204240 Ratio: 1.74
decomp(read): 699.4 us, -0.0 MB/s FAILED. Error code: -1
OK
Compression level: 3
comp(write): 2218.4 us, 901.5 MB/s Final bytes: 1204240 Ratio: 1.74
decomp(read): 737.3 us, -0.0 MB/s FAILED. Error code: -1
OK
Compression level: 4
comp(write): 1621.4 us, 1233.5 MB/s Final bytes: 1159184 Ratio: 1.81
decomp(read): 1165.2 us, -0.0 MB/s FAILED. Error code: -1
OK
Compression level: 5
comp(write): 1390.6 us, 1438.2 MB/s Final bytes: 1159184 Ratio: 1.81
decomp(read): 1189.5 us, -0.0 MB/s FAILED. Error code: -1
OK
Compression level: 6
comp(write): 949.2 us, 2106.9 MB/s Final bytes: 1136656 Ratio: 1.85
decomp(read): 1355.1 us, -0.0 MB/s FAILED. Error code: -1
OK
Compression level: 7
comp(write): 743.6 us, 2689.6 MB/s Final bytes: 1125520 Ratio: 1.86
decomp(read): 1497.2 us, -0.0 MB/s FAILED. Error code: -1
OK
Compression level: 8
comp(write): 761.6 us, 2626.1 MB/s Final bytes: 1125520 Ratio: 1.86
decomp(read): 1562.7 us, -0.0 MB/s FAILED. Error code: -1
OK
Compression level: 9
comp(write): 785.8 us, 2545.2 MB/s Final bytes: 1119824 Ratio: 1.87
decomp(read): 1980.3 us, -0.0 MB/s FAILED. Error code: -1
OK
Round-trip compr/decompr on 7.5 GB
Elapsed time: 9.6 s, 1751.7 MB/s So it's very similar to your results. I'll need to check what's going on. |
Thanks for the speedy response. Yes, what blosc does is basically split the data to be compressed in small blocks (in order to use L1 as efficiently as possible, but also for leveraging multi-threading). It then applies a shuffle filter (it does not compress as such, but it helps compressors to achieve better compression ratios in many scenarios of binary data) and then pass the shuffled data to the compressor. More info about how it works in the 10 first minutes of this presentation: https://www.youtube.com/watch?v=E9q33wbPCGU Regarding the size of the blocks (I suppose this is important for density), they are typically between 8 KB and up to around 1 MB, depending on the compression level, the data type size and the compressor that is going to be used. See the algorithm that computes block sizes here: https://github.com/FrancescAlted/c-blosc/blob/density/blosc/blosc.c#L918 Please tell me if you need more clarifications. I am eager to use DENSITY inside Blosc because I think it is a good fit, but I am trying to understand it first (then I will need to figure out how to use C89 and C99 code in the same project ;) |
Oh, and regarding the question of using just Chameleon is because I am trying. If everything goes well, the idea is to use Chameleon for low compression levels and Cheetah for higher ones. Then, depending on how slow compression is, I might decide to use Lion for the highest compression level. I suppose I can use |
Ok I got everything to work properly using the following patch applied to your density tree : https://gist.github.com/gpnuma/e159fb6b505ef9b11e00 . Here is a test run : Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
BloscLZ: 1.0.5
LZ4: 1.7.0
Snappy: unknown
Zlib: 1.2.5
DENSITY: 0.12.6
Using compressor: density
Running suite: single
--> 4, 8388608, 8, 32, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes Type size: 8 bytes
Working set: 256.0 MB Number of threads: 4
********************** Running benchmarks *********************
memcpy(write): 2526.0 us, 3167.0 MB/s
memcpy(read): 1291.0 us, 6196.7 MB/s
Compression level: 0
comp(write): 1101.3 us, 7264.3 MB/s Final bytes: 8388624 Ratio: 1.00
decomp(read): 1313.1 us, 6092.6 MB/s OK
Compression level: 1
comp(write): 2871.6 us, 2785.9 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 2388.5 us, 3349.3 MB/s OK
Compression level: 2
comp(write): 2750.1 us, 2909.0 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 2395.7 us, 3339.3 MB/s OK
Compression level: 3
comp(write): 2749.2 us, 2910.0 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 2407.5 us, 3323.0 MB/s OK
Compression level: 4
comp(write): 2977.3 us, 2687.0 MB/s Final bytes: 4511568 Ratio: 1.86
decomp(read): 2269.7 us, 3524.7 MB/s OK
Compression level: 5
comp(write): 3043.9 us, 2628.2 MB/s Final bytes: 4511568 Ratio: 1.86
decomp(read): 2270.0 us, 3524.2 MB/s OK
Compression level: 6
comp(write): 4438.5 us, 1802.4 MB/s Final bytes: 3622608 Ratio: 2.32
decomp(read): 4439.0 us, 1802.2 MB/s OK
Compression level: 7
comp(write): 4256.3 us, 1879.6 MB/s Final bytes: 3601120 Ratio: 2.33
decomp(read): 4279.2 us, 1869.5 MB/s OK
Compression level: 8
comp(write): 4248.0 us, 1883.2 MB/s Final bytes: 3601120 Ratio: 2.33
decomp(read): 4408.4 us, 1814.7 MB/s OK
Compression level: 9
comp(write): 11095.0 us, 721.0 MB/s Final bytes: 1887328 Ratio: 4.44
decomp(read): 12044.7 us, 664.2 MB/s OK
Round-trip compr/decompr on 7.5 GB
Elapsed time: 7.9 s, 2141.1 MB/s I set the significant bits to 32 otherwise the data to compress isn't very interesting (it's like processing a file full of zeroes). Here is a sample run with snappy, which exhibits a similar - although lower (1.60) - containment in compression ratio : Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
BloscLZ: 1.0.5
LZ4: 1.7.0
Snappy: unknown
Zlib: 1.2.5
DENSITY: 0.12.6
Using compressor: snappy
Running suite: single
--> 4, 8388608, 8, 32, snappy
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes Type size: 8 bytes
Working set: 256.0 MB Number of threads: 4
********************** Running benchmarks *********************
memcpy(write): 2402.9 us, 3329.3 MB/s
memcpy(read): 1203.4 us, 6648.0 MB/s
Compression level: 0
comp(write): 1345.3 us, 5946.4 MB/s Final bytes: 8388624 Ratio: 1.00
decomp(read): 1285.3 us, 6224.3 MB/s OK
Compression level: 1
comp(write): 6389.5 us, 1252.1 MB/s Final bytes: 5232684 Ratio: 1.60
decomp(read): 2433.4 us, 3287.5 MB/s OK
Compression level: 2
comp(write): 4867.7 us, 1643.5 MB/s Final bytes: 5232684 Ratio: 1.60
decomp(read): 2394.4 us, 3341.1 MB/s OK
Compression level: 3
comp(write): 4901.1 us, 1632.3 MB/s Final bytes: 5232684 Ratio: 1.60
decomp(read): 2389.7 us, 3347.6 MB/s OK
Compression level: 4
comp(write): 5716.6 us, 1399.4 MB/s Final bytes: 3990010 Ratio: 2.10
decomp(read): 2806.1 us, 2850.9 MB/s OK
Compression level: 5
comp(write): 5746.6 us, 1392.1 MB/s Final bytes: 3990010 Ratio: 2.10
decomp(read): 2786.3 us, 2871.2 MB/s OK
Compression level: 6
comp(write): 6050.9 us, 1322.1 MB/s Final bytes: 3339270 Ratio: 2.51
decomp(read): 2944.6 us, 2716.8 MB/s OK
Compression level: 7
comp(write): 6181.5 us, 1294.2 MB/s Final bytes: 3012514 Ratio: 2.78
decomp(read): 3119.4 us, 2564.6 MB/s OK
Compression level: 8
comp(write): 6235.0 us, 1283.1 MB/s Final bytes: 3012514 Ratio: 2.78
decomp(read): 3143.5 us, 2544.9 MB/s OK
Compression level: 9
comp(write): 5757.8 us, 1389.4 MB/s Final bytes: 2558737 Ratio: 3.28
decomp(read): 3115.5 us, 2567.8 MB/s OK
Round-trip compr/decompr on 7.5 GB
Elapsed time: 8.1 s, 2097.7 MB/s The workaround for output buffer size I used in the aforementioned patch will be fixed in 0.12.6 as a set of function which precisely define the minimum output buffer size for compression/decompression will appear. |
Oh yeah, I forgot to mention : this was compiled and run against the latest dev branch version. Overall, if I may add, I think you should test blosc against a real file instead of synthetic data. Your current method has the advantage of creating very precise entropy levels but its drawback is that it does not represent anything real. |
Hmm, something is going wrong in my machine (Ubuntu 14.10 / clang 3.5):
The above is with the dev branch. With master:
So that's not any better. |
Did you try to apply the patch I provided to c-blosc ? |
Ah, nope. I applied (part of) it here: FrancescAlted/c-blosc@f505fd8 . With this, I am no getting segfaults anymore:
BTW, I am not changing the block size in benchmark because the current one (2 MB) is already a bit large for chunked datasets (for a hint on why small data chunks are important to us, see http://bcolz.blosc.org/). Curiously enough, density works best without threading:
Not sure exactly why. |
Regarding your suggestion of testing Blosc on actual data, well, the gist of it is to work as a compressor for binary data, where zero bytes are, by far, the most common used. Also, the whole point about using the shuffle filter is to increase the probability of finding a run of zeroed bytes in buffers. The fact is that Blosc works pretty well in practice as you can see for example in: https://www.youtube.com/watch?v=TZdqeEd7iTM or https://www.youtube.com/watch?v=kLP83HZvbfQ |
That is very strange in regards to threading. On my test platform (Core i7 OS/X) here is what I get : 1 thread $ bench/bench density single 1
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
BloscLZ: 1.0.5
LZ4: 1.7.0
Snappy: unknown
Zlib: 1.2.5
DENSITY: 0.12.6
Using compressor: density
Running suite: single
--> 1, 8388608, 8, 32, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes Type size: 8 bytes
Working set: 256.0 MB Number of threads: 1
********************** Running benchmarks *********************
memcpy(write): 2366.5 us, 3380.5 MB/s
memcpy(read): 1228.9 us, 6509.6 MB/s
Compression level: 0
comp(write): 1268.7 us, 6305.8 MB/s Final bytes: 8388624 Ratio: 1.00
decomp(read): 1374.2 us, 5821.7 MB/s OK
Compression level: 1
comp(write): 8289.4 us, 965.1 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 6334.8 us, 1262.9 MB/s OK
Compression level: 2
comp(write): 8155.4 us, 980.9 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 6509.8 us, 1228.9 MB/s OK
Compression level: 3
comp(write): 8433.1 us, 948.6 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 6459.7 us, 1238.4 MB/s OK
Compression level: 4
comp(write): 6900.0 us, 1159.4 MB/s Final bytes: 4511568 Ratio: 1.86
decomp(read): 4903.2 us, 1631.6 MB/s OK
Compression level: 5
comp(write): 6945.7 us, 1151.8 MB/s Final bytes: 4511568 Ratio: 1.86
decomp(read): 4941.9 us, 1618.8 MB/s OK
Compression level: 6
comp(write): 8646.8 us, 925.2 MB/s Final bytes: 3622608 Ratio: 2.32
decomp(read): 9722.9 us, 822.8 MB/s OK
Compression level: 7
comp(write): 7820.2 us, 1023.0 MB/s Final bytes: 3601120 Ratio: 2.33
decomp(read): 8835.1 us, 905.5 MB/s OK
Compression level: 8
comp(write): 7845.3 us, 1019.7 MB/s Final bytes: 3601120 Ratio: 2.33
decomp(read): 8817.7 us, 907.3 MB/s OK
Compression level: 9
comp(write): 21697.2 us, 368.7 MB/s Final bytes: 1887328 Ratio: 4.44
decomp(read): 23950.2 us, 334.0 MB/s OK
Round-trip compr/decompr on 7.5 GB
Elapsed time: 16.5 s, 1022.6 MB/s 2 threads $ bench/bench density single 2
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
BloscLZ: 1.0.5
LZ4: 1.7.0
Snappy: unknown
Zlib: 1.2.5
DENSITY: 0.12.6
Using compressor: density
Running suite: single
--> 2, 8388608, 8, 32, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes Type size: 8 bytes
Working set: 256.0 MB Number of threads: 2
********************** Running benchmarks *********************
memcpy(write): 2292.8 us, 3489.3 MB/s
memcpy(read): 1232.9 us, 6488.8 MB/s
Compression level: 0
comp(write): 1088.8 us, 7347.3 MB/s Final bytes: 8388624 Ratio: 1.00
decomp(read): 1307.0 us, 6120.7 MB/s OK
Compression level: 1
comp(write): 4619.7 us, 1731.7 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 3784.3 us, 2114.0 MB/s OK
Compression level: 2
comp(write): 4642.2 us, 1723.3 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 3688.3 us, 2169.0 MB/s OK
Compression level: 3
comp(write): 4585.2 us, 1744.7 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 3743.4 us, 2137.1 MB/s OK
Compression level: 4
comp(write): 3968.9 us, 2015.7 MB/s Final bytes: 4511568 Ratio: 1.86
decomp(read): 2929.8 us, 2730.5 MB/s OK
Compression level: 5
comp(write): 3946.0 us, 2027.4 MB/s Final bytes: 4511568 Ratio: 1.86
decomp(read): 2964.6 us, 2698.5 MB/s OK
Compression level: 6
comp(write): 5236.9 us, 1527.6 MB/s Final bytes: 3622608 Ratio: 2.32
decomp(read): 5659.9 us, 1413.5 MB/s OK
Compression level: 7
comp(write): 6199.0 us, 1290.5 MB/s Final bytes: 3601120 Ratio: 2.33
decomp(read): 6393.8 us, 1251.2 MB/s OK
Compression level: 8
comp(write): 6170.7 us, 1296.4 MB/s Final bytes: 3601120 Ratio: 2.33
decomp(read): 6286.6 us, 1272.5 MB/s OK
Compression level: 9
comp(write): 10581.0 us, 756.1 MB/s Final bytes: 1887328 Ratio: 4.44
decomp(read): 11585.6 us, 690.5 MB/s OK
Round-trip compr/decompr on 7.5 GB
Elapsed time: 9.9 s, 1699.6 MB/s 4 threads $ bench/bench density single 4
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
List of supported compressors in this build: blosclz,lz4,lz4hc,snappy,zlib,density
Supported compression libraries:
BloscLZ: 1.0.5
LZ4: 1.7.0
Snappy: unknown
Zlib: 1.2.5
DENSITY: 0.12.6
Using compressor: density
Running suite: single
--> 4, 8388608, 8, 32, density
********************** Run info ******************************
Blosc version: 1.7.0.dev ($Date:: 2015-05-27 #$)
Using synthetic data with 32 significant bits (out of 32)
Dataset size: 8388608 bytes Type size: 8 bytes
Working set: 256.0 MB Number of threads: 4
********************** Running benchmarks *********************
memcpy(write): 2379.6 us, 3362.0 MB/s
memcpy(read): 1199.0 us, 6672.4 MB/s
Compression level: 0
comp(write): 1090.6 us, 7335.2 MB/s Final bytes: 8388624 Ratio: 1.00
decomp(read): 1305.6 us, 6127.5 MB/s OK
Compression level: 1
comp(write): 2906.1 us, 2752.9 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 2453.8 us, 3260.3 MB/s OK
Compression level: 2
comp(write): 2772.4 us, 2885.6 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 2427.6 us, 3295.4 MB/s OK
Compression level: 3
comp(write): 2786.6 us, 2870.9 MB/s Final bytes: 4566672 Ratio: 1.84
decomp(read): 2404.4 us, 3327.3 MB/s OK
Compression level: 4
comp(write): 2714.1 us, 2947.5 MB/s Final bytes: 4511568 Ratio: 1.86
decomp(read): 2168.6 us, 3689.0 MB/s OK
Compression level: 5
comp(write): 2717.3 us, 2944.1 MB/s Final bytes: 4511568 Ratio: 1.86
decomp(read): 2152.0 us, 3717.5 MB/s OK
Compression level: 6
comp(write): 4490.2 us, 1781.7 MB/s Final bytes: 3622608 Ratio: 2.32
decomp(read): 4443.0 us, 1800.6 MB/s OK
Compression level: 7
comp(write): 4247.7 us, 1883.4 MB/s Final bytes: 3601120 Ratio: 2.33
decomp(read): 4253.4 us, 1880.9 MB/s OK
Compression level: 8
comp(write): 4250.4 us, 1882.2 MB/s Final bytes: 3601120 Ratio: 2.33
decomp(read): 4271.5 us, 1872.9 MB/s OK
Compression level: 9
comp(write): 11015.6 us, 726.2 MB/s Final bytes: 1887328 Ratio: 4.44
decomp(read): 12085.9 us, 661.9 MB/s OK
Round-trip compr/decompr on 7.5 GB
Elapsed time: 7.8 s, 2166.7 MB/s So threading is visibly improving things, apart maybe for the 4-thread lion vs 2-thread. |
But after further comparisons yes you're right, it seems like snappy for example scales better with multithreading (goes from 25.5s on 1-thread to 8.2s on 4-thread which is 3 times faster). BTW there is a slight overhead in setting up a buffer in density as buffer initialization involves some malloc, that's why I had increased the blocksize and maybe that's the reason heavy multithreading is not helping a lot with small block sizes (the small overhead in setting up compression is probably what actually limits the scalability). In regards to blosc and binary data, yes I understand what you are trying to do ! The only problem with random data is that you actually deny any obvious "patterns" in non-zero data which inevitably appear when manipulating "human" data. |
Yes, the malloc call inside density could be the root of poor threading scalability. Thanks for willing to tackle this. Blosc does not shuffle using 8 bytes blocks by default, but rather the size of the datatype that you are compressing (2 for short int, 4 for int and float32, 8 for long int and float64 and other sizes for structs too). Using this datatype size is critical for the reasons explained in the talks above. Regarding real data, you may want to have a look at this notebook: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb where real data is being used and where you can see that compression ratio can reach a 20x in this case. Also, it can be seen that some operations takes less time (on decent modern computer) on compressed datasets than in uncompressed ones. |
Needs retesting with 0.14.0 |
Hi, I am trying to add support for DENSITY into the Blosc meta-compressor. So, right now, my attempt lives here: https://github.com/FrancescAlted/c-blosc/tree/density, and in particular, you can see how DENSITY is called here: https://github.com/FrancescAlted/c-blosc/blob/density/blosc/blosc.c#L504
However, I am running into issues when selecting the DENSITY codec:
The above is with 'master' branch (refreshed some minutes ago). With the 'dev' of DENSITY, I get somewhat better results:
So, I suppose DENSITY is still in beta, but please consider c-blosc as a another testing bench. Second, I wonder why the speed is so low. For example, by using the LZ4 codec I am getting this:
which is roughly 10x faster.
In case you want to experiment by yourself, the support for DENSITY in c-blosc is via a shared library for now (requiring C99 is not supported right now in c-blosc because it has to have support for other codecs that are non-C99 compliant code). So, in case the shared libraries for DENSITY are installed in the system (say /usr/local/lib and /usr/local/include for headers), here it is how to compile c-blosc:
Thanks!
The text was updated successfully, but these errors were encountered: