[Backend] Utilizes no_alloc variant of JErasure to avoid heap allocation #1

windkit · 2016-01-19T04:59:29Z

Description

JErasure Library uses lots of heap allocation for temporary structure.
The large number of small allocation/deallocation may have performance impact.
When used in NIF, some memory issue is encountered (Random Segmentation Fault)

Purposed Solution

Added a separate code path which avoid memory allocation (structures are allocated by caller beforehand), putting temporary structure in stack.

Related Issue

leo-project/leofs#440

Relate PR

leo-project/jerasure#3
leo-project/jerasure#4
#1

mocchira · 2016-01-19T05:50:09Z

@windkit
Since we will use a large amount of stack with this PR,
In order to tune +sss Erlang emulator flag( http://www.erlang.org/doc/man/erl.html ),
we need to grasp the maximum amount of stack per thread in the worst case ( the most nested call tree including jerasure ).
Can you estimate that with the practical maximum k, m and w.
I'd recommend you using a runtime estimation. ( http://stackoverflow.com/questions/1756285/stack-size-estimation )

windkit · 2016-01-19T07:04:41Z

@mocchira Stack size is indeed one of my concerns about the PRs

The biggest usage happens in jerasure_schedule_decode_data_lazy

int smart_n[k*m*w*w+1][5];
int *smartptr[k*m*w*w+1][5];
...
// jerasure_generate_decoding_data_schedule_noalloc()
int real_decoding_matrix [k*w*(cdf+ddf)*w];
int decoding_matrix[k*k*w*w];
int inverse[k*k*w*w];

It would consume ~10*k*k*w*w*(size of int) for each instance.
Practically w = 8 (k + m < 2 ^ w) should be enough for Cauchy-RS, k = 20 (number of data), m < k.
In total, it would take ~2MB.

To set the suggested stack size, we need to first limit the possible parameters from users. Alternatively, we can move back these large structures to heap (with enif_alloc provided in erl_nif.h)

mocchira · 2016-01-19T07:22:17Z

@windkit Thanks.
2MB sounds reasonable to me.

Given that there are 32 cores (which typically means 32 scheduler threads on default VM parameters),
It will take approximately 64MB (32 * 2) in stack ( no problem ).

I may ask you to benchmark erasure coding features with different +sss settings.

mocchira · 2016-01-19T07:29:18Z

@windkit
Please link this information(#1) into somewhere around leo_erasure/README.md.

windkit · 2016-01-20T03:31:44Z

While the size of int varies with different platforms, it is usually 4 bytes.
Tested on Ubuntu 14.04.3 x64 with GCC 4.8.4 and Erlang 17.5, setting the suggested stack size to 64 kiloword (+sss 64) aka 512 KB works fine with Cauchy-RS {20,4,8} during decoding test.

I will start benchmarking the performance with different stack size.

mocchira · 2016-01-20T03:36:23Z

LGTM.
I will merge after finishing to benchmark.

windkit · 2016-01-22T00:08:38Z

Benchmark Results are uploaded. Cauchy-RS is used as it is the most heavy one in terms of stack size.

In short, the PR improves the performance a bit (~5%) and stack size (+sss) has no significant effect on the performance

Encoding

https://github.com/leo-project/notes/tree/master/leofs/benchmark/libs/leo_erasure/20160121_1m_cauchyrs_k10m4_t32_enc

Decoding

https://github.com/leo-project/notes/tree/master/leofs/benchmark/libs/leo_erasure/20160121_1m_cauchyrs_k10m4_t32_dec

mocchira · 2016-01-22T05:37:20Z

@windkit Thanks.
To make sure,
Can you do a long running test for around 8 hours?

windkit · 2016-01-22T05:42:23Z

Sure, I am running a 10 hours, encode+decode 1:1 test now.

yosukehara · 2016-01-22T05:59:51Z

@windkit After the current benchmark, I'd like to ask you to benchmark LeoFS v1.4.0 with latest leo_erasure for long duration - 3hours, 6hours and more, which is similar to 20151222_isars_k10m4_15m_r49w1_60min_1.

windkit · 2016-01-25T00:38:01Z

@mocchira Please find the long run test result at
https://github.com/leo-project/notes/tree/master/leofs/benchmark/libs/leo_erasure/20160122_1m_cauchyrs_k10m4_t32_10hr
With R:W = 1:1, the throughput is the average of the two, around 7000ops

@yosukehara I will now move on to the testing with LeoFS

windkit · 2016-01-25T23:14:40Z

@yosukehara @mocchira
Please find the benchmark with LeoFS at
https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/1.4/erasure_code/20160125_vandrs_15m_r49w1_60min

yosukehara · 2016-01-26T06:06:36Z

@windkit Thanks for benchmarking that. It is good result to me, and I've just recognized it is almost same result with 20151222_isars_k10m4_15m_r49w1_60min_1.

I'd like to ask you to benchmark both isars and vandrs w/LeoFS v1.4.0-pre.3-dev for long duration, 4hours or 6hours.

windkit · 2016-01-26T06:21:42Z

@yosukehara I will start testing the two coding scheme for 6 hours to check the stability.

mocchira · 2016-01-26T09:03:31Z

@windkit thanks.
I'll also check your PR for jerasure tomorrow and
merge both of PRs if there is no problem.

windkit · 2016-01-27T10:26:59Z

@yosukehara The 6 hr test result are uploaded at

JErasure Vand-RS (10,4)
https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/1.4/erasure_code/20160126_vandrs_15m_r49w1_360min

ISA-L RS (10,4)
https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/1.4/erasure_code/20160127_isars_15m_r49w1_360min

yosukehara · 2016-01-27T12:32:01Z

@windkit Thanks a lot.

mocchira · 2016-01-29T00:39:47Z

@windkit
To make sure,
Since jerasure code base was changed,
I'd recommend you to do primary benchmarks ( most longest long running etc... ) again.

windkit · 2016-01-29T00:52:24Z

@mocchira I am starting 6 hrs test for all the coding scheme supported.

windkit · 2016-02-01T00:04:50Z

@mocchira Benchmark Result could be found at
https://github.com/leo-project/notes/tree/master/leofs/benchmark/libs/leo_erasure/20160129_1m_allcode_k10m4_t32_mix_6hr

windkit · 2016-02-01T03:12:24Z

By mistake, I messed the branches, currently I am fixing them, please wait for the moment.

windkit · 2016-02-01T04:21:40Z

The problem has been fixed, sorry for that.

In the process I spotted another memory leak with Coding* getCoder in c_src/leo_erasure_nif.cpp, I will fix it separately.

[Backend] Utilizes no_alloc variant of JErasure to avoid heap allocation

mocchira · 2016-02-02T00:29:07Z

LGTM.
Thank you as always!

This was referenced Jan 19, 2016

Added no_alloc variant leo-project/jerasure#1

Merged

[leo_erasure] double free & memory leak and fragmentation leo-project/leofs#440

Closed

This was referenced Jan 19, 2016

Added no_alloc variant ceph/jerasure#5

Closed

Added no_alloc variant leo-project/jerasure#3

Merged

windkit force-pushed the no_alloc branch from 425e11e to 365ea67 Compare January 20, 2016 03:25

windkit force-pushed the no_alloc branch from 365ea67 to d64cc39 Compare January 27, 2016 09:13

windkit mentioned this pull request Jan 27, 2016

Added Selective decoding Options leo-project/jerasure#4

Merged

windkit force-pushed the no_alloc branch from d64cc39 to 4b6cd2b Compare January 29, 2016 00:47

windkit force-pushed the no_alloc branch 2 times, most recently from 4b6cd2b to 712e726 Compare January 29, 2016 05:33

windkit force-pushed the develop branch from ce867c5 to 7ec6767 Compare January 29, 2016 05:36

windkit force-pushed the no_alloc branch from 712e726 to fc6621d Compare January 29, 2016 05:41

windkit force-pushed the no_alloc branch 2 times, most recently from ff9eebb to f2219b3 Compare February 1, 2016 02:48

windkit force-pushed the develop branch from 9e72faa to 8fdc8c9 Compare February 1, 2016 02:59

windkit force-pushed the no_alloc branch 3 times, most recently from 9deea67 to d1b37d2 Compare February 1, 2016 03:08

windkit force-pushed the develop branch 3 times, most recently from 583b3c3 to 5215e59 Compare February 1, 2016 03:59

windkit force-pushed the no_alloc branch from d1b37d2 to 9125ccd Compare February 1, 2016 04:14

windkit force-pushed the no_alloc branch from 9125ccd to 91ab10d Compare February 1, 2016 05:03

Use noalloc variant of JErasure

4e3d856

windkit force-pushed the no_alloc branch from 91ab10d to 829d30a Compare February 1, 2016 05:10

windkit added 3 commits February 1, 2016 14:10

Removed jerasure_mod (Merged to leo-project/JErasure)

224313f

Added Galois Initialization

97d3540

Added notes about stack size for no_alloc changes

829d30a

mocchira added a commit that referenced this pull request Feb 2, 2016

Merge pull request #1 from windkit/no_alloc

59464fa

[Backend] Utilizes no_alloc variant of JErasure to avoid heap allocation

mocchira merged commit 59464fa into leo-project:develop Feb 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backend] Utilizes no_alloc variant of JErasure to avoid heap allocation #1

[Backend] Utilizes no_alloc variant of JErasure to avoid heap allocation #1

windkit commented Jan 19, 2016

mocchira commented Jan 19, 2016

windkit commented Jan 19, 2016

mocchira commented Jan 19, 2016

mocchira commented Jan 19, 2016

windkit commented Jan 20, 2016

mocchira commented Jan 20, 2016

windkit commented Jan 22, 2016

mocchira commented Jan 22, 2016

windkit commented Jan 22, 2016

yosukehara commented Jan 22, 2016

windkit commented Jan 25, 2016

windkit commented Jan 25, 2016

yosukehara commented Jan 26, 2016

windkit commented Jan 26, 2016

mocchira commented Jan 26, 2016

windkit commented Jan 27, 2016

yosukehara commented Jan 27, 2016

mocchira commented Jan 29, 2016

windkit commented Jan 29, 2016

windkit commented Feb 1, 2016

windkit commented Feb 1, 2016

windkit commented Feb 1, 2016

mocchira commented Feb 2, 2016

[Backend] Utilizes no_alloc variant of JErasure to avoid heap allocation #1

[Backend] Utilizes no_alloc variant of JErasure to avoid heap allocation #1

Conversation

windkit commented Jan 19, 2016

Description

Purposed Solution

Related Issue

Relate PR

mocchira commented Jan 19, 2016

windkit commented Jan 19, 2016

mocchira commented Jan 19, 2016

mocchira commented Jan 19, 2016

windkit commented Jan 20, 2016

mocchira commented Jan 20, 2016

windkit commented Jan 22, 2016

Encoding

Decoding

mocchira commented Jan 22, 2016

windkit commented Jan 22, 2016

yosukehara commented Jan 22, 2016

windkit commented Jan 25, 2016

windkit commented Jan 25, 2016

yosukehara commented Jan 26, 2016

windkit commented Jan 26, 2016

mocchira commented Jan 26, 2016

windkit commented Jan 27, 2016

yosukehara commented Jan 27, 2016

mocchira commented Jan 29, 2016

windkit commented Jan 29, 2016

windkit commented Feb 1, 2016

windkit commented Feb 1, 2016

windkit commented Feb 1, 2016

mocchira commented Feb 2, 2016