support nogc concatenation in the outermost dimension #3

timotheecour · 2016-01-10T09:45:14Z

algo:

auto  cat!0(S0 a0, S1 a1) if ( isSlice!S0 && isSlice!S1) {
  auto shape0=a0.shape;
  auto shape1=a1.shape;
  shape0[0]+=shape1[0];
  assert(shape0[1..$]==shape1[1..$]);
  return chain(a0.byElement, a1.byElement).sliced(shape);
}

generalizes trivially to variadic arguments

The text was updated successfully, but these errors were encountered:

9il · 2016-01-10T09:46:46Z

This is slow. ByElement is random access range, but it is slow for random access.

timotheecour · 2016-01-10T10:11:51Z

why would it have to be used in a random-access way?
i would assume it to be faster than having to allocate the result of the concatenation. eg:

auto a=range0.sliced(m, n0)
auto b=range1.sliced(m, n1)
cat!0(a,b)[2..$-1, 2..$-1].transposed.imwrite("result.jpg");
// => imwrite requires sequential access to elements to obtain a non-strided D array, followed by calling libjpeg. No random access here.

alternative:

auto c=catAllocating(a,b)[2..$-1, 2..$-1].transposed.imwrite("result.jpg");

this is not good as not only it allocates, but it has to call byElement twice: once at the concatenation, and once at the end (inside materializing to an array in imwrite)

9il · 2016-01-10T10:21:35Z

why would it have to be used in a random-access way?

See section Binary Representation at struct Slice description.

cat!0(a,b)[2..$-1, 2..$-1].transposed.imwrite("result.jpg");

It is possible to implement fast version with input range primitives. However transposed and other functions would not work.

9il · 2016-01-10T10:23:07Z

But you will be able to use fast version to save images.

timotheecour · 2016-01-10T10:41:23Z

i think I see what you're trying to say:
you're saying: if the user uses
cat(a,b) in a random-access way, he should probably allocate first to get a pointer-backed slice so that random-accesses are fast.

I agree with that but i think it's good to give user the option to allocate or not in this case.
I would assume the example I gave above would indeed be faster with the nogc cat(a,b).

Furthermore, instead of catAllocating why not define a general function:

auto reallocate(S)(S a) if (isSlice!S){
  return a.byElement.array.sliced(a.shape);
}
// with another overload that takes a preallocated buffer

//apologies if it already exists in std.experimental.ndslice, but couldnt' find it!

Then: we don't need catAllocating, we can just call:

auto c=cat(a,b).reallocate;
// fast to use c in random access

9il · 2016-01-10T10:48:52Z

What is the use case for concatenation without allocation?
If you want so save a file, it is possible to do with allocation algorithm (but without memory allocation!).
An OutputRange can be used to save concatenation to memory or to a file. For GC allocation OutputRange is std,array.Appender and for file-streams it is std.stdio.lockingTextWriter (if I am not wrong).

timotheecour · 2016-01-10T11:04:15Z

What is use case for concatenation without allocation?

delayed allocation (after applying further transformations as in my above example) [1],
no allocation nor random access (eg: passing through some reduce operaton)

Concrete example inspired from http://jackstouffer.com/blog/nd_slice.html

"This function does not calculate border cases in which a window overlaps the image partially. However, the function can still be used to carry out such calculations. That can be done by creating an amplified image, with the edges reflected from the original image"

This is a perfect use case of cat (at least in the case of upper and lower borders):

auto image = ...
auto border_up=image(_,R(0,border).reverse);
auto border_down=image(_,R($-1,$-1-border,-1));
auto padded_image=cat(border_up, image, border_down);

// now call some algorithm that uses padded_image and is mostly non-random-access.
// movingWindowByChannel could maybe qualify since most of the access [windows] are sequential access, not random access.

[1] same rationale as "only to be assembled at the very end to avoid unnecessary allocations" in http://jackstouffer.com/blog/nd_slice.html

9il · 2016-01-10T11:24:00Z

Concrete example inspired from http://jackstouffer.com/blog/nd_slice.html

This is my example. And I am 99.99% sure that median filter will be faster with allocation.

timotheecour · 2016-01-10T12:21:52Z

This is my example.

I know :-), and I like it :-)

And I am 99.99% sure that median filter will be faster with allocation.

haven't tried w your example but see this simple test:

https://github.com/timotheecour/dtools/blob/master/dtools/scratch/bench_ndslice/test1.d
=> 2X speedup with lazy cat

I'm obviously not saying it'll always be faster without reallocate after cat, but that it can definitely be faster, so that makes it worth it to provide as a library for user. User can choose to reallocate or not for his use case.

9il · 2016-01-10T21:15:25Z

=> 2X speedup with lazy cat

Lazy cat of Slice over Iota (It not allocated at memory at all) VS Lazy cat + allocation into memory.
It is not fair. It is not from real world.

Fair:
Lazy cat of Slice over array VS Reallocation + (without cat) Slice over array.

BTW: indexSlice could help to create fast lazy cat.

timotheecour · 2016-01-11T02:21:11Z

the speedup was also 2X when I replaced:

auto image = iota(n0 * n1).sliced(n0, n1)[1 .. $, 1 .. $];

with

auto image = iota(n0 * n1).array.sliced(n0, n1)[1 .. $, 1 .. $];

Also, I made a more real-world example, see updated code in
https://github.com/timotheecour/dtools/blob/master/dtools/scratch/bench_ndslice/test1.d
which computes an integral image over a mirror-padded image.

The speedup with lazy cat is now 1.4X in this example.

I will take a look at indexSlice.

9il · 2016-01-11T03:17:28Z

I will take a look at indexSlice.

The idea is to propagate Slice primitives over a simple generic Cat (with multidimensonal auto opIndex(auto ref size_t[N] lengths) ) using IndexSlice.

add undefined type

9il · 2017-03-06T14:01:08Z

Partially solved by stack. We can add opIndex primitive to it, so it will be more flexible.

9il · 2017-03-06T17:18:01Z

libmir/mir-algorithm#29 fixes this issue.

auto s = stack!dim(...);
auto v = s.indexed(s.shape.ndiota);
static assert(isSlice!(typeof(v)));

This is more accurate and faster then flattened workaround.

9il · 2017-03-09T03:31:31Z

constant padding was added libmir/mir-algorithm#31

9il · 2017-03-09T07:58:14Z

edge, symmetric and wrap paddings libmir/mir-algorithm#32

9il · 2017-03-09T13:45:19Z

renamed http://docs.algorithm.dlang.io/latest/mir_ndslice_concatenation.html#.concatenation

9il added enhancement allocators labels Mar 25, 2016

dextorious mentioned this issue May 15, 2016

Performance issue with difference indexing syntax #215

Closed

wilzbach added a commit that referenced this issue Jun 27, 2016

Merge pull request #3 from 9il/generic_uniform

95a6f88

add undefined type

9il added the concatenation label Aug 9, 2016

9il closed this as completed Mar 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support nogc concatenation in the outermost dimension #3

support nogc concatenation in the outermost dimension #3

timotheecour commented Jan 10, 2016 •

edited by 9il

Loading

9il commented Jan 10, 2016

timotheecour commented Jan 10, 2016

9il commented Jan 10, 2016

9il commented Jan 10, 2016

timotheecour commented Jan 10, 2016

9il commented Jan 10, 2016

timotheecour commented Jan 10, 2016

9il commented Jan 10, 2016

timotheecour commented Jan 10, 2016

9il commented Jan 10, 2016

timotheecour commented Jan 11, 2016

9il commented Jan 11, 2016

9il commented Mar 6, 2017 •

edited

Loading

9il commented Mar 6, 2017

9il commented Mar 9, 2017

9il commented Mar 9, 2017

9il commented Mar 9, 2017

support nogc concatenation in the outermost dimension #3

support nogc concatenation in the outermost dimension #3

Comments

timotheecour commented Jan 10, 2016 • edited by 9il Loading

9il commented Jan 10, 2016

timotheecour commented Jan 10, 2016

9il commented Jan 10, 2016

9il commented Jan 10, 2016

timotheecour commented Jan 10, 2016

9il commented Jan 10, 2016

timotheecour commented Jan 10, 2016

9il commented Jan 10, 2016

timotheecour commented Jan 10, 2016

9il commented Jan 10, 2016

timotheecour commented Jan 11, 2016

9il commented Jan 11, 2016

9il commented Mar 6, 2017 • edited Loading

9il commented Mar 6, 2017

9il commented Mar 9, 2017

9il commented Mar 9, 2017

9il commented Mar 9, 2017

timotheecour commented Jan 10, 2016 •

edited by 9il

Loading

9il commented Mar 6, 2017 •

edited

Loading