Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slice and makeSlice should accept ranges #81

Closed
wilzbach opened this issue Apr 12, 2016 · 16 comments
Closed

slice and makeSlice should accept ranges #81

wilzbach opened this issue Apr 12, 2016 · 16 comments

Comments

@wilzbach
Copy link
Member

Reason: When one uses an input range like iota or iotaSlice which is doesn't allow modifications (=doesn't implement lvalue index), the following two don't work:

auto a = 6.iotaSlice(2, 3);
a[0, 0] = 1; // not an lvalue
auto b = 6.iota.sliced(2, 3);
b[0, 0] = 1; // not an lvalue

However when using the new allocation APIs, we can allocate memory:

auto a = 6.iotaSlice(2, 3).slice; // we create a modifyable copy
a[0, 0] = 1; 
auto b = 6.iota.array.sliced(2, 3); // or duplicate directly
b[0, 0] = 1;

Thus imho it would be nice if slice directly accepts a range, that fills the allocated array.
Basically a short wrapper around:

auto a = slice!int(2, 2);
a[] = iota(4).sliced(2, 2);

What do you think?

@9il
Copy link
Member

9il commented Apr 13, 2016

auto b = 6.iota.array.sliced(2, 3);

Current version is already clear auto b = 6.iota.sliced(2, 3).slice;

@9il
Copy link
Member

9il commented Apr 13, 2016

for ndslice module common range is Slice ;)

@wilzbach
Copy link
Member Author

Current version is already clear auto b = 6.iota.sliced(2, 3).slice;

... but not efficient as 6.iota.sliced(2, 3).slice gets translated to 6.iota.sliced(2, 3).byElement.array.sliced!replaceArrayWithPointer(slice.shape) - would be nice if we could avoid the duplicate allocation. slice already allocates, we just want to fill it.

@9il
Copy link
Member

9il commented Apr 13, 2016

No, it is already fast

@9il
Copy link
Member

9il commented Apr 13, 2016

see implementation

@wilzbach
Copy link
Member Author

would be nice if we could avoid the duplicate allocation. slice already allocates, we just want to fill it.

Yeah just saw it - ignore my complaints for now. sorry.

@wilzbach
Copy link
Member Author

It does make a huge difference - m1 and m3 take roughly twice the time (I guess the 20% are the overhead created from byElement) or am I missing something here?

fastest: 1
rel. diff for 0: 1.871
rel. diff for 1: 1.000
rel. diff for 2: 1.855
All unit tests have been run successfully.
dub test  21.39s user 0.38s system 99% cpu 21.772 total
auto m1()
{
    size_t k = 100, s1 = 10, s2 = 10;

    import std.range: iota;
    import mir.ndslice.slice: slice, sliced;
    auto r = iota(k).sliced(s1, s2).slice;
    assert(r[9, 9] == k - 1);
}

auto m2()
{
    size_t k = 100, s1 = 10, s2 = 10;

    import std.range: iota;
    import mir.ndslice.slice: slice;
    import mir.ndslice.selection: byElement;
    auto r = slice!ulong(s1, s2);
    auto range = iota(k);

    foreach(ref el; r.byElement)
    {
        el = range.front;
        range.popFront;
    }

    assert(r[9, 9] == k - 1);
}

auto m3()
{
    size_t k = 100, s1 = 10, s2 = 10;

    import std.range: iota;
    import mir.ndslice.slice: slice, sliced;
    import mir.ndslice.selection: byElement;
    auto r = slice!ulong(s1, s2);
    auto range = iota(k);

    r[] = range.sliced(s1, s2);

    assert(r[9, 9] == k - 1);
}

unittest
{
    auto n = 1_000_000;

    import std.datetime: benchmark;
    auto result = benchmark!(m1, m2, m3)(n);

    import std.stdio;
    import std.range: enumerate;
    import std.algorithm: map, minPos;

    auto rs = [result[0], result[1], result[2]].map!`a.length`;
    auto minEl = rs.enumerate.minPos!`a.value < b.value`.front;

    writeln("fastest: ", minEl.index);

    foreach (i, el; rs.enumerate)
        writefln("rel. diff for %d: %.3f", i , el * 1. / minEl.value);
}

@wilzbach wilzbach reopened this Apr 13, 2016
@9il
Copy link
Member

9il commented Apr 13, 2016

LDC results?

@wilzbach
Copy link
Member Author

LDC results?

nope - dmd, but with ldc it's still measurable!.

fastest: 1
rel. diff for 0: 1.225
rel. diff for 1: 1.000
rel. diff for 2: 1.273
All unit tests have been run successfully.
dub test --compiler=ldc  21.21s user 0.88s system 99% cpu 22.208 total

@9il
Copy link
Member

9il commented Apr 13, 2016

add --build=unittest-nobounds

@9il
Copy link
Member

9il commented Apr 13, 2016

add --build=unittest-nobounds

add --build=release-nobounds

@wilzbach
Copy link
Member Author

add --build=release-nobounds

ldc: Unknown command line argument '-disable-boundscheck'. Try: 'ldc -help'
ldc: Did you mean '-disable-dfa-sched'?

@9il
Copy link
Member

9il commented Apr 13, 2016

ok, lets just --build=release

@wilzbach
Copy link
Member Author

done - I was shocked a bit initially because the runtime went done from 20s to less than a second, but I added a global variable (which is printed) to add the result of the verification assert.

fastest: 2
rel. diff for 0: 2.079
rel. diff for 1: 1.780
rel. diff for 2: 1.000
true
dub --compiler=ldc --build=release  28.88s user 0.09s system 100% cpu 28.958 total

I verified the output by running in multiple orders.

Source

@9il
Copy link
Member

9il commented Apr 13, 2016

oh, I thought the implementation is differ. Thanks, will be fixed

@9il
Copy link
Member

9il commented Apr 13, 2016

Fixed

@9il 9il closed this as completed Apr 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants