ENH: Implement radix sort #12586

hameerabbasi · 2018-12-18T14:46:01Z

Fixes #6050

This PR implements radix/counting sort for integers.

cc: @mattip A review would be nice, along with next steps. Feel free to force-push if need be.

numpy/core/src/npysort/radixsort.c.src

mattip · 2018-12-18T18:45:37Z

tests and benchmarks come before making radix the default for integers.

hameerabbasi · 2018-12-18T19:23:56Z

Looks like I need to implement a generic npy_radixsort and npy_aradixsort going by the quicksort.c.src file... Can I have an example of how I'd dispatch based on dtype? I assume I have to do something with PyArray_DESCR.

andravin · 2018-12-18T23:52:55Z

Will you also support radix sort for floats? One can make this work by deriving keys from the floats, as documented here: https://github.com/eloj/radix-sorting#float-keys

numpy/core/src/npysort/radixsort.c.src

hameerabbasi · 2018-12-19T20:59:38Z

Will you also support radix sort for floats? One can make this work by deriving keys from the floats, as documented here: https://github.com/eloj/radix-sorting#float-keys

I can do float32/float64 easily but I'm not sure if there are integer equivalents to complex*/float80/float128. Those might require multiple passes and/or copies.

mattip · 2018-12-21T11:25:02Z

an example of how I'd dispatch based on dtype

You would have to check the dtypes in PyArray_ArgSort, PyArray_ArgPartition, PyArray_Sort, and PyArray_Partition. You can use the PyArray_ISINTEGER macro.

hameerabbasi · 2018-12-24T10:49:07Z

You can use the PyArray_ISINTEGER macro.

I still need to check for bool and float/double. Also, I need to decide the exact method based on the exact dtype.

mattip · 2018-12-24T12:15:42Z

You attach the sort methods to the dtype, so this line will return NULL if that dtype cannot do radix sort

hameerabbasi · 2018-12-24T13:30:42Z

Next, I think, comes attaching kind='radixsort' to the sort and argsort functions, writing tests, benchmarks and docs.

It'd also be nice to know where the default sort is decided, if the benchmarks show an improvement and we want to change it.

mattip · 2018-12-24T13:59:40Z

where the default sort is decided

Grep for PyArray_SortkindConverter and NPY_QUICKSORT. The first is used to convert a string to the proper enum. I think there are some other places that used NY_QUICKSORT directly.

numpy/core/src/npysort/radixsort.c.src

andravin

Might need to zero-initialize cnt array before use.

numpy/core/src/npysort/radixsort.c.src

hameerabbasi · 2018-12-25T15:44:39Z

Okay, while this works, it's massively slower than quicksort (by about a factor of 3). I went back to the reference repo: https://github.com/eloj/radix-sorting

And it seems he's doing two things different:

Multiple passes for 64-bit and more, along with a bit of OOP to decide the key function. Can be simulated by function pointer, but I highly doubt it's any faster/slower than what we're doing.
In his benchmark, he excludes the cost of allocating the auxiliary array.

mattip · 2018-12-25T15:50:32Z

allocating the auxiliary array

Can you do this once for each int dtype and store it somewhere?

hameerabbasi · 2018-12-25T15:51:59Z

Can you do this once for each int dtype and store it somewhere?

It's the exact same size as the input array, so unfortunately not.

EDIT: Here's the original implementation. Everything besides the auxiliary array is on the stack, so we avoid allocations.

andravin · 2018-12-25T22:24:48Z

Can you do this once for each int dtype and store it somewhere?

It's the exact same size as the input array, so unfortunately not.

EDIT: Here's the original implementation. Everything besides the auxiliary array is on the stack, so we avoid allocations.

Usually we should avoid using a malloc in a numeric function. Often it is better to change the API, as the original implementation you reference does, so that any workspace or output buffer is passed as an argument. The user manages the memory and potentially re-uses buffers instead of allocating them repeatedly.

The next level of abstraction, which hides memory management from the user but still affords reuse, is to wrap such an allocation-free inner function with one that uses a memory pool to allocate workspace/output buffers. This makes memory allocation very fast for frequently used buffer sizes.

Unfortunately, I am not familiar enough with numpy internals to understand if these design patterns are already in use elsewhere.

Also, this is a nice C/C++ idiom for zero-initialization of a heap allocated array:

size_t cnt0[256] = { 0 };

It might generate the same code as memset, but it is much cleaner.

[edited for clarity]

andravin · 2018-12-25T22:42:30Z

Would be nice to have multi-core support so that all available memory bandwidth can be utilized.

Would be nice to try 11-bit (2048 entry) counters in radix-sort instead of 8-bit (256 entry). These require 3 passes for 32-bit keys instead of 4, but still fit in L1 cache.

Would be nice to have integer and float merge-sort that use SIMD instructions and compare with radix-sort.

This is a lot of stuff, I am just putting it is on the radar. Cheers.

hameerabbasi · 2018-12-26T11:28:28Z

The next level of abstraction, which hides memory management from the user but still affords reuse, is to wrap such an allocation-free inner function with one that uses a memory pool to allocate workspace/output buffers. This makes memory allocation very fast for frequently used buffer sizes.

From what I understand the pymalloc allocator already does this. @mattip, do we have the PyObjectArenaAllocator held somewhere or would I need to use the getter function?

EDIT: Transposing two loops and getting rid of some unnecessary arrays leads to a huge speedup: 2x over quicksort on a 10 ** 6-sized array.

I'll also add in some other optimisations from this implementation, which is the one the author uses in the benchmark.

hameerabbasi · 2018-12-26T11:52:51Z

Usually we should avoid using a malloc in a numeric function. Often it is better to change the API, as the original implementation you reference does, so that any workspace or output buffer is passed as an argument. The user manages the memory and potentially re-uses buffers instead of allocating them repeatedly.

I'm not sure how much effort this would take, but the user doesn't usually manage buffers in NumPy, the exception being out kwargs, which are rarely used. I agree with the mixing sentiment, I was hoping to avoid this myself, but the sorting code is written in a way that would disallow this. The NOT_USED pointer is a PyArray... which means it can't be used for passing in the auxiliary array. I would need to change all sorts to change this.

Would be nice to have multi-core support so that all available memory bandwidth can be utilized.

In general, NumPy doesn't use multi-core anywhere in its codebase.

Would be nice to try 11-bit (2048 entry) counters in radix-sort instead of 8-bit (256 entry). These require 3 passes for 32-bit keys instead of 4, but still fit in L1 cache.

Apparently, the author of the original repo tried this and found it wasn't all it was hyped up to be. I'm assuming it has to do with byte alignment being more efficient.

Would be nice to have integer and float merge-sort that use SIMD instructions and compare with radix-sort.

This would either require different compiler flags or ASM code, which has no precedent in NumPy.

Also, this is a nice C/C++ idiom for zero-initialization of a heap allocated array:

size_t cnt0[256] = { 0 };

Done!

jakirkham · 2018-12-26T21:00:31Z

@jschueller probably can give you some tips on how to structure the C code to get the compiler to make use of SIMD instructions.

hameerabbasi · 2018-12-26T21:22:04Z

I added more optimisations:

It sorts by just the columns needed instead of all of them.
It terminates early on already sorted.
Bit twiddling instead of arithmetic should make operations faster.

~~The problem is somehow these don't work currently. :/~~

Edit: They do work. 😄 Apparently, Jupyter Lab needs a python setup.py build_ext -i before it imports the most recent version, whereas runtests.py has its own version.

Edit 2: CI is broken because of warnings, tests pass.

andravin · 2018-12-26T22:26:09Z

@jschueller probably can give you some tips on how to structure the C code to get the compiler to make use of SIMD instructions.

After some digging I found https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/simd.inc.src

/*
 * This file is for the definitions of simd vectorized operations.
 *
 * Currently contains sse2 functions that are built on amd64, x32 or
 * non-generic builds (CFLAGS=-march=...)
 * In future it may contain other instruction sets like AVX or NEON detected
 * at runtime in which case it needs to be included indirectly via a file
 * compiled with special options (or use gcc target attributes) so the binary
 * stays portable.
 */

It uses the Intel intrinsics, which are basically a C API for SIMD instructions:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/

Here is a paper that describes how to sort numbers using the 128-bit SSE intrinsics:
http://www.vldb.org/pvldb/1/1454171.pdf

So SIMD accelerated merge-sort appears possible by adding the vector sort primitives documented in the paper to umath/simd.inc.src.

andravin · 2018-12-26T23:12:12Z

Would be nice to have multi-core support so that all available memory bandwidth can be utilized.

In general, NumPy doesn't use multi-core anywhere in its codebase.

I see. But NumPy already depends on multi-threaded BLAS libraries. I actually don't see why NumPy would not also enable multi-threading elsewhere using the same OpenMP style environment variables that BLAS libraries use for threading control. Default to 1 thread for backwards compatibility (otherwise users who have already built multi-processing into their applications will be impacted).

Edit: #11826 seeks a numpy API to control the number of BLAS threads, so that applications that are tuned for multi-processing can force it to 1 (the trick is there is no standard BLAS API for setting the number of threads, and there is no way to know which BLAS library that numpy is using). Of course this same API could also control the number of threads available to any numpy internal functions. I have yet to see that the lack of threading in numpy internals is a policy, rather historically most of the heavy lifting is deferred to BLAS libraries which are themselves multi-threaded.

hameerabbasi · 2018-12-27T13:06:31Z

cc @mattip I think this is ready, a review would be awesome. I have to update docs and changelog, but otherwise this is ready.

hameerabbasi · 2018-12-27T13:20:54Z

The test failures don't seem to be related to this PR, as a best guess.

hameerabbasi · 2019-04-08T17:19:48Z

Benchmarks above. :-) It'd be nice to have it as an option, I agree. But the issue is, it's impossible to add a sort without breaking the ABI, currently this PR replaces mergesort (which is now timsort) for relevant dtypes.

hameerabbasi · 2019-04-08T17:42:22Z

(I still have the branch around, just doing some housekeeping on PRs, I'd be willing to work on this if NumPy decided their ABI was okay to break)

rgommers · 2019-04-08T18:07:18Z

I'd be willing to work on this if NumPy decided their ABI was okay to break)

not for this I'd say, however you can turn it around too - when we eventually break the ABI then that's a good moment to add radixsort

ahaldane · 2019-04-08T18:58:49Z

I added it to the list: https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release

charris · 2019-04-08T20:13:48Z

A test that seems to be missing from the benchmarks is a simple random array. I noted that when I put them in, so maybe we should add one. The current tests are for arrays that are already ordered to some extent, which is what timsort is designed to deal with.

hameerabbasi · 2019-04-08T20:32:26Z

A test that seems to be missing from the benchmarks is a simple random array. I noted that when I put them in, so maybe we should add one. The current tests are for arrays that are already ordered to some extent, which is what timsort is designed to deal with.

Opened a PR for this, #13287.

hameerabbasi · 2019-04-09T10:26:01Z

Here are the benchmarks for randomly-shuffled arrays:

This branch:

               merge   int16            ('random',)              157±0.6μs
               merge   int32            ('random',)               168±2μs
               merge   int64            ('random',)               200±4μs
               merge   int16            ('random',)              51.7±0.2μs
               merge   int32            ('random',)               99.2±6μs
               merge   int64            ('random',)              202±0.5μs
               merge   int16            ('random',)              10.5±0.1μs
               merge   int32            ('random',)              10.6±0.3μs
               merge   int64            ('random',)              8.96±0.4μs

master

               merge   int16            ('random',)               900±5μs
               merge   int32            ('random',)               893±20μs
               merge   int64            ('random',)               903±5μs
               merge   int16            ('random',)               507±9μs
               merge   int32            ('random',)               502±4μs
               merge   int64            ('random',)               522±6μs
               merge   int16            ('random',)              11.1±0.1μs
               merge   int32            ('random',)              9.74±0.1μs
               merge   int64            ('random',)             10.8±0.04μs

Which proves that radix sort is faster for truly random data (reversed is its worst case).

Now, there are two ways to look at this:

Statistically, (assuming a randomly shuffled array), most arrays have little to no sorted area. (favors this PR)
Practically, it could be that arrays aren't really randomly shuffled, and instead have sorted areas. (favors timsort)

Which is true? There is absolutely no way to know without some usage data. My instinct relies on the latter, as I think stable sorts are usually used to sort in succession (or preserve some kind of partial order), and so do have sorted areas.

In any case, this should be discussed.

hameerabbasi · 2019-04-09T10:37:27Z

Here's the full set of benchmarks:

This branch:

python runtests.py --bench sort
Building, see build.log...
Build OK
· No executable found for python 3.6
· Discovering benchmarks
· Running 4 total benchmarks (1 commits * 1 environments * 4 benchmarks)
[  0.00%] ·· Benchmarking existing-py_Users_hameerabbasi_anaconda_envs_numpy_bin_python
[ 12.50%] ··· Running (bench_function_base.Sort.time_argsort--)...
[ 50.00%] ··· Running (bench_function_base.Sort.time_sort_worst--).
[ 62.50%] ··· bench_function_base.Sort.time_argsort                           ok
[ 62.50%] ··· ======= ======= ================================ =============
                kind   dtype             array_type
              ------- ------- -------------------------------- -------------
               merge   int16            ('random',)               152±2μs
               merge   int16            ('ordered',)             14.2±0.3μs
               merge   int16           ('reversed',)             138±0.4μs
               merge   int16            ('uniform',)            13.9±0.08μs
               merge   int16        ('sorted_block', 10)          183±2μs
               merge   int16       ('sorted_block', 100)          174±2μs
               merge   int16       ('sorted_block', 1000)         135±1μs
               merge   int16       ('swapped_pair', 0.01)         142±3μs
               merge   int16       ('swapped_pair', 0.1)         147±0.2μs
               merge   int16       ('swapped_pair', 0.5)          153±2μs
               merge   int16   ('random_unsorted_area', 0.5)      137±1μs
               merge   int16   ('random_unsorted_area', 0.1)      144±2μs
               merge   int16   ('random_unsorted_area', 0.01)     148±1μs
               merge   int16        ('random_bubble', 1)          146±4μs
               merge   int16        ('random_bubble', 5)         140±0.4μs
               merge   int16       ('random_bubble', 10)          144±3μs
               merge   int32            ('random',)               164±1μs
               merge   int32            ('ordered',)            16.2±0.08μs
               merge   int32           ('reversed',)              155±5μs
               merge   int32            ('uniform',)             15.9±0.1μs
               merge   int32        ('sorted_block', 10)          211±9μs
               merge   int32       ('sorted_block', 100)          187±5μs
               merge   int32       ('sorted_block', 1000)         148±2μs
               merge   int32       ('swapped_pair', 0.01)        148±0.8μs
               merge   int32       ('swapped_pair', 0.1)          162±3μs
               merge   int32       ('swapped_pair', 0.5)          164±1μs
               merge   int32   ('random_unsorted_area', 0.5)      152±4μs
               merge   int32   ('random_unsorted_area', 0.1)      147±2μs
               merge   int32   ('random_unsorted_area', 0.01)     158±4μs
               merge   int32        ('random_bubble', 1)         155±0.8μs
               merge   int32        ('random_bubble', 5)          153±2μs
               merge   int32       ('random_bubble', 10)          149±3μs
               merge   int64            ('random',)               210±8μs
               merge   int64            ('ordered',)             16.2±0.2μs
               merge   int64           ('reversed',)              184±2μs
               merge   int64            ('uniform',)             16.7±0.2μs
               merge   int64        ('sorted_block', 10)          236±6μs
               merge   int64       ('sorted_block', 100)          217±1μs
               merge   int64       ('sorted_block', 1000)         180±1μs
               merge   int64       ('swapped_pair', 0.01)         195±7μs
               merge   int64       ('swapped_pair', 0.1)          194±4μs
               merge   int64       ('swapped_pair', 0.5)          191±7μs
               merge   int64   ('random_unsorted_area', 0.5)     186±0.6μs
               merge   int64   ('random_unsorted_area', 0.1)      190±5μs
               merge   int64   ('random_unsorted_area', 0.01)    194±0.6μs
               merge   int64        ('random_bubble', 1)          196±4μs
               merge   int64        ('random_bubble', 5)          193±3μs
               merge   int64       ('random_bubble', 10)          196±3μs
              ======= ======= ================================ =============

[ 75.00%] ··· bench_function_base.Sort.time_sort                              ok
[ 75.00%] ··· ======= ======= ================================ =============
                kind   dtype             array_type
              ------- ------- -------------------------------- -------------
               merge   int16            ('random',)               52.0±1μs
               merge   int16            ('ordered',)             14.0±0.3μs
               merge   int16           ('reversed',)             53.9±0.3μs
               merge   int16            ('uniform',)             14.4±0.1μs
               merge   int16        ('sorted_block', 10)          50.7±1μs
               merge   int16       ('sorted_block', 100)         52.2±0.9μs
               merge   int16       ('sorted_block', 1000)        54.5±0.8μs
               merge   int16       ('swapped_pair', 0.01)        54.6±0.2μs
               merge   int16       ('swapped_pair', 0.1)         53.4±0.5μs
               merge   int16       ('swapped_pair', 0.5)         51.1±0.5μs
               merge   int16   ('random_unsorted_area', 0.5)      52.5±1μs
               merge   int16   ('random_unsorted_area', 0.1)     57.0±0.7μs
               merge   int16   ('random_unsorted_area', 0.01)    63.0±0.7μs
               merge   int16        ('random_bubble', 1)          63.1±2μs
               merge   int16        ('random_bubble', 5)         57.7±0.7μs
               merge   int16       ('random_bubble', 10)          55.9±1μs
               merge   int32            ('random',)               94.3±5μs
               merge   int32            ('ordered',)             15.2±0.1μs
               merge   int32           ('reversed',)              123±10μs
               merge   int32            ('uniform',)             14.9±0.3μs
               merge   int32        ('sorted_block', 10)          131±2μs
               merge   int32       ('sorted_block', 100)          129±3μs
               merge   int32       ('sorted_block', 1000)         120±10μs
               merge   int32       ('swapped_pair', 0.01)        108±0.4μs
               merge   int32       ('swapped_pair', 0.1)          119±8μs
               merge   int32       ('swapped_pair', 0.5)         100±0.3μs
               merge   int32   ('random_unsorted_area', 0.5)      122±10μs
               merge   int32   ('random_unsorted_area', 0.1)      114±2μs
               merge   int32   ('random_unsorted_area', 0.01)     120±2μs
               merge   int32        ('random_bubble', 1)          118±2μs
               merge   int32        ('random_bubble', 5)          113±1μs
               merge   int32       ('random_bubble', 10)          113±2μs
               merge   int64            ('random',)               199±4μs
               merge   int64            ('ordered',)            14.4±0.06μs
               merge   int64           ('reversed',)              179±5μs
               merge   int64            ('uniform',)             15.0±0.2μs
               merge   int64        ('sorted_block', 10)          234±4μs
               merge   int64       ('sorted_block', 100)          221±9μs
               merge   int64       ('sorted_block', 1000)         177±3μs
               merge   int64       ('swapped_pair', 0.01)         179±2μs
               merge   int64       ('swapped_pair', 0.1)          189±3μs
               merge   int64       ('swapped_pair', 0.5)         194±0.6μs
               merge   int64   ('random_unsorted_area', 0.5)      182±2μs
               merge   int64   ('random_unsorted_area', 0.1)     177±0.9μs
               merge   int64   ('random_unsorted_area', 0.01)     181±2μs
               merge   int64        ('random_bubble', 1)          189±2μs
               merge   int64        ('random_bubble', 5)          185±6μs
               merge   int64       ('random_bubble', 10)          182±5μs
              ======= ======= ================================ =============

[ 87.50%] ··· bench_function_base.Sort.time_sort_inplace                      ok
[ 87.50%] ··· ======= ======= ================================ =============
                kind   dtype             array_type
              ------- ------- -------------------------------- -------------
               merge   int16            ('random',)             10.7±0.07μs
               merge   int16            ('ordered',)            10.4±0.07μs
               merge   int16           ('reversed',)             10.7±0.2μs
               merge   int16            ('uniform',)             10.3±0.1μs
               merge   int16        ('sorted_block', 10)        10.4±0.09μs
               merge   int16       ('sorted_block', 100)         10.5±0.3μs
               merge   int16       ('sorted_block', 1000)        10.2±0.1μs
               merge   int16       ('swapped_pair', 0.01)       10.4±0.07μs
               merge   int16       ('swapped_pair', 0.1)         10.6±0.2μs
               merge   int16       ('swapped_pair', 0.5)         10.6±0.2μs
               merge   int16   ('random_unsorted_area', 0.5)     10.7±0.4μs
               merge   int16   ('random_unsorted_area', 0.1)    10.4±0.03μs
               merge   int16   ('random_unsorted_area', 0.01)    10.7±0.1μs
               merge   int16        ('random_bubble', 1)         10.7±0.2μs
               merge   int16        ('random_bubble', 5)         10.3±0.2μs
               merge   int16       ('random_bubble', 10)         10.5±0.2μs
               merge   int32            ('random',)              10.7±0.2μs
               merge   int32            ('ordered',)            10.5±0.04μs
               merge   int32           ('reversed',)            10.5±0.07μs
               merge   int32            ('uniform',)             10.7±0.1μs
               merge   int32        ('sorted_block', 10)        10.4±0.03μs
               merge   int32       ('sorted_block', 100)         10.9±0.2μs
               merge   int32       ('sorted_block', 1000)       10.5±0.04μs
               merge   int32       ('swapped_pair', 0.01)        10.6±0.2μs
               merge   int32       ('swapped_pair', 0.1)         10.7±0.3μs
               merge   int32       ('swapped_pair', 0.5)         10.5±0.1μs
               merge   int32   ('random_unsorted_area', 0.5)     10.8±0.2μs
               merge   int32   ('random_unsorted_area', 0.1)     10.6±0.2μs
               merge   int32   ('random_unsorted_area', 0.01)    10.8±0.4μs
               merge   int32        ('random_bubble', 1)         10.6±0.1μs
               merge   int32        ('random_bubble', 5)         10.5±0.1μs
               merge   int32       ('random_bubble', 10)         10.7±0.1μs
               merge   int64            ('random',)              8.48±0.2μs
               merge   int64            ('ordered',)            8.30±0.05μs
               merge   int64           ('reversed',)             8.63±0.1μs
               merge   int64            ('uniform',)             8.36±0.2μs
               merge   int64        ('sorted_block', 10)        8.50±0.05μs
               merge   int64       ('sorted_block', 100)        8.48±0.09μs
               merge   int64       ('sorted_block', 1000)        8.64±0.2μs
               merge   int64       ('swapped_pair', 0.01)       8.44±0.08μs
               merge   int64       ('swapped_pair', 0.1)        8.46±0.02μs
               merge   int64       ('swapped_pair', 0.5)         9.47±0.1μs
               merge   int64   ('random_unsorted_area', 0.5)     8.51±0.2μs
               merge   int64   ('random_unsorted_area', 0.1)     8.94±0.5μs
               merge   int64   ('random_unsorted_area', 0.01)   8.50±0.05μs
               merge   int64        ('random_bubble', 1)         8.45±0.2μs
               merge   int64        ('random_bubble', 5)        8.32±0.05μs
               merge   int64       ('random_bubble', 10)        8.35±0.07μs
              ======= ======= ================================ =============

[100.00%] ··· bench_function_base.Sort.time_sort_worst                  96.1±2ms

master:

python runtests.py --bench sort
Building, see build.log...
Build OK
· No executable found for python 3.6
· Discovering benchmarks
· Running 4 total benchmarks (1 commits * 1 environments * 4 benchmarks)
[  0.00%] ·· Benchmarking existing-py_Users_hameerabbasi_anaconda_envs_numpy_bin_python
[ 12.50%] ··· Running (bench_function_base.Sort.time_argsort--)...
[ 50.00%] ··· Running (bench_function_base.Sort.time_sort_worst--).
[ 62.50%] ··· bench_function_base.Sort.time_argsort                           ok
[ 62.50%] ··· ======= ======= ================================ =============
                kind   dtype             array_type
              ------- ------- -------------------------------- -------------
               merge   int16            ('random',)               883±3μs
               merge   int16            ('ordered',)             16.2±0.1μs
               merge   int16           ('reversed',)             21.4±0.5μs
               merge   int16            ('uniform',)             15.8±0.1μs
               merge   int16        ('sorted_block', 10)          207±4μs
               merge   int16       ('sorted_block', 100)         146±0.4μs
               merge   int16       ('sorted_block', 1000)        78.1±0.9μs
               merge   int16       ('swapped_pair', 0.01)        133±0.4μs
               merge   int16       ('swapped_pair', 0.1)          364±5μs
               merge   int16       ('swapped_pair', 0.5)          727±3μs
               merge   int16   ('random_unsorted_area', 0.5)      172±2μs
               merge   int16   ('random_unsorted_area', 0.1)     35.6±0.1μs
               merge   int16   ('random_unsorted_area', 0.01)   18.4±0.04μs
               merge   int16        ('random_bubble', 1)         17.8±0.3μs
               merge   int16        ('random_bubble', 5)         24.7±0.2μs
               merge   int16       ('random_bubble', 10)         36.1±0.2μs
               merge   int32            ('random',)               885±20μs
               merge   int32            ('ordered',)            15.9±0.05μs
               merge   int32           ('reversed',)             21.8±0.4μs
               merge   int32            ('uniform',)             16.4±0.4μs
               merge   int32        ('sorted_block', 10)          207±7μs
               merge   int32       ('sorted_block', 100)          134±1μs
               merge   int32       ('sorted_block', 1000)        74.5±0.5μs
               merge   int32       ('swapped_pair', 0.01)         140±5μs
               merge   int32       ('swapped_pair', 0.1)          366±5μs
               merge   int32       ('swapped_pair', 0.5)          733±5μs
               merge   int32   ('random_unsorted_area', 0.5)      177±3μs
               merge   int32   ('random_unsorted_area', 0.1)      41.1±1μs
               merge   int32   ('random_unsorted_area', 0.01)    18.6±0.2μs
               merge   int32        ('random_bubble', 1)         18.3±0.3μs
               merge   int32        ('random_bubble', 5)         26.7±0.1μs
               merge   int32       ('random_bubble', 10)         41.4±0.9μs
               merge   int64            ('random',)               877±5μs
               merge   int64            ('ordered',)            16.6±0.09μs
               merge   int64           ('reversed',)             21.5±0.8μs
               merge   int64            ('uniform',)             16.7±0.3μs
               merge   int64        ('sorted_block', 10)          202±4μs
               merge   int64       ('sorted_block', 100)         134±0.9μs
               merge   int64       ('sorted_block', 1000)         72.7±1μs
               merge   int64       ('swapped_pair', 0.01)         138±2μs
               merge   int64       ('swapped_pair', 0.1)          366±7μs
               merge   int64       ('swapped_pair', 0.5)          773±8μs
               merge   int64   ('random_unsorted_area', 0.5)      174±2μs
               merge   int64   ('random_unsorted_area', 0.1)     40.0±0.8μs
               merge   int64   ('random_unsorted_area', 0.01)   18.1±0.08μs
               merge   int64        ('random_bubble', 1)         18.8±0.3μs
               merge   int64        ('random_bubble', 5)         24.7±0.4μs
               merge   int64       ('random_bubble', 10)         40.4±0.8μs
              ======= ======= ================================ =============

[ 75.00%] ··· bench_function_base.Sort.time_sort                              ok
[ 75.00%] ··· ======= ======= ================================ =============
                kind   dtype             array_type
              ------- ------- -------------------------------- -------------
               merge   int16            ('random',)               502±6μs
               merge   int16            ('ordered',)             14.5±0.2μs
               merge   int16           ('reversed',)            18.8±0.05μs
               merge   int16            ('uniform',)            14.4±0.06μs
               merge   int16        ('sorted_block', 10)          337±6μs
               merge   int16       ('sorted_block', 100)          255±5μs
               merge   int16       ('sorted_block', 1000)        141±0.8μs
               merge   int16       ('swapped_pair', 0.01)         256±2μs
               merge   int16       ('swapped_pair', 0.1)          390±2μs
               merge   int16       ('swapped_pair', 0.5)          490±8μs
               merge   int16   ('random_unsorted_area', 0.5)      136±1μs
               merge   int16   ('random_unsorted_area', 0.1)     36.2±0.6μs
               merge   int16   ('random_unsorted_area', 0.01)    17.1±0.2μs
               merge   int16        ('random_bubble', 1)         16.8±0.3μs
               merge   int16        ('random_bubble', 5)         25.0±0.5μs
               merge   int16       ('random_bubble', 10)         35.4±0.5μs
               merge   int32            ('random',)               515±10μs
               merge   int32            ('ordered',)            13.8±0.09μs
               merge   int32           ('reversed',)             20.3±0.2μs
               merge   int32            ('uniform',)             14.6±0.4μs
               merge   int32        ('sorted_block', 10)          341±2μs
               merge   int32       ('sorted_block', 100)          258±7μs
               merge   int32       ('sorted_block', 1000)        145±0.8μs
               merge   int32       ('swapped_pair', 0.01)         259±2μs
               merge   int32       ('swapped_pair', 0.1)          390±10μs
               merge   int32       ('swapped_pair', 0.5)          497±5μs
               merge   int32   ('random_unsorted_area', 0.5)     140±0.2μs
               merge   int32   ('random_unsorted_area', 0.1)     35.5±0.3μs
               merge   int32   ('random_unsorted_area', 0.01)    16.5±0.2μs
               merge   int32        ('random_bubble', 1)        16.0±0.09μs
               merge   int32        ('random_bubble', 5)         25.5±0.6μs
               merge   int32       ('random_bubble', 10)         36.4±0.2μs
               merge   int64            ('random',)               526±10μs
               merge   int64            ('ordered',)             17.6±0.3μs
               merge   int64           ('reversed',)            21.9±0.09μs
               merge   int64            ('uniform',)             17.4±0.2μs
               merge   int64        ('sorted_block', 10)          321±3μs
               merge   int64       ('sorted_block', 100)          262±4μs
               merge   int64       ('sorted_block', 1000)         149±4μs
               merge   int64       ('swapped_pair', 0.01)         275±5μs
               merge   int64       ('swapped_pair', 0.1)          388±4μs
               merge   int64       ('swapped_pair', 0.5)          505±6μs
               merge   int64   ('random_unsorted_area', 0.5)      142±1μs
               merge   int64   ('random_unsorted_area', 0.1)     40.4±0.5μs
               merge   int64   ('random_unsorted_area', 0.01)    19.3±0.5μs
               merge   int64        ('random_bubble', 1)         19.3±0.2μs
               merge   int64        ('random_bubble', 5)         27.3±0.2μs
               merge   int64       ('random_bubble', 10)         40.9±0.3μs
              ======= ======= ================================ =============

[ 87.50%] ··· bench_function_base.Sort.time_sort_inplace                      ok
[ 87.50%] ··· ======= ======= ================================ =============
                kind   dtype             array_type
              ------- ------- -------------------------------- -------------
               merge   int16            ('random',)              11.0±0.1μs
               merge   int16            ('ordered',)             10.7±0.2μs
               merge   int16           ('reversed',)             10.5±0.2μs
               merge   int16            ('uniform',)            10.6±0.08μs
               merge   int16        ('sorted_block', 10)         10.7±0.1μs
               merge   int16       ('sorted_block', 100)         10.8±0.3μs
               merge   int16       ('sorted_block', 1000)       10.3±0.05μs
               merge   int16       ('swapped_pair', 0.01)       10.9±0.08μs
               merge   int16       ('swapped_pair', 0.1)        10.6±0.08μs
               merge   int16       ('swapped_pair', 0.5)        10.8±0.08μs
               merge   int16   ('random_unsorted_area', 0.5)    10.3±0.09μs
               merge   int16   ('random_unsorted_area', 0.1)     10.5±0.2μs
               merge   int16   ('random_unsorted_area', 0.01)    10.4±0.2μs
               merge   int16        ('random_bubble', 1)        10.6±0.05μs
               merge   int16        ('random_bubble', 5)        10.5±0.08μs
               merge   int16       ('random_bubble', 10)         10.3±0.2μs
               merge   int32            ('random',)             9.64±0.04μs
               merge   int32            ('ordered',)             9.55±0.5μs
               merge   int32           ('reversed',)             9.72±0.5μs
               merge   int32            ('uniform',)            9.17±0.06μs
               merge   int32        ('sorted_block', 10)        9.92±0.08μs
               merge   int32       ('sorted_block', 100)         9.95±0.5μs
               merge   int32       ('sorted_block', 1000)        9.53±0.2μs
               merge   int32       ('swapped_pair', 0.01)       9.58±0.06μs
               merge   int32       ('swapped_pair', 0.1)        10.1±0.08μs
               merge   int32       ('swapped_pair', 0.5)        9.66±0.06μs
               merge   int32   ('random_unsorted_area', 0.5)     9.71±0.1μs
               merge   int32   ('random_unsorted_area', 0.1)    9.20±0.05μs
               merge   int32   ('random_unsorted_area', 0.01)   9.62±0.03μs
               merge   int32        ('random_bubble', 1)         9.63±0.5μs
               merge   int32        ('random_bubble', 5)        9.65±0.06μs
               merge   int32       ('random_bubble', 10)         9.23±0.1μs
               merge   int64            ('random',)             11.4±0.03μs
               merge   int64            ('ordered',)            10.4±0.06μs
               merge   int64           ('reversed',)             10.6±0.4μs
               merge   int64            ('uniform',)            10.3±0.02μs
               merge   int64        ('sorted_block', 10)         10.8±0.2μs
               merge   int64       ('sorted_block', 100)         11.0±0.2μs
               merge   int64       ('sorted_block', 1000)        10.8±0.3μs
               merge   int64       ('swapped_pair', 0.01)        10.8±0.2μs
               merge   int64       ('swapped_pair', 0.1)         10.9±0.2μs
               merge   int64       ('swapped_pair', 0.5)        10.9±0.04μs
               merge   int64   ('random_unsorted_area', 0.5)     10.8±0.2μs
               merge   int64   ('random_unsorted_area', 0.1)     10.5±0.2μs
               merge   int64   ('random_unsorted_area', 0.01)   10.4±0.07μs
               merge   int64        ('random_bubble', 1)        10.3±0.08μs
               merge   int64        ('random_bubble', 5)         10.4±0.1μs
               merge   int64       ('random_bubble', 10)        10.8±0.09μs
              ======= ======= ================================ =============

[100.00%] ··· bench_function_base.Sort.time_sort_worst                  99.6±2ms

shoyer · 2019-04-09T15:42:25Z

My instinct relies on the latter, as I think stable sorts are usually used to sort in succession (or preserve some kind of partial order), and so do have sorted areas.

I also think this is most likely, so the best default is probably timsort.

That said, I do really like the idea of having an option for radix sort. It's too bad that this seems to break C API compatibility (I don't really understand the nuance of that yet).

charris · 2019-04-09T15:47:04Z

We might also want to test int8. I don't know how often people sort that type, but it would seem to offer the best chance of good performance for radixsort. We should also fix the problems with the PyArray_ArrFuncs structure being setup by downstream users, which is why we cannot just add entries to the end like we do for other structures. See the discussion at #12945.

shoyer · 2019-04-09T15:54:55Z

It looks like there are already some compelling speedups, e.g., by up to 10x for random int16 data. I don't think we need to test it on random int8 data to believe this would speed it up.

hameerabbasi · 2019-04-09T17:27:26Z

not for this I'd say, however you can turn it around too - when we eventually break the ABI then that's a good moment to add radixsort

That's what I meant... I couldn't articulate it too well.

Becnhmarks (just for int8):

This PR:

python runtests.py --bench sort
Building, see build.log...
Build OK
· No executable found for python 3.6
· Discovering benchmarks
· Running 4 total benchmarks (1 commits * 1 environments * 4 benchmarks)
[  0.00%] ·· Benchmarking existing-py_Users_hameerabbasi_anaconda_envs_numpy_bin_python
[ 12.50%] ··· Running (bench_function_base.Sort.time_argsort--)...
[ 50.00%] ··· Running (bench_function_base.Sort.time_sort_worst--).
[ 62.50%] ··· bench_function_base.Sort.time_argsort                           ok
[ 62.50%] ··· ======= ======= ================================ =============
                kind   dtype             array_type
              ------- ------- -------------------------------- -------------
               merge    int8            ('random',)              7.66±0.7ms
               merge    int8            ('ordered',)            6.54±0.08ms
               merge    int8           ('reversed',)             6.77±0.4ms
               merge    int8            ('uniform',)             1.86±0.1ms
               merge    int8        ('sorted_block', 10)         8.39±0.7ms
               merge    int8       ('sorted_block', 100)         7.09±0.8ms
               merge    int8       ('sorted_block', 1000)        7.26±0.9ms
               merge    int8       ('swapped_pair', 0.01)        7.00±0.3ms
               merge    int8       ('swapped_pair', 0.1)         7.61±0.6ms
               merge    int8       ('swapped_pair', 0.5)         7.44±0.4ms
               merge    int8   ('random_unsorted_area', 0.5)     7.75±0.4ms
               merge    int8   ('random_unsorted_area', 0.1)     7.39±0.2ms
               merge    int8   ('random_unsorted_area', 0.01)    6.64±0.1ms
               merge    int8        ('random_bubble', 1)         6.82±0.3ms
               merge    int8        ('random_bubble', 5)        6.46±0.07ms
               merge    int8       ('random_bubble', 10)        6.56±0.07ms
              ======= ======= ================================ =============

[ 75.00%] ··· bench_function_base.Sort.time_sort                              ok
[ 75.00%] ··· ======= ======= ================================ =============
                kind   dtype             array_type
              ------- ------- -------------------------------- -------------
               merge    int8            ('random',)             2.51±0.04ms
               merge    int8            ('ordered',)            2.41±0.08ms
               merge    int8           ('reversed',)             2.38±0.1ms
               merge    int8            ('uniform',)              774±40μs
               merge    int8        ('sorted_block', 10)        2.39±0.08ms
               merge    int8       ('sorted_block', 100)        2.36±0.04ms
               merge    int8       ('sorted_block', 1000)       2.30±0.07ms
               merge    int8       ('swapped_pair', 0.01)        2.51±0.1ms
               merge    int8       ('swapped_pair', 0.1)        2.51±0.07ms
               merge    int8       ('swapped_pair', 0.5)        2.55±0.06ms
               merge    int8   ('random_unsorted_area', 0.5)    2.60±0.05ms
               merge    int8   ('random_unsorted_area', 0.1)    2.64±0.06ms
               merge    int8   ('random_unsorted_area', 0.01)   2.56±0.06ms
               merge    int8        ('random_bubble', 1)         2.49±0.1ms
               merge    int8        ('random_bubble', 5)        2.52±0.01ms
               merge    int8       ('random_bubble', 10)        2.53±0.02ms
              ======= ======= ================================ =============

[ 87.50%] ··· bench_function_base.Sort.time_sort_inplace                      ok
[ 87.50%] ··· ======= ======= ================================ ==========
                kind   dtype             array_type
              ------- ------- -------------------------------- ----------
               merge    int8            ('random',)             830±9μs
               merge    int8            ('ordered',)            815±8μs
               merge    int8           ('reversed',)            814±20μs
               merge    int8            ('uniform',)            697±9μs
               merge    int8        ('sorted_block', 10)        816±7μs
               merge    int8       ('sorted_block', 100)        810±10μs
               merge    int8       ('sorted_block', 1000)       790±5μs
               merge    int8       ('swapped_pair', 0.01)       831±20μs
               merge    int8       ('swapped_pair', 0.1)        820±9μs
               merge    int8       ('swapped_pair', 0.5)        823±8μs
               merge    int8   ('random_unsorted_area', 0.5)    815±3μs
               merge    int8   ('random_unsorted_area', 0.1)    807±5μs
               merge    int8   ('random_unsorted_area', 0.01)   805±9μs
               merge    int8        ('random_bubble', 1)        824±20μs
               merge    int8        ('random_bubble', 5)        836±6μs
               merge    int8       ('random_bubble', 10)        829±10μs
              ======= ======= ================================ ==========

[100.00%] ··· bench_function_base.Sort.time_sort_worst                   102±3ms

master:

python runtests.py --bench sort
Building, see build.log...
Build OK
· No executable found for python 3.6
· Discovering benchmarks
· Running 4 total benchmarks (1 commits * 1 environments * 4 benchmarks)
[  0.00%] ·· Benchmarking existing-py_Users_hameerabbasi_anaconda_envs_numpy_bin_python
[ 12.50%] ··· Running (bench_function_base.Sort.time_argsort--)..
[ 37.50%] ··· Running (bench_function_base.Sort.time_sort_inplace--)..
[ 62.50%] ··· bench_function_base.Sort.time_argsort                           ok
[ 62.50%] ··· ======= ======= ================================ =============
                kind   dtype             array_type
              ------- ------- -------------------------------- -------------
               merge    int8            ('random',)               94.6±2ms
               merge    int8            ('ordered',)              30.0±1ms
               merge    int8           ('reversed',)             30.2±0.6ms
               merge    int8            ('uniform',)            2.09±0.02ms
               merge    int8        ('sorted_block', 10)         64.8±0.8ms
               merge    int8       ('sorted_block', 100)         55.9±0.7ms
               merge    int8       ('sorted_block', 1000)        57.2±0.4ms
               merge    int8       ('swapped_pair', 0.01)        45.7±0.2ms
               merge    int8       ('swapped_pair', 0.1)          70.3±1ms
               merge    int8       ('swapped_pair', 0.5)          91.3±2ms
               merge    int8   ('random_unsorted_area', 0.5)     53.5±0.8ms
               merge    int8   ('random_unsorted_area', 0.1)     36.4±0.3ms
               merge    int8   ('random_unsorted_area', 0.01)    30.9±0.6ms
               merge    int8        ('random_bubble', 1)         30.3±0.6ms
               merge    int8        ('random_bubble', 5)          29.4±1ms
               merge    int8       ('random_bubble', 10)         30.4±0.7ms
              ======= ======= ================================ =============

[ 75.00%] ··· bench_function_base.Sort.time_sort                              ok
[ 75.00%] ··· ======= ======= ================================ =============
                kind   dtype             array_type
              ------- ------- -------------------------------- -------------
               merge    int8            ('random',)              83.2±0.5ms
               merge    int8            ('ordered',)             29.1±0.2ms
               merge    int8           ('reversed',)             29.3±0.2ms
               merge    int8            ('uniform',)            1.05±0.02ms
               merge    int8        ('sorted_block', 10)         62.1±0.3ms
               merge    int8       ('sorted_block', 100)         53.3±0.3ms
               merge    int8       ('sorted_block', 1000)        54.1±0.3ms
               merge    int8       ('swapped_pair', 0.01)        41.2±0.2ms
               merge    int8       ('swapped_pair', 0.1)         59.8±0.7ms
               merge    int8       ('swapped_pair', 0.5)         80.8±0.3ms
               merge    int8   ('random_unsorted_area', 0.5)     48.1±0.4ms
               merge    int8   ('random_unsorted_area', 0.1)     34.8±0.3ms
               merge    int8   ('random_unsorted_area', 0.01)    30.2±0.2ms
               merge    int8        ('random_bubble', 1)         29.3±0.7ms
               merge    int8        ('random_bubble', 5)         29.4±0.5ms
               merge    int8       ('random_bubble', 10)         29.6±0.2ms
              ======= ======= ================================ =============

[ 87.50%] ··· bench_function_base.Sort.time_sort_inplace                      ok
[ 87.50%] ··· ======= ======= ================================ =============
                kind   dtype             array_type
              ------- ------- -------------------------------- -------------
               merge    int8            ('random',)             8.43±0.05ms
               merge    int8            ('ordered',)            3.40±0.05ms
               merge    int8           ('reversed',)            3.30±0.01ms
               merge    int8            ('uniform',)              899±6μs
               merge    int8        ('sorted_block', 10)        5.96±0.04ms
               merge    int8       ('sorted_block', 100)        5.27±0.04ms
               merge    int8       ('sorted_block', 1000)       5.76±0.01ms
               merge    int8       ('swapped_pair', 0.01)       4.28±0.03ms
               merge    int8       ('swapped_pair', 0.1)        5.79±0.04ms
               merge    int8       ('swapped_pair', 0.5)        7.50±0.05ms
               merge    int8   ('random_unsorted_area', 0.5)    4.86±0.03ms
               merge    int8   ('random_unsorted_area', 0.1)    3.72±0.04ms
               merge    int8   ('random_unsorted_area', 0.01)   3.31±0.01ms
               merge    int8        ('random_bubble', 1)        3.25±0.02ms
               merge    int8        ('random_bubble', 5)        3.24±0.03ms
               merge    int8       ('random_bubble', 10)        3.49±0.03ms
              ======= ======= ================================ =============

[100.00%] ··· bench_function_base.Sort.time_sort_worst                   100±1ms

So you're not far off. 🙂

hameerabbasi · 2019-04-09T17:47:41Z

That said, I do really like the idea of having an option for radix sort. It's too bad that this seems to break C API compatibility (I don't really understand the nuance of that yet).

Not the API, but rather the ABI. It doesn't mean people have to change their code, but that code compiled for older versions of NumPy will no longer work for newer ones. This is because the positioning of some function pointers changed in a struct.

charris · 2019-04-09T23:20:08Z

Looks to me that a good case can be made for radix sort for at least some of the dtypes.

hameerabbasi · 2019-05-11T04:41:42Z

I have rebased, fixed the doc error/warning, and incorporated @charris's change about using this only for small dtypes.

Testing: - np.int8/np.uint8 - np.int16/np.uint16 - np.int32/np.uint32 - np.int64/np.uint64 - np.float32/np.float64 - np.float16/np.longdouble

charris · 2019-05-12T01:28:27Z

close/reopen

charris · 2019-05-12T03:01:32Z

Thanks Hameer.

jakirkham · 2019-05-12T16:33:54Z

Thanks for working on this Hameer. Glad to see it in.

I missed the last part about limiting radix sort to some smaller types, which types is it being used for?

mattip · 2019-05-12T16:43:17Z

It is this line which will choose radix sort for the first 5 types: npy_bool, npy_byte, npy_ubyte, npy_short, npy_ushort

mattip reviewed Dec 18, 2018

View reviewed changes

numpy/core/src/npysort/radixsort.c.src Outdated Show resolved Hide resolved

mattip reviewed Dec 18, 2018

View reviewed changes

numpy/core/src/npysort/radixsort.c.src Outdated Show resolved Hide resolved

andravin reviewed Dec 19, 2018

View reviewed changes

numpy/core/src/npysort/radixsort.c.src Outdated Show resolved Hide resolved

hameerabbasi commented Dec 24, 2018

View reviewed changes

numpy/core/src/npysort/radixsort.c.src Show resolved Hide resolved

andravin reviewed Dec 24, 2018

View reviewed changes

numpy/core/src/npysort/radixsort.c.src Outdated Show resolved Hide resolved

numpy/core/src/npysort/radixsort.c.src Show resolved Hide resolved

This was referenced Dec 26, 2018

Accelerate coordinate selection via keysort zarr-developers/zarr-python#236

Closed

Cython for radix argsort and fast indexing? pydata/sparse#62

Closed

hameerabbasi changed the title ~~[WIP] Implement radix sort~~ ENH: Implement radix sort Dec 27, 2018

hameerabbasi mentioned this pull request Apr 8, 2019

Add benchmark for sorting random array. #13287

Merged

hameerabbasi reopened this Apr 9, 2019

hameerabbasi force-pushed the radix-sort branch 2 times, most recently from d4824a0 to 512e7dd Compare May 11, 2019 04:40

hameerabbasi added 2 commits May 11, 2019 15:03

ENH: Radix sort

12fb101

TST: Test all dtypes for sort.

7f58678

Testing: - np.int8/np.uint8 - np.int16/np.uint16 - np.int32/np.uint32 - np.int64/np.uint64 - np.float32/np.float64 - np.float16/np.longdouble

hameerabbasi force-pushed the radix-sort branch from 2a4ddb5 to 7f58678 Compare May 11, 2019 22:13

charris closed this May 12, 2019

charris reopened this May 12, 2019

charris merged commit 3bdc915 into numpy:master May 12, 2019

hameerabbasi mentioned this pull request Jul 27, 2020

np.intersect1d is several times faster for return_indices=True #16923

Open

ENH: Implement radix sort #12586

ENH: Implement radix sort #12586

Conversation

hameerabbasi commented Dec 18, 2018 • edited Loading

mattip commented Dec 18, 2018

hameerabbasi commented Dec 18, 2018

andravin commented Dec 18, 2018

hameerabbasi commented Dec 19, 2018

mattip commented Dec 21, 2018

hameerabbasi commented Dec 24, 2018

mattip commented Dec 24, 2018

hameerabbasi commented Dec 24, 2018

mattip commented Dec 24, 2018

andravin left a comment

Choose a reason for hiding this comment

hameerabbasi commented Dec 25, 2018

mattip commented Dec 25, 2018

hameerabbasi commented Dec 25, 2018 • edited Loading

andravin commented Dec 25, 2018 • edited Loading

andravin commented Dec 25, 2018 • edited Loading

hameerabbasi commented Dec 26, 2018 • edited Loading

hameerabbasi commented Dec 26, 2018

jakirkham commented Dec 26, 2018

hameerabbasi commented Dec 26, 2018 • edited Loading

andravin commented Dec 26, 2018

andravin commented Dec 26, 2018 • edited Loading

hameerabbasi commented Dec 27, 2018

hameerabbasi commented Dec 27, 2018

hameerabbasi commented Apr 8, 2019

hameerabbasi commented Apr 8, 2019

rgommers commented Apr 8, 2019

ahaldane commented Apr 8, 2019

charris commented Apr 8, 2019

hameerabbasi commented Apr 8, 2019

hameerabbasi commented Apr 9, 2019

hameerabbasi commented Apr 9, 2019

shoyer commented Apr 9, 2019

charris commented Apr 9, 2019

shoyer commented Apr 9, 2019

hameerabbasi commented Apr 9, 2019

hameerabbasi commented Apr 9, 2019 • edited Loading

charris commented Apr 9, 2019

hameerabbasi commented May 11, 2019

charris commented May 12, 2019

charris commented May 12, 2019

jakirkham commented May 12, 2019

mattip commented May 12, 2019

hameerabbasi commented Dec 18, 2018 •

edited

Loading

hameerabbasi commented Dec 25, 2018 •

edited

Loading

andravin commented Dec 25, 2018 •

edited

Loading

andravin commented Dec 25, 2018 •

edited

Loading

hameerabbasi commented Dec 26, 2018 •

edited

Loading

hameerabbasi commented Dec 26, 2018 •

edited

Loading

andravin commented Dec 26, 2018 •

edited

Loading

hameerabbasi commented Apr 9, 2019 •

edited

Loading