Make take and drop more symmetric #1183

primo-ppcg · 2023-06-08T08:33:06Z

Related issue: #1178

Allow take from the end of bytes or indexed (as drop does).
Allow drop from fibers (as take does).
Return table for take from dictionary types.
Allow drop from dictionary types (as take does).
Increase efficiency for take and drop with slices.
Check indexed types before bytes.

It isn't clear to me that there is a common use case for taking and dropping from dictionary types. However, the implementation is relatively straight-forward, particularly for take, so I don't know that it should be explicitly forbidden either.

take for indexed types isn't slowed any by this change, and drop has become significantly faster. take has become slightly slower for bytes objects due to the type check reordering. I believe that indexed types are a more common case, and in particular when efficiency is a concern.

(def ind (range 200))
(repeat 10_000_000 (take 100 ind))
master     8.399327
take-drop  7.313185

(repeat 10_000_000 (drop 100 ind))
master    11.91337
take-drop  6.92769

(repeat 1_000_000 (take-until |(> $ 100) ind))
master     7.40999
take-drop  7.256637

(repeat 1_000_000 (drop-until |(> $ 100) ind))
master     7.251279
take-drop  7.15903

Allow `take` from the end of bytes or indexed (as `drop` does). Allow `drop` from fibers (as `take` does).

Return table for `take` of dictionary types. Allow `drop` of dictionary types.

Increase efficiency for `take` and `drop` with slices. Check indexed types before bytes types.

sogaiu · 2023-06-08T12:27:29Z

I tried the code out. Below are some preliminary remarks.

FWIW, I did something like:

(def ind (range 200))

## take

(def take-results @[])

(for i 0 10
  (def start (os/clock)) 
  (repeat 10_000_000 (take 100 ind))
  (array/push take-results
              (- (os/clock) start)))

So I could see individual batch run results.

For take and drop I got results that looked similar to the original posting.

However, for take-until and drop-until, the results were not so clearly different. One thing I noticed in the results was that there could sometimes be a "spike" result or a result beyond which things took more time consistently, perhaps this was some throttling kicking in.

If I throw out outlier-ish things take-until looks slightly faster on the take-drop branch, but drop-until I couldn't really tell.

In any case, I presume the main point of this PR is symmetry (though of course it's nice to get a boost for some cases and/or maintain a similar level of efficiency for others).

A rather minor thing -- there was a change that got lost in one of zevv's recent PRs and it looks like the same lines are touched here so I wonder if we might make a change: 472ec73#diff-e120880268b4f0f04177470180f50ee0d2c7ac13cb83bb778b6d81efda1cbbccR4171-R4172

If you don't mind, may be we can go for one of the suggested options. No big deal though, just noticed a chance to tweak :)

sogaiu · 2023-06-08T12:57:12Z

If we're going the route of handling dictionaries [1] for drop-until, I wonder if it's worth the predicate getting access to both the key and value per encountered pair. At the moment it looks like to me like the predicate only gets access to the value.

Not sure though, just bringing it up.

If so, I guess it would make sense to do similarly for take-until: b5407ac#diff-e120880268b4f0f04177470180f50ee0d2c7ac13cb83bb778b6d81efda1cbbccR1122

If those changes were made then take-while and drop-while are also affected, but I guess that doesn't involve any code changes.

[1] I'm in favor of leaving the functionality undocumented initially :)

primo-ppcg · 2023-06-08T14:06:38Z

I appreciate the feedback!

drop became significantly slower after #1114, which is understandable because it added extra functionality. My primary goal was to win back some of the lost performance, and also not to slow take any.

drop-until and take-until are essentially the same implementation as before, just restructured slightly. My point in including them is to demonstrate that they haven't been slowed by the refactoring. On my system, both are approximately 0.1µs per call faster for indexed types, and 0.05µs slower per call for bytes, pretty consistently. This is due primarily to the re-ordering of the checks.

For profiling, I use something similar to this: timeit.janet

(defmacro timeit
  ``Similar to `loop`, but outputs performance statistics after completion.``
  [head & body]
  (with-syms [clk cnt elp run]
    ~(do
       (var ,cnt 0)
       (def ,clk (os/clock))
       (loop ,head (++ ,cnt) ,;body)
       (def ,elp (- (os/clock) ,clk))
       (def ,run (/ ,elp ,cnt))
       (cond
         (< ,run 1e-3) (printf "elapsed %fs, %.4gµs/body" ,elp (* ,run 1_000_000))
         (< ,run 1) (printf "elapsed %fs, %.4gms/body" ,elp (* ,run 1_000))
         (printf "elapsed %fs, %.4gs/body" ,elp ,run)))))

(def ind (range 200))
(timeit [:repeat 10_000_000]
  (drop 100 ind))

A rather minor thing -- there was a change that got lost in one of zevv's recent PRs and it looks like the same lines are touched here

I remembered to run my changes through format-file (unlike last time). ~~Should I revert those lines?~~ I see, he had meant to change the (if-not ... (do for (unless ... (and left it in a form the formatter doesn't like 😉).

Done.

@sogaiu

As suggested by @sogaiu @zevv forget to push this change in a recent PR (janet-lang#1175 (comment)). Incidentally, the affected lines were already reformatted in the current PR, via fmt/format-file.

primo-ppcg · 2023-06-08T16:31:40Z

If we're going the route of handling dictionaries [1] for drop-until, I wonder if it's worth the predicate getting access to both the key and value per encountered pair. At the moment it looks like to me like the predicate only gets access to the value.

Not sure though, just bringing it up.

All other functions that accept a predicate (find, filter, etc.) treat dictionary types the same as any other, passing only the value to the predicate. For consistency, these should probably do the same.

primo-ppcg added 3 commits June 7, 2023 18:12

take-drop symmetry

472ec73

Allow `take` from the end of bytes or indexed (as `drop` does). Allow `drop` from fibers (as `take` does).

take-drop dictionaries

b5407ac

Return table for `take` of dictionary types. Allow `drop` of dictionary types.

take-drop performance tweaks

a238391

Increase efficiency for `take` and `drop` with slices. Check indexed types before bytes types.

minor readability change

5317edc

As suggested by @sogaiu @zevv forget to push this change in a recent PR (janet-lang#1175 (comment)). Incidentally, the affected lines were already reformatted in the current PR, via fmt/format-file.

nitpick performance tweak

70b2e81

bakpakin merged commit e35c6b8 into janet-lang:master Jun 8, 2023
7 checks passed

primo-ppcg deleted the take-drop branch June 9, 2023 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make take and drop more symmetric #1183

Make take and drop more symmetric #1183

primo-ppcg commented Jun 8, 2023

sogaiu commented Jun 8, 2023 •

edited

Loading

sogaiu commented Jun 8, 2023

primo-ppcg commented Jun 8, 2023 •

edited

Loading

primo-ppcg commented Jun 8, 2023 •

edited

Loading

Make take and drop more symmetric #1183

Make take and drop more symmetric #1183

Conversation

primo-ppcg commented Jun 8, 2023

sogaiu commented Jun 8, 2023 • edited Loading

sogaiu commented Jun 8, 2023

primo-ppcg commented Jun 8, 2023 • edited Loading

primo-ppcg commented Jun 8, 2023 • edited Loading

sogaiu commented Jun 8, 2023 •

edited

Loading

primo-ppcg commented Jun 8, 2023 •

edited

Loading

primo-ppcg commented Jun 8, 2023 •

edited

Loading