Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ceilDiv and fastCeilDiv in math #18596

Merged
merged 10 commits into from
Aug 19, 2021
Merged

Conversation

demotomohiro
Copy link
Contributor

@demotomohiro demotomohiro commented Jul 27, 2021

ceilDiv is a round up integer division proc and fastCeilDiv is faster version that works only positive integer value.

lib/pure/math.nim Outdated Show resolved Hide resolved
lib/pure/math.nim Outdated Show resolved Hide resolved
lib/pure/math.nim Outdated Show resolved Hide resolved
lib/pure/math.nim Outdated Show resolved Hide resolved
@Araq
Copy link
Member

Araq commented Jul 28, 2021

  • Insufficient tests for edge cases (low(int) etc.)
  • Insufficient justification for why this should be in the stdlib.

@timotheecour
Copy link
Member

timotheecour commented Jul 28, 2021

Insufficient tests for edge cases (low(int) etc.)

fair, can be fixed in this PR by adding more tests

Insufficient justification for why this should be in the stdlib.

it's a reasonable API for std/math IMO (tricky enough that you don't want to inline the implementation in user code or depend on 3rd party package, when the need comes; and very little downside to adding this API objectively)

@Araq
Copy link
Member

Araq commented Jul 28, 2021

when the need comes; and very little downside to adding this API objectively

The need didn't come for years. Costs of software development: 20% for the initial implementation, 80% maintenance. Plus lacking IC every proc gets re-compiled all the time, and even with IC the CI doesn't benefit from IC as much as it always starts from a fresh Docker instance or similar. And once accepted into math.nim we need to ensure:

  • That it works at compile-time. (For consistency!)
  • That it works with JS. (Think about the poor "code silos" that otherwise result.)
  • That it keeps working.
  • Edge cases can be "fixed" later or not, who knows how it's used in practice, so how about we deprecate fastCeilDiv because the behavior changed and introduce reallyFastCeilDiv instead? That works oh so well for the other things we have in the stdlib...

Now compare this to the "ceilDiv in a Nimble package" solution:

  • It doesn't slow down our builds.
  • You can measure more easily how often ceilDiv really is used in practice.
  • Its unittests are re-run when its git repo changes instead of when Nim's monorepo changes.
  • I can depend on a specific commit of the library and be shielded from its "bugfixes":

tricky enough that you don't want to inline the implementation in user code or depend on 3rd party package, when the need comes;

So all the potential pitfalls and bugs should be ours to tackle. Great.

@timotheecour
Copy link
Member

timotheecour commented Jul 28, 2021

these points can be quantified objectively to avoid hand waving. The API is indeed less commonly used than floorDiv and I'm not going to argue over it strongly, but the costs are exaggerated.

That it works at compile-time. (For consistency!)
That it works with JS. (Think about the poor "code silos" that otherwise result.)

already the case, because tmath uses the recommended targets: "c cpp js" + main + static main() pattern

That it keeps working.

nothing to do, that's what CI is for; the problem with maintenance is almost always deficient testing or complex feature interaction; newer APIs (including here) have better test coverage and APIs like this have low risk of feature interaction.

Plus lacking IC every proc gets re-compiled all the time, and even with IC the CI doesn't benefit from IC as much as it always starts from a fresh Docker instance or similar.

  • total cost of testing: 0.000009 seconds for the ceilDiv + fastCeilDiv tests (c backend) (after instrumentation); this is a non-concern
  • cost of compiling overhead (when API is unused): I'm measuring about 0.00025 seconds per API, for most APIs (each generic func calls semFunc twice, non-generics call it once), including for ceilDiv and fastCeilDiv;
proc semFunc(c: PContext, n: PNode): PNode =
   let validPragmas = if n[namePos].kind != nkEmpty: procPragmas else: lambdaPragmas
   let t = cpuTime()
   result = semProcAux(c, n, skFunc, validPragmas)
   let t2 = cpuTime()
   echo (t2 - t, n[0]) # 0.0002

the only real downside IMO is the extra 0.00025 second required for semantic checking for clients of std/math regarless whether the API is used or not. These do add up but is still small compared to other things (eg backend compilation). Lazy semantic checking would allow completely removing theses costs for unused APIs (only parsing cost, but that's negligible).

@Varriount
Copy link
Contributor

I'm on the fence.

I definitely don't think fastCeilDiv should be added - the use-case for ceilDiv is small enough that providing a special-cased version doesn't make much sense. The touchiness of modern processors in general also makes me unsure just how much "faster" it will be.

@timotheecour @demotomohiro What are some situations that require ceiling division?

@Araq
Copy link
Member

Araq commented Jul 29, 2021

these points can be quantified objectively to avoid hand waving.

But they cannot easily quantified objectively as this is not about ceilDiv directly, but how about we develop the stdlib. Little things add up and every addition is a potential "issue" for our issue tracker that keeps piling up issues. We need much more YAGNI for the stdlib evolution.

@timotheecour
Copy link
Member

timotheecour commented Jul 29, 2021

how about let's remove fastCeilDiv from this PR for now and just keep ceilDiv, to move forward with this.

I'm still not convinced about the "potential future issues" such additions creates; the vast majority of bugs occur when you either have complex feature interactions or inadequate tests. eg, genEnumCaseStmt which was introduced with 0 tests nor runnableExamples (and, unsurprisingly, ended up with bugs).

@Araq
Copy link
Member

Araq commented Jul 30, 2021

Bugs are only one aspect, the other is the CI and the ineffective testing algorithm. (You change one thing and 1000 unrelated things are tested yet again.) And the other is compile-time performance, every Nim user pays for ceilDiv, one actually uses it.

@timotheecour
Copy link
Member

timotheecour commented Jul 30, 2021

CI and the ineffective testing algorithm. (You change one thing and 1000 unrelated things are tested yet again.)

test pruning based on module dependency graph

We've discussed this informally a few times but we should have an RFC otherwise nothing's gonna happen there; it's a good idea and can be done robustly against pruning algorithm bugs by ensuring that CI at least runs un-pruned on some schedule (eg on each commit to devel). Algo:

  • skipping running is easy: if a change was made in module A and a test B doesn't import A (transitively), then we can skip running B
  • skipping compiling is harder but doable: knowing whether B transitively imports A requires caching module dependencies (for a fixed compilation option), it can be done by keeping track of compile dependencies across multiple compilations

the other is compile-time performance, every Nim user pays for ceilDiv, one actually uses it.

That's the only actual downside, and that's what I measured i #18596 (comment) (0.00025 sec per API is a good rough approximation); it would wash away to 0 (only parsing, ie negligible) with lazy semantic analysis:

lazy semcheck

(would have a huge impact on compile times)
I implemented a POC of this a while ago in a branch, but would need revival.
for a symbol declaration, don't semcheck it until it's used in an imperative context, in fact this should be the same algorithm as the one used for cgen/jsgen, for which dead-code elimination happens automatically in a top down way

proc fn1 = discard # fn1 not semchecked yet
proc fn2 = fn1() # fn1 still not semchecked because fn2 is only declared, not used
when declared(fn1): discard # still not using fn1 in imperative context

when compiles(fn2()): discard
  # this would trigger semcheck of fn2, which would trigger semcheck of fn1
fn2() # ditto

std/mathutils

Back to this PR; another possibility is std/mathutils which can contain less commonly used algorithms; this would negate the CT impact and avoid bloating an already large module

@konsumlamm
Copy link
Contributor

Back to this PR; another possibility is std/mathutils which can contain less commonly used algorithms; this would negate the CT impact and avoid bloating an already large module

That really sounds like it should just be a nimble package.

@demotomohiro
Copy link
Contributor Author

I have collected lines of code that do ceil division from popular open source project.

https://github.com/demotomohiro/ceil-division-in-wild

I also found in stackoverflow:

https://stackoverflow.com/questions/17944/how-to-round-up-the-result-of-integer-division

If I have x items which I want to display in chunks of y per page, how many pages will be needed?

@disruptek
Copy link
Contributor

I actually implemented this at SESCO. Big deal. It still doesn't belong in stdlib. Fix the bugs; leave the feature development to others -- you're not good at it.

@timotheecour
Copy link
Member

timotheecour commented Jul 30, 2021

#18596 (comment)

this nails it, thanks; ceilDiv makes the intent clearer in all the cases mentioned in https://github.com/demotomohiro/ceil-division-in-wild.

@Varriount
Copy link
Contributor

Varriount commented Aug 5, 2021

I am currently leaning in the direction of agreeing* that this should be added to the math module (@demotomohero's "y items in x pages" example convinced me). The only other concern I have is whether users will actually look for this kind of routine in the first place.

Unfortunately, adding an addition to the standard library doesn't guarantee that it will be used, even if the addition is well-designed. Those using the language have to intuit or know that the standard library will provide the mechanism and feel that looking up the function's documentation is less effort than implementing the functionality manually.

*(I still don't believe the fastCeilDiv routine should be added)

@timotheecour
Copy link
Member

The only other concern I have is whether users will actually look for this kind of routine in the first place.

we already have these in std/math:
div, floorDiv, floor, ceil

so it makes sense that one would also look there for ceilDiv. Plus, docgen search would reveal it.

@demotomohiro
Copy link
Contributor Author

Previous fastCeilDiv code is not fastest when divisor is const.

Compiler generate a idiv or div instruction when divisor of div operator in Nim is not const.
When divisor is const, compiler generates a code without div or idiv instruction as integer division is slow.
And if divisor is const unsigned int type and a power of 2, it generates simpler and faster code using right shift instruction.
If divisor is const signed int type, it still generates code faster than div instruction but it also need to generate more code so that it works correctly with both positive and negative int values.
Even if both dividend and divisor are Natural type, it generates code in the same way as int type.

Most of ceil divisions are used with positive value, and some of them are used with a divisor that is a power of 2 const value like a number of bits/bytes per some kinds of memory block.
https://github.com/demotomohiro/ceil-division-in-wild

Here is how nim and backend C compiler generates assembly code with different kind of divisor:
https://godbolt.org/z/4fvbhrz5E

I compared speed of fastCeilDiv with several const/runtime signed/unsigned int values using following code.
When divisor is runtime value, there is almost no difference of speed between signed and unsigned int, but it is about 10 times slower than non-power of 2 const divisor.
When divisor is a power of 2 const unsigned int type, it is about 2 times faster than a power of 2 const signed int type.
Even if divisor is a non-power of 2 const, unsigned int type is slighty faster than a signed int type.

import std/[strformat, strutils, monotimes, times]

template measure(title: static[string]; body: untyped) =
  block:
    const numLoop {.inject.} = 100_000_000
    let
      runtimeVal {.inject.} = cast[int](main)
      start = getMonoTime()
    let x {.inject.} = block:
      body
    let delta {.inject.} = getMonoTime() - start
    echo title & ": " & &"{delta.inMicroseconds.float / 1_000_000.0} sec, x = {x}"

proc fastCeilDiv[T: SomeInteger](x, y: T): T {.inline.} =
  (x + (y - 1.T)) div y

proc fastCeilDiv2[T: SomeInteger](x, y: T): T {.inline.} =
  ((x.uint + (y.uint - 1.uint)) div y.uint).T

proc main =
  measure("fastCeilDiv runtime value Signed  "):
    var x = 1
    for i in runtimeVal .. (runtimeVal + numLoop):
      x += fastCeilDiv(i, runtimeVal)
    x

  measure("fastCeilDiv runtime value Unsigned"):
    var x = 1'u
    for i in runtimeVal .. (runtimeVal + numLoop):
      x += fastCeilDiv(i.uint, runtimeVal.uint)
    x

  template constMeasure(n: static[int]): untyped =
    measure("fastCeilDiv  " & n.toHex(8) & " Signed  "):
      var x = 1
      for i in runtimeVal .. (runtimeVal + numLoop):
        x += fastCeilDiv(i, n)
      x

    measure("fastCeilDiv  " & n.toHex(8)  & " Unsigned"):
      var x = 1'u
      for i in runtimeVal .. (runtimeVal + numLoop):
        x += fastCeilDiv(i.uint, n.uint)
      x

    measure("fastCeilDiv2 " & n.toHex(8) & "         "):
      var x = 1
      for i in runtimeVal .. (runtimeVal + numLoop):
        x += fastCeilDiv2(i, n)
      x

  constMeasure(256)
  constMeasure(0x8000)
  constMeasure(0x8000000)
  constMeasure(255)
  #257 is a prime number.
  constMeasure(257)
  constMeasure(0xc000)

main()

I compiled it with "-d:danger" option.
Here is output on wandbox:
https://wandbox.org/permlink/qM8e8VIlt6TwYBUJ

fastCeilDiv runtime value Signed  : 1.077261 sec, x = 1323047682
fastCeilDiv runtime value Unsigned: 1.031973 sec, x = 1323047682
fastCeilDiv  00000100 Signed  : 0.101275 sec, x = 21197050016659
fastCeilDiv  00000100 Unsigned: 0.04074 sec, x = 21197050016659
fastCeilDiv2 00000100         : 0.040962 sec, x = 21197050016659
fastCeilDiv  00008000 Signed  : 0.101335 sec, x = 165651562244
fastCeilDiv  00008000 Unsigned: 0.040438 sec, x = 165651562244
fastCeilDiv2 00008000         : 0.040427 sec, x = 165651562244
fastCeilDiv  08000000 Signed  : 0.101273 sec, x = 100000002
fastCeilDiv  08000000 Unsigned: 0.044366 sec, x = 100000002
fastCeilDiv2 08000000         : 0.042385 sec, x = 100000002
fastCeilDiv  000000FF Signed  : 0.133338 sec, x = 21280175506929
fastCeilDiv  000000FF Unsigned: 0.101151 sec, x = 21280175506929
fastCeilDiv2 000000FF         : 0.099565 sec, x = 21280175506929
fastCeilDiv  00000101 Signed  : 0.120323 sec, x = 21114571417369
fastCeilDiv  00000101 Unsigned: 0.09922 sec, x = 21114571417369
fastCeilDiv2 00000101         : 0.097479 sec, x = 21114571417369
fastCeilDiv  0000C000 Signed  : 0.11708 sec, x = 110451042008
fastCeilDiv  0000C000 Unsigned: 0.09665700000000001 sec, x = 110451042008
fastCeilDiv2 0000C000         : 0.097663 sec, x = 110451042008

Output on Raspberry PI3:

fastCeilDiv runtime value Signed  : 3.016756 sec, x = 1240087918
fastCeilDiv runtime value Unsigned: 2.644811 sec, x = 1240087918
fastCeilDiv  00000100 Signed  : 0.287002 sec, x = -882919215
fastCeilDiv  00000100 Unsigned: 0.143513 sec, x = 3412048081
fastCeilDiv2 00000100         : 0.143499 sec, x = -882919215
fastCeilDiv  00008000 Signed  : 0.286971 sec, x = -863255726
fastCeilDiv  00008000 Unsigned: 0.215222 sec, x = 3431711570
fastCeilDiv2 00008000         : 0.215227 sec, x = -863255726
fastCeilDiv  08000000 Signed  : 0.287075 sec, x = 100000002
fastCeilDiv  08000000 Unsigned: 0.215228 sec, x = 100000002
fastCeilDiv2 08000000         : 0.215221 sec, x = 100000002
fastCeilDiv  000000FF Signed  : 0.430497 sec, x = 1134583359
fastCeilDiv  000000FF Unsigned: 0.286981 sec, x = 1134583359
fastCeilDiv2 000000FF         : 0.286972 sec, x = 1134583359
fastCeilDiv  00000101 Signed  : 0.358714 sec, x = 2045299445
fastCeilDiv  00000101 Unsigned: 0.286985 sec, x = 2045299445
fastCeilDiv2 00000101         : 0.28696 sec, x = 2045299445
fastCeilDiv  0000C000 Signed  : 0.358696 sec, x = -558841289
fastCeilDiv  0000C000 Unsigned: 0.286982 sec, x = 3736126007
fastCeilDiv2 0000C000         : 0.28696 sec, x = -558841289

Output on termux on nvidia shield tv:

fastCeilDiv runtime value Signed  : 0.317217 sec, x = 200000002
fastCeilDiv runtime value Unsigned: 0.314767 sec, x = 200000002
fastCeilDiv  00000100 Signed  : 0.209821 sec, x = 163583061909072791
fastCeilDiv  00000100 Unsigned: 0.073613 sec, x = 163583061909072791
fastCeilDiv2 00000100         : 0.06937400000000001 sec, x = 163583061909072791
fastCeilDiv  00008000 Signed  : 0.209885 sec, x = 1277992720774642
fastCeilDiv  00008000 Unsigned: 0.072842 sec, x = 1277992720774642
fastCeilDiv2 00008000         : 0.07333099999999999 sec, x = 1277992720774642
fastCeilDiv  08000000 Signed  : 0.131097 sec, x = 312063325933
fastCeilDiv  08000000 Unsigned: 0.06884800000000001 sec, x = 312063325933
fastCeilDiv2 08000000         : 0.07158299999999999 sec, x = 312063325933
fastCeilDiv  000000FF Signed  : 0.262323 sec, x = 164224564112441711
fastCeilDiv  000000FF Unsigned: 0.209776 sec, x = 164224564112441711
fastCeilDiv2 000000FF         : 0.209854 sec, x = 164224564112441711
fastCeilDiv  00000101 Signed  : 0.209828 sec, x = 162946551940749557
fastCeilDiv  00000101 Unsigned: 0.209807 sec, x = 162946551940749557
fastCeilDiv2 00000101         : 0.209847 sec, x = 162946551940749557
fastCeilDiv  0000C000 Signed  : 0.218569 sec, x = 851995163850635
fastCeilDiv  0000C000 Unsigned: 0.210577 sec, x = 851995163850635
fastCeilDiv2 0000C000         : 0.209827 sec, x = 851995163850635

@Varriount
Copy link
Contributor

Previous fastCeilDiv code is not fastest when divisor is const.

This is a reason why I don't believe a "fast" version should be added. As I stated previously:

The touchiness of modern processors in general also makes me unsure just how much "faster" [fastCeilDiv] will be.

@demotomohiro
Copy link
Contributor Author

@Varriount

The only other concern I have is whether users will actually look for this kind of routine in the first place.

How about to Nim compiler find code that do ceil division without stdlib and show warning like "Use ceilDiv in math module".
Example of ceil division code:

  • (x + y - 1) div y
  • x div y + 1
  • (x shr n) + (if (x and (2^n - 1)) != 0: 1 else: 0)

But this will slow down compile speed.

Previous fastCeilDiv code is not fastest when divisor is const.

This is a reason why I don't believe a "fast" version should be added. As I stated previously:

The touchiness of modern processors in general also makes me unsure just how much "faster" [fastCeilDiv] will be.

Most of Nim code use int type variable even if the variable is always 0 or positive value.
Because Nim recommends signed int rather than unsigned int, Nim do over/under flow check to signed int but unsigned int
and integer literal without prefix is a int type.
If ceilDiv is used with signed int, it must works correctly with negative value even if arguments are actually always positive.
And there are many cases ceil division is used with positive constant value.
https://github.com/demotomohiro/ceil-division-in-wild

But this page says

When you divide an integer (that is known to be positive or zero) by a constant, convert the integer to unsigned
https://en.wikibooks.org/wiki/Optimizing_C++/Code_optimization/Faster_operations#Integer_division_by_a_constant

This answer explains why:
https://stackoverflow.com/a/49478562
https://stackoverflow.com/questions/49477554/why-a-constant-int-is-faster-when-a-is-unsigned-vs-signed

So if you use signed int type but that is know to be positive or zero, you need to call fastCeilDiv that convert arguments to unsigned.

I compared generated assembly code of ceilDiv and fastCeilDiv with runtime value, const power of 2 and const non-power of 2 divisors in Compiler Exploper.
fastCeilDiv always generates less instruction than ceilDiv.
https://godbolt.org/z/3f5nz18MG

Following code compares speed of ceilDiv and fastCeilDiv:
Nim version 1.4.8
Compiled with nim c -d:danger compareCeilDiv.nim.

On wandbox, speed of ceilDiv and fastCeilDiv with runtime value divisor are almost same but fastCeilDiv is slighty faster than on NVIDIA shield TV and Raspberry PI 3.
With const signed power of 2 divisor, fastCeilDiv is about 3 times faster than ceilDiv on all test machines.
With const signed non-power of 2 divisor, fastCeilDiv is about 2 times faster than ceilDiv on all test machines.

import std/[strformat, strutils, monotimes, times]

template measure(title: static[string]; body: untyped) =
  block:
    const numLoop {.inject.} = 100_000_000
    let
      runtimeVal {.inject.} = cast[int](main)
      start = getMonoTime()
    let x {.inject.} = block:
      body
    let delta {.inject.} = getMonoTime() - start
    echo title & ": " & &"{delta.inMicroseconds.float / 1_000_000.0} sec, x = {x}"

func ceilDiv[T: SomeInteger](x, y: T): T {.inline.} =
  result = x div y
  if not (x < 0 xor y < 0) and x mod y != 0:
    inc result

proc fastCeilDiv[T: SomeInteger](x, y: T): T {.inline.} =
  when sizeof(T) == 8:
    type UT = uint64
  elif sizeof(T) == 4:
    type UT = uint32
  elif sizeof(T) == 2:
    type UT = uint16
  else:
    type UT = uint8

  assert x >= 0 and y > 0
  when T is SomeUnsignedInt:
    assert x + y - 1 >= x

  ((x.UT + (y.UT - 1.UT)) div y.UT).T

proc main =
  echo "-- Runtime value --"
  measure("ceilDiv     signed   runtime value"):
    var x = 1
    for i in runtimeVal .. (runtimeVal + numLoop):
      x += ceilDiv(i, runtimeVal)
    x

  measure("fastCeilDiv signed   runtime value"):
    var x = 1
    for i in runtimeVal .. (runtimeVal + numLoop):
      x += fastCeilDiv(i, runtimeVal)
    x

  measure("ceilDiv     unsigned runtime value"):
    var x = 1'u
    for i in runtimeVal .. (runtimeVal + numLoop):
      x += ceilDiv(i.uint, runtimeVal.uint)
    x

  measure("fastCeilDiv unsigned runtime value"):
    var x = 1'u
    for i in runtimeVal .. (runtimeVal + numLoop):
      x += fastCeilDiv(i.uint, runtimeVal.uint)
    x

  template constMeasure(n: static[int]): untyped =
    measure("ceilDiv     signed   const: " & n.toHex(8)):
      var x = 1
      for i in runtimeVal .. (runtimeVal + numLoop):
        x += ceilDiv(i, n)
      x

    measure("fastCeilDiv signed   const: " & n.toHex(8)):
      var x = 1
      for i in runtimeVal .. (runtimeVal + numLoop):
        x += fastCeilDiv(i, n)
      x

    measure("ceilDiv     unsigned const: " & n.toHex(8)):
      var x = 1'u
      for i in runtimeVal .. (runtimeVal + numLoop):
        x += ceilDiv(i.uint, n.uint)
      x

    measure("fastCeilDiv unsigned const: " & n.toHex(8)):
      var x = 1'u
      for i in runtimeVal .. (runtimeVal + numLoop):
        x += fastCeilDiv(i.uint, n.uint)
      x

  echo "-- Power of 2 divisors --"
  constMeasure(256)
  constMeasure(0x8000)
  constMeasure(0x8000000)

  echo "-- Non-Power of 2 divisors --"
  constMeasure(255)
  #257 is a prime number.
  constMeasure(257)
  constMeasure(0xc000)

main()

Output on Wandbox:
https://wandbox.org/permlink/6tSxLwzhEAvpE7BB

-- Runtime value --
ceilDiv     signed   runtime value: 1.141367 sec, x = 1323047682
fastCeilDiv signed   runtime value: 1.062625 sec, x = 1323047682
ceilDiv     unsigned runtime value: 1.08019 sec, x = 1323047682
fastCeilDiv unsigned runtime value: 1.04149 sec, x = 1323047682
-- Power of 2 divisors --
ceilDiv     signed   const: 00000100: 0.155666 sec, x = 21197050016659
fastCeilDiv signed   const: 00000100: 0.041768 sec, x = 21197050016659
ceilDiv     unsigned const: 00000100: 0.094292 sec, x = 21197050016659
fastCeilDiv unsigned const: 00000100: 0.040721 sec, x = 21197050016659
ceilDiv     signed   const: 00008000: 0.149477 sec, x = 165651562244
fastCeilDiv signed   const: 00008000: 0.041419 sec, x = 165651562244
ceilDiv     unsigned const: 00008000: 0.096973 sec, x = 165651562244
fastCeilDiv unsigned const: 00008000: 0.040661 sec, x = 165651562244
ceilDiv     signed   const: 08000000: 0.147635 sec, x = 100000002
fastCeilDiv signed   const: 08000000: 0.040369 sec, x = 100000002
ceilDiv     unsigned const: 08000000: 0.093613 sec, x = 100000002
fastCeilDiv unsigned const: 08000000: 0.040393 sec, x = 100000002
-- Non-Power of 2 divisors --
ceilDiv     signed   const: 000000FF: 0.205451 sec, x = 21280175506929
fastCeilDiv signed   const: 000000FF: 0.09621300000000001 sec, x = 21280175506929
ceilDiv     unsigned const: 000000FF: 0.156617 sec, x = 21280175506929
fastCeilDiv unsigned const: 000000FF: 0.09919500000000001 sec, x = 21280175506929
ceilDiv     signed   const: 00000101: 0.206868 sec, x = 21114571417369
fastCeilDiv signed   const: 00000101: 0.098992 sec, x = 21114571417369
ceilDiv     unsigned const: 00000101: 0.154835 sec, x = 21114571417369
fastCeilDiv unsigned const: 00000101: 0.095625 sec, x = 21114571417369
ceilDiv     signed   const: 0000C000: 0.185543 sec, x = 110451042008
fastCeilDiv signed   const: 0000C000: 0.098065 sec, x = 110451042008
ceilDiv     unsigned const: 0000C000: 0.159565 sec, x = 110451042008
fastCeilDiv unsigned const: 0000C000: 0.09755900000000001 sec, x = 110451042008

Output on NVIDIA Shield TV:

-- Runtime value --
ceilDiv     signed   runtime value: 0.438935 sec, x = 200000002
fastCeilDiv signed   runtime value: 0.314849 sec, x = 200000002
ceilDiv     unsigned runtime value: 0.367393 sec, x = 200000002
fastCeilDiv unsigned runtime value: 0.314872 sec, x = 200000002
-- Power of 2 divisors --
ceilDiv     signed   const: 00000100: 0.216189 sec, x = 150114616999388343
fastCeilDiv signed   const: 00000100: 0.06979200000000001 sec, x = 150114616999388343
ceilDiv     unsigned const: 00000100: 0.091862 sec, x = 150114616999388343
fastCeilDiv unsigned const: 00000100: 0.07210900000000001 sec, x = 150114616999388343
ceilDiv     signed   const: 00008000: 0.216184 sec, x = 1172770494915764
fastCeilDiv signed   const: 00008000: 0.070558 sec, x = 1172770494915764
ceilDiv     unsigned const: 00008000: 0.091839 sec, x = 1172770494915764
fastCeilDiv unsigned const: 00008000: 0.07041600000000001 sec, x = 1172770494915764
ceilDiv     signed   const: 08000000: 0.216182 sec, x = 286378063148
fastCeilDiv signed   const: 08000000: 0.07195799999999999 sec, x = 286378063148
ceilDiv     unsigned const: 08000000: 0.091858 sec, x = 286378063148
fastCeilDiv unsigned const: 08000000: 0.072923 sec, x = 286378063148
-- Non-Power of 2 divisors --
ceilDiv     signed   const: 000000FF: 0.577276 sec, x = 150703301771738891
fastCeilDiv signed   const: 000000FF: 0.209962 sec, x = 150703301771738891
ceilDiv     unsigned const: 000000FF: 0.309038 sec, x = 150703301771738891
fastCeilDiv unsigned const: 000000FF: 0.209955 sec, x = 150703301771738891
ceilDiv     signed   const: 00000101: 0.488301 sec, x = 149530513431491893
fastCeilDiv signed   const: 00000101: 0.212696 sec, x = 149530513431491893
ceilDiv     unsigned const: 00000101: 0.312741 sec, x = 149530513431491893
fastCeilDiv unsigned const: 00000101: 0.211091 sec, x = 149530513431491893
ceilDiv     signed   const: 0000C000: 0.600892 sec, x = 781847013276710
fastCeilDiv signed   const: 0000C000: 0.210914 sec, x = 781847013276710
ceilDiv     unsigned const: 0000C000: 0.368974 sec, x = 781847013276710
fastCeilDiv unsigned const: 0000C000: 0.210902 sec, x = 781847013276710

Output on Raspberry PI 3:

-- Runtime value --
ceilDiv     signed   runtime value: 3.651427 sec, x = 1128591562
fastCeilDiv signed   runtime value: 2.753733 sec, x = 1128591562
ceilDiv     unsigned runtime value: 3.376447 sec, x = 1128591562
fastCeilDiv unsigned runtime value: 2.681804 sec, x = 1128591562
-- Power of 2 divisors --
ceilDiv     signed   const: 00000100: 0.429088 sec, x = 2062994921
fastCeilDiv signed   const: 00000100: 0.214556 sec, x = 2062994921
ceilDiv     unsigned const: 00000100: 0.286077 sec, x = 2062994921
fastCeilDiv unsigned const: 00000100: 0.143028 sec, x = 2062994921
ceilDiv     signed   const: 00008000: 0.429097 sec, x = 736817514
fastCeilDiv signed   const: 00008000: 0.143033 sec, x = 736817514
ceilDiv     unsigned const: 00008000: 0.357638 sec, x = 736817514
fastCeilDiv unsigned const: 00008000: 0.214549 sec, x = 736817514
ceilDiv     signed   const: 08000000: 0.429107 sec, x = 100000002
fastCeilDiv signed   const: 08000000: 0.143033 sec, x = 100000002
ceilDiv     unsigned const: 08000000: 0.28606 sec, x = 100000002
fastCeilDiv unsigned const: 08000000: 0.214558 sec, x = 100000002
-- Non-Power of 2 divisors --
ceilDiv     signed   const: 000000FF: 0.71515 sec, x = 588704231
fastCeilDiv signed   const: 000000FF: 0.286036 sec, x = 588704231
ceilDiv     unsigned const: 000000FF: 0.572167 sec, x = 588704231
fastCeilDiv unsigned const: 000000FF: 0.286072 sec, x = 588704231
ceilDiv     signed   const: 00000101: 0.643589 sec, x = -100677363
fastCeilDiv signed   const: 00000101: 0.28615 sec, x = -100677363
ceilDiv     unsigned const: 00000101: 0.500618 sec, x = 4194289933
fastCeilDiv unsigned const: 00000101: 0.286051 sec, x = 4194289933
ceilDiv     signed   const: 0000C000: 0.715162 sec, x = 507882482
fastCeilDiv signed   const: 0000C000: 0.286041 sec, x = 507882482
ceilDiv     unsigned const: 0000C000: 0.643684 sec, x = 507882482
fastCeilDiv unsigned const: 0000C000: 0.286076 sec, x = 507882482

@timotheecour
Copy link
Member

timotheecour commented Aug 8, 2021

here's an independent benchmark, showing fastCeilDiv is 2.5X faster than ceilDiv, tested on OSX

("ceilDiv", 44432, 5747)
("fastCeilDiv", 17738, 5747)
("ceilDiv", 44278, 5747)
("fastCeilDiv", 19486, 5747)
("ceilDiv", 44200, 5747)
("fastCeilDiv", 17576, 5747)
("ceilDiv", 44388, 5747)
("fastCeilDiv", 19476, 5747)

(measured via rdtsc which gives nanosecond granularity, allowing benchmarks with smaller sizes to avoid other effects such as cache, but in this case it doesn't matter and epochTime or other timers would get similar results)

proc getCpuTicksImpl(): uint64 {.importc: "__rdtsc".}
template getCpuTicks*(): int64 = cast[int64](getCpuTicksImpl()) # see https://github.com/timotheecour/Nim/issues/773 + upcoming PR

import std/random
import std/math

template mainAux(algo)=
  block:
    let t1 = getCpuTicks()
    var c = 0
    for i in 0..<n-1:
      let ci = algo(a[i], a[i+1])
      c+=ci
    let t2 = getCpuTicks()
    echo (astToStr(algo), t2 - t1, c)

proc main()=
  const n = 1000
  var a: array[n, int]
  for i in 0..<n:
    a[i] = 1 + rand(10000)
  for j in 0..<4:
    mainAux(ceilDiv)
    mainAux(fastCeilDiv)
main()

Now that all the concerns against this PR have been properly addressed, can we please just merge this already.

  • maintenance cost => addressed by adding comprehensive test coverage, and the tests in place use the recommended targets: "c cpp js" + main + static main() pattern so that it works in all backends.
  • no performance improvement => the benchmarks are clear
  • no use in the wild => see https://github.com/demotomohiro/ceil-division-in-wild; it's used a lot, but less efficiently and less clearly than what this PR would allow
  • not discoverable => i don't buy that, ceilDiv, fastCeilDiv sit in the same module as floorDiv, ceil, etc, just where you'd expect to find it with the name you'd expect; and docgen search works

@Varriount
Copy link
Contributor

I still don't feel that there is much worth to be gained in adding fastCeilDiv.

In adding such a routine, an assertion is being made that this routine is faster than ceilDiv for some range of inputs. This is difficult to guarantee over multiple platforms, compilers, and build flags (and over time). While fastCeilDiv may indeed be faster than ceilDiv for some range of inputs, this doesn't mean that someone who switches to using fastCeilDiv (if they can) will necessarily see an actual increase in the overall performance of their program.

If a user is in a situation where the conditional check in ceilDiv is having an actual, measurable performance impact on their overall program, then they can look at ceilDiv and come up with their own specialized copy - the implementation is extremely simple. The only reason I see ceilDiv as a worthwhile addition to the standard library is because there are a handful of general use-cases for it, and because the math module already has floorDiv.

However, perhaps there's a compromise here: the standard library already has a Natural type for numbers which are guaranteed (to a greater or lesser extent) to be positive. An overload to ceilDiv could exist for the Natural type which forgoes the conditional check. I would be fine with this, as long as any assertions about the overload's performance are tenuous ("Perform ceiling division for natural numbers. This may be faster than in certain circumstances").

@timotheecour
Copy link
Member

timotheecour commented Aug 9, 2021

This is difficult to guarantee over multiple platforms, compilers, and build flags (and over time). While fastCeilDiv may indeed be faster than ceilDiv for some range of inputs, this doesn't mean that someone who switches to using fastCeilDiv (if they can) will necessarily see an actual increase in the overall performance of their program.

oh, cmon. The point is that fastCeilDiv is faster on common architectures for positive inputs; who cares if it has same speed as fastCeil on some hypothetical architecture. With you logic, we'd design stdlib for the common base denominator performance and capability wise, preventing optimizations on platforms that support certain common operations.

An overload to ceilDiv could exist for the Natural type which forgoes the conditional check

Natural overload can't overload generic SomeInteger, and also doesn't help with unsigned inputs

func ceilDiv*[T: SomeInteger](x, y: T): T = 1
func ceilDiv*(x, y: Natural): int = 2
echo ceilDiv(3, 4) # 1

then they can look at ceilDiv and come up with their own specialized copy - the implementation is extremely simple.

I don't see how this 11-line function is "extremely simple"; that's the point of having it in stdlib, so users don't have to re-invent the wheel, poorly or less efficiently; the examples in the wild in fact don't benefit from the optimization introduced in this PR, precisely because it's not "extremely simple" as you claim dismissively.

@Varriount
Copy link
Contributor

The point is that fastCeilDiv is faster on common architectures for positive inputs; who cares if it has same speed as fastCeil on some hypothetical architecture.

Users care. When I read documentation, I generally assume that it is accurate.

Natural overload can't overload generic SomeInteger, and also doesn't help with unsigned inputs

Ah, I concede that point then. I would like to point out that one can overload based on whether a type is signed or unsigned, and converting natural, signed integers to unsigned integers is a no-op (excluding range checks).

I don't see how this 11-line function is "extremely simple"

8 of those 11 lines are handling different integer inputs. This is hardly complex. The 3 remaining lines, though unintuitive in a mathematical sense, are not difficult to read and understand. Given the mathematical complexity, I could see pointing this formula out in ceilDiv's documentation, however I still don't feel that using fastCeilDiv (compared to ceilDiv) would have enough of a performance impact in enough situations to warrant its introduction as a full procedure.

With you logic, we'd design stdlib for the common base denominator performance and capability wise, preventing optimizations on platforms that support certain common operations.

No.
With my logic, if one was to take it to the extreme, and exclude actual common sense, any and all platform-specific optimizations and capabilities would be invisible to a user. This does not mean such optimizations and capabilities wouldn't be used by the implementation. A user would only be able to benefit from them indirectly, or through cross-platform abstractions.

As things stand, this is what is usually done anyway. Why? Because to do otherwise is to introduce decisions on the user's part - decisions that are either mostly meaningless, or that can be automatically made by the compiler. When I write a program, I don't want to have to decide which variation of some function will be most performant - I want the compiler to do that! As an example, this is why we moved from having separate binarySearch and smartBinarySearch procedures, to just the one binarySearch procedure.


Let me reiterate my points, for clarity:

  • I do feel that ceilDiv should be added to the standard library:
    • It serves as a logical complement to floorDiv.
    • It has at least a few general use-cases.
    • The implementation is simple to maintain, but not necessarily intuitive enough for one to write on their own.
  • I do not feel that fastCeilDiv should be added:
    • The number of situations where a more performant, less flexible variation of ceilDiv will result in any measurable performance difference to an overall program are negligible.
    • Introducing a variation of ceilDiv imposes a decision cost upon the user, which is unwarranted when considered in light of the previous point.
    • The fastCeilDiv formula can be used in a procedure overload for unsigned integers, and the possible performance difference can be mentioned in the documentation of both procedures.

Please note that these are point_s_, plural, and no single one should be mistaken as representing my entire motivation. I am not stating that user decisions should always be avoided. I am not stating that any simple-but-unintuitive procedure should be added to the standard library. I am stating that, taken as a whole, these are my reasons for supporting or rejecting these procedures' introduction to the standard library.

@timotheecour
Copy link
Member

timotheecour commented Aug 10, 2021

Users care. When I read documentation, I generally assume that it is accurate.

the documentation in this PR is accurate and the benchmarks are reproducible

I would like to point out that one can overload based on whether a type is signed or unsigned, and converting natural, signed integers to unsigned integers is a no-op (excluding range checks).
When I write a program, I don't want to have to decide which variation of some function will be most performant - I want the compiler to do that
The fastCeilDiv formula can be used in a procedure overload for unsigned integers.

except you can't, otherwise things like ceilDiv(160'u8, 100'u8) would be incorrect.

Just don't use fastCeilDiv if you don't care about performance, noone is forcing you to use it.

@Varriount
Copy link
Contributor

Varriount commented Aug 10, 2021

Just don't use fastCeilDiv if you don't care about performance, noone is forcing you to use it.

This does not negate the decision cost that a variant procedure like this adds. In this instance, that cost outweighs any benefit this procedure provides.

The whole idea that "oh, let's add this to the standard library, and if people don't like it, they don't have to use it. It won't do any harm to non-users", is flawed. Adding functionality to the standard library increases maintenance costs, increases the amount of documentation that must be read and searched through, and in this case, makes using the standard library more difficult.

User: oh, I need ceiling division... and there's a ceilDiv and a fastCeilDiv... I want my program to be fast, so I guess I should use fastCeilDiv... oh, but fastCeilDiv only accepts positive numbers... are all my numbers positive? is there a way I can make them all positive? I don't want my program to be slow...

By labelling a variant as "fast", or "smart", or any other similar adjective, an implicit effect is that the original is considered "slow", or "dumb", etc. Most people, even if they aren't explicitly seeking performance, don't want to use something that is considered "slow", "dumb", etc.

When adding something to the standard library, these facts must be balanced against the worth of the functionality being added. And this is not in the context of what worth the function has to just you, or just me, but to all the users of the standard library.

Please stop taking the stance that adding something to the standard library has no impact - it does. This has been mentioned to you, repeatedly, over and over, across multiple issues.

@Varriount Varriount closed this Aug 10, 2021
@Varriount Varriount reopened this Aug 10, 2021
@Araq
Copy link
Member

Araq commented Aug 11, 2021

ceilDiv is acceptable for the stdlib and fastCeilDiv isn't. The documentation can refer to a fastCeilDiv Nimble package or similar, the probability that fastCeilDiv will make your code faster in the real world are really slim. Runtime costs are dominated by memory bandwidth / cache effectiveness anyway.

The problem with fastCeilDiv is that it could set a standard, for "consistency" we then can also accept fastDivMod and fastSum, in fact, it's likely that by exploiting special cases or avoiding edge cases some performance can be gained. That is not the purpose of a standard library, a standard library has to be protected against misuse, hence caring about all the edge cases. (Yes, I know we have plenty of old, bad code, but new code must follow a higher standard.)

Which is also why "stdlib lacked ceilDiv so I wrote my own (and it fails for high(int) but my application doesn't use it with high(int))" really does work and why application development is much cheaper than stdlib development.

@demotomohiro
Copy link
Contributor Author

How about to add only fastCeilDiv after renaming it to ceilDiv?
fastCeilDiv works with any positive signed int value including high(int) but negative value.
(See test code of fastCeilDiv)
I think ceilDiv is rarely used with negative value.
I couldn't find a ceil division with possibly negative value in this repo:
https://github.com/demotomohiro/ceil-division-in-wild
In most of ceil divisions I found so far, dividend is a number of bits/bytes/threads/pixels/items, and divisor is a number of these thing per memory block/thread group/tile/page.
Obviously, these number never have negative value.
If ceilDiv called with negative value, there is likely a bug before calling ceilDiv and an assert in ceilDiv will detect the negative value.

fastCeilDiv doesn't work when arguments are unsigned and (x + y - 1) overflows, but Nim doesn't recommend uint type.
I don't think it cause problems because Linux kernel does ceil division in the same way as fastCeilDiv.
https://github.com/demotomohiro/ceil-division-in-wild/tree/main/grep/linux
And many of C code I found implement a cail division in the same way.

If there are people who ask for supporting negative integer value or any unsigned int value,
the implementaion of ceilDiv can be changed to support such case without breaking compatibility.

If this idea still unacceptable, may I add a following code to ceilDiv after removing fastCeilDiv?

proc ceilDiv*[T: SomeInteger](x, y: T): T {.inline.} =
  when T is SomeSignedInt and T is range and low(T) >= 0:
    # Use code in fastCeilDiv
    ...
  else:
    # Use code in ceilDiv
    ...

This code runs as fast as fastCeilDiv when T is signed range type and low(T) >= 0 and works with any value in the range.
But it requres more test code.

@timotheecour
Copy link
Member

timotheecour commented Aug 11, 2021

Let's do this:

func ceilDiv*[T: SomeInteger](x, y: T, fastMode: static bool = true): T {.inline, since: (1, 5, 1).} =
  ## When `fastMode = true`, we assume that `x >= 0` and `y > 0` and (`x + y - 1 <= high(T)`
  ## if T is SomeUnsignedInt) and check for it with assertions on; this enables faster code.
  ## When `fastMode = false`, we produce exact for all inputs but with slower code.

it's clear enough and satisfies all use cases, and doesn't pollute the namespace, and follows nim-lang/RFCs#376.
The default to fastMode = true reflects real world use cases.

@Araq
Copy link
Member

Araq commented Aug 11, 2021

fastMode: static bool = true still makes the interface harder to understand. Better to only add fastCeilDiv, renamed to ceilDiv with well documented restrictions.

@timotheecour
Copy link
Member

timotheecour commented Aug 11, 2021

fastMode: static bool = true still makes the interface harder to understand. Better to only add fastCeilDiv, renamed to ceilDiv with well documented restrictions.

to make some progress, let's do that (add only ceilDiv with the implied meaning of fastMode = true, ie with the meaning of fastCeilDiv only) and defer to future work the question of whether to add fastMode: static bool = true, since this design wouldn't be prevented by adding ceilDiv with fastMode implied.

The tests for specific for ceilDiv (with fastMode = false implied) can be commented out with when false for now so this doesn't get lost.

@Varriount
Copy link
Contributor

Varriount commented Aug 12, 2021

I'd rather see the more general version introduced.

Going with the alternate version means introducing a function that exhibits undefined behavior for a substantial range of inputs, all for the sake of a largely insignificant performance difference.

@Araq
Copy link
Member

Araq commented Aug 12, 2021

Going with the alternate version means introducing a function that exhibits undefined behavior for a substantial range of inputs, all for the sake of a largely insignificant performance difference.

The contributor cares more about the fast case and the known use cases (pagination) only care about positive numbers.

@demotomohiro
Copy link
Contributor Author

I copied ceilDiv and its test code to https://github.com/demotomohiro/divmath.
Then I removed ceilDiv and renamed fastCeilDiv to ceilDiv.

lib/pure/math.nim Outdated Show resolved Hide resolved
lib/pure/math.nim Outdated Show resolved Hide resolved
Comment on lines 972 to 973
else:
type UT = uint8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it theoretically possible that sizeof(T) is 16 or something? And do we really want to use uint32/uint64 if T is int (not sure if this would make a difference, but #18445 seems like a better solution).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As T is SomeInteger, T can be int8 or int16.
If arguments were signed int and they were not converted to unsigned int,
it doesn't work any positive signed int value because x + (y - 1) can overflow.
toUnsigned in #18445 would be better,
but I cannot use it until it is mergerd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this is relevant to my question... I'm asking if it is possible that T (in particular int, since the others have a fixed size) can be 16 bytes (i.e. 128 bits) or more. In that case, UT would be uint8, which would be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for misunderstanding your question.
I don't think sizeof(T) can be 16 or larger as there is no int128 or uint128 types in system module.
https://nim-lang.github.io/Nim/system.html
But I fixed the code so that when ceilDiv is called with int type such like sizeof(T) > 8, it becomes compile error.

Comment on lines 979 to 987
# If divisor is const, backend compiler generate code without `div` instruction
# as it is slow on most of CPU.
# If divisor is a power of 2 and a const unsigned integer type,
# compiler generates faster code.
# If divisor is const and signed int, generated code become slower than
# the code with unsigned int because division with signed int need to works
# both positive and negative value without `idiv`/`sdiv`.
# That is why this code convert parameters to unsigned.
# And also this works with any positive int value unless T is unsigned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# If divisor is const, backend compiler generate code without `div` instruction
# as it is slow on most of CPU.
# If divisor is a power of 2 and a const unsigned integer type,
# compiler generates faster code.
# If divisor is const and signed int, generated code become slower than
# the code with unsigned int because division with signed int need to works
# both positive and negative value without `idiv`/`sdiv`.
# That is why this code convert parameters to unsigned.
# And also this works with any positive int value unless T is unsigned.
# If divisor is const, the compiler generates code without a `div` instruction,
# as it is slow on most CPUs.
# If the divisor is a power of 2 and a const unsigned integer type, the
# compiler generates faster code.
# If the divisor is const and a signed integer, generated code become slower than
# the code with unsigned integers, because division with signed integers need to works
# for both positive and negative value without `idiv`/`sdiv`.
# That is why this code convert parameters to unsigned.
# And also this works with any positive integer value, unless T is unsigned.

Does it really matter that much, how much better the performance for const divisors is, anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see this post:
#18596 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That just explains how the performance for const divisors is faster afaict, not if it's really releveant in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler can calculate the return value at compile time only when all arguments are const.
There are cases where dividend is runtime value and divisor is const.
https://github.com/demotomohiro/ceil-division-in-wild

lib/pure/math.nim Outdated Show resolved Hide resolved
lib/pure/math.nim Outdated Show resolved Hide resolved
elif sizeof(T) == 1:
type UT = uint8
else:
{.fatal: "Unsupported int type".}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{.fatal: "Unsupported int type".}
type UT = T

I think this would be more sane, it's also what toUnsigned would do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think someone find an issue and it get fixed quickly is better rather than people keep using the code that need to be fixed without knowing about it.

@Araq
Copy link
Member

Araq commented Aug 19, 2021

Thank you for your contribution and your patience.

@Araq Araq merged commit 373bbd9 into nim-lang:devel Aug 19, 2021
@demotomohiro
Copy link
Contributor Author

All participants, thank you for your help.

PMunch pushed a commit to PMunch/Nim that referenced this pull request Mar 28, 2022
* Use assert in runnableExamples and improve boundary check
* Add more tests for ceilDiv
* Fix comment in ceilDiv
* Calling ceilDiv with int type T such like sizeof(T) > 8 is error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants