-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ceilDiv and fastCeilDiv in math #18596
Conversation
|
fair, can be fixed in this PR by adding more tests
it's a reasonable API for std/math IMO (tricky enough that you don't want to inline the implementation in user code or depend on 3rd party package, when the need comes; and very little downside to adding this API objectively) |
The need didn't come for years. Costs of software development: 20% for the initial implementation, 80% maintenance. Plus lacking IC every proc gets re-compiled all the time, and even with IC the CI doesn't benefit from IC as much as it always starts from a fresh Docker instance or similar. And once accepted into math.nim we need to ensure:
Now compare this to the "ceilDiv in a Nimble package" solution:
So all the potential pitfalls and bugs should be ours to tackle. Great. |
these points can be quantified objectively to avoid hand waving. The API is indeed less commonly used than
already the case, because tmath uses the recommended
nothing to do, that's what CI is for; the problem with maintenance is almost always deficient testing or complex feature interaction; newer APIs (including here) have better test coverage and APIs like this have low risk of feature interaction.
proc semFunc(c: PContext, n: PNode): PNode =
let validPragmas = if n[namePos].kind != nkEmpty: procPragmas else: lambdaPragmas
let t = cpuTime()
result = semProcAux(c, n, skFunc, validPragmas)
let t2 = cpuTime()
echo (t2 - t, n[0]) # 0.0002 the only real downside IMO is the extra |
I'm on the fence. I definitely don't think @timotheecour @demotomohiro What are some situations that require ceiling division? |
But they cannot easily quantified objectively as this is not about ceilDiv directly, but how about we develop the stdlib. Little things add up and every addition is a potential "issue" for our issue tracker that keeps piling up issues. We need much more YAGNI for the stdlib evolution. |
how about let's I'm still not convinced about the "potential future issues" such additions creates; the vast majority of bugs occur when you either have complex feature interactions or inadequate tests. eg, |
Bugs are only one aspect, the other is the CI and the ineffective testing algorithm. (You change one thing and 1000 unrelated things are tested yet again.) And the other is compile-time performance, every Nim user pays for |
test pruning based on module dependency graphWe've discussed this informally a few times but we should have an RFC otherwise nothing's gonna happen there; it's a good idea and can be done robustly against pruning algorithm bugs by ensuring that CI at least runs un-pruned on some schedule (eg on each commit to devel). Algo:
That's the only actual downside, and that's what I measured i #18596 (comment) (0.00025 sec per API is a good rough approximation); it would wash away to 0 (only parsing, ie negligible) with lazy semantic analysis: lazy semcheck(would have a huge impact on compile times)
std/mathutilsBack to this PR; another possibility is |
That really sounds like it should just be a nimble package. |
I have collected lines of code that do ceil division from popular open source project. https://github.com/demotomohiro/ceil-division-in-wild I also found in stackoverflow: https://stackoverflow.com/questions/17944/how-to-round-up-the-result-of-integer-division
|
I actually implemented this at SESCO. Big deal. It still doesn't belong in stdlib. Fix the bugs; leave the feature development to others -- you're not good at it. |
this nails it, thanks; ceilDiv makes the intent clearer in all the cases mentioned in https://github.com/demotomohiro/ceil-division-in-wild. |
I am currently leaning in the direction of agreeing* that this should be added to the math module (@demotomohero's "y items in x pages" example convinced me). The only other concern I have is whether users will actually look for this kind of routine in the first place. Unfortunately, adding an addition to the standard library doesn't guarantee that it will be used, even if the addition is well-designed. Those using the language have to intuit or know that the standard library will provide the mechanism and feel that looking up the function's documentation is less effort than implementing the functionality manually. *(I still don't believe the |
we already have these in std/math: so it makes sense that one would also look there for ceilDiv. Plus, docgen search would reveal it. |
Previous Compiler generate a Most of ceil divisions are used with positive value, and some of them are used with a divisor that is a power of 2 const value like a number of bits/bytes per some kinds of memory block. Here is how nim and backend C compiler generates assembly code with different kind of divisor: I compared speed of import std/[strformat, strutils, monotimes, times]
template measure(title: static[string]; body: untyped) =
block:
const numLoop {.inject.} = 100_000_000
let
runtimeVal {.inject.} = cast[int](main)
start = getMonoTime()
let x {.inject.} = block:
body
let delta {.inject.} = getMonoTime() - start
echo title & ": " & &"{delta.inMicroseconds.float / 1_000_000.0} sec, x = {x}"
proc fastCeilDiv[T: SomeInteger](x, y: T): T {.inline.} =
(x + (y - 1.T)) div y
proc fastCeilDiv2[T: SomeInteger](x, y: T): T {.inline.} =
((x.uint + (y.uint - 1.uint)) div y.uint).T
proc main =
measure("fastCeilDiv runtime value Signed "):
var x = 1
for i in runtimeVal .. (runtimeVal + numLoop):
x += fastCeilDiv(i, runtimeVal)
x
measure("fastCeilDiv runtime value Unsigned"):
var x = 1'u
for i in runtimeVal .. (runtimeVal + numLoop):
x += fastCeilDiv(i.uint, runtimeVal.uint)
x
template constMeasure(n: static[int]): untyped =
measure("fastCeilDiv " & n.toHex(8) & " Signed "):
var x = 1
for i in runtimeVal .. (runtimeVal + numLoop):
x += fastCeilDiv(i, n)
x
measure("fastCeilDiv " & n.toHex(8) & " Unsigned"):
var x = 1'u
for i in runtimeVal .. (runtimeVal + numLoop):
x += fastCeilDiv(i.uint, n.uint)
x
measure("fastCeilDiv2 " & n.toHex(8) & " "):
var x = 1
for i in runtimeVal .. (runtimeVal + numLoop):
x += fastCeilDiv2(i, n)
x
constMeasure(256)
constMeasure(0x8000)
constMeasure(0x8000000)
constMeasure(255)
#257 is a prime number.
constMeasure(257)
constMeasure(0xc000)
main() I compiled it with "-d:danger" option.
Output on Raspberry PI3:
Output on termux on nvidia shield tv:
|
This is a reason why I don't believe a "fast" version should be added. As I stated previously:
|
How about to Nim compiler find code that do ceil division without stdlib and show warning like "Use ceilDiv in math module".
But this will slow down compile speed.
Most of Nim code use int type variable even if the variable is always 0 or positive value. But this page says
This answer explains why: So if you use signed int type but that is know to be positive or zero, you need to call I compared generated assembly code of Following code compares speed of On wandbox, speed of import std/[strformat, strutils, monotimes, times]
template measure(title: static[string]; body: untyped) =
block:
const numLoop {.inject.} = 100_000_000
let
runtimeVal {.inject.} = cast[int](main)
start = getMonoTime()
let x {.inject.} = block:
body
let delta {.inject.} = getMonoTime() - start
echo title & ": " & &"{delta.inMicroseconds.float / 1_000_000.0} sec, x = {x}"
func ceilDiv[T: SomeInteger](x, y: T): T {.inline.} =
result = x div y
if not (x < 0 xor y < 0) and x mod y != 0:
inc result
proc fastCeilDiv[T: SomeInteger](x, y: T): T {.inline.} =
when sizeof(T) == 8:
type UT = uint64
elif sizeof(T) == 4:
type UT = uint32
elif sizeof(T) == 2:
type UT = uint16
else:
type UT = uint8
assert x >= 0 and y > 0
when T is SomeUnsignedInt:
assert x + y - 1 >= x
((x.UT + (y.UT - 1.UT)) div y.UT).T
proc main =
echo "-- Runtime value --"
measure("ceilDiv signed runtime value"):
var x = 1
for i in runtimeVal .. (runtimeVal + numLoop):
x += ceilDiv(i, runtimeVal)
x
measure("fastCeilDiv signed runtime value"):
var x = 1
for i in runtimeVal .. (runtimeVal + numLoop):
x += fastCeilDiv(i, runtimeVal)
x
measure("ceilDiv unsigned runtime value"):
var x = 1'u
for i in runtimeVal .. (runtimeVal + numLoop):
x += ceilDiv(i.uint, runtimeVal.uint)
x
measure("fastCeilDiv unsigned runtime value"):
var x = 1'u
for i in runtimeVal .. (runtimeVal + numLoop):
x += fastCeilDiv(i.uint, runtimeVal.uint)
x
template constMeasure(n: static[int]): untyped =
measure("ceilDiv signed const: " & n.toHex(8)):
var x = 1
for i in runtimeVal .. (runtimeVal + numLoop):
x += ceilDiv(i, n)
x
measure("fastCeilDiv signed const: " & n.toHex(8)):
var x = 1
for i in runtimeVal .. (runtimeVal + numLoop):
x += fastCeilDiv(i, n)
x
measure("ceilDiv unsigned const: " & n.toHex(8)):
var x = 1'u
for i in runtimeVal .. (runtimeVal + numLoop):
x += ceilDiv(i.uint, n.uint)
x
measure("fastCeilDiv unsigned const: " & n.toHex(8)):
var x = 1'u
for i in runtimeVal .. (runtimeVal + numLoop):
x += fastCeilDiv(i.uint, n.uint)
x
echo "-- Power of 2 divisors --"
constMeasure(256)
constMeasure(0x8000)
constMeasure(0x8000000)
echo "-- Non-Power of 2 divisors --"
constMeasure(255)
#257 is a prime number.
constMeasure(257)
constMeasure(0xc000)
main() Output on Wandbox:
Output on NVIDIA Shield TV:
Output on Raspberry PI 3:
|
here's an independent benchmark, showing fastCeilDiv is 2.5X faster than ceilDiv, tested on OSX
(measured via proc getCpuTicksImpl(): uint64 {.importc: "__rdtsc".}
template getCpuTicks*(): int64 = cast[int64](getCpuTicksImpl()) # see https://github.com/timotheecour/Nim/issues/773 + upcoming PR
import std/random
import std/math
template mainAux(algo)=
block:
let t1 = getCpuTicks()
var c = 0
for i in 0..<n-1:
let ci = algo(a[i], a[i+1])
c+=ci
let t2 = getCpuTicks()
echo (astToStr(algo), t2 - t1, c)
proc main()=
const n = 1000
var a: array[n, int]
for i in 0..<n:
a[i] = 1 + rand(10000)
for j in 0..<4:
mainAux(ceilDiv)
mainAux(fastCeilDiv)
main() Now that all the concerns against this PR have been properly addressed, can we please just merge this already.
|
I still don't feel that there is much worth to be gained in adding In adding such a routine, an assertion is being made that this routine is faster than ceilDiv for some range of inputs. This is difficult to guarantee over multiple platforms, compilers, and build flags (and over time). While If a user is in a situation where the conditional check in However, perhaps there's a compromise here: the standard library already has a |
oh, cmon. The point is that fastCeilDiv is faster on common architectures for positive inputs; who cares if it has same speed as fastCeil on some hypothetical architecture. With you logic, we'd design stdlib for the common base denominator performance and capability wise, preventing optimizations on platforms that support certain common operations.
Natural overload can't overload generic SomeInteger, and also doesn't help with unsigned inputs func ceilDiv*[T: SomeInteger](x, y: T): T = 1
func ceilDiv*(x, y: Natural): int = 2
echo ceilDiv(3, 4) # 1
I don't see how this 11-line function is "extremely simple"; that's the point of having it in stdlib, so users don't have to re-invent the wheel, poorly or less efficiently; the examples in the wild in fact don't benefit from the optimization introduced in this PR, precisely because it's not "extremely simple" as you claim dismissively. |
Users care. When I read documentation, I generally assume that it is accurate.
Ah, I concede that point then. I would like to point out that one can overload based on whether a type is signed or unsigned, and converting natural, signed integers to unsigned integers is a no-op (excluding range checks).
8 of those 11 lines are handling different integer inputs. This is hardly complex. The 3 remaining lines, though unintuitive in a mathematical sense, are not difficult to read and understand. Given the mathematical complexity, I could see pointing this formula out in
No. As things stand, this is what is usually done anyway. Why? Because to do otherwise is to introduce decisions on the user's part - decisions that are either mostly meaningless, or that can be automatically made by the compiler. When I write a program, I don't want to have to decide which variation of some function will be most performant - I want the compiler to do that! As an example, this is why we moved from having separate Let me reiterate my points, for clarity:
Please note that these are point_s_, plural, and no single one should be mistaken as representing my entire motivation. I am not stating that user decisions should always be avoided. I am not stating that any simple-but-unintuitive procedure should be added to the standard library. I am stating that, taken as a whole, these are my reasons for supporting or rejecting these procedures' introduction to the standard library. |
the documentation in this PR is accurate and the benchmarks are reproducible
except you can't, otherwise things like Just don't use |
This does not negate the decision cost that a variant procedure like this adds. In this instance, that cost outweighs any benefit this procedure provides. The whole idea that "oh, let's add this to the standard library, and if people don't like it, they don't have to use it. It won't do any harm to non-users", is flawed. Adding functionality to the standard library increases maintenance costs, increases the amount of documentation that must be read and searched through, and in this case, makes using the standard library more difficult. User: oh, I need ceiling division... and there's a By labelling a variant as "fast", or "smart", or any other similar adjective, an implicit effect is that the original is considered "slow", or "dumb", etc. Most people, even if they aren't explicitly seeking performance, don't want to use something that is considered "slow", "dumb", etc. When adding something to the standard library, these facts must be balanced against the worth of the functionality being added. And this is not in the context of what worth the function has to just you, or just me, but to all the users of the standard library. Please stop taking the stance that adding something to the standard library has no impact - it does. This has been mentioned to you, repeatedly, over and over, across multiple issues. |
The problem with Which is also why "stdlib lacked ceilDiv so I wrote my own (and it fails for high(int) but my application doesn't use it with high(int))" really does work and why application development is much cheaper than stdlib development. |
How about to add only
If there are people who ask for supporting negative integer value or any unsigned int value, If this idea still unacceptable, may I add a following code to proc ceilDiv*[T: SomeInteger](x, y: T): T {.inline.} =
when T is SomeSignedInt and T is range and low(T) >= 0:
# Use code in fastCeilDiv
...
else:
# Use code in ceilDiv
... This code runs as fast as |
Let's do this: func ceilDiv*[T: SomeInteger](x, y: T, fastMode: static bool = true): T {.inline, since: (1, 5, 1).} =
## When `fastMode = true`, we assume that `x >= 0` and `y > 0` and (`x + y - 1 <= high(T)`
## if T is SomeUnsignedInt) and check for it with assertions on; this enables faster code.
## When `fastMode = false`, we produce exact for all inputs but with slower code. it's clear enough and satisfies all use cases, and doesn't pollute the namespace, and follows nim-lang/RFCs#376. |
|
to make some progress, let's do that (add only ceilDiv with the implied meaning of The tests for specific for |
I'd rather see the more general version introduced. Going with the alternate version means introducing a function that exhibits undefined behavior for a substantial range of inputs, all for the sake of a largely insignificant performance difference. |
The contributor cares more about the fast case and the known use cases (pagination) only care about positive numbers. |
I copied |
lib/pure/math.nim
Outdated
else: | ||
type UT = uint8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it theoretically possible that sizeof(T)
is 16 or something? And do we really want to use uint32
/uint64
if T
is int
(not sure if this would make a difference, but #18445 seems like a better solution).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As T is SomeInteger, T can be int8 or int16.
If arguments were signed int and they were not converted to unsigned int,
it doesn't work any positive signed int value because x + (y - 1)
can overflow.
toUnsigned
in #18445 would be better,
but I cannot use it until it is mergerd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this is relevant to my question... I'm asking if it is possible that T
(in particular int
, since the others have a fixed size) can be 16 bytes (i.e. 128 bits) or more. In that case, UT
would be uint8
, which would be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for misunderstanding your question.
I don't think sizeof(T) can be 16 or larger as there is no int128
or uint128
types in system
module.
https://nim-lang.github.io/Nim/system.html
But I fixed the code so that when ceilDiv
is called with int type such like sizeof(T) > 8, it becomes compile error.
lib/pure/math.nim
Outdated
# If divisor is const, backend compiler generate code without `div` instruction | ||
# as it is slow on most of CPU. | ||
# If divisor is a power of 2 and a const unsigned integer type, | ||
# compiler generates faster code. | ||
# If divisor is const and signed int, generated code become slower than | ||
# the code with unsigned int because division with signed int need to works | ||
# both positive and negative value without `idiv`/`sdiv`. | ||
# That is why this code convert parameters to unsigned. | ||
# And also this works with any positive int value unless T is unsigned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# If divisor is const, backend compiler generate code without `div` instruction | |
# as it is slow on most of CPU. | |
# If divisor is a power of 2 and a const unsigned integer type, | |
# compiler generates faster code. | |
# If divisor is const and signed int, generated code become slower than | |
# the code with unsigned int because division with signed int need to works | |
# both positive and negative value without `idiv`/`sdiv`. | |
# That is why this code convert parameters to unsigned. | |
# And also this works with any positive int value unless T is unsigned. | |
# If divisor is const, the compiler generates code without a `div` instruction, | |
# as it is slow on most CPUs. | |
# If the divisor is a power of 2 and a const unsigned integer type, the | |
# compiler generates faster code. | |
# If the divisor is const and a signed integer, generated code become slower than | |
# the code with unsigned integers, because division with signed integers need to works | |
# for both positive and negative value without `idiv`/`sdiv`. | |
# That is why this code convert parameters to unsigned. | |
# And also this works with any positive integer value, unless T is unsigned. |
Does it really matter that much, how much better the performance for const divisors is, anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see this post:
#18596 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That just explains how the performance for const divisors is faster afaict, not if it's really releveant in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler can calculate the return value at compile time only when all arguments are const.
There are cases where dividend is runtime value and divisor is const.
https://github.com/demotomohiro/ceil-division-in-wild
elif sizeof(T) == 1: | ||
type UT = uint8 | ||
else: | ||
{.fatal: "Unsupported int type".} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{.fatal: "Unsupported int type".} | |
type UT = T |
I think this would be more sane, it's also what toUnsigned
would do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think someone find an issue and it get fixed quickly is better rather than people keep using the code that need to be fixed without knowing about it.
Thank you for your contribution and your patience. |
All participants, thank you for your help. |
* Use assert in runnableExamples and improve boundary check * Add more tests for ceilDiv * Fix comment in ceilDiv * Calling ceilDiv with int type T such like sizeof(T) > 8 is error
ceilDiv
is a round up integer division proc andfastCeilDiv
is faster version that works only positive integer value.