Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: Use constant folding for divisions not a power of two #9609

Merged
merged 5 commits into from
Oct 10, 2022
Merged

Optimization: Use constant folding for divisions not a power of two #9609

merged 5 commits into from
Oct 10, 2022

Conversation

zwim
Copy link
Contributor

@zwim zwim commented Oct 8, 2022

This PR is a minimal optimization. See https://luapower.com/luajit-notes#luajit-assumptions

When c is a constant x/c is slower than x * (1/c) (if c is not a power of two).


This change is Reviewable

@Frenzie Frenzie added this to the 2022.10 milestone Oct 8, 2022
@@ -889,10 +889,10 @@ function FileManagerMenu:_getTabIndexFromLocation(ges)
if not ges then
return last_tab_index
-- if the start position is far right
elseif ges.pos.x > 2 * Screen:getWidth() / 3 then
elseif ges.pos.x > Screen:getWidth() * (2/3) then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bigger question's what they are on ARM than on x86 though. (Not an objection. :-) )

Copy link
Member

@NiLuJe NiLuJe Oct 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the CPU (some... don't even have native euclidean div hardware ;)).

But generally the same principles apply (i.e., it's a few clock cycles slower).

Which makes getting these out of the ARM technical manuals kind of a bitch, because it's optional on the A7 & A8 (IIRC, it's always supported on Kindle & Kobo, though).

But, if we look at a slightly more modern processor where it's stock: https://hardwarebug.org/2014/05/15/cortex-a7-instruction-cycle-timings/

(So, yeah, much slower ;)).

Copy link
Member

@NiLuJe NiLuJe Oct 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In compiled languages, it'd be the compiler's job to transform as many divisions as possibly in MULs or shifts, but I'm not quite sure what happens in Lua's case ;o).

Copy link
Member

@Frenzie Frenzie Oct 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LuaJIT tends to be pretty good about these things but you'd have to check. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a small benchmark

x = 0
for i = 1,500000 do
    for j = 1,1000 do
        x = x + (j + i) * (1/3)
    end
end

print(x)

and

x = 0
for i = 1,500000 do
    for j = 1,1000 do
        x = x + (j + i) / 3
    end
end

pint(x)

Run both with time luajit bench.lua.
On a Sage the first one takes 5.4s the second one 12.4s.
On my Laptop the first on needs 0.5s the second one 2.1s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LuaJIT tends to be pretty good about these things but you'd have to check. :-)

In https://luapower.com/luajit-notes#luajit-assumptions there is

divisions are 4x slower than multiplications on x86, so when dividing by a constant, it helps turning x / c into x * (1 / c) since the constant expression is folded – LuaJIT does this already for power-of-2 constants where the semantics are equivalent.

Copy link
Member

@NiLuJe NiLuJe Oct 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context re: my link above, the Sage is running on a quad-A7 @ 1.8GHz, FWIW (although we mostly only keep a single core online).

Our other devices tend to run on a A8 or an A9 at 1GHz.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks.

Copy link
Member

@NiLuJe NiLuJe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing time.to_number in there, but LGTM ;).

@poire-z
Copy link
Contributor

poire-z commented Oct 8, 2022

See https://luapower.com/luajit-notes#luajit-assumptions

If these assumptions/advices are still valid, I find it quite sad that Luajit should force us to write less readable code :/
(Looking at the changes, there are many places that will require me to spent 3-10 seconds to figure out what this does - which is an awful lot more that the 3ns a processor would spent doing the expensive division if it was written in plain readable english :)

@zwim
Copy link
Contributor Author

zwim commented Oct 8, 2022

@poire-z d'accord. Luajit has some potential for optimizations here.

My simple bemchmark shows that the infos on luapower.com are not outdated.

Your are right that some of the changes hinder readability a bit, but other changes improve exactly this. So in respect of readability these changes are a null-sum-game, what stays, is the mini performance improvement and a sensitation (for us ) for that constant folding optimization potential (until Mike Pall implements this in luajit).

(And maybe writing * (1/3) instead of /3 is not so bad if one gets used to it?)

@NiLuJe
Copy link
Member

NiLuJe commented Oct 8, 2022

The only ones I had to stop for a bit to grok are the / 1.5 => * (2/3) ones, FWIW ;).

@zwim
Copy link
Contributor Author

zwim commented Oct 9, 2022

(Beeing a teacher I would say 1.5=3/2, and division is the same as multiplication with the reciprocal so 1/1.5=1/(3/2)=2/3. But I dont want to teach.)

@poire-z
Copy link
Contributor

poire-z commented Oct 9, 2022

Your are right that some of the changes hinder readability a bit, but other changes improve exactly this. So in respect of readability these changes are a null-sum-game

It's a null sum game only when the nb of bad readability == nb of good readability :)
I agree some of your changes improve readability - but other don't.
Anyway, I won't re-read it and won't tag those that don't.

I call to your good sense to not pop in every upcoming PR (by newcomers, but also oldtimers :) and chase such divisions and advise to use * ( 1 / N). You can do it when it's a hot code path, but when it's not, let a brain's intuitive output "let divide this by 6" be - it doesn't have to be "let multiply this by 1/6".
Ie, if opening the ToC window does four /6, it's just fine and not hot. If opening the ToC window could do four /6 for each ToC item, and having 2000 ToC items is possible, it could be considered "hot" (lukewarm for me :).

(And may be once a year, for your birthday, you'll be allowed to pass over code that is not yours and optimize it and make it less readable :) Not too long ago, people did sacrify some of their children to the angry gods to make them friendlier - it was also an optimization: less mouths to feed :)

@zwim
Copy link
Contributor Author

zwim commented Oct 9, 2022

@poire-z I totally agree :)

@NiLuJe
Copy link
Member

NiLuJe commented Oct 9, 2022

Psst:

I'm not seeing time.to_number in there, but LGTM ;).

;p

@zwim
Copy link
Contributor Author

zwim commented Oct 10, 2022

@poire-z I will implement a plugin to remind me to force this optimizations. Stay tuned ;-)

@zwim zwim merged commit 4969811 into koreader:master Oct 10, 2022
@NiLuJe
Copy link
Member

NiLuJe commented Oct 23, 2022

This may have subtly broken a number of things because of different rounding semantics paired with near-zero/negative values and math.floor's own rounding behavior for near-zero negative values.

┌─(niluje@illyria:pts/10)─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────(~/MPLAYER/koreader)─┐
└─(0.75:31%:21:42:%)── echo $(( 12 - (12 / 5.) * 5 ))                                                                                                                                                                                                                                                                                                  ──(Sun, Oct 23)─┘
0.
┌─(niluje@illyria:pts/10)─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────(~/MPLAYER/koreader)─┐
└─(0.28:31%:21:43:%)── echo $(( 12 - (12 * (1/5.)) * 5 ))                                                                                                                                                                                                                                                                                              ──(Sun, Oct 23)─┘
-1.7763568394002505e-15
┌─(niluje@illyria:pts/10)─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────(~/MPLAYER/koreader)─┐
└─(0.22:31%:21:43:%)── echo $(( 12 / 5. ))                                                                                                                                                                                                                                                                                                             ──(Sun, Oct 23)─┘
2.3999999999999999
┌─(niluje@illyria:pts/10)─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────(~/MPLAYER/koreader)─┐
└─(0.22:31%:21:44:%)── echo $(( 12 * (1/5.) ))                                                                                                                                                                                                                                                                                                         ──(Sun, Oct 23)─┘
2.4000000000000004

(Example in ZSH because Lua lies to you in print, at least for the intermediate part ;)).

> print(12 - (12 / 5) * 5)
0
> print(12 - (12 * (1/5)) * 5)
-1.7763568394003e-15

Caught in the Kobo light toggle ramp, when starting with a frontlight set to 12, it lead to the final step being floored to -1 instead of 0, which breaks because we're bypassing the clamping in this codepath.

NiLuJe added a commit to NiLuJe/koreader that referenced this pull request Oct 23, 2022
Because of floating point computery math stuff.

Regression since koreader#9609
@zwim
Copy link
Contributor Author

zwim commented Oct 23, 2022

F.... whats that for. Floating point is always difficult. Maybe we had pure luck up to now with using math.floor. In future we should consider to use a sort of rounding.

. @NiLuJe thanks for investigating.

@NiLuJe
Copy link
Member

NiLuJe commented Oct 23, 2022

Yeah, Math.round would have worked around the issue, but I kept the div because it makes the intent much clearer (i.e., that n / 5 * 5 == 0).

NiLuJe added a commit to NiLuJe/koreader that referenced this pull request Oct 23, 2022
Because of floating point computery math stuff.

Regression since koreader#9609
c.f., koreader#9609 (comment)
NiLuJe added a commit to NiLuJe/koreader that referenced this pull request Oct 23, 2022
Because of floating point computery math stuff.

Regression since koreader#9609
c.f., koreader#9609 (comment)
NiLuJe added a commit to NiLuJe/koreader that referenced this pull request Oct 23, 2022
Because of floating point computery math stuff.

Regression since koreader#9609
c.f., koreader#9609 (comment)
@zwim
Copy link
Contributor Author

zwim commented Oct 23, 2022

Edit of my prev post: Please replace difficult with dangerous.
No problem with /5.
But for future changes/additions we really should consider to rethink the usage of flooring vs. roundig.
I might write an elaborate description of whats happenimg in this special case (whenI am not on mobile )

@NiLuJe NiLuJe mentioned this pull request Oct 23, 2022
@poire-z
Copy link
Contributor

poire-z commented Oct 23, 2022

Maybe we had pure luck up to now with using math.floor
[...]
But for future changes/additions we really should consider to rethink the usage of flooring vs. roundig.

Never had any such issues. I guess we should trust our brain naive output ("divide by 5", our play in our head with "normal" values when you write the maths; and math.floor never failed me) and not try to be too clever by twisting things making them less naive and readable. (And not make things "more complicated" because "complicated" didn't work - "simple" is nice :)

@zwim zwim deleted the constant_folding branch October 24, 2022 04:37
NiLuJe added a commit that referenced this pull request Oct 24, 2022
Because of floating point computery math stuff.

Regression since #9609
c.f., #9609 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants