-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization: Use constant folding for divisions not a power of two #9609
Conversation
@@ -889,10 +889,10 @@ function FileManagerMenu:_getTabIndexFromLocation(ges) | |||
if not ges then | |||
return last_tab_index | |||
-- if the start position is far right | |||
elseif ges.pos.x > 2 * Screen:getWidth() / 3 then | |||
elseif ges.pos.x > Screen:getWidth() * (2/3) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bigger question's what they are on ARM than on x86 though. (Not an objection. :-) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on the CPU (some... don't even have native euclidean div hardware ;)).
But generally the same principles apply (i.e., it's a few clock cycles slower).
Which makes getting these out of the ARM technical manuals kind of a bitch, because it's optional on the A7 & A8 (IIRC, it's always supported on Kindle & Kobo, though).
But, if we look at a slightly more modern processor where it's stock: https://hardwarebug.org/2014/05/15/cortex-a7-instruction-cycle-timings/
(So, yeah, much slower ;)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In compiled languages, it'd be the compiler's job to transform as many divisions as possibly in MULs or shifts, but I'm not quite sure what happens in Lua's case ;o).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LuaJIT tends to be pretty good about these things but you'd have to check. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made a small benchmark
x = 0
for i = 1,500000 do
for j = 1,1000 do
x = x + (j + i) * (1/3)
end
end
print(x)
and
x = 0
for i = 1,500000 do
for j = 1,1000 do
x = x + (j + i) / 3
end
end
pint(x)
Run both with time luajit bench.lua
.
On a Sage the first one takes 5.4s
the second one 12.4s
.
On my Laptop the first on needs 0.5s
the second one 2.1s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LuaJIT tends to be pretty good about these things but you'd have to check. :-)
In https://luapower.com/luajit-notes#luajit-assumptions there is
divisions are 4x slower than multiplications on x86, so when dividing by a constant, it helps turning x / c into x * (1 / c) since the constant expression is folded – LuaJIT does this already for power-of-2 constants where the semantics are equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context re: my link above, the Sage is running on a quad-A7 @ 1.8GHz, FWIW (although we mostly only keep a single core online).
Our other devices tend to run on a A8 or an A9 at 1GHz.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing time.to_number
in there, but LGTM ;).
If these assumptions/advices are still valid, I find it quite sad that Luajit should force us to write less readable code :/ |
@poire-z d'accord. Luajit has some potential for optimizations here. My simple bemchmark shows that the infos on luapower.com are not outdated. Your are right that some of the changes hinder readability a bit, but other changes improve exactly this. So in respect of readability these changes are a null-sum-game, what stays, is the mini performance improvement and a sensitation (for us ) for that constant folding optimization potential (until Mike Pall implements this in luajit). (And maybe writing * (1/3) instead of /3 is not so bad if one gets used to it?) |
The only ones I had to stop for a bit to grok are the |
(Beeing a teacher I would say 1.5=3/2, and division is the same as multiplication with the reciprocal so 1/1.5=1/(3/2)=2/3. But I dont want to teach.) |
It's a null sum game only when the nb of bad readability == nb of good readability :) I call to your good sense to not pop in every upcoming PR (by newcomers, but also oldtimers :) and chase such divisions and advise to use (And may be once a year, for your birthday, you'll be allowed to pass over code that is not yours and optimize it and make it less readable :) Not too long ago, people did sacrify some of their children to the angry gods to make them friendlier - it was also an optimization: less mouths to feed :) |
@poire-z I totally agree :) |
Psst:
;p |
@poire-z I will implement a plugin to remind me to force this optimizations. Stay tuned ;-) |
This may have subtly broken a number of things because of different rounding semantics paired with near-zero/negative values and
(Example in ZSH because Lua lies to you in
Caught in the Kobo light toggle ramp, when starting with a frontlight set to 12, it lead to the final step being floored to -1 instead of 0, which breaks because we're bypassing the clamping in this codepath. |
Because of floating point computery math stuff. Regression since koreader#9609
F.... whats that for. Floating point is always difficult. Maybe we had pure luck up to now with using math.floor. In future we should consider to use a sort of rounding. . @NiLuJe thanks for investigating. |
Yeah, |
Because of floating point computery math stuff. Regression since koreader#9609 c.f., koreader#9609 (comment)
Because of floating point computery math stuff. Regression since koreader#9609 c.f., koreader#9609 (comment)
Because of floating point computery math stuff. Regression since koreader#9609 c.f., koreader#9609 (comment)
Edit of my prev post: Please replace difficult with dangerous. |
Never had any such issues. I guess we should trust our brain naive output ("divide by 5", our play in our head with "normal" values when you write the maths; and math.floor never failed me) and not try to be too clever by twisting things making them less naive and readable. (And not make things "more complicated" because "complicated" didn't work - "simple" is nice :) |
Because of floating point computery math stuff. Regression since #9609 c.f., #9609 (comment)
This PR is a minimal optimization. See https://luapower.com/luajit-notes#luajit-assumptions
When
c
is a constantx/c
is slower thanx * (1/c)
(if c is not a power of two).This change is![Reviewable](https://camo.githubusercontent.com/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)