Assorted optimizations #62

debrouxl · 2017-09-01T15:07:10Z

Together, these commits shave more than 33500 bytes, i.e. more than 4.7%, over the size of the ARM binary compiled by the arm-none-eabi toolchain packaged by Debian unstable (GCC 6.3.1, binutils 2.28). They optimize the source code's size, too, but that doesn't matter :D
More than half of the optimization potential of an old version of this PR was destroyed by integrating #123 - essentially, my optimizations mostly remain, but jacobly's have disappeared.

Premature optimization is known to be bad... but so is letting known pessimization patterns (e.g. copying nontrivial classes such as Complex by value when const references do the job, copy-paste jobs, letting hundreds of trivial constructors / getters / setters / action methods out of line) stay, grow and multiply in the code base :)

I am well aware that the pull request contains way too many commits, and that some of those commits are very large (despite some splitting work). As long as I hadn't signed that that stupid, counter-productive CLA that you insist on people signing, it wasn't much of an issue, as you wouldn't have looked at the code anyway ;)
Before I signed the CLA, you couldn't integrate those changes even if they were coming from other persons (and in fact, you probably independently reinvented several changes on your side - as I wrote below, I noticed it when fixing conflicts), unless said changes were suggested by other persons before I implemented them, as occurred with a couple ideas from jacobly. But the code can still be useful to other persons for their own purposes, and can be subject to review, in the meantime.

CLAassistant · 2017-09-01T15:07:16Z

All committers have signed the CLA.

debrouxl · 2017-12-10T19:58:41Z

I've finished the rebase onto current master @ 8c37e8a .

Most of the optimizations I performed several months ago on 1.1.x remain valid, under some form, on 1.2.x. The largest conflicts were, probably as expected, in Poincare. The absolute savings of this PR are nearly 2 KB higher in 1.2.x than in 1.1.x, partially because I added some more optimizations, but as the whole code base is significantly larger, the relative savings are lower (5.8% -> 5.3%).

The number of patches was reduced from 18 to 16 by squashing two consecutive patches together, as well as removing a patch which added -fno-math-errno to the build flags - that used to be an optimization, it's now a 8-byte pessimization. However, the modified file count raised by several dozens.

Resolving some of the conflicts showed that a tiny subset of my changes was integrated upstream, probably independently reinvented, after I published them ;)

… on (mostly final, some singleton) classes with trivial constructors, getters and/or setters. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

…ader, so that the compiler can leverage its prior knowledge of the fact that the destructor is trivial (noticed by disassembling the code) to greatly optimize derived classes' destructors. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

…declared only once; optimize several strlcpy() with short strings as source; add and use strcpy() to replace strlcpy(dest, src, strlen(src)+1); add and use a function to factor out some redundant code in three app controllers. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

… while making classes final - all of that only when it does not hurt size. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

…, setters, forwards, while making classes final - all of that only when it does not hurt size. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

…setters, forwards, while making classes final - all of that only when it does not hurt size. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

…rds, while making classes final - all of that mostly when it does not hurt size. NOTE: switching KDCoordinate from int16_t to int32_t would save nearly 1 KB. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

…, BoundedStaticHierarchy, Complex, Integer and Symbol constructors, destructors and method bodies to the header, to optimize the code quite a bit by giving inlining a chance. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

…ine::ComplexAndComplexReduction, Rational::Rational and their implementations + subroutines, take const Complex<T> & instead of (const) Complex<T>, to avoid inefficiently copying Complex instances around; inline most clone() methods, which saves a bit of space as well. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

…ng the creation of instances from NAN, INFINITY and real numbers. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

…ate() / toScalar() / delete triplets by calls to approximateToScalar(), which does exactly this. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

…rsions, factor out methods between the FloatPairStore children, perform a bit of manual CSE. This reduces accuracy a tiny bit, but... if accuracy were a goal, only double would be used in the first place. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

…new[] and operator delete[]. It increases size very slightly (small code generation changes in some functions), but it allows executing a single bl instead of a bl followed by a b. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

debrouxl · 2018-05-28T18:31:32Z

At long last, the Epsilon licensing issue was alleviated (today's updates to #38 and the license change).
I still hate CLAs, because they're a barrier to contribution and they drive some contributors away (the GNU project, Canonical and others saw that effect)... but with NumWorks' reaffirmed and reinforced commitment to a CLA, despite the potential harm, me remaining adamant about not signing this kind of PoS isn't the most constructive behaviour I can display. Therefore, I signed the f* CLA, and let's proceed integrating some of my work to Epsilon.

I've finished the rebase onto current master ( fde4883 ), in several steps, all of which are published on my repo. However, I'll stagger force pushes to my repo's master branch, so that I can write comments here every time.

Nearly all of the optimizations I performed several months ago on 1.2.x remain valid, under some form, on 1.3.x. The absolute savings of this PR are ~200 bytes higher in 1.3.0 than in 1.2.0, but again, as the whole code base is significantly larger, the relative savings are lower (5.3% -> 5.1%).

No change on the number of patches.

Resolving some of the conflicts showed that a tiny subset of my changes was integrated upstream, probably independently reinvented, after I published them ;)

debrouxl · 2018-05-28T19:25:33Z

Guess what, nearly all optimizations still hold on 1.4.0 and 1.4.1. The absolute savings are just about the same, but the relative savings are now barely above 5.0%.

debrouxl · 2018-05-28T19:32:53Z

Likewise for 1.5.0. The absolute savings are several dozens of bytes higher, but epsilon.elf gained over 36000 bytes, which deflates the relative savings to ~4.75%.

EmilieNumworks · 2020-03-06T12:50:32Z

I am going to close this PR as it is outdated. I am sorry we did not review it at that time.

adriweb · 2020-03-06T12:57:35Z

Rebasing some of the commits for a "part 2" could likely still work, but will need some time, hmm

debrouxl · 2020-03-06T13:23:53Z

Some of the optimizations scattered around the PR are certainly still valid indeed, but this PR predated the usage of LTO, which was the way to go to reduce the size a lot, but which made the result of optimizations unstable over time, and as such, harder to predict and maintain in the face of code evolution...

LTO probably made a small subset of these optimizations automatically performed, but the main consequence was counter-intuitiveness and volatility over time of the results. Making the code of a function objectively better (individually smaller), or a class explicitly final (good for correctness and formerly usually for size optimization), can change link-time optimization decisions such as inlining, partial specialization and finalization, and end up causing a size increase. Explicit noinline marking could work around that in the current state of the code, but could later prove detrimental to optimization, after the code has changed to fulfill functional requirements.
Also, LTO made the build cycle much slower, and therefore, it's harder to test the result of individual optimizations, the way I did it when I worked on this PR in pre-LTO times.

Even if there were several speed-oriented changes causing minor size increases in my PR, e.g. inlining operator new / new[] / delete / delete[], the main goal has always been to decrease the size, so as to be able to cram more functionality within the N0100's low Flash memory size.

EDIT: the TL;DR of this message is that merging the whole set of changes from this PR would probably have been a bad thing, as some changes would have had adverse effect on size with LTO, though others kept being useful with LTO, as shown by issue #519 . However, I probably would have liked to know that you'd eventually be doing nothing with the changes in this PR, I would have spent less time on it ;)

debrouxl force-pushed the master branch 4 times, most recently from bb2336d to 4fc9647 Compare September 6, 2017 19:44

debrouxl force-pushed the master branch 6 times, most recently from e55cc80 to 98870a1 Compare September 11, 2017 20:48

jacobly0 mentioned this pull request Sep 12, 2017

Use math.h header instead of std one #122

Closed

debrouxl force-pushed the master branch 8 times, most recently from 3cc4494 to c98fef0 Compare September 18, 2017 20:11

debrouxl force-pushed the master branch 10 times, most recently from 5dd8936 to 7b686fb Compare September 25, 2017 20:55

debrouxl mentioned this pull request Apr 21, 2018

"Life is short and ROM is full" #110

Closed

debrouxl mentioned this pull request May 17, 2018

License forbids modification, but pull requests accepted #38

Closed

debrouxl added 15 commits May 28, 2018 19:55

[poincare, escher, kandinsky, apps] Several easy, large optimizations…

75fa1e1

… on (mostly final, some singleton) classes with trivial constructors, getters and/or setters. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

[poincare] Inline a number of trivial constructors, getters, setters,…

ac8cfe4

… while making classes final - all of that only when it does not hurt size. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

[escher] Inline a significant number of trivial constructors, getters…

baf6f1f

…, setters, forwards, while making classes final - all of that only when it does not hurt size. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

[apps] Inline a significant number of trivial constructors, getters, …

cdaec68

…setters, forwards, while making classes final - all of that only when it does not hurt size. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

[apps] Inline a significant number of trivial constructors, getters, …

3f8bdb9

…setters, forwards, while making classes final - all of that only when it does not hurt size. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

[apps] Inline a significant number of trivial constructors, getters, …

7fde066

…setters, forwards, while making classes final - all of that only when it does not hurt size. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

[poincare] Add several static methods to Complex for further optimizi…

cbb3ad1

…ng the creation of instances from NAN, INFINITY and real numbers. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

[poincare] Initialize several local arrays directly; replace approxim…

d18110c

…ate() / toScalar() / delete triplets by calls to approximateToScalar(), which does exactly this. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>

debrouxl force-pushed the master branch from a598994 to b198977 Compare May 28, 2018 18:17

debrouxl force-pushed the master branch from b198977 to fc753c6 Compare May 28, 2018 19:24

debrouxl force-pushed the master branch from fc753c6 to 41120e8 Compare May 28, 2018 19:31

debrouxl mentioned this pull request May 28, 2018

Some size (+ usually speed) optimizations, round 1... #519

Merged

LeaNumworks force-pushed the master branch from 2f31f04 to f30df04 Compare November 23, 2018 13:47

EmilieNumworks closed this Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assorted optimizations #62

Assorted optimizations #62

debrouxl commented Sep 1, 2017 •

edited

CLAassistant commented Sep 1, 2017 •

edited

debrouxl commented Dec 10, 2017 •

edited

debrouxl commented May 28, 2018

debrouxl commented May 28, 2018 •

edited

debrouxl commented May 28, 2018

EmilieNumworks commented Mar 6, 2020

adriweb commented Mar 6, 2020

debrouxl commented Mar 6, 2020 •

edited

Assorted optimizations #62

Assorted optimizations #62

Conversation

debrouxl commented Sep 1, 2017 • edited

CLAassistant commented Sep 1, 2017 • edited

debrouxl commented Dec 10, 2017 • edited

debrouxl commented May 28, 2018

debrouxl commented May 28, 2018 • edited

debrouxl commented May 28, 2018

EmilieNumworks commented Mar 6, 2020

adriweb commented Mar 6, 2020

debrouxl commented Mar 6, 2020 • edited

debrouxl commented Sep 1, 2017 •

edited

CLAassistant commented Sep 1, 2017 •

edited

debrouxl commented Dec 10, 2017 •

edited

debrouxl commented May 28, 2018 •

edited

debrouxl commented Mar 6, 2020 •

edited