-
Notifications
You must be signed in to change notification settings - Fork 7.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major performance improvements to idf_size.py (IDFGH-2404) #4518
Conversation
@ajcasagrande Thanks for the contribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good, @ajcasagrande . And an unbelievably good result, a well justified optimization pass!!
We may be able to look into running this as part of the standard app build, which would be nice.
I think I remember in an early version of idf_size.py
thinking about optimisation and deciding it was premature, but clearly not the case now! (At least not after it's gotten more complex.)
Just two requests before we progress the merge process:
There's a test script tools/test_idf_size/test.sh
that we run in our internal CI, could you please check it still passes with this version? It's plausible that some "failures" may just be due to changed but still legitimate output, in which case updating the test itself is an acceptable fix.
Also we haven't gotten around to running flake8
automatically on GitHub PRs yet, you can run this locally by pip install flake8
and then flake8 --config .flake8 tools/idf_size.py
from the IDF_PATH directory.
if section is not None and m is not None: | ||
sym_backup = m.group("sym_name") | ||
if not RE_PRE_FILTER.match(line): | ||
# line does not match our quick check, so skip to next line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of total curiosity, what's the performance difference from leaving out the pre-filter step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr without the pre-filter it goes from 375 ms to 561 ms
So i started looking into optimizing it with various small things here and there before I narrowed down the issue was with that specific regex (RE_SOURCE_FILE
). I ended up getting fairly good gains, but not in the ballpark of what I have now. It wasn't until I discovered one very small thing that gave very high gains: there was an extra un-needed wildcard after the symbol name.
|
V
RE_SOURCE_LINE = r"\s*(?P<sym_name>\S*).* +0x(?P<address> ....
So by combining the cmake and source file regexes into 1 it cut the time down in almost half, but I just checked and with this small regex tweak and nothing else changing, even leaving the cmake as a separate regex (with the fix), brings it down to ~1.4 seconds.
Of course by the time i discovered this I had already made a lot of the other small tweaks whose performance gains became a lot less significant.
I played around with it a bit, commenting out fixes, and wrote a bash script to run it 10 times and print the average run. The results are kinda interesting to see. The first row is the code as it is now, the following rows an ⨯
represents a fix/tweak that I undid for bench-marking. As you can see without the regex fix the pre-filter and early exit made a much larger difference than they do once you include the regex fix.
Pre-Filter | Early Exit | Fixed Regex | Avg Millis |
---|---|---|---|
✓ | ✓ | ✓ | 375 |
⨯ | ✓ | ✓ | 561 |
✓ | ⨯ | ✓ | 505 |
⨯ | ⨯ | ✓ | 675 |
✓ | ✓ | ⨯ | 1291 |
⨯ | ✓ | ⨯ | 4672 |
✓ | ⨯ | ⨯ | 6023 |
⨯ | ⨯ | ⨯ | 9294 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the very comprehensive writeup!
@ajcasagrande Thanks a lot for this PR. I think it even improves readability of the code. As @projectgus said, please test with test.sh in tools/test_idf_size/ to verify that it works with the test .map file. |
Also fixes flake8 warning because the line was too long
Thanks for your review, guys. I appreciate it.
anthony@linux:~/esp/esp-idf$ flake8 --config .flake8 tools/idf_size.py
anthony@linux:~/esp/esp-idf$
Name Stmts Miss Cover
------------------------------------------------------------------------------
/home/anthony/esp/esp-idf/tools/idf_size.py 193 2 99%
test_idf_size.py 17 1 94%
------------------------------------------------------------------------------
TOTAL 210 3 99% |
@ajcasagrande Thanks for that. I've pushed this into our internal review & merge queue, PR will be updated once it's merged to master. |
Merged in 874cfda, thanks @ajcasagrande! |
I really like the information available from
idf_size.py
, however myfirmware.map
file is15 MB
and it takes a whopping 20 seconds to run!After investigating I discovered that a majority of the time is spent running the regexes, especially
RE_SOURCE_LINE
. I tweaked that one and combined it with the cmake archive-less alternative. I also added a bunch of other small tweaks such as pre-filtering the input lines and exiting when we reach the next section in the file.If you have ever run
idf.py size
oridf.py size-components
on a fairly large project you will most likely not even believe how fast my version runs.Input file size
anthony@linux:~/esp/esp-idf/tools$ du -h /tmp/firmware.map 15M /tmp/firmware.map
Current code in master branch
My version
Results