Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major performance improvements to idf_size.py (IDFGH-2404) #4518

Closed
wants to merge 2 commits into from

Conversation

ajcasagrande
Copy link
Contributor

I really like the information available from idf_size.py, however my firmware.map file is 15 MB and it takes a whopping 20 seconds to run!

After investigating I discovered that a majority of the time is spent running the regexes, especially RE_SOURCE_LINE. I tweaked that one and combined it with the cmake archive-less alternative. I also added a bunch of other small tweaks such as pre-filtering the input lines and exiting when we reach the next section in the file.

If you have ever run idf.py size or idf.py size-components on a fairly large project you will most likely not even believe how fast my version runs.

Input file size

anthony@linux:~/esp/esp-idf/tools$ du -h /tmp/firmware.map 
15M	/tmp/firmware.map

Current code in master branch

anthony@linux:~/esp/esp-idf/tools$ time ~/esp/esp-idf-master/tools/idf_size.py /tmp/firmware.map
Total sizes:
 DRAM .data size:   14132 bytes
 DRAM .bss  size:   35128 bytes
Used static DRAM:   49260 bytes (  75320 available, 39.5% used)
Used static IRAM:  108605 bytes (  22467 available, 82.9% used)
      Flash code:  923103 bytes
    Flash rodata:  300984 bytes
Total image size:~1346824 bytes (.bin may be padded larger)

real	0m19.440s
user	0m19.395s
sys	0m0.044s

My version

anthony@linux:~/esp/esp-idf/tools$ time ./idf_size.py /tmp/firmware.map
Total sizes:
 DRAM .data size:   14132 bytes
 DRAM .bss  size:   35128 bytes
Used static DRAM:   49260 bytes (  75320 available, 39.5% used)
Used static IRAM:  108605 bytes (  22467 available, 82.9% used)
      Flash code:  923103 bytes
    Flash rodata:  300984 bytes
Total image size:~1346824 bytes (.bin may be padded larger)

real	0m0.385s
user	0m0.361s
sys	0m0.024s

Results

From 19.4 seconds down to 0.4 seconds

@claassistantio
Copy link

claassistantio commented Dec 19, 2019

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot changed the title Major performance improvements to idf_size.py Major performance improvements to idf_size.py (IDFGH-2404) Dec 19, 2019
@Alvin1Zhang
Copy link
Collaborator

@ajcasagrande Thanks for the contribution.

Copy link
Contributor

@projectgus projectgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, @ajcasagrande . And an unbelievably good result, a well justified optimization pass!!

We may be able to look into running this as part of the standard app build, which would be nice.

I think I remember in an early version of idf_size.py thinking about optimisation and deciding it was premature, but clearly not the case now! (At least not after it's gotten more complex.)

Just two requests before we progress the merge process:

There's a test script tools/test_idf_size/test.sh that we run in our internal CI, could you please check it still passes with this version? It's plausible that some "failures" may just be due to changed but still legitimate output, in which case updating the test itself is an acceptable fix.

Also we haven't gotten around to running flake8 automatically on GitHub PRs yet, you can run this locally by pip install flake8 and then flake8 --config .flake8 tools/idf_size.py from the IDF_PATH directory.

if section is not None and m is not None:
sym_backup = m.group("sym_name")
if not RE_PRE_FILTER.match(line):
# line does not match our quick check, so skip to next line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of total curiosity, what's the performance difference from leaving out the pre-filter step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr without the pre-filter it goes from 375 ms to 561 ms

So i started looking into optimizing it with various small things here and there before I narrowed down the issue was with that specific regex (RE_SOURCE_FILE). I ended up getting fairly good gains, but not in the ballpark of what I have now. It wasn't until I discovered one very small thing that gave very high gains: there was an extra un-needed wildcard after the symbol name.

                                       |
                                       V
RE_SOURCE_LINE = r"\s*(?P<sym_name>\S*).* +0x(?P<address> ....

So by combining the cmake and source file regexes into 1 it cut the time down in almost half, but I just checked and with this small regex tweak and nothing else changing, even leaving the cmake as a separate regex (with the fix), brings it down to ~1.4 seconds.

Of course by the time i discovered this I had already made a lot of the other small tweaks whose performance gains became a lot less significant.

I played around with it a bit, commenting out fixes, and wrote a bash script to run it 10 times and print the average run. The results are kinda interesting to see. The first row is the code as it is now, the following rows an represents a fix/tweak that I undid for bench-marking. As you can see without the regex fix the pre-filter and early exit made a much larger difference than they do once you include the regex fix.

Pre-Filter Early Exit Fixed Regex Avg Millis
375
561
505
675
1291
4672
6023
9294

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very comprehensive writeup!

@0xjakob
Copy link
Collaborator

0xjakob commented Dec 20, 2019

@ajcasagrande Thanks a lot for this PR. I think it even improves readability of the code. As @projectgus said, please test with test.sh in tools/test_idf_size/ to verify that it works with the test .map file.

Also fixes flake8 warning because the line was too long
@ajcasagrande
Copy link
Contributor Author

ajcasagrande commented Dec 21, 2019

Thanks for your review, guys. I appreciate it.

flake8 had 1 warning, the RE_SOURCE_FILE line was 163 chars (max 160). I was able to fix this by removing the re.M flag which I believe to be superfluous since we are reading the file one line at a time.

flake8 is now passing.

anthony@linux:~/esp/esp-idf$ flake8 --config .flake8 tools/idf_size.py
anthony@linux:~/esp/esp-idf$ 

test.sh passes:

Name                                                       Stmts   Miss  Cover
------------------------------------------------------------------------------
/home/anthony/esp/esp-idf/tools/idf_size.py                  193      2    99%
test_idf_size.py                                              17      1    94%
------------------------------------------------------------------------------
TOTAL                                                        210      3    99%

@projectgus
Copy link
Contributor

@ajcasagrande Thanks for that. I've pushed this into our internal review & merge queue, PR will be updated once it's merged to master.

espressif-bot pushed a commit that referenced this pull request Jan 6, 2020
@igrr
Copy link
Member

igrr commented Jan 13, 2020

Merged in 874cfda, thanks @ajcasagrande!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants