Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to Construct 2.10 #477

Closed
wants to merge 1 commit into from

Conversation

SupremeMortal
Copy link

Move away from an internal copy of Construct to the latest version in order to support new DWARFv5 data types.

@eliben
Copy link
Owner

eliben commented Jul 10, 2023

Thanks, this is a very large change, PTAL at the CI test failures... Can you clarify how this is required to support DWARFv5 stuff?

@SupremeMortal
Copy link
Author

Some of the form types mentioned here require UBInt24. Although it would be possible to add these to the existing library, I noticed that it was on the TODO list to use the updated construct library so I thought I'd give it a shot.

Will investigate the CI failures.

@SupremeMortal SupremeMortal force-pushed the construct-2.10 branch 2 times, most recently from d6ea94c to ab6714e Compare July 10, 2023 18:25
@SupremeMortal
Copy link
Author

CI and unit test issues have been resolved.

Copy link
Owner

@eliben eliben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall very nice! Still thinking about this, left some initial comments for now

elftools/elf/sections.py Outdated Show resolved Hide resolved
elftools/elf/sections.py Outdated Show resolved Hide resolved
elftools/elf/segments.py Outdated Show resolved Hide resolved
return 0


class EmbeddableStruct(Struct):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the newer construct versions removed these? I wonder if they offer any alternatives -- writing these custom structs always felt like a kludge

Copy link
Author

@SupremeMortal SupremeMortal Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there are no replacements as mentioned here construct/construct#1027 (comment)

I did however manage to remove RepeatUntilExcluding and StreamOffset

elftools/dwarf/callframe.py Outdated Show resolved Hide resolved
@eliben
Copy link
Owner

eliben commented Jul 10, 2023

We're long overdue on this :-) One thing that is slightly annoying is that we're adding more special construct utils/parsers instead of removing existing ones!

@sevaa - please take a look as well, when you have some time.

@SupremeMortal did you have a chance to measure performance? I want to make sure we're not regressing here

Move away from an internal copy of Construct to the latest version in order to support DWARF v5 required data types.
@SupremeMortal
Copy link
Author

I have not measured performance properly yet, but on my system, the unit tests now take on average around 9.1 seconds compared to before when it took 8.3 seconds. There are some optimisations that can be made to improve it which I will investigate.

@eliben
Copy link
Owner

eliben commented Jul 11, 2023

I have not measured performance properly yet, but on my system, the unit tests now take on average around 9.1 seconds compared to before when it took 8.3 seconds. There are some optimisations that can be made to improve it which I will investigate.

Running python3 test/all_tests.py takes around half a minute (wallclock time) at least on my machine, so this can provide a bit more runway for comparison.

@sevaa
Copy link
Contributor

sevaa commented Jul 11, 2023

On general grounds, I welcome not forking vendor packages gratutously :) That said, I'll take a closer look.

@eliben
Copy link
Owner

eliben commented Jul 11, 2023

On general grounds, I welcome not forking vendor packages gratutously :) That said, I'll take a closer look.

Hey now, it wasn't gratuitous :-)
Back in the day, construct development was completely stalled and semi-abandoned. They only worked on Python 2, and I had to fork to make pyelftools Python 3 compatible. It took a long while for construct to find a stable maintainer, and IIRC they took the Python 3 stuff back from pyelftools at some point.

That said, now seems like a fine time to relinquish the fork.

@eliben
Copy link
Owner

eliben commented Jul 17, 2023

My view on where things stand now:

  • In general I'm supportive of merging this
  • @SupremeMortal please report benchmarking/timing on something non trivial
  • I'd be very happy if @sevaa could do a code review as well, since he's had more mileage on hacking pyelftools internals recently than me

No rush on any of this, just thought a status update from my end would be useful.

@SupremeMortal
Copy link
Author

I'm unsure if the performance will be good enough. I tested with a 1.5GB ELF file with DWARF v5 debug, and it took 20 seconds with this PR compared to 10 seconds without which is quite a large regression in performance.

I also tried optimising structs with the new lazy-loaded structures in construct 2.10, but it ended up being unusable for most of the structs in this codebase.

@eliben
Copy link
Owner

eliben commented Jul 17, 2023

@SupremeMortal thanks for the update! This is indeed concerning. pyelftools doesn't excel at performance in general, so making it 2x slower for large inputs is a hard pill to swallow.

@arekbulski cc-ing you, given the discussion in #180

@sevaa
Copy link
Contributor

sevaa commented Jul 21, 2023

So a new convention to learn. Doesn't look too terrible. It did seem unnatural to me all along that parsers needed to have a name attached to them just in case the parser is embedded in a struct.

As for compiling, we could start by storing the basic building blocks (trivial DWARF forms, nondynamic headers, etc) in compiled form and see where it takes us.

@eliben
Copy link
Owner

eliben commented Jul 25, 2023

As for compiling, we could start by storing the basic building blocks (trivial DWARF forms, nondynamic headers, etc) in compiled form and see where it takes us.

What do you mean by "compiling" here? Can you elaborate?

@arekbulski
Copy link

Compilation is a feature in Construct.

@sevaa
Copy link
Contributor

sevaa commented Apr 5, 2024

@eliben Are we abandoning this? I'd rather not.
@SupremeMortal Can you please maybe bring this in line with the latest master? Also, any luck with compiling parsers, re: performance?

@eliben
Copy link
Owner

eliben commented Apr 5, 2024

@sevaa I don't have the capacity to spend too much time on this, personally. The 2x performance penalty seems like a problem to me - so it will have to be addressed. WDYT?

@sevaa
Copy link
Contributor

sevaa commented Apr 5, 2024

I have some bandwidth currently, wanted to revisit this. There's been several releases of Construct since, maybe they've done something about performance - or maybe we could apply some tweaks on our side. I don't see @SupremeMortal publishing any changes since the initial commit - if he looked at the performance issues, it was not mature enough for publishing, and anyway, the fork needs to be merged with the recent changes to the master. If @SupremeMortal doesn't respond in a few days, I could fork the fork and sort of take over.

@sevaa
Copy link
Contributor

sevaa commented Apr 5, 2024

FYI, for me, with this PR's fork and the latest construct, the performance difference with the master is not that great - the master is ~2% faster for a firehose parsing of DWARF CU/DIEs in a subset of our corpus. That's two percent, not times two,

@eliben
Copy link
Owner

eliben commented Apr 5, 2024

FYI, for me, with this PR's fork and the latest construct, the performance difference with the master is not that great - the master is ~2% faster for a firehose parsing of DWARF CU/DIEs in a subset of our corpus. That's two percent, not times two,

Interesting. I wonder what benchmark @SupremeMortal is alluding to in #477 (comment)

2% is definitely tolerable!

@sevaa
Copy link
Contributor

sevaa commented Apr 8, 2024

Status update. I have merged the branch with the master. Autotests pass.

I've slapped together a primitive performance test that isolates the parsing - reading all files under testfiles_for_dwarfdump into memory, then parsing them into CUs and DIEs under the timer (the I/O is deliberately not timed). I've added compile() calls to structs init logic here and there - but there are several construct features that we use all the time that are explicitly not compatible with compile(), notably anything with lambdas or function calls, and also Embed() (which is pretty much discontinued in the upstream construct). Notably, the LEB128 parser uses a lambda.

UPDATE: construct has a built-in parser called VarInt which is identical to ULEB128. It was inspired by protobuf's variable length int encoding. Protobufs has a provision for signed ints too, but their encoding ("zigzag") is different. Still good.

The latest result on said performance test is - 3.1 sec vs 2.7 in the current master. The previous test run, the one with a 2% discrepancy, I'm afraid, was marred with debugger interference. That's about a 15% slowdown. I might get more mileage out of compile() if something is done about the compiling of Embed and lambdas.

One rather counterintuitive result I've got - since DIE is the most parsed structure in DWARF (at least in the firehose scenario), I've tried prebuilding a parser for the attribute sequence of an abbreviation and reusing that in the DIE parser, but that only slowed down the benchmark. Even weirder, when I compiled the parser, it slowed down the benchmark even more.

@eliben
Copy link
Owner

eliben commented Apr 14, 2024

Superseded by #548

@eliben eliben closed this Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants