Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/symbol: Position independent executable #2910

Closed
wants to merge 1 commit into from

Conversation

marselester
Copy link
Contributor

@marselester marselester commented Apr 1, 2023

I am working on a DIY symbolizer and decided to share some of the results -- support of PIE. I've started with simple cases (see a blog post) such as resolving function names using only .symtab section, i.e., I haven't touched dynamic libraries yet.

PIE (position independent executable) is used by default in gcc for security measures, i.e., address space layout randomization. This means that the executable segment is mapped to a random high memory address such as 0x5646e2188000 instead of the usual 0x401000 address (non PIE).

PIE is used by default in gcc for security measures,
i.e., address space layout randomization.
This means that the executable segment is mapped to a random
high memory address such as 0x5646e2188000 instead of
the usual 0x401000 address (non PIE).

See https://github.com/marselester/diy-parca-agent/tree/main/cmd/addr2func
tool if you want to inspect an ELF file.
@marselester marselester requested a review from a team as a code owner April 1, 2023 00:51
Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fantastic PR! 🤘❤️

I'm asking for your patience with this. I want to take some time to review this thoroughly. We have other open PRs that touch these parts.

@marselester
Copy link
Contributor Author

@kakkoyun thank you!

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution ❤️

As we talked offline. These changes need more consideration. We might evolve this into something else.

@marselester
Copy link
Contributor Author

Thank you for taking time walking me through the related code in the Parca Agent!

@marselester
Copy link
Contributor Author

Here is a summary of address normalization. Please feel free to fill the gaps.

Parca Agent stores CPU samples in pprof format and uploads them to Parca Server. The format itself allows to store function names, but Agent doesn't symbolize anything and delegates this responsibility to Server. Moreover, unlike pprof tools, Agent doesn't write "raw" sampled addresses into profiles, it normalizes them first. This decision was made to offload Server, so it wouldn't need to normalize an address on each PCToLines call.

Normalization allows one to look up a function name in a symbol table using an address
which is within the ELF file's executable segment range (where .text resides).

// GetBase determines the base address to subtract from virtual
// address to get symbol table address. For an executable, the base
// is 0. Otherwise, it's a shared library, and the base is the
// address where the mapping starts. The kernel needs special handling.
base, _ := elfexec.GetBase(&ef.FileHeader, ph, f.m.kernelOffset, f.m.start, f.m.limit, f.m.offset)
normalizedAddress := addr - base

See related Agent code:

Before uploading an object file (an executable or a shared library), it is stripped off of ELF sections that don't help Server with symbolization. For example, .text section where machine instructions live is deleted, but the .symtab section that contains a symbol table is preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants