Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Convert stackwalking/cfi loading into a fixpoint iteration #450

Merged
merged 12 commits into from
Jun 8, 2021

Conversation

Swatinem
Copy link
Member

@Swatinem Swatinem commented Jun 2, 2021

Instead of two times, it will now do up to 5 stack walks, trying to load increasing CFI on each iteration.

@Swatinem
Copy link
Member Author

Swatinem commented Jun 2, 2021

Still to do:
The ModuleListBuilder is now run in every iteration, so it will over-report missing modules. I think I can move that part out of the loop eventually.

@Swatinem
Copy link
Member Author

Swatinem commented Jun 4, 2021

While updating the snapshots, I noticed the following:

The CFI status is being changed from unused to missing for some modules, some are more correct, as the module is being referenced in the stack trace, but obviously not being found/loaded.
For some though, the status is being set even though they are not being referenced from a stack trace. Maybe they are being referenced by referenced_modules, but the stack frames referencing that module are being discarded somewhere.

I think in this case, avoiding the false negatives (previously being marked as unused even though they are missing) is a good idea even though it might introduce some false positives (they are now being marked as missing even though they end up not being used).

@Swatinem Swatinem requested review from flub and a team June 4, 2021 10:28
@Swatinem Swatinem marked this pull request as ready for review June 4, 2021 10:28
@Swatinem
Copy link
Member Author

Swatinem commented Jun 4, 2021

False positive:
In the fixture which is used for both snapshots, we do two passes, one for the initial scan, which includes rpcrt4.dll and dbgcore.dll, but the second pass with the added CFI from the main exe does not include those anymore.
So indeed we try to load them, but they are still unused in the final stack walk.

Copy link
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need another pass over the logic of how to collect statusses i'm afraid.

The CfiCacheModules went from something that was purely used to build a report post-hoc to something that now is merging that functionality with also providing and adding info during the fixpoint iterations. Which makes things more complicated? Haven't figured out yet if this can be simpler and more separated yet.

crates/symbolicator/src/services/symbolication.rs Outdated Show resolved Hide resolved
Comment on lines 164 to 165
// make sure to hold onto a reference to the CfiCacheFile, to make sure
// it will not be evicted in the middle of using it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it suddenly become this cache's job to keep these files alive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, they were moved into the stackwalking method, and that had an explicit drop at the end to keep the cache alive. This just moves this around a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe move this to the doc comment on this entry in the struct definition.


/// Returns a copy of the inner Module Map that can be used for processing.
fn for_processing(&self) -> BTreeMap<CodeModuleId, CfiModule> {
self.inner.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloning a BTreeMap sounds expensive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed unfortunate. Previously, the CfiCacheModules (which internally was a vec of tuples) would be serialized and sent via procspawn. Now it is the btreemap, so not too different actually.

// get_referenced_modules nor marked as scanned during
// stackwalking, it can only be unused.
obj_info.unwind_status = Some(ObjectFileStatus::Unused);
obj_info.unwind_status = None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no longer unused?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind-of piggyback on the None to raising errors about files being missing.

crates/symbolicator/src/services/symbolication.rs Outdated Show resolved Hide resolved
crates/symbolicator/src/services/symbolication.rs Outdated Show resolved Hide resolved
Comment on lines 164 to 165
// make sure to hold onto a reference to the CfiCacheFile, to make sure
// it will not be evicted in the middle of using it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe move this to the doc comment on this entry in the struct definition.

@Swatinem Swatinem merged commit bb5de6d into master Jun 8, 2021
@Swatinem Swatinem deleted the fix/fixpoint-cfi branch June 8, 2021 13:30
@flub
Copy link
Contributor

flub commented Jun 9, 2021

@jan-auer @Swatinem so for some reason the bot never popped up on this PR, but it should have. Is the bot ok? It used to work in the past.

@Swatinem
Copy link
Member Author

Swatinem commented Jun 9, 2021

which bot are we talking about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants