Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(objects): Make sure returned files can be used for purpose #279

Merged
merged 11 commits into from
Nov 12, 2020

Conversation

flub
Copy link
Contributor

@flub flub commented Oct 29, 2020

When the objects actor downloads files it ranks them and chooses the
highest ranking file. However before returning we must make sure the
file can actually be used for the requested purpose.

In particular this avoids asking for CFI but only getting a (64-bit)
PDB but no PE file. This file does not actually provide the requested
info, which then confuses the CfiActor because it tries to use it as a
CFI file, even though the file does not have CFI.

Floris Bruynooghe added 4 commits October 29, 2020 18:36
When the objects actor downloads files it ranks them and chooses the
highest ranking file.  However before returning we must make sure the
file can actually be used for the requested purpose.

In particular this avoids asking for CFI but only getting a (64-bit)
PDB but no PE file.  This file does not actually provide the requested
info, which then confuses the CfiActor because it tries to use it as a
CFI file, even though the file does not have CFI.
If there's an archive, but the file inside is malformed it was shown
as simply missing.  This fixes this to still give a malformed marker.
object_meta.features gets filled in with Default::default values for
other cache statuses.  So we only want to filter those for which we
actually downloaded but do not match the requirements.
Comment on lines +215 to +249
let future = request
.compute(temp_file.path())
.and_then(move |status: CacheStatus| {
if let Some(ref cache_path) = cache_path {
sentry::configure_scope(|scope| {
scope.set_extra(
&format!("cache.{}.cache_path", name),
cache_path.to_string_lossy().into(),
);
});

log::trace!("Creating {} at path {:?}", name, cache_path);
}

log::trace!("Creating {} at path {:?}", name, cache_path);
}
let byteview = ByteView::open(temp_file.path())?;

let byteview = ByteView::open(temp_file.path())?;
metric!(
counter(&format!("caches.{}.file.write", name)) += 1,
"status" => status.as_ref(),
);
metric!(
time_raw(&format!("caches.{}.file.size", name)) = byteview.len() as u64,
"hit" => "false"
);

metric!(
counter(&format!("caches.{}.file.write", name)) += 1,
"status" => status.as_ref(),
);
metric!(
time_raw(&format!("caches.{}.file.size", name)) = byteview.len() as u64,
"hit" => "false"
);
let path = match cache_path {
Some(ref cache_path) => {
status.persist_item(cache_path, temp_file)?;
CachePath::Cached(cache_path.to_path_buf())
}
None => CachePath::Temp(temp_file.into_temp_path()),
};

let path = match cache_path {
Some(ref cache_path) => {
status.persist_item(cache_path, temp_file)?;
CachePath::Cached(cache_path.to_path_buf())
}
None => CachePath::Temp(temp_file.into_temp_path()),
};

Ok(request.load(key.scope.clone(), status, byteview, path))
});
Ok(request.load(key.scope.clone(), status, byteview, path))
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code didn't change, I only added the type signature to the |status: CacheStatus| closure and the rest is rust-fmt's doing.

Comment on lines +618 to +635
.filter(|response| match response {
// Make sure we only return objects which provide the requested info
Ok(object_meta) => {
if object_meta.status == CacheStatus::Positive {
// object_meta.features is meaningless when CacheStatus != Positive
match purpose {
ObjectPurpose::Unwind => object_meta.features.has_unwind_info,
ObjectPurpose::Debug => {
object_meta.features.has_debug_info
|| object_meta.features.has_symbols
}
ObjectPurpose::Source => object_meta.features.has_sources,
}
} else {
true
}
}
Err(_) => true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This .filter() is the actual bugfix. The the minidump processor assumes that when it finds a file in the CFI cache for the module it has a file with valid cfi info. In turn the file gets into the cfi cache because the cfi actor assumes that the object actor returns a file fit for purpose, and it asked for a CFI file. However the object actor's ranking mechanism didn't respect that. Adding this makes it respect this.

Comment on lines 298 to 304
None => {
if archive.objects().all(|r| r.is_err()) {
return Ok(CacheStatus::Malformed);
} else {
return Ok(CacheStatus::Negative);
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a bug where if an archive contained only malformed objects it would show up as missing rather than malformed. Arguably this isn't aggressive enough and should return malformed if the object was not found and any member of the archive failed to be parsed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be good to write in a code comment

@flub flub marked this pull request as ready for review November 3, 2020 15:22
@flub flub requested a review from a team November 3, 2020 15:22
src/actors/common/cache.rs Outdated Show resolved Hide resolved
});
let future = request
.compute(temp_file.path())
.and_then(move |status: CacheStatus| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"hide whitespace changes" is your friend ;-) I wonder why you need the explicit type declaration here though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is entirely a subjective readability choice. I'd say this is no different than spelling out the types in functions, and this is a large enough function...

Comment on lines 298 to 304
None => {
if archive.objects().all(|r| r.is_err()) {
return Ok(CacheStatus::Malformed);
} else {
return Ok(CacheStatus::Negative);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be good to write in a code comment

src/actors/objects/mod.rs Outdated Show resolved Hide resolved
src/actors/objects/mod.rs Outdated Show resolved Hide resolved
true
}
}
Err(_) => true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just out of curiosity: the new filter now throws away things we can’t use, but why does it pass through errors and negative cache statuses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because all those could have been object files which might have contained the info we needed. but we don't know until we can actually fetch it. and passing them through allows us to report on their status later (download error, malformed, negative cache). this whole behaviour of only returning one result here a bit questionable and may have to change to improve reporting

Floris Bruynooghe and others added 3 commits November 4, 2020 11:14
Co-authored-by: Arpad Borsos <arpad.borsos@sentry.io>
Co-authored-by: Arpad Borsos <arpad.borsos@sentry.io>
Only just realised there's also a "batch" button.  oh well

Co-authored-by: Arpad Borsos <arpad.borsos@sentry.io>
Err(e) => {
log::debug!("Failed to download: {:#?}", e);
return (3, *i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc we did this for stable min such that ties are resolved by order of sources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The i here entirely depends on the order of the futures being executed above above calling download_svc.list_files(). Practically this means there is no order, it's a race on who gets executed by the executor first (for non-sentry sources) or how fast the network is (for the sentry source).

At least that's my understanding and the reason I took it out. I don't think it enforces anything and only makes the code look more complicated.

Copy link
Member

@untitaker untitaker Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join_all preserves order of futures. list_files can return any order it wants, but joining all list_files calls together resembles the source order.

In theory this gives you the possibility to upload your own Windows kernel symbols and let them have precedence over our own built-in symbol source. I am not sure if anybody relies on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely behavior we should retain.

}
})
.filter(|response| match response {
// Make sure we only return objects which provide the requested info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we dedupe this with the content in min_by_key? iiuc all scores of 2 should basically be filtered out.

Perhaps the closure of min_by_key should go into a filter_map before min_by happens. Or we refactor this entire thing into a regular for-loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for complaining about this. I've looked at all the consumers again and concluded the filtering needs to be moved to before the min_by_key in order to still return a possible Err if we filtered out an object file.

I'm not sure I would like to merge the two, I think having the two as separate operations is easier to understand as each have a defined scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah no worries was just an idea.

Floris Bruynooghe added 3 commits November 11, 2020 17:25
If we didnt find the object in an archive but it was meant to be
there while at the same time there was a malformed file, it is safer
to assume that the malformed file was the one we needed rather than
treat it as missing.
Because the whole find function returns an
Option<Result<ObjectFileMeta, _>> we should fiter before we find the
most suitable object.  This way when we filtered out an object the
selection could still return a relevant Err.  Filtering after does not
allow this anymore.
@flub flub requested a review from untitaker November 12, 2020 11:42
@flub flub merged commit 3e4e534 into master Nov 12, 2020
@flub flub deleted the fix/bad-unwind-status branch November 12, 2020 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants