Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(profile): Introduce a common sample format #1462

Merged
merged 23 commits into from Sep 27, 2022

Conversation

phacops
Copy link
Contributor

@phacops phacops commented Sep 12, 2022

We have several formats for each platform based around sampling. This format is meant to consolidate all of them into one common format, the Sample format.

Compared to current formats, this one lets us deduplicate the stacks of frames to reduce the size of the profile and standardize common metadata for all platforms and in returns, in the rest of the pipeline, the logic to transform into calltrees will be adapted for this format only instead of having to be adapted for each platform for which it is missing (rust, python and node so far). Some of the fields were added or renamed to match what is in the event format as well (for example timestamp or release).

There are still some fields specific to cocoa. It's not great since it's meant to be a more generic format but we still need some things that exists only for cocoa. Not sure if we can do so much better here. I toyed with the idea of adding a generic tags field so specific metadata but we'd still have to manually validate it since that metadata is mandatory in our data model right now. I think once we'll adapt our data model to be more generic, we'd be able to update this format as well.

@phacops phacops requested a review from a team as a code owner September 12, 2022 13:26
@phacops phacops requested a review from a team September 12, 2022 13:26
@JonasBa
Copy link
Member

JonasBa commented Sep 12, 2022

One worry about indexing stacks that I have is that for some environments (node for example), we do not have control over sample scheduling meaning we only get access to the entire profile once the profile stop method has been called at which point creating an index of stacks may add a non negligible amount of CPU and memory overhead. In any case, I think we need to benchmark this format to see how it impacts our overhead.

As an alternative, could we maybe think about supporting both indexed and non indexed stacks? It could be a progressive enhancement where we just distinguish between samples that are int[][] vs int[] where int[][] is the non-indexed version and int[] is the indexed stacks version.

@phacops
Copy link
Contributor Author

phacops commented Sep 12, 2022

One worry about indexing stacks that I have is that for some environments (node for example), we do not have control over sample scheduling meaning we only get access to the entire profile once the profile stop method has been called at which point creating an index of stacks may add a non negligible amount of CPU and memory overhead. In any case, I think we need to benchmark this format to see how it impacts our overhead.

You don't need to use this feature if you don't want to. Push each stack as a unique one and give it the latest ID.

@JonasBa
Copy link
Member

JonasBa commented Sep 13, 2022

One worry about indexing stacks that I have is that for some environments (node for example), we do not have control over sample scheduling meaning we only get access to the entire profile once the profile stop method has been called at which point creating an index of stacks may add a non negligible amount of CPU and memory overhead. In any case, I think we need to benchmark this format to see how it impacts our overhead.

You don't need to use this feature if you don't want to. Push each stack as a unique one and give it the latest ID.

@phacops does that mean we no longer index frames thought? That could be a pretty big hit to the profile size

Comment on lines 153 to 156
match payload {
Ok(payload) => Ok(vec![payload]),
Err(err) => Err(err),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match payload {
Ok(payload) => Ok(vec![payload]),
Err(err) => Err(err),
}
Ok(vec![payload?])

This might be simpler and there is no need for another match here

fn remove_single_samples_per_thread(&mut self) {
let mut sample_count_by_thread_id: HashMap<u64, u32> = HashMap::new();

for sample in self.profile.samples.iter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for sample in self.profile.samples.iter() {
for sample in self.profile.samples {

you shouldn't have to use .iter() explicitly here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it but I needed to borrow.

@phacops
Copy link
Contributor Author

phacops commented Sep 13, 2022

@phacops does that mean we no longer index frames thought? That could be a pretty big hit to the profile size

It wasn't there in the first commit but I added it in a different commit. It would be a shame not to have it.

Err(payload) => Err(payload),
let profile: MinimalProfile = minimal_profile_from_json(payload)?;
match profile.version {
Some(_) => expand_sample_profile(payload),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would match on v1 explicitly here. That would make the intent clearer, and if there's ever a v2, an old Relay instance would not try to expand it as a sample profile.

let mut items: Vec<Vec<u8>> = Vec::new();

for transaction in &profile.transactions {
let mut new_profile = profile.clone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of splitting each transaction into a separate profile? A doc comment would help on this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment, let me know if it's clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you 👍

let mut new_profile = profile.clone();

new_profile.transactions.clear();
new_profile.transactions.push(transaction.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to consume profile.transactions and prevent the clone() here.

Copy link
Contributor Author

@phacops phacops Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to do this. I need to be able to copy the profile but Profile doesn't implement Copy since I have a bunch of Vec and String in the struct. So I explicitely call clone. Then, this is what the compiler suggests:

error[E0382]: borrow of partially moved value: `profile`
   --> relay-profiling/src/sample.rs:205:31
    |
204 |     for transaction in profile.transactions {
    |                        -------------------- `profile.transactions` partially moved due to this implicit call to `.into_iter()`
205 |         let mut new_profile = profile.clone();
    |                               ^^^^^^^^^^^^^^^ value borrowed here after partial move
    |
note: this function takes ownership of the receiver `self`, which moves `profile.transactions`
    = note: partial move occurs because `profile.transactions` has type `Vec<transaction_metadata::TransactionMetadata>`, which does not implement the `Copy` trait
help: consider iterating over a slice of the `Vec<transaction_metadata::TransactionMetadata>`'s content to avoid moving into the `for` loop
    |
204 |     for transaction in &profile.transactions {

And from there, it also makes me clone the transaction. What would you do differently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed trickier than I thought, but what should work is

let mut profile = parse_profile(payload)?;

// [...]

// Replace profile.transactions with an empty vector
// (side benefit: makes calling .clear() unnecessary).
let transactions = std::mem::take(&mut profile.transactions);
for transaction in transactions {
    // [...]
    new_profile.profile.samples.retain_mut(|sample| {
        // [...]
    });

    new_profile.transactions.push(transaction);

return Err(ProfileError::NoTransactionAssociated);
}

profile.profile.frames.retain(|frame| frame.valid());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could frames be empty after this? If so, should an error be returned in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since frames are indexed (and stacks as well), I can't remove any of them, valid or invalid. I'll remove this.

@@ -30,6 +30,10 @@ impl TransactionMetadata {
&& self.relative_start_ns < self.relative_end_ns
&& self.relative_cpu_start_ms <= self.relative_cpu_end_ms
}

pub fn duration_ns(&self) -> u64 {
self.relative_end_ns - self.relative_start_ns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could panic if end < start, you might want to use saturating_sub.

let mut new_profile = profile.clone();

new_profile.transactions.clear();
new_profile.transactions.push(transaction.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed trickier than I thought, but what should work is

let mut profile = parse_profile(payload)?;

// [...]

// Replace profile.transactions with an empty vector
// (side benefit: makes calling .clear() unnecessary).
let transactions = std::mem::take(&mut profile.transactions);
for transaction in transactions {
    // [...]
    new_profile.profile.samples.retain_mut(|sample| {
        // [...]
    });

    new_profile.transactions.push(transaction);

let mut items: Vec<Vec<u8>> = Vec::new();

for transaction in &profile.transactions {
let mut new_profile = profile.clone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you 👍

deserialize_with = "deserialize_number_from_string",
skip_serializing_if = "is_zero"
)]
pub thread_id: u64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phacops could we call thread_id active_thread_id instead?

This way it's easier to figure out what it refers to.

@phacops phacops requested review from viglia and a team September 15, 2022 20:44
Copy link
Contributor

@viglia viglia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phacops I saw, that when you replaced is_active in thread_metadata with active_thread_id in transaction_metadata you've also removed is_main.

Is it intentional not to have that field anymore?

@phacops
Copy link
Contributor Author

phacops commented Sep 19, 2022

Yes, it's intentional to not have is_main.

The objective with is_main was to indicate which thread to be shown in the UI and since on mobile platforms, the main thread is the thread we want to focus on, we named it that way.

Once we live the mobile SDKs world, main thread and active thread become different. At the begining, adding a second metadata field to indicate the active thread seemed like a natural thing to do but it turns out we are not interested in the main thread, we're interested into the active thread. So is_main can disappear.

Copy link
Member

@Zylphrex Zylphrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small questions, but otherwise LGTM!

&& self.relative_cpu_start_ms <= self.relative_cpu_end_ms
&& self.relative_start_ns < self.relative_end_ns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does relative_start_ns use < but relative_cpu_start_ms use <=? Are 0 duration profiles allowed when using relative_cpu_start_ms (this is android only right?)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to avoid considering the profile invalid when we don't have a value (fields will be defaulted to 0). Ideally we'd do a check based on the platform but we don't have access to that. We might also have to align both to check for <= because we might have CPU time only anyway.

@@ -1036,6 +1036,7 @@ impl EnvelopeProcessorService {
match relay_profiling::expand_profile(&item.payload()[..]) {
Ok(payloads) => new_profiles.extend(payloads),
Err(err) => {
relay_log::debug!("invalid profile: {:#?}", err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we able to see these logs in prod somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's local only.

Comment on lines +16 to +23
#[serde(default, skip_serializing_if = "Option::is_none")]
instruction_addr: Option<Addr>,
#[serde(default, skip_serializing_if = "Option::is_none")]
name: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
line: Option<u32>,
#[serde(default, skip_serializing_if = "Option::is_none")]
file: Option<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all the attributes are optional, is it possible the SDK sends an empty dict as the frame? Should we invalidate this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible yes. Since we do frame indexing, I wouldn't be really able to remove the frame if it's invalid. We could reject the whole profile after checking if frames referenced in a sample are all valid or not though.

"stack_id": 0,
"thread_id": "1",
"queue_address": "0x0000000102adc700",
"relative_timestamp_ns": "10500500"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phacops since this is not really a timestamp, should we call it elapsed_ns instead to be more accurate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it elapsed_since_start_ns, makes it very clear that way. @viglia @armcknight

@phacops phacops enabled auto-merge (squash) September 27, 2022 19:21
@phacops phacops merged commit 4894c1d into master Sep 27, 2022
@phacops phacops deleted the pierre/profiling-sample-format branch September 27, 2022 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants