Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Change format to nested set for the dataframe #215

Merged
merged 7 commits into from Sep 13, 2022

Conversation

aocenas
Copy link
Member

@aocenas aocenas commented Sep 8, 2022

This PR changes the format in which we pack the data into dataFrame and which flamegraph expects it. This is used mainly with the goal of having at leas somehow reasonable and stable dataFrame format for the flamegraph before we move it to Grafana core so that we don't have to do super drastic changes later on.

Before we used a flamebearer format where we had each level encoded as an array of 4 number tuples ([start, value, self, label, start, value, self, label, ....]). This format is easy to render from and is efficient space wise but because the data frame needs for each column the same number of values and each level itself had a variable length we ended up encoding each level in a json string which wasn't very efficient and also would make it harder to create an appropriate query from different datasources.

This changes the format to a nested set format which encodes each item as [level, value, label]. By having the items sorted by depth-first iteration order of the tree we can recreate the parent/child hierarchy just from the level field and order. This means we can pack it into data frame easily as each field has the same number of values and dataFrame.length == number of items/bars in the profile.

At this point, this isn't super efficient still as we get flamebearer format from the API we need to convert to a tree, then to nested set data frame, and then back to flamebearer on the front end but this is just to keep this PR smaller and not change rendering or the Fire API in one go.

@@ -0,0 +1,181 @@
// This component is based on logic from the flamebearer project
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved not sure why github does not recognize that.

return [];
}
const dataView = new DataFrameView<Item>(data);
return nestedSetToLevels(dataView);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting the data frame to levels arrays so we don't have to change the rending much at this moment.

label: string;
};

export function getRectDimensionsForLevel(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly moved and split the rendering into functions for better testability. Otherwise, the logic itself should be the same as it still renders from levels arrays, just the items are now objects instead of just numbers.

@cyriltovena
Copy link
Collaborator

If you need a tree instead we can may be change the api btw.

Tree usually are harder on the frontend and less compressible

const ITEM_OFFSET = 4

type ProfileTree struct {
Start int64
Copy link

@leeoniya leeoniya Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is here to reconstruct the flamebearer struct (for now), but will eventually get dropped once we swap out renderers?

Copy link
Member Author

@aocenas aocenas Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes did not think about direct transform from flamebearer to nested set format so needed the intermediate tree structure but hope to get rid of it eventually. Removing this needs change in the Fire API rather than change on the frontend though.

@leeoniya
Copy link

leeoniya commented Sep 8, 2022

good [intermediate] step, i think.

likely all the levels structs code will evaporate once we go straight from tree -> nested set -> render, instead of levels -> tree -> nested set -> levels -> render.

@aocenas
Copy link
Member Author

aocenas commented Sep 9, 2022

@cyriltovena yeah we should probably take a look at the API response whether it make sense to change it.

The main problem with the flamebearer is that we have to do JSON marshal and unmarshal of each level once on backend once on fronted because otherwise we cannot put it in the dataframe. The nested set format fits nicely while still only needing [level, value, label] per item, and the render still needs just a single pass over all the items in the dataframe so that should not be worse either.

Main point of doing this now though isn't to get any performance gains but to get to some reasonable dataFrame format so if we move it to core and somebody starts to play with it the format is reasonably close to what we want in the end and they can give feedback on it.

@cyriltovena
Copy link
Collaborator

No problem I'm open to revisit the API, I think we will anyway at some point.

@aocenas aocenas marked this pull request as ready for review September 9, 2022 14:14
@leeoniya
Copy link

leeoniya commented Sep 9, 2022

to get to some reasonable dataFrame format so if we move it to core and somebody starts to play with it the format is reasonably close to what we want in the end and they can give feedback on it.

and also to work out a generic way of passing tree structures through dataframes, which will be useful beyond just the fire/profiling case.

@aocenas aocenas mentioned this pull request Sep 12, 2022
8 tasks
} from '../../constants';
import { ItemWithStart } from './dataTransform';

type RectData = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been calling them bars and I've heard blocks mentioned a few times too. Now we're going to call them Rect? We should stick to one name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rect is mainly to differentiate them as a rectangle which has an x/y position because it's called like that when you render. Before when it's just a profiling data I tried to keep with the bar naming

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so bar is what you get from the API, rect is a rectangle data for rendering

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but good question about the naming. @cyriltovena is there a common name for a single bar in the flamegraph?

@aocenas aocenas merged commit ed195eb into main Sep 13, 2022
@aocenas aocenas deleted the aocenas/change-dataframe-format branch September 13, 2022 08:38
simonswine pushed a commit to simonswine/pyroscope that referenced this pull request Jun 30, 2023
Change format to nested set for the dataframe
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants