Skip to content

Move timestamp monotonicity check to DataStreamBuffer::Head() to accommodate #1732#1761

Merged
vihangm merged 3 commits intopixie-io:mainfrom
benkilimnik:flaky-montonicity
Nov 8, 2023
Merged

Move timestamp monotonicity check to DataStreamBuffer::Head() to accommodate #1732#1761
vihangm merged 3 commits intopixie-io:mainfrom
benkilimnik:flaky-montonicity

Conversation

@benkilimnik
Copy link
Copy Markdown
Member

Summary: Preemptively adapts the timestamp monotonicity change introduced in #1733 to the last stitcher api PR #1732, which modifies ParseResult.frame_positions to be an unordered flat_hash_map. This changes the order in which GetTimestamp is called because we are now iterating over an unordered map of streamIDs to positions when matching timestamps with the parsed frames in the event_parser.

Previously, we were always iterating over the frame_position with the oldest timestamp first, meaning that prev_timestamp_ in the datastream buffer was set correctly. With frame_positions being an unordered map, we no longer have this guarantee.

To address this, we move the monotonicity check to the Head() implementation of the datastream buffer and enforce increasing timestamps for the contiguous chunk returned by Head() only.

Type of change: /kind bug

Test Plan: Extended the data stream buffer test + existing targets.

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@benkilimnik benkilimnik changed the title Move timestamp monotonicity check to DataStreamBuffer::Head() to accommodate in #1733 Move timestamp monotonicity check to DataStreamBuffer::Head() to accommodate #1732 Nov 3, 2023
@benkilimnik benkilimnik requested review from ddelnano and etep November 3, 2023 19:44
@benkilimnik benkilimnik marked this pull request as ready for review November 3, 2023 19:44
@@ -152,6 +152,15 @@ TEST_P(DataStreamBufferTest, Timestamp) {
EXPECT_EQ(stream_buffer.Head(), "123456789");
EXPECT_OK_AND_EQ(stream_buffer.GetTimestamp(8),
5); // timestamp is adjusted to previous timestamp + 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the test code we added last time relevant still? Do we need 3 non monotonic Add calls or are the ones you added in this PR enough?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I believe the ones added in this PR are sufficient - I've just updated them.

These tests check that prev_timestamp_ is updated correctly after each call to Head() under these conditions:

  1. The last timestamp was the same
  2. The last timestamp was larger (non-monotonically increasing)

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
}

it = events_.begin();
// end_it stopped at the first non-contiguous event (at the end of current head)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we guaranteed to call MergeContiguousEventsIntoHead for the situations we care about? I noticed that Head only calls this function if IsHeadAndEventsMergeable is true (source).

Copy link
Copy Markdown
Member Author

@benkilimnik benkilimnik Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsHeadAndEventsMergeable is true whenever we have events that line up with the end of the previous head_ such that we can merge at least one event from events_.

Since we only check for monotonicity when merging events into head_, any existing head_ will have monotonic timestamps (stored in head_pos_to_ts_).

Does that answer your question?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, just wanted to double check my understanding of that merge logic 👍

@benkilimnik benkilimnik requested a review from a team November 7, 2023 05:45
@vihangm vihangm merged commit 4dce436 into pixie-io:main Nov 8, 2023
JamesMBartlett pushed a commit that referenced this pull request Nov 9, 2023
)

Summary: Populates a map of streamIDs to deque of frames in
`ParseFramesLoop` instead of `ParseFrames`. This should provide a small
efficiency boost, as we won't have to loop over the frames twice. This
PR relies on #1761 due to the way timestamps are updated using
`ParseResult`.

Related issues: #1375

Type of change: /kind cleanup

Test Plan: Updated parsing tests to use new interface

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
ddelnano pushed a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…commodate pixie-io#1732 (pixie-io#1761)

Summary: Preemptively adapts the timestamp monotonicity change
introduced in pixie-io#1733 to the last stitcher api PR pixie-io#1732, which modifies
`ParseResult.frame_positions` to be an unordered `flat_hash_map`. This
changes the order in which `GetTimestamp` is called because we are now
iterating over an unordered map of streamIDs to positions when matching
timestamps with the parsed frames in the
[event_parser](https://github.com/pixie-io/pixie/blob/e6bfab707f1f4871f4b7b8ed53321ec9e7b5807d/src/stirling/source_connectors/socket_tracer/protocols/common/event_parser.h#L138C29-L138C36).

Previously, we were always iterating over the `frame_position` with the
oldest timestamp first, meaning that `prev_timestamp_` in the datastream
buffer was set correctly. With `frame_positions` being an unordered map,
we no longer have this guarantee.

To address this, we move the monotonicity check to the `Head()`
implementation of the datastream buffer and enforce increasing
timestamps for the contiguous chunk returned by `Head()` only.

Type of change: /kind bug

Test Plan: Extended the data stream buffer test + existing targets.

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
ddelnano pushed a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…xie-io#1732)

Summary: Populates a map of streamIDs to deque of frames in
`ParseFramesLoop` instead of `ParseFrames`. This should provide a small
efficiency boost, as we won't have to loop over the frames twice. This
PR relies on pixie-io#1761 due to the way timestamps are updated using
`ParseResult`.

Related issues: pixie-io#1375

Type of change: /kind cleanup

Test Plan: Updated parsing tests to use new interface

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
ddelnano pushed a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…commodate pixie-io#1732 (pixie-io#1761)

Summary: Preemptively adapts the timestamp monotonicity change
introduced in pixie-io#1733 to the last stitcher api PR pixie-io#1732, which modifies
`ParseResult.frame_positions` to be an unordered `flat_hash_map`. This
changes the order in which `GetTimestamp` is called because we are now
iterating over an unordered map of streamIDs to positions when matching
timestamps with the parsed frames in the
[event_parser](https://github.com/pixie-io/pixie/blob/e6bfab707f1f4871f4b7b8ed53321ec9e7b5807d/src/stirling/source_connectors/socket_tracer/protocols/common/event_parser.h#L138C29-L138C36).

Previously, we were always iterating over the `frame_position` with the
oldest timestamp first, meaning that `prev_timestamp_` in the datastream
buffer was set correctly. With `frame_positions` being an unordered map,
we no longer have this guarantee.

To address this, we move the monotonicity check to the `Head()`
implementation of the datastream buffer and enforce increasing
timestamps for the contiguous chunk returned by `Head()` only.

Type of change: /kind bug

Test Plan: Extended the data stream buffer test + existing targets.

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
ddelnano pushed a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
…xie-io#1732)

Summary: Populates a map of streamIDs to deque of frames in
`ParseFramesLoop` instead of `ParseFrames`. This should provide a small
efficiency boost, as we won't have to loop over the frames twice. This
PR relies on pixie-io#1761 due to the way timestamps are updated using
`ParseResult`.

Related issues: pixie-io#1375

Type of change: /kind cleanup

Test Plan: Updated parsing tests to use new interface

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants