Skip to content

TWCC based congestion control - v0#3165

Merged
boks1971 merged 274 commits into
masterfrom
raja_pacer_hacks1
Nov 11, 2024
Merged

TWCC based congestion control - v0#3165
boks1971 merged 274 commits into
masterfrom
raja_pacer_hacks1

Conversation

@boks1971
Copy link
Copy Markdown
Contributor

Making a PR to kick off this development. It is missing some bits (see notes below), but wanted to get this going so that we can make incremental/iterative commits. This is already quite large a change.

There are comments in code, but some guidance on reviewing if anyone needs it.

  • This uses a no queue pacer when using TWCC (pacer/no_queue.go)
  • All send packets are queued up by this module and processed in a goroutine, i. e. all downstream packets go through this. This could become a performance bottleneck concern. Have to keep a watch for it. But, with TWCC, would be good to have packets going out in sequence.
  • In the forward path, RecordPacketSendAndGetSequenceNumber in sendsidebwe/packet_tracker.go records the outgoing packet with the TWCC sequence number.
  • In the reverse path, HandleRTCP in sendsidebwe/congestion_detector.go queues up TWCC feedback reports and processes it in a goroutine (processFeedbackReport in that file ☝️ ).
  • sendsidebwe/packet_group.go accumulates packets acknowledged in the TWCC feedback report into groups.
  • sendsidebwe/congestion_detector.go has congestion detection state machine and bandwidth estimation.
  • There are callbacks into StreamAllocator which uses the congestion signal to adjust allocation.

TODO:

  • Need to work on probing. It is probing now, but it is not long enough.
  • Pure loss based congestion is not handled. If there is a middle box that uses dropping as a strategy to deal with congestion rather than building up queue, it is not handled. Need to add a way to deal with it.
  • Need to do some code movement (refactor/re-stucturing) to make it cleaner. It is a bit of a tangle now, but lower in priority.

boks1971 added 30 commits June 2, 2023 16:32
packetsLostFromRR += (1 << 32)
r.packetsLostFromRR = r.packetsLostFromRR&0xFFFF_FFFF_FF00_0000 + uint64(rr.TotalLost)
if ((rr.TotalLost-r.lastRR.TotalLost)&((1<<24)-1)) < (1<<23) && rr.TotalLost < r.lastRR.TotalLost {
r.packetsLostFromRR += (1 << 24)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by clean up. There is a 24-bit field in TWCC report also. Was handling that and remembered this. So, doing the same handling here.

@boks1971 boks1971 requested a review from a team November 10, 2024 15:25
Copy link
Copy Markdown
Contributor

@paulwe paulwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only really have superficial feedback but lgtm 👍

onCongestionStateChange func(congestionState CongestionState, estimatedAvailableChannelCapacity int64)
}

func NewCongestionDetector(params congestionDetectorParams) *congestionDetector {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can't be called by other modules because the params are private. should the constructor also be private? this applies to a bunch of the new constructors

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah true. Will change those. I am always confused by those and I usually end up changing it a few times developing a module :-) . I was thinking if all methods should be private as well, but going to leave those as public, but just change the contructors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think using "public" members on private types can make things more readable it was just he constructors that seemed confusing

Comment on lines +201 to +202
c.lock.Lock()
defer c.lock.Unlock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might only need to guard PushBack?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have been bitten by "posting to a closed channel" panic enough times :-) . So, tend to lock that as well. But, that is not doing what it is intended here. Will add a check for stop fuse also.

Will select not attempt to post if channel is closed? Not able to find good behaviour documentation there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge this for now. Will change it is posting to channel can be done outside the lock. I think there are other places that can change as well if the posting can happen outside.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might make more sense to use c.stop.Break() to stop the worker then you don't have to worry about writing to a closed channel. you would never want to close wake anyway because as long as c.stop isn't broken the worker will infinitely loop selecting the closed wake channel

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will make a small PR to just address that.

type packetTracker struct {
params packetTrackerParams

lock sync.Mutex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like access to this type is always externally synchronized?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no. I initially relied on the wrapping lock (from congestion_detector), but in the forward path, when sequence numbers are created, congestion_detector is not in the path.

Also, in the reverse path, did not want to rely on the wrapping lock as it would mean holding the lock while the report is processed (it could be fast, but it would still affect the forward path).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i missed that it's part of the public interface of congestion detector...

@boks1971 boks1971 merged commit a3f2ca5 into master Nov 11, 2024
@boks1971 boks1971 deleted the raja_pacer_hacks1 branch November 11, 2024 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants