Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mvcc : WatchResponse chan of watchStream use a lot of memory space #11906

Closed
cfc4n opened this issue May 17, 2020 · 3 comments · Fixed by #11987
Closed

mvcc : WatchResponse chan of watchStream use a lot of memory space #11906

cfc4n opened this issue May 17, 2020 · 3 comments · Fixed by #11987

Comments

@cfc4n
Copy link
Contributor

cfc4n commented May 17, 2020

version

etcd Version: 3.3.17
Git SHA: 6d80523
Go Version: go1.12.9
Go OS/Arch: linux/amd64

issue

file : mvcc\watchable_store.go line 102
NewWatchStream()function malloc chanBufLen length chan WatchResponse, it's a very big number, and use lot of memory. there are some pictures to illustrate the problem.
NewWatchStream-
NewWatchStream-chan-len

In watchable_store.go, some annotate like this:

// chanBufLen is the length of the buffered chan
// for sending out watched events.
// TODO: find a good buf value. 1024 is just a random one that
// seems to be reasonable.
chanBufLen = 1024

I read the code about this chan ,and found that its value does not need to be large. Each endpoint has 2-4 length is enough for buffer use.

Am I right?

@cfc4n
Copy link
Contributor Author

cfc4n commented May 27, 2020

About ChanBufLen Parameter:
The chanBufLen parameter is the size of the WatchResponse chan of the watcher. When each change occurs, get the watchers corresponding to this key, foreach all watcher in watchers, and write WatchResponse to WatchResponse chan. The type of watchers is watcherSet, which is map[*watcher]struct{}, the length is uncertain.

The consumption of WatchResponse chan is in the sendLoop method of serverWatchStream, where version comparison, Event encapsulation, and GRPC be called.

Once the rate of consumption is lower than the rate of production, a change event is lost. In the real world, most cases are written faster than they are consumed.

Currently chanLen is set to 1024, and if the watcher stream of a key exceeds 1024, the change event will be lost.

Conclusion:

  1. When the number of watcher stream of a key is greater than 1024, some watcher clients may fail to receive the change event.
  2. WatchResponse chan is 1024. In most cases, the number of watcher streams of a key is not more than 100, which will just preapply for a large amount of memory space.
  3. The type of WatchResponse with chan seems not to be a particularly good choice... In other words, make sure that the consumption speed is higher than the production speed. Or chan reader in advance, write it into a slice, and then send it to the client.

This is a difficult choice.

/cc @gyuho @xiang90

xiang90 pushed a commit that referenced this issue Jun 10, 2020
128 seems to be enough.
Sometimes the consumption speed is more than the production speed.

See #11906 for more detail.
@cfc4n
Copy link
Contributor Author

cfc4n commented Jun 12, 2020

I have a think about WatchResponse chanLen.
When the channel was full, there need print logs for it after line 545.
Do we need it?

func (w *watcher) send(wr WatchResponse) bool {
progressEvent := len(wr.Events) == 0
if len(w.fcs) != 0 {
ne := make([]mvccpb.Event, 0, len(wr.Events))
for i := range wr.Events {
filtered := false
for _, filter := range w.fcs {
if filter(wr.Events[i]) {
filtered = true
break
}
}
if !filtered {
ne = append(ne, wr.Events[i])
}
}
wr.Events = ne
}
// if all events are filtered out, we should send nothing.
if !progressEvent && len(wr.Events) == 0 {
return true
}
select {
case w.ch <- wr:
return true
default:
return false
}
}

/cc @xiang90 @gyuho @spzala @jingyih

@gyuho
Copy link
Contributor

gyuho commented Jul 6, 2020

@cfc4n Can we highlight this in CHANGELOG?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants