-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new metric And logger possiblity for peers error #18
Add new metric And logger possiblity for peers error #18
Conversation
groupcache.go
Outdated
logger.WithFields(logrus.Fields{ | ||
"err": err, | ||
"key": key, | ||
}).Error("groupcache: error retrieving key from peers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to include the peer such that log message looks like
error retrieving key from peer 'scout-n01'
Also, instead of prefix message with groupcache:
we add a category field. "category": "groupcache"
which makes it easy to search ELK stack for groupcache related errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding :
Would be nice to include the peer such that log message looks like
> error retrieving key from peer 'scout-n01'
Do you want :
- the peer's hostname that make the call to the remote peers
Or :
- the remote peer’s hostname that generate the error when being called
So,
If it is the peer's hostname that make the call to the remote peers
when setting the logger with SetLogger()
, the logger can be setup with a default field like this :
logger = logger.WithFields(logrus.Fields{
"peer": hostname,
})
groupcache.SetLogger(logger)
if it is the remote peer’s hostname that generate the error when being called
I don't see how to do it without modifying the code a lot ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ELK stacks should take care of identifying where the log entry is coming from. error retrieving key from peer 'scout-n01'
is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The peer
variable should have the hostname/ip of the remote peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peer is a variable of type ProtoGetter which is an interface
type ProtoGetter interface {
Get(context context.Context, in *pb.GetRequest, out *pb.GetResponse) error
Remove(context context.Context, in *pb.GetRequest) error
}
So we don't have access to any private field that might contains the return var, Am I wrong, or miss something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is httpGetter
, you can add a new method GetURL()
to the ProtoGetter
interface which will return the peer url. This interface is only used internal to groupcache for contacting peers. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok gotcha,
error now will looks like :
ERRO[0035] error retrieving key from peer 'http://MBP-tommy-42.local:5071' category=groupcache err="dial tcp 127.0.0.1:5071: connect: connection refused" key="blabla" peer=MBP-tommy-42.local
…error Introducing a new metric GetFromPeersSlowestDuration which will be recording the slowest call made to getFromPeers. Introducing SetLogger() to set a logger and allow getFromPeers error to be logged of.
peer var was being accessed by scope heritage instead of being passed as param. making it unsafe to data race.
c29dbbc
to
b543958
Compare
groupcache_test.go
Outdated
@@ -285,6 +289,10 @@ func (p fakePeers) GetAll() []ProtoGetter { | |||
return p | |||
} | |||
|
|||
func (p fakePeers) GetURL(key string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this now; I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm nop :D , fakePeers Implement the ProtoGetter
interface with the new GetURL()
method that need to be implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetURL(key string) string
is a different interface than what ProtoGetter
requires GetURL() string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, fakePeers is a collection of []ProtoGetter
not an implementation of the ProtoGetter
interface, as such it doesn't have to implement the ProtoGetter
interface, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, miss-understand here :D , I removed it from fakePeerS****
but added it on fakePeer
so we should be good now :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, thanks!
Introducing a new metric
GetFromPeersSlowestDuration
which will be recording the slowestcall made to getFromPeers.
This new metric requires to add a method
Store
to override the atomic value instead of just increasing its value.Along with this new metric, I Introduce
SetLogger()
to set a logger and allowgetFromPeers
errors to be logged ( previously fully ignored ).Also fixing a data race that could happen upon deletion of a key from the cache