Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new metric And logger possiblity for peers error #18

Merged
merged 8 commits into from
Jul 9, 2020

Conversation

Tommy-42
Copy link
Contributor

@Tommy-42 Tommy-42 commented Jul 2, 2020

Introducing a new metric GetFromPeersSlowestDuration which will be recording the slowest
call made to getFromPeers.

This new metric requires to add a method Store to override the atomic value instead of just increasing its value.

Along with this new metric, I Introduce SetLogger() to set a logger and allow getFromPeers errors to be logged ( previously fully ignored ).

Also fixing a data race that could happen upon deletion of a key from the cache

groupcache.go Outdated
logger.WithFields(logrus.Fields{
"err": err,
"key": key,
}).Error("groupcache: error retrieving key from peers")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to include the peer such that log message looks like

error retrieving key from peer 'scout-n01'

Also, instead of prefix message with groupcache: we add a category field. "category": "groupcache" which makes it easy to search ELK stack for groupcache related errors.

Copy link
Contributor Author

@Tommy-42 Tommy-42 Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding :

Would be nice to include the peer such that log message looks like
  > error retrieving key from peer 'scout-n01'

Do you want :

  • the peer's hostname that make the call to the remote peers

Or :

  • the remote peer’s hostname that generate the error when being called

So,
If it is the peer's hostname that make the call to the remote peers
when setting the logger with SetLogger(), the logger can be setup with a default field like this :

logger = logger.WithFields(logrus.Fields{
	"peer": hostname,
})
groupcache.SetLogger(logger)

if it is the remote peer’s hostname that generate the error when being called
I don't see how to do it without modifying the code a lot ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ELK stacks should take care of identifying where the log entry is coming from. error retrieving key from peer 'scout-n01' is sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The peer variable should have the hostname/ip of the remote peer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peer is a variable of type ProtoGetter which is an interface

type ProtoGetter interface {
	Get(context context.Context, in *pb.GetRequest, out *pb.GetResponse) error
	Remove(context context.Context, in *pb.GetRequest) error
}

So we don't have access to any private field that might contains the return var, Am I wrong, or miss something ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is httpGetter, you can add a new method GetURL() to the ProtoGetter interface which will return the peer url. This interface is only used internal to groupcache for contacting peers. Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok gotcha,

error now will looks like :

ERRO[0035] error retrieving key from peer 'http://MBP-tommy-42.local:5071'  category=groupcache err="dial tcp 127.0.0.1:5071: connect: connection refused" key="blabla" peer=MBP-tommy-42.local

Tommy PAGEARD added 3 commits July 6, 2020 18:55
…error

Introducing a new metric GetFromPeersSlowestDuration which will be recording the slowest
call made to getFromPeers.

Introducing SetLogger() to set a logger and allow getFromPeers error to be logged of.
peer var was being accessed by scope heritage instead of being passed as param.
making it unsafe to data race.
@Tommy-42 Tommy-42 force-pushed the tommy-42/add_metrics_and_logger branch from c29dbbc to b543958 Compare July 6, 2020 16:57
groupcache.go Outdated Show resolved Hide resolved
@@ -285,6 +289,10 @@ func (p fakePeers) GetAll() []ProtoGetter {
return p
}

func (p fakePeers) GetURL(key string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this now; I think?

Copy link
Contributor Author

@Tommy-42 Tommy-42 Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm nop :D , fakePeers Implement the ProtoGetter interface with the new GetURL() method that need to be implemented

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetURL(key string) string is a different interface than what ProtoGetter requires GetURL() string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, fakePeers is a collection of []ProtoGetter not an implementation of the ProtoGetter interface, as such it doesn't have to implement the ProtoGetter interface, right?

Copy link
Contributor Author

@Tommy-42 Tommy-42 Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, miss-understand here :D , I removed it from fakePeerS**** but added it on fakePeer so we should be good now :D

Copy link
Contributor

@thrawn01 thrawn01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thanks!

@thrawn01 thrawn01 merged commit 829b71f into mailgun:master Jul 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants