Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate aborted validation of Authenticate() to clientv3 side #10408

Closed
mitake opened this issue Jan 15, 2019 · 14 comments
Closed

Migrate aborted validation of Authenticate() to clientv3 side #10408

mitake opened this issue Jan 15, 2019 · 14 comments
Assignees
Labels

Comments

@mitake
Copy link
Contributor

mitake commented Jan 15, 2019

In #10218, the server side handling of aborted validation for auth store is deleted. We need to migrate it to clientv3 side.

@jingyih
Copy link
Contributor

jingyih commented Feb 5, 2019

@mitake Does this part of the code have the same logic? If the revision is already behind, does retry help?

func (s *EtcdServer) raftRequest(ctx context.Context, r pb.InternalRaftRequest) (proto.Message, error) {
for {
resp, err := s.raftRequestOnce(ctx, r)
if err != auth.ErrAuthOldRevision {
return resp, err
}
}
}

@mitake
Copy link
Contributor Author

mitake commented Feb 11, 2019

@jingyih yeah the motivation of the code is shared with the older version of doSerialize(). If the revision is behind, clients should call Authenticate() request again. So it isn't working. Like read only RPCs, this loop should also be moved to the client. Thanks for pointing out!

@xqzhang2015
Copy link

xqzhang2015 commented Oct 25, 2019

Hi @jingyih @mitake @horkhe how are you?

I'm using etcd v3.4.0 and still have this infinite loop issue. According to my troubleshooting, it's truly from raftRequest().

// in etcdserver/v3_server.go
539 func (s *EtcdServer) raftRequest(ctx context.Context, r pb.InternalRaftRequest) (proto.Message, error) {
540 |   i := 0
541 |   for {
542 |   |   i++
543 |   |   resp, err := s.raftRequestOnce(ctx, r)
544 |   |   plog.Errorf("...... raftRequest: %v th, err: %v", i, err)
545 |   |   if err != auth.ErrAuthOldRevision {
546 |   |   |   return resp, err
547 |   |   }
548 |   }
549 }

Here is the log output:

11:35:12 etcd1 | 2019-10-25 11:35:12.712630 E | etcdserver: ...... raftRequest: 1675 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.713339 E | etcdserver: ...... raftRequest: 1676 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.713993 E | etcdserver: ...... raftRequest: 1677 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.714770 E | etcdserver: ...... raftRequest: 1678 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.715464 E | etcdserver: ...... raftRequest: 1679 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.716177 E | etcdserver: ...... raftRequest: 1680 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.716832 E | etcdserver: ...... raftRequest: 1681 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.717561 E | etcdserver: ...... raftRequest: 1682 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.718271 E | etcdserver: ...... raftRequest: 1683 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.718933 E | etcdserver: ...... raftRequest: 1684 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.719672 E | etcdserver: ...... raftRequest: 1685 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.720348 E | etcdserver: ...... raftRequest: 1686 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.720991 E | etcdserver: ...... raftRequest: 1687 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.721675 E | etcdserver: ...... raftRequest: 1688 th, err: auth: revision in header is old
11:35:12 etcd1 | 2019-10-25 11:35:12.722433 E | etcdserver: ...... raftRequest: 1689 th, err: auth: revision in header is old

Reproduce:

  1. create user/role and open auth
  2. client auth and hold the token
  3. create or delete new user, grant to root role
  4. client call put() using the previous token, which is old revision now
  5. cluster CPU is 100%

My questions:

  1. Does anyone else still have this infinite loop issue for an old-revision token?
  2. If not, how did you solve it? It yes, why is it not fixed together?
  3. Will it be fixed recently? If not, is using-simple-token a better choice for now?

Thanks.

@horkhe
Copy link
Contributor

horkhe commented Oct 25, 2019

@xqzhang2015 as you can see this issue is still open. We solved it by running a patched version of etcd :-)

@xqzhang2015
Copy link

@horkhe thanks for your comment. Just for double confirmation, is your patch the same as the PR #10468 ?

@jingyih @mitake I got your idea about leave the PR #10468 open, but I don't think it's necessary to completely align with the golang client here. Some points:

  1. Without this PR, any auth store update will lead to 100% CPU(infinite loop) issue. So, JWT auth is not ready for PROD usage for now.
  2. The existing loop code couldn't provide any help to re-auth. (Maybe my understanding is not fully exact. If not, feel free to update me.)
  3. Not everyone uses the golang client. We are also using a C++ client, besides golang.

Thanks.

@xqzhang2015
Copy link

/cc @xiang90 any comment?

@jingyih
Copy link
Contributor

jingyih commented Oct 26, 2019

@xqzhang2015 Thanks! I think it makes sense to remove the loop on server side. I will update #10468.

@jingyih
Copy link
Contributor

jingyih commented Oct 26, 2019

@horkhe I believe the patch you mentioned is #10218. It fixes the issue for range request, but not other request types.

@horkhe
Copy link
Contributor

horkhe commented Oct 28, 2019

The patch is only part of the solution. You still need to fix the client that does not invalidate a session when auth.ErrAuthOldRevision is returned.

@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 6, 2020
@mitake
Copy link
Contributor Author

mitake commented Apr 7, 2020

Sorry for keeping this for a long time... I have a plan for this so let me remove the stale label.

@stale stale bot removed the stale label Apr 7, 2020
@mitake mitake added stale and removed stale labels Apr 7, 2020
@stale
Copy link

stale bot commented Jul 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Oct 10, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 10, 2020
@mitake mitake removed the stale label Oct 22, 2020
@stale
Copy link

stale bot commented Jan 20, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 20, 2021
@stale stale bot closed this as completed Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants