Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix responses for the token APIs #54532

Merged
merged 15 commits into from
Apr 16, 2020
Merged

Conversation

jkakavas
Copy link
Member

@jkakavas jkakavas commented Mar 31, 2020

This commit fixes our behavior for the use of token related APIs.
More concretely:

  • In the Get Token API with the refresh grant, when an invalid
    (already deleted, malformed, unknown) refresh token is used in the
    body of the request, we respond with 400 HTTP status code
    and an error_description header with the message "could not
    refresh the requested token".
    Previously we would return erroneously return a 401 with "token
    malformed" message.

  • In the Invalidate Token API, when using an invalid (already
    deleted, malformed, unknown) access or refresh token, we respond
    with 404 and a body that shows that no tokens were invalidated:

    {
      "invalidated_tokens":0,
      "previously_invalidated_tokens":0,
       "error_count":0
    }
    

    The previous behavior would be to erroneously return
    a 400 or 401 ( depending on the case ).

  • In the Invalidate Token API, when the tokens index doesn't
    exist or is closed, we return 400 because we assume this is
    a user issue either because they tried to invalidate a token
    when there is no tokens index yet ( i.e. no tokens have
    been created yet or the tokens index has been deleted ) or the
    index is closed.

  • In the Invalidate Token API, when the tokens index is
    unavailable, we return a 503 status code because
    we want to signal to the caller of the API that the token they
    tried to invalidate was not invalidated and we can't be sure
    if it is still valid or not, and that they should try the request
    again.

Resolves: #53323

This commit fixes our behavior for the use of token related APIs.
More concretely:

- In the Get Token API with the `refresh` grant, when an invalid
(already deleted, malformed, unknown) refresh token is used in the
body of the request, we respond with 400 and an error_description
header with the message "could not refresh the requested token" as
opposed to sometimes doing that and sometimes returning `401` and
a message "token malformed"

- In the Invalidate Token API, when using an invalid (already
deleted, malformed, unknown) access or refresh token, we respond
with 200 and a body that shows that no tokens were invalidated:
```
{
  "invalidated_tokens":0,
  "previously_invalidated_tokens":0,
  "error_count":0
}
```
as opposed to the current behavior which was to throw an error
with 400 or 401 ( depending on the case )
@jkakavas jkakavas added >bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v8.0.0 v7.7.0 labels Mar 31, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (:Security/Authentication)

@@ -48,7 +48,7 @@ POST /_security/oidc/logout
"refresh_token": "vLBPvmAB6KvwvJZr27cS"
}
--------------------------------------------------
// TEST[catch:unauthorized]
// TEST[catch:request]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (expected for this test) 500 was masked by the 401 that was thrown in

invalidateRefreshToken(request.getRefreshToken(), ActionListener.wrap(ignore -> {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because the response changes to 200 and error now happens in the OIDC part as 500?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is necessary because we don't do the login flow so we don't have tokens to invalidate for the logout. We expect that this call would fail for that reason - but it failing with a 401 was a mistake

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please help me understand the motivation of returning 200 for more "invalid" token invalidation requests. It feels like a form of leniency to me. For the same "not found" token, what is the benefit of returning a 200 for "invalidation requeset" compared to 400 for "refresh request"? Do we have a specific use case for it? Why not make them consistent?

Hiding internal errors could be misleading at times. I'd prefer more accurate and granular error responses, i.e. error code and description that precisely describe the underlying issue (unless forbidden by security policy, e.g. password enumeration). Elasticsearch in general has pretty elaborated error messages. We even have error_trace parameter to show the entire stacktrace. So my understanding is that we are OK to tell users the actual issues. The current tokenService code sometimes hides the actual issue by a generic 401 malformed error. This can be improved. But I am not sure whether the answer is to replace all of them with 200.

There are many things that can go wrong with a request. To return 200 for error situations, we need carefully define what error types qualify it, which I think is tricky. There are gray areas. For an example, this variant of invalidateAccessToken returns 400 for null user and this is different from the new 200 logic. Another example is that I could even argue that an "empty string" should be treated the same as malformated ones and also gets a 200 response.

In summary, I am not convinced that more 200 responses are what we want. But I am open to discussions.

@@ -584,7 +585,8 @@ public void invalidateAccessToken(String accessToken, ActionListener<TokensInval
final Iterator<TimeValue> backoff = DEFAULT_BACKOFF.iterator();
decodeToken(accessToken, ActionListener.wrap(userToken -> {
if (userToken == null) {
listener.onFailure(traceLog("invalidate token", accessToken, malformedTokenException()));
logger.trace("The access token [{}] is expired and already deleted", accessToken);
listener.onResponse(TokensInvalidationResult.emptyResult());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, a 200 response will be returned if the token index does not exist. This behaviour is different from the refresh token invalidation, which returns 400 invalidGrant when the index does not exist.

@@ -48,7 +48,7 @@ POST /_security/oidc/logout
"refresh_token": "vLBPvmAB6KvwvJZr27cS"
}
--------------------------------------------------
// TEST[catch:unauthorized]
// TEST[catch:request]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because the response changes to 200 and error now happens in the OIDC part as 500?

@@ -872,7 +891,7 @@ private void findTokenFromRefreshToken(String refreshToken, Iterator<TimeValue>
}
} catch (IOException e) {
logger.debug(() -> new ParameterizedMessage("Could not decode refresh token [{}].", refreshToken), e);
listener.onFailure(malformedTokenException());
listener.onResponse(SearchHits.empty());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All usage of malformedTokenException seems to be deleted so itself can be removed as well.

@jkakavas
Copy link
Member Author

jkakavas commented Apr 2, 2020

Could you please help me understand the motivation of returning 200 for more "invalid" token invalidation requests. It feels like a form of leniency to me.

Sure thing. This is not about leniency.

So, in principal when we get a request to invalidate one/more tokens and we can't find the token in our token index:

  • Was the token created by us but is already expired and we deleted it (as a cleanup) from the tokens index ?
  • Was the token never created by us ? In this case
    • Was the the token a random string ?
    • Was the token for another cluster ?
    • Was the token for another application ?
  • Was the token truncated by mistake by the user or a bit flipped in transit ?
  • Does the token not even exist yet ?

Our current behavior is broken as we would reply with:

  • for refresh token: could not refresh the requested token
  • for access token: token malformed

and I'm trying to figure out how to best solve it. I took the stance that we shouldn't care about why the token we received can't be invalidated since it's already invalid. I'm open to discuss this of course. Is your problem with the response code specifically ? I selected that because the request "succeeds in its desired outcome" and I felt the response body was an indication enough. Would you think a 400 and the same request body would be more appropriate?

For the same "not found" token, what is the benefit of returning a 200 for "invalidation request" compared to 400 for "refresh request"? Do we have a specific use case for it? Why not make them consistent?

Because it's not the same thing IMO. When the request is to refresh a token and the token is malformed, then we can't refresh the token and we cannot satisfy the user request so we return a 400 saying it's the callers fault because the token they sent was not valid.

When the request is to invalidate the token the caller sends us a refresh token that they think is valid and their purpose is to make this token "not valid". If we determine that the token is not valid either way, the user request is satisfied (because the token is invalid) and we indicate in the response body that we did not invalidate the token now with

{
  "invalidated_tokens":0, 
  "previously_invalidated_tokens":0, 
  "error_count":0
}

The more I think about it though, the more a 4xx seems better suited

Hiding internal errors could be misleading at times. I'd prefer more accurate and granular error responses, i.e. error code and description that precisely describe the underlying issue (unless forbidden by security policy, e.g. password enumeration). Elasticsearch in general has pretty elaborated error messages. We even have error_trace parameter to show the entire stacktrace. So my understanding is that we are OK to tell users the actual issues. The current tokenService code sometimes hides the actual issue by a generic 401 malformed error. This can be improved.

I don't know if I agree.. Why would someone attempting to authenticate with an access token care if this failed because we couldn't base64 decode a string or because the version number of the encoded token is not what we expected or because we couldn't perform a search in an internal index. These are internal implementation details, are prone to change and offer no value to the caller. The caller should treat the access token as an opaque string with a binary status : valid or invalid. The information on why we failed might be interesting for an administrator and this is why we log it.

This can be improved. But I am not sure whether the answer is to replace all of them with 200.

TBC, this is not what this PR attempts to do. It's not a blanket change to always return 200 everywhere, it affects the calls to the invalidate API.

To return 200 for error situations, we need carefully define what error types qualify it, which I think is tricky.

Agreed :/

There are gray areas. For an example, this variant of invalidateAccessToken returns 400 for null user and this is different from the new 200 logic.

I didn't change that on purpose because it is called from TransportSamlInvalidateSessionAction with a token that we just found in our security index ourselves and not with a token that a user sent us via the API

Another example is that I could even argue that an "empty string" should be treated the same as malformated ones and also gets a 200 response.

This breaks the contract of our Invalidate Token API though ( that you must provide a non empty string ) and thus we fail on validation on the Request level.

Maybe we could chat about this tomorrow too and save us some back 'n forth, I'll hit you up in my morning !

@jkakavas
Copy link
Member Author

jkakavas commented Apr 2, 2020

I had a discussion with @albertzaharovits now about this too and we came to an agreement on :

  • We should return 4xx instead of 200 when the invalidation request doesn't cause the token to be invalidated explicitly by us in that time
  • We should return 400 when the index or the shard is not available so that the caller knows that the token might not be invalidated and that they should re-run the request to invalidate it. The response should be an empty response
    {
    "invalidated_tokens":0, 
    "previously_invalidated_tokens":0, 
    "error_count":0
    }
    
  • We should return 404 when the token cannot be invalidated because it's malformed, or because we can't find it in the tokens index. The response should be an empty response
    {
    "invalidated_tokens":0, 
    "previously_invalidated_tokens":0, 
    "error_count":0
    }
    

I'll adust the code, @ywangd we can still discuss this in your afternoon !

@azasypkin
Copy link
Member

I had a discussion with @albertzaharovits now about this too and we came to an agreement on :

That sounds like it will be a breaking change for Kibana though, unless I'm missing something. We never expected invalidate to fail unless something went terribly wrong and the situation isn't recoverable. We invalidate tokens in a couple of different scenarios and having 4** code may break a couple of user flows until Kibana is patched we handle them differently:

  • When user has a session created by SAML/Kerberos/PKI/Token provider and tries to logout - in this case we try to invalidate tokens from the cookie and redirect user to a special logged_out page. If invalidate returns non-200 - we'll return error page instead.

  • When user has a session created by a PKI provider and changes client certificate for some reason - in this case we're invalidating old token and trying to create a new one based on the new certificate. If invalidate returns non-200 - we'll return error page instead.

  • When user has a session created by a SAML provider and tries to log in once again with IdP initiated login - in this case we're invalidating tokens from existing session and switch to ones returned in exchange to the new SAML response. If invalidate returns non-200 - we'll return error page instead. We can ignore this case here since it's doesn't work correctly right now anyway - Properly handle SAML IdP initiated login with existing session containing expired access token kibana#59629 (the issue triggered this discussion initially).

Are you still planning this change for 7.7.0?

@jkakavas
Copy link
Member Author

jkakavas commented Apr 2, 2020

@azasypkin I was under the impression that it is broken now for kibana ( as you do get 4xx for already invalidated tokens - but with misleading errors ), and this is the motivation for changing this behavior.

TBC, nothing we're doing here changes the current behavior when you'd get a 200. If invalidation is successful, you still get a 200. What changes is what happens when invalidation is not successful, and that is that you will get more consistent 4** statuses and appropriate error messages where now you might get misleading error messages (but still with 4** statuses )

Are you still planning this change for 7.7.0?

I'm treating this as a bug since it doesn't change a correct behavior but fixes an incorrect one. As such (and assuming that we don't figure out that it does change a behavior in a way that is breaking for kibana ) if this is merged in time to make it to a BC for 7.7, it will be in 7.7, otherwise it will be in 7.7.1

@ywangd
Copy link
Member

ywangd commented Apr 2, 2020

@jkakavas I understand that the intention is to bring consistency to invalidate token calls for both access and refresh tokens. My main question was about when and why we should return a 200 response (other than for truely successful response), i.e. what internal failures qualify a 200.

Base on the analysis so far and existing code behaviours, the following status code seems to make sense to me:

  • Token document not found -> 404
  • Malformed token -> 400
  • Token index or security index does not exist -> 400 or 500?
  • Token already expired -> 200
  • Token already invalidated -> 200
  • Other failures like timeout -> existing behaviours

As for the response content, I'd personally prefer to tell user the failure reason. Maybe not to the details of base64 decoding failure, rather something like "malformed" or "not found". But the status code already signifies it. So I won't cling to my idea. I am available for discussion tomorrow when you have time. Thanks

@azasypkin Correct me if I am wrong, but I think Kibana should not be affected by these non-200 codes since I'd assume when Kibana invalidates the tokens, they should all fall into above cases where 200 will be returned. Also we are not changing any existing 200 responses into something else.

@azasypkin
Copy link
Member

TBC, nothing we're doing here changes the current behavior when you'd get a 200. If invalidation is successful, you still get a 200. What changes is what happens when invalidation is not successful, and that is that you will get more consistent 4** statuses and appropriate error messages where now you might get misleading error messages (but still with 4** statuses )

Ah, I must have misunderstood then, that's good to know! If in all the cases when we were returning 200 we'll continue to return 200 then we're good, thanks and sorry for the noise!

@jkakavas
Copy link
Member Author

jkakavas commented Apr 2, 2020

My main question was about when and why we should return a 200 response (other than for truely successful response), i.e. what internal failures qualify a 200.

And that is a valid question that made me rethink this through and decide that we shouldn't return 200. 💯

Token document not found -> 404

✔️ . The caveat is that we can never know for sure if this an access token that was at some point valid but now expired and removed OR a token that is not meant for us.

Malformed token -> 400

i disagree on that. I think 404 is appropriate. Access tokens should be opaque strings for the clients, they shouldn't care / know / be aware of whether it can be decoded on our side. It's not iike they are encoding or producing these tokens on the client side so that such an error message would be useful / an indication to do something differently. It's a server side implementation detail. We discussed this with Albert too and came down to that we could do it but don't see the benefit in it.

I was also thinking we probably want to keep the 404 vs X distinction and only return X when we want to signal to the caller that they should retry the request. Now, 400 maybe not be the right code for that , I'll think this through

Token index or security index does not exist -> 400 or 500?

I guess 400 , if it doesn't exist then what is this token that we need to invalidate either way ? Probably not ours, since if we had created it , we would have stored it too.

Token already expired -> 200

✔️ but more precisely, token that has already expired but not for more than 24hrs since we would have cleaned over the document. After ~24hrs from a token expiration this would fall in the "Token document not found" case.

Token already invalidated -> 200

✔️ With the same caveat as above.

Most things make sense and corresponds to what we were laying out in #54532 (comment)

As for the response content, I'd personally prefer to tell user the failure reason. Maybe not to the details of base64 decoding failure, rather something like "malformed" or "not found".

see my comments above, I'm still not persuaded this is something we should do.

@ywangd
Copy link
Member

ywangd commented Apr 3, 2020

@jkakavas Thanks a lot for your detailed response and background knowledge. I appreciate that. 👍 I feel we are progressing here and have narrowed down the discussions.

i disagree on that. I think 404 is appropriate. Access tokens should be opaque strings for the clients, they shouldn't care / know / be aware of whether it can be decoded on our side.

400 is a catch-all code for all client side errors. It includes but not limited to decoding errors. I picked it since no specific status code is available. In contrast, other 4xx codes have specific meanings, e.g. 404 has a specific meaning of "not found". Hence I find it hard to link this specific meaning to any underlying errors other than "something not found". Difference between status codes could also come in handy for support, i.e. if users tell us a 404 is returned, we know for sure it is about a missing document.

The caveat is that we can never know for sure if this an access token that was at some point valid but now expired and removed OR a token that is not meant for us.

This is a great one. We have two choices here: 1) 404 as discussed above; or 2) 200 with a TokensInvalidationResult.emptyResult and error_count set to 1. The current code returns 40x for this. So a 404 is not really breaking existing behaviour. But do we have any complaint about it from users or Kibana? e.g. Are users complaining that they always get an error page when they try to logout after a long idle time (24+ hours) (@azasypkin)? If yes, I'd prefer a 200 response. So overall 404 is acceptable here and 200 may be better if we have concrete user needs.

Lastly, for 4xx responses, would it be better to set error_count to 1. It does not really provide extra information to users since the status code already says it. But I feel it's more in line with the semantics of an "error" response.

@azasypkin
Copy link
Member

azasypkin commented Apr 3, 2020

Are users complaining that they always get an error page when they try to logout after a long idle time (24+ hours) (@azasypkin)?

It's definitely possible, but we haven't heard such complaints yet. Likely because there are just a few cases when user can get into such situation:

  • user was inactive for 24h+ on a page in Kibana that doesn't send any requests to a server on its own (so that token isn't automatically refreshed). And the first action after such inactivity period is local logout.
  • IdP initiated logout after a long period of inactivity in Kibana
  • IdP initiated login after a long period of inactivity (currently we don't call invalidate in this case for expired/missing tokens, but we want to fix that once this PR merges)

If yes, I'd prefer a 200 response. So overall 404 is acceptable here and 200 may be better if we have concrete user needs.

Kibana would definitely prefer 200 since we know that tokens we send are always well-formed and were created by Elasticsearch, but they can be:

  • expired or already removed from ES index (we treat these cases the same in Kibana, but I understand that it's not that easy to handle them in a similar way on ES side)
  • created by another ES version (after stack upgrade or something like this)

To summarize:

  • If we don't change any flow to return non-200 that currently returns 200 we won't break anything in Kibana
  • If returning 200 isn't feasible in the cases above we can deal with 4** codes as well, but we'll need to know that status code "matrix" to distinguish between cases when we can proceed or when we should stop user flow entirely cause something went terribly wrong and we want to signal that user may not be logged out and should contact admin (5**?).

@jkakavas
Copy link
Member Author

jkakavas commented Apr 3, 2020

But do we have any complaint about it from users or Kibana? e.g. Are users complaining that they always get an error page when they try to logout after a long idle time (24+ hours) (@azasypkin)?

I think this is up to Kibana to consume and not up to us to change. We will define the status code that we return in such cases and Kibana will make sure that the correct behavior is presented to the user.

404 also mimics what we do in other APIs in such cases ( i.e. when you try to delete a document that is not there, or a user/role that is not there - no matter if it existed or not at some point )

Hence I find it hard to link this specific meaning to any underlying errors other than "something not found".

We basically can't find a token document for that token. The reason why ( we can't decode the token string in order to make a search request) is - to me - an implementation detail.

Difference between status codes could also come in handy for support, i.e. if users tell us a 404 is returned, we know for sure it is about a missing document.

As I mentioned before, end users do not care about this. If this comes up for troubleshooting, it will be from the perspective of an administrator that has access to logs.

I still prefer the simplicity of 2 error codes ( 404 for when we can't find the doc for any reason, and xxx for when the index/shard is unavailable, so that clients can easily code around this and only retry on the latter ) over trying to add more error codes for different error cases. I can be persuaded to do this but I think it adds complexity with no apparent gain.

Lastly, for 4xx responses, would it be better to set error_count to 1. It does not really provide extra information to users since the status code already says it. But I feel it's more in line with the semantics of an "error" response.

We discussed whether we should do this also yesterday. We came down to the conclusion that we shouldn't. The reasoning is that this error_count was introduced to signify actual errors in invalidating tokens so that the caller is aware that they should try again. So let's say you get a response that is

{
"invalidated_tokens":8, 
"previously_invalidated_tokens":0, 
"error_count":3
}

it means there were 11 tokens and we could invalidate only 8 of them, so there are 3 that are still valid. If we return "error_count":1 for our 404s, it's as if we're saying that the token is not successfully invalidated == is still valid, which is not the case here. I'm fine doing this for the case where the index/shard is unavailable.

@ywangd
Copy link
Member

ywangd commented Apr 3, 2020

I still prefer the simplicity of 2 error codes ( 404 for when we can't find the doc for any reason, and xxx for when the index/shard is unavailable, so that clients can easily code around this and only retry on the latter )

Let's go with this then. 4xx vs 200 is a rather big difference. But between different 4xx, I don't have a super strong feeling and would prefer progress over perfection in this case. Consistent behaviour is the goal here and I think we can achieve it regardlessly.

As for the error_count, it was an optional suggestion. So I am ok either way.

Thanks for all the discussions. It was good learning.

@jkakavas jkakavas requested a review from ywangd April 9, 2020 14:33
@jkakavas jkakavas added v7.7.1 and removed v7.7.0 labels Apr 9, 2020
Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments and the mock needs to be fixed for the failed tests, otherwise LGTM. Thanks for the iterations.

I think a 400 might be better to signal that it's the caller's fault instead of 404 which can also be observed when a token is simply invalidated and deleted

Do you still plan to do this?

when(license.isTokenServiceAllowed()).thenReturn(true);
}

public void testInvalidateTokensWhenIndexUnavailable() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worthwhile to test "unavailable" index in TokenAuthIntegTests by closing the token index before invalidation?

@jkakavas
Copy link
Member Author

I think a 400 might be better to signal that it's the caller's fault instead of 404 which can also be observed when a token is simply invalidated and deleted

Do you still plan to do this?

I went back and forth to be honest, but I'll do this as it feels the right thing to do

@ywangd
Copy link
Member

ywangd commented Apr 15, 2020

I think a 400 might be better to signal that it's the caller's fault instead of 404 which can also be observed when a token is simply invalidated and deleted

Do you still plan to do this?

I went back and forth to be honest, but I'll do this as it feels the right thing to do

Feel free to leave it for the next round if you need more time to ponder on it.

@jkakavas
Copy link
Member Author

Is it worthwhile to test "unavailable" index in TokenAuthIntegTests by closing the token index before invalidation?

Wouldn't that mostly be testing SecurityIndexManager#checkIndexAvailable though ?

@ywangd
Copy link
Member

ywangd commented Apr 15, 2020

Wouldn't that mostly be testing SecurityIndexManager#checkIndexAvailable though ?

Not entirely. A closed index makes SecurityIndexManager#isAvailable() to return false and this is checked in TokenService#getUserTokenFromId and TokenService#findTokenFromRefreshToken. These methods in turn calls listener.onFailure(frozenTokensIndex.getUnavailableReason()).

My understanding is that you wanted to test the code behaviour when above failure happens?

@jkakavas
Copy link
Member Author

jkakavas commented Apr 15, 2020

Not entirely. A closed index makes SecurityIndexManager#isAvailable() to return false and this is checked in TokenService#getUserTokenFromId and TokenService#findTokenFromRefreshToken. These methods in turn calls listener.onFailure(frozenTokensIndex.getUnavailableReason()).

My understanding is that you wanted to test the code behaviour when above failure happens?

Yes, but what we check is what happens when isAvailable() returns false, not why it returns false

Now, that I will be adding different behavior for exists vs available for the invalidate related methods, I'll add a test case for closed as this still triggers unavailable but should return 400 and not 503. If that's what you meant above +1, I didn't make the connection

@ywangd
Copy link
Member

ywangd commented Apr 15, 2020

My original comment of "a closed index is counted as an unavailable index" was to your comment of

This was added because we couldn't trigger the index to be unavailable ( but rather only not existent ) in TokenAuthIntegTests

I thought your preference was to trigger the index to be unavailable in TokenAuthIntegTests, but could not, so you instead created TransportInvalidateTokenActionTests with mocks. So I was suggesting you could trigger unavailable index in the integration tests by closing the index before the token invalidation call. I was not suggesting it for differentiating between exists and available. But you are right it could be useful in this case.

@jkakavas
Copy link
Member Author

So I was suggesting you could trigger unavailable index in the integration tests by closing the index before the token invalidation call.

Gotcha, the original comment (#54532 (comment)) is presented out of context in :octocat: UI :/ instead of as a response to my original comment ( that I had forgotten I had made up until now )

@jkakavas
Copy link
Member Author

jkakavas commented Apr 15, 2020

Thanks for all the comments and discussion @ywangd , I changed the behavior to differentiate between closed/not-existent and unavailable in the invalidate token API, take a look if you please to validate that this is what you had in mind too

Comment on lines +611 to +615
if (e instanceof IndexNotFoundException || e instanceof IndexClosedException) {
listener.onFailure(new ElasticsearchSecurityException("failed to invalidate token", RestStatus.BAD_REQUEST));
} else {
listener.onFailure(unableToPerformAction(e));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have the new logic encapsulated inside unableToPerformAction(e). Otherwise it looks good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unableToPerformAction is meant to be a simple wrapper to throw an ESS with 503. If your issue is with the duplication of these 4 lines and you feel strongly about this, I can add method but I don't see so much value in it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often apply rule of 3 and these code is duplicated 2 times, not 3 or more. So, no, I don't feel strongly about it.

@jkakavas jkakavas merged commit de30a0e into elastic:master Apr 16, 2020
jkakavas added a commit to jkakavas/elasticsearch that referenced this pull request Apr 16, 2020
This commit fixes our behavior regarding the responses we
return in various cases for the use of token related APIs.
More concretely:

- In the Get Token API with the `refresh` grant, when an invalid
(already deleted, malformed, unknown) refresh token is used in the
body of the request, we respond with `400` HTTP status code
 and an `error_description` header with the message "could not
refresh the requested token".
Previously we would return erroneously return a  `401` with "token
malformed" message.

- In the Invalidate Token API, when using an invalid (already
deleted, malformed, unknown) access or refresh token, we respond
with `404` and a body that shows that no tokens were invalidated:
   ```
   {
     "invalidated_tokens":0,
     "previously_invalidated_tokens":0,
      "error_count":0
   }
   ```
   The previous behavior would be to erroneously return
a `400` or `401` ( depending on the case ).

- In the Invalidate Token API, when the tokens index doesn't
exist or is closed, we return `400` because we assume this is
a user issue either because they tried to invalidate a token
when there is no tokens index yet ( i.e. no tokens have
been created yet or the tokens index has been deleted ) or the
index is closed.

- In the Invalidate Token API, when the tokens index is
unavailable, we return a `503` status code because
we want to signal to the caller of the API that the token they
tried to invalidate was not invalidated and we can't be sure
if it is still valid or not, and that they should try the request
again.

Resolves: elastic#53323
jkakavas added a commit that referenced this pull request Apr 16, 2020
This commit fixes our behavior regarding the responses we
return in various cases for the use of token related APIs.
More concretely:

- In the Get Token API with the `refresh` grant, when an invalid
(already deleted, malformed, unknown) refresh token is used in the
body of the request, we respond with `400` HTTP status code
 and an `error_description` header with the message "could not
refresh the requested token".
Previously we would return erroneously return a  `401` with "token
malformed" message.

- In the Invalidate Token API, when using an invalid (already
deleted, malformed, unknown) access or refresh token, we respond
with `404` and a body that shows that no tokens were invalidated:
   ```
   {
     "invalidated_tokens":0,
     "previously_invalidated_tokens":0,
      "error_count":0
   }
   ```
   The previous behavior would be to erroneously return
a `400` or `401` ( depending on the case ).

- In the Invalidate Token API, when the tokens index doesn't
exist or is closed, we return `400` because we assume this is
a user issue either because they tried to invalidate a token
when there is no tokens index yet ( i.e. no tokens have
been created yet or the tokens index has been deleted ) or the
index is closed.

- In the Invalidate Token API, when the tokens index is
unavailable, we return a `503` status code because
we want to signal to the caller of the API that the token they
tried to invalidate was not invalidated and we can't be sure
if it is still valid or not, and that they should try the request
again.

Resolves: #53323
@bpintea bpintea added v7.7.0 and removed v7.7.1 labels Apr 21, 2020
jkakavas added a commit to jkakavas/elasticsearch that referenced this pull request Apr 27, 2020
jkakavas added a commit to jkakavas/elasticsearch that referenced this pull request May 14, 2020
This commit fixes our behavior regarding the responses we
return in various cases for the use of token related APIs.
More concretely:

- In the Get Token API with the `refresh` grant, when an invalid
(already deleted, malformed, unknown) refresh token is used in the
body of the request, we respond with `400` HTTP status code
 and an `error_description` header with the message "could not
refresh the requested token".
Previously we would return erroneously return a  `401` with "token
malformed" message.

- In the Invalidate Token API, when using an invalid (already
deleted, malformed, unknown) access or refresh token, we respond
with `404` and a body that shows that no tokens were invalidated:
   ```
   {
     "invalidated_tokens":0,
     "previously_invalidated_tokens":0,
      "error_count":0
   }
   ```
   The previous behavior would be to erroneously return
a `400` or `401` ( depending on the case ).

- In the Invalidate Token API, when the tokens index doesn't
exist or is closed, we return `400` because we assume this is
a user issue either because they tried to invalidate a token
when there is no tokens index yet ( i.e. no tokens have
been created yet or the tokens index has been deleted ) or the
index is closed.

- In the Invalidate Token API, when the tokens index is
unavailable, we return a `503` status code because
we want to signal to the caller of the API that the token they
tried to invalidate was not invalidated and we can't be sure
if it is still valid or not, and that they should try the request
again.

Resolves: elastic#53323
jkakavas added a commit that referenced this pull request May 14, 2020
This commit fixes our behavior regarding the responses we
return in various cases for the use of token related APIs.
More concretely:

- In the Get Token API with the `refresh` grant, when an invalid
(already deleted, malformed, unknown) refresh token is used in the
body of the request, we respond with `400` HTTP status code
 and an `error_description` header with the message "could not
refresh the requested token".
Previously we would return erroneously return a  `401` with "token
malformed" message.

- In the Invalidate Token API, when using an invalid (already
deleted, malformed, unknown) access or refresh token, we respond
with `404` and a body that shows that no tokens were invalidated:
   ```
   {
     "invalidated_tokens":0,
     "previously_invalidated_tokens":0,
      "error_count":0
   }
   ```
   The previous behavior would be to erroneously return
a `400` or `401` ( depending on the case ).

- In the Invalidate Token API, when the tokens index doesn't
exist or is closed, we return `400` because we assume this is
a user issue either because they tried to invalidate a token
when there is no tokens index yet ( i.e. no tokens have
been created yet or the tokens index has been deleted ) or the
index is closed.

- In the Invalidate Token API, when the tokens index is
unavailable, we return a `503` status code because
we want to signal to the caller of the API that the token they
tried to invalidate was not invalidated and we can't be sure
if it is still valid or not, and that they should try the request
again.

Backport of #54532
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Properly handle non-existent tokens in Token Invalidate API
7 participants