New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

privacyidea-token-janitor delete action does not delete all matching tokens #1322

Open
fredreichbier opened this Issue Nov 26, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@fredreichbier
Member

fredreichbier commented Nov 26, 2018

I just stumbled upon this problem, which can be reproduced as follows:

  • Have a lot of tokens
  • Run privacyidea-token-janitor find --action delete --chunksize 10, which should delete all tokens
  • Token janitor deletes a lot of tokens and exits successfully after a while
  • Run privacyidea-token-janitor find --action delete --chunksize 10
  • The token janitor still finds some tokens to delete!

This is probably related to the chunksize feature we added in #1224 and #1264 by using the pagination feature of get_tokens:

  • Assume we have 100 tokens numbered from 0 to 99
  • The token janitor fetches the first chunk of 10 tokens to delete with get_tokens(..., page=1), i.e. tokens 0 to 9
  • It deletes these tokens
  • The token janitor now fetches the next chunk of 10 tokens to delete with get_tokens(..., page=2). As tokens 0 to 9 have been deleted, page 1 now contains tokens 10 to 19, so page 2 is now tokens 20 to 29!
  • Because of this, tokens 10 to 19 are never deleted.

The underlying problem is that we modify the database while doing a paginated search, and this is not only the case for token deletion, but also for things like unassign, disable :-/

@fredreichbier

This comment has been minimized.

Member

fredreichbier commented Nov 27, 2018

Looking about this some more, it turns out to be quite nasty :-/

The bug only happens if the number of tokens is greater than the chunksize (otherwise, pagination has no effect).

privacyidea-token-janitor misses tokens for invocations in which (1) the action influences the tokens found by the filter criterion, and (2) the filter criterion is translated into a SQL query:

  • --action delete is a problem for any filter criterion: It deletes tokens, so they don't match the filter criterion anymore, which causes the above problem
  • --active true --action disable disables active tokens, so they don't match the filter criterion anymore, same problem

But the bug doesn't occur in the following cases:

  • --active false --action unassign unassigns inactive tokens. But as they are still inactive, they still match the filter criterion, so the bug does not occur!
  • For --orphaned 1 --action unassign, the check for orphaned tokens is not part of the SQL query. So from the SQL perspective, unassigned tokens still match the query, so the bug does not occur!

For the cases in which the action interferes with the filter criterion, we would need to use a solution like in delete_chunked for audit rotation:

def delete_chunked(session, table, filter, limit=1000):

Here, we do not use the pagination feature, but just delete matching tokens until no tokens are found anymore.

We could fix the bug by checking ahead-of-time if the filter criterion and action will interfere, and using a special implementation in these cases. But checking for such conditions seems pretty hard and error-prone to me.

We originally introduced the feature in #1224 to speed up the search for orphaned tokens. So I have the following proposal: We only use paginated search for actions which do not change the database at all (i.e. CSV export, listuser, export), and refer to good-old non-paginated search for all other actions. This will slow down these actions, and maybe cause problems for big queries. But if the queries go through, the resulting database state will be correct, which I think is much better than the current behavior! :-)

@fredreichbier

This comment has been minimized.

Member

fredreichbier commented Nov 27, 2018

Or: we could save the total number of matching tokens in the first invocation of get_tokens. For all further invocations, we check if the new total number of matching tokens is smaller, and if that is the case, we restart the query at page 1.

But this could have unexpected results if some user just deletes or disables some token via the WebUI in parallel.

fredreichbier added a commit that referenced this issue Nov 27, 2018

Token janitor now forbids --chunksize for modification actions
For several combinations of filter criteria and actions, the --chunksize
parameter resulted in matching tokens for which no action was performed.

Fixes #1322
@plettich

This comment has been minimized.

Contributor

plettich commented Nov 27, 2018

We could fix the bug by checking ahead-of-time if the filter criterion and action will interfere, and using a special implementation in these cases. But checking for such conditions seems pretty hard and error-prone to me.

could You explain this a little more? We check if the action is delete and use the non-paginated query. The filter should be the same, or not?

But this could have unexpected results if some user just deletes or disables some token via the WebUI in parallel.

is the database locked during the pagination requests? Otherwise this problem would arise there, too.

@fredreichbier

This comment has been minimized.

Member

fredreichbier commented Nov 28, 2018

could You explain this a little more? We check if the action is delete and use the non-paginated query. The filter should be the same, or not?

I'll try :) You're right, for delete the case is easy because we just use the DeleteLimit query from lib/sqlutils.py regardless of the filter criterion. But it's slightly harder for, e.g., disable: In that case, it depends on the filter criterion whether we can use pagination or not:

  • For --active true, we must not use pagination, because active tokens will be disabled and won't match the SQL query anymore
  • For --orphaned 1, we can use pagination, because the orphaned token check is not part of the SQL query

We could think of falling back to non-paginated queries for cases in which we aren't sure, but this could be complicated too :)

But this could have unexpected results if some user just deletes or disables some token via the WebUI in parallel.

is the database locked during the pagination requests? Otherwise this problem would arise there, too.

Good point! No, the database is not locked, so all sorts of funny things could happen anyway.

@cornelinux cornelinux added this to the 3.0 Code Cleanup milestone Nov 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment