Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to limit the count of matched items. #20

Open
Sazpaimon opened this issue Mar 25, 2014 · 3 comments
Open

Add ability to limit the count of matched items. #20

Sazpaimon opened this issue Mar 25, 2014 · 3 comments

Comments

@Sazpaimon
Copy link
Contributor

In a scan, limit restricts the number of items scanned. It would be nice if there were a way to limit the amount of matched items a scan returns. Vogels should also probably recurse through each LastEvaluatedKey until the limit is met.

Query also appears to have a similar issue, but only when the query response is getting paginated (such as when it hits the 1MB limit per response). If I do a limit(500) and only get, for example, 100 items returned with a LastEvaluatedKey, Vogels should recurse through and continue to get more responses until the limit is met, or no more results are available.

Right now the only workaround for these is doing a loadAll, which does not respect limit and will always get all available items (which can potentially waste throughput).

@ianmurrays
Copy link

Any update on this? I need this too 😄

@zhiyelee
Copy link

+1

@ryanfitz
Copy link
Owner

Do to the way DynamoDB works this would be difficult to implement correctly in a generic way. I'll give an example:

  // find all users named bob
  Account.scan().where('name').equals('bob').limit(500);

Lets say there are 500,000 accounts total and 800 are named 'bob'.

First issue is say all the accounts named bob are towards the end of the scan iterator. You are going to scan over all 500K items before you reach 500 Bobs and potentially use up all your provisioned throughput. If you enabled loadAll() on your scan, vogels will scan 500 items at a time, but will eventually iterate over the entire table. limit with loadAll is probably a bit confusing for new users when combining them. Limit with dynamodb really just provides a way to limit the number of read throughput used per request, not really a way to limit the number of items returned.

The second issue is say in the first scan request 100 Bobs are returned and then a subsequent request is made again in order to attempt to fully load 500 Bobs, on this next request 450 Bobs get returned (because many bobs were located near this iterator). Should vogels just 500 users, or should it return 550 (total found). Data isn't sorted on scans so it would have to just randomly return 500 users and ignore the 50 others. The next iterator returned from DynamoDB wouldn't be valid if you then attempted to load up the next 500 Bobs, because you didn't return 50 of them. The same issue applies on queries to secondary indexes.

I'm open to suggestions on how to make this as user friendly as possible. We need to be able to work with the limitations DynamoDB and Id want to make it explicit as possible to developers that they might be consuming lots of throughput when executing certain functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants