Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with super large quantities #25

Closed
CrackerJackMack opened this issue Feb 7, 2013 · 16 comments
Closed

Deal with super large quantities #25

CrackerJackMack opened this issue Feb 7, 2013 · 16 comments

Comments

@CrackerJackMack
Copy link
Contributor

We should probably have some type of sane way of dealing with limiting the number of objects returned.

Something like limiting the default to 200 and allowing the client to specify --offset/--from.

Or maybe expose --limit and default is for everything.

More discussion is needed

@sudorandom
Copy link
Contributor

I agree, for calls where it's possible to list a lot of items (e.g. sl cci list) --limit and --offset should be added to the parser for that CLIRunnable class. To standardize on that, I propose we add a 'add_limit_args' function (or a better name) in SoftLayer/CLI/init.py to reduce duplicate code when adding arguments to the parser.

@CrackerJackMack
Copy link
Contributor Author

Isn't there already a section were we stuff some other default parser options such as --config and --really ? If so, I suggest we add them there.

I'm more inclined to use --limit and --from, but --offset maps to our API more directly. Regardless, we'll need to add additional output such (only showing X, Y remaining). On most services there are getXCount() so this is a possibility but makes a 2nd call.

@sudorandom
Copy link
Contributor

Isn't there already a section were we stuff some other default parser options such as --config and --really ? If so, I suggest we add them there.

Yep, all the other helpers are in SoftLayer/CLI/init.py (where I suggested).. but since there seem to be more and more (and candidates for more) maybe we should move those helpers somewhere. If we put it in SoftLayer/CLI we need to make sure it doesn't get picked up as a module so 'utils' doesn't show up in the CLI.

@sudorandom
Copy link
Contributor

This is on hold until I can find a way to filter CCI by multiple tags on the server side so we can leverage limits/offsets also on the server side. This resulted in finding a bug in the XML-RPC to SOAP translation layer. Waiting until it's fixed before going further.

@quiteliderally
Copy link
Contributor

Until the last comment, this was on track to be a re-implementation of grep

@sudorandom
Copy link
Contributor

I'm starting to think that we might be doing this the wrong way. Instead of bubbling up --limit and --offset to command-line options what would you guys think about simulating streaming by fetching a reasonable amount of results at a time and displaying them immediately until the entire list is exhausted. This way, we can periodically show the header row as well.

@sudorandom
Copy link
Contributor

Here's an example:

:.......:............:....................................:.......:........:................:...............:..............:
:  id   : datacenter :                host                : cores : memory :   primary_ip   :   backend_ip  : provisioning :
:.......:............:....................................:.......:........:................:...............:..............:
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
:.......:............:....................................:.......:........:................:...............:..............:
:  id   : datacenter :                host                : cores : memory :   primary_ip   :   backend_ip  : provisioning :
:.......:............:....................................:.......:........:................:...............:..............:
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
: 12345 :   dal05    :               server               :   4   :   4G   :    12.34.56    :    65.43.32   :              :
:.......:............:....................................:.......:........:................:...............:..............:

@quiteliderally
Copy link
Contributor

the way I would prefer to solve this problem is more robust server-side filtering, rather than limits. If you search for "", and that's a big list, what did you expect? If you search for "tim-", that should be more manageable.

The use case where you really want to see every CCI seems very rare for internal users and even rarer for external. You either have a number of CCIs where fetching them all doesn't matter, or you have so many that you know a reasonable limiting query is required. I grep the list 100% of the time, but it gets costly obviously to fetch that list again and again.

Another option I mentioned to landreth is that ability to search by username who created it. I don't even know if that's possible, but it would be nice to be certain the group of CCI IDs I'm operating on are mine. If we allow a user to set a default for that (#70) then you get a nice scoping of the account for free without having to mess with paging.

@underscorephil
Copy link
Contributor

I am more inclined to see a default resultLimit set with the ability to override through flags in addition to server side filtering support as a nice to have. However rather than reprinting the entire table for each call, print the entire result list at once. Doing so would allow us to handle multiple output formats(csv,xml,json) with greater ease.

A default resultLimit will prevent API timeouts in almost all cases as we have a predefined object mask and know the amount of data returned with each object.

Limiting interaction based on user is currently available through permissions and tags would be a good solution for user organization if only viewing and not interaction should limited.

@quiteliderally
Copy link
Contributor

I'd hate to be mistaken for someone who cares about the problems of others, but that pretty strongly violates the Principle of Least Surprise™

Getting back a portion of the total records when you didn't ask for a limit is Surprising™. Getting back all CCIs that contain the string "tim" when you searched for "tim" is Not Surprising™.

Also, I don't think the scenario of having a default resultLimit and not supporting server-side filtering is a workable option. If the only way to search is to grep locally, but you've left an arbitrary number of records on the server, they won't get matched. That's Surprising™

@quiteliderally
Copy link
Contributor

@SLphil, I think I understand what you said about user filtering. To clarify, I'd like an option to be able to pass to this:

https://gist.github.com/timariyeh/07cd403f8d3b8115b495

And be certain without scanning every single line item that I'm not cancelling @CrackerJackMack's servers. If it were impossible for my list output to return Joe Bro's CCI without a "-A" option, then it would be impossible to seed the cancel call with those IDs. It would also mean I could just do a "cci list" and not want to die reading each line.

@briancline
Copy link
Member

In my mind the most intuitive solution is no limit on result set by default (--limit=none), with some sanity paging under the hood to prevent the API timeouts @SLphil mentioned.

For folks with a bajillion servers, if someone notices the retrieval is slow and it's enough of a problem for them, they could set --limit=100 and perhaps --offset=m. The automatic sanity paging that might occur is likely to be the delay someone in this case gets impatient with.

For folks who don't have enough servers to be inconvenienced with a delay, this default behavior is still intuitive; they didn't ask for a limit so they still get their full list as expected in a reasonable amount of time. Even if it has to automatically page once into a second API call, it's likely not a big deal; and they also got what they asked for.

And of course both users, should they be impatient, would still have the option of doing some sort of faster name- or tag-based search that occurs on the server side (i.e., sl cci list 'webscale*' or sl cci list --tags=yolo).

@underscorephil
Copy link
Contributor

I believe we have a miscommunication which should be expected as nobody expects The Spanish Inquisition™ :). My comment on resultLimits is less to do with the data output but rather the backend. See this example of using result limits to prevent call timeouts while gathering an arbitrary amount of objects:
https://gist.github.com/SLphil/5147534

The string filtering can be done without server side filtering before being displayed to the user, preventing the need for piping to grep.

@quiteliderally
Copy link
Contributor

I think as long as these things are true:

  1. The SoftLayer API supports server-side filtering
  2. The SoftLayer API command line client has options for filtering

Then it's counter-intuitive for those command line options to not leverage the the server-side filtering. I'm in general opposed to introducing new stuff when there is already stuff.

The only reason I ever clicked on this issue to see how you gross python people live is because an immediate issue that pops into my mind is this scenario:

I want all instances that contain the phrase "2 chaynz" out of the hundreds (or thousands) of CCIs that exist on my account, but I don't want to process that intermediary result on my machine. In the same sense that I don't want to fetch an entire sql table to find max(weight)

In this scenario, any pre-imposed limit is gross, because my results are misleading. Even if I limit 50 and filter client-side, it would be purely coincidental if the 50 records I receive happen to contain all references to the string

If it's not server-side, I'd much rather use grep than whatever ad hoc filtering might be implemented, but I guess there are windows users(??) to think about.

Working around timeouts or whatever steps just outside the zone of things I think about, so it might not be possible.

@sudorandom
Copy link
Contributor

@SLphil I have a generic implementation of the pattern you mentioned/used that returns a generator that you can iterate over without regard to the number of API calls being made. You can see that here: #66

@ everyone I feel like if someone types sl cci list they want to see all their CCIs. That seems reasonable. Since this seems to be the trend favorite, filtering like this, sl cci list kmac* would should CCIs that start with kmac*. That's pretty reasonable too. As for flushing out the implementation (because eventually someone somewhere has to write this code and that's half of what this issue is for), would you expect to have that just filtering on hostname? If so, that's pretty doable.

@quiteliderally
Copy link
Contributor

Is this closed, or "closed" ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants