Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List series should have option to return shard space mapping #867

Closed
pauldix opened this issue Aug 22, 2014 · 16 comments
Closed

List series should have option to return shard space mapping #867

pauldix opened this issue Aug 22, 2014 · 16 comments

Comments

@pauldix
Copy link
Member

pauldix commented Aug 22, 2014

So users can be sure they've set things up properly, they should be able to see which shard space a give series will be mapped to.

It seems the easiest way to do this is to have an option on list series that will have it return the shard space name each series will be mapped to. Like

list series, space
list series, space /&stats.*/

The , space would be optional. If included the result would look something like:

Name Space
seriesA one_week_space
seriesB 30_days_space

I'm open to other potential query syntax, this was just my first idea.

@sahilthapar
Copy link

👍

@schmurfy
Copy link
Contributor

why not just add the space column in the returned data for "list series" ? is it expensive ?
Other than that what would your second example do, list the series stored in the spaces matching a regexp ?

@jvshahid
Copy link
Contributor

@schmurfy agree, I don't see a reason why make the query more complicated than just list series and return the extra column

@schmurfy
Copy link
Contributor

schmurfy commented Sep 9, 2014

is there any news on this feature ? I currently have a test system running fine but I would like to check that everything is going where I think it is before pushing it to production.

@pauldix
Copy link
Member Author

pauldix commented Sep 9, 2014

@jvshahid, @schmurfy returning the space mapping per series would definitely be more expensive. Unless we cached that mapping. Basically, for each series name we'd loop through the shard spaces and see if it matches the regex.

It's certainly easier to do this feature without the updates to the query grammar. I could run a few tests to see how much it slows down the query.

@pauldix
Copy link
Member Author

pauldix commented Sep 9, 2014

Just tested on my laptop with 500k series. Without including the space name was about 2s to run list series. With space names it was about 2.7s. Obviously, it'll have a greater impact when going over a network, but gzip will help quite a bit with that.

What you think @jvshahid, good enough?

@pauldix
Copy link
Member Author

pauldix commented Sep 9, 2014

I forgot to mention that was with 3 shard spaces defined.

@Dieterbe
Copy link
Contributor

Dieterbe commented Sep 9, 2014

alternatively we could also keep 'list series' just the list of series, and have a command like inspect shard <shard-name> to see which series match it.

@schmurfy
Copy link
Contributor

@pauldix as a temporary measure could you share the modified list series ? If you can make it available on a branch somewhere I could use it for my current goal and will not impact the discussion on how to really implement it in an influxdb release.

I think the real question for implementing this feature is: how is list series currently used ?
As I see it you would not run it until needed and probably keep a cached version but aside of the admin tools which need to show this information I don't see any real use in production code, if that's really the case adding more data and slightly slow it down is not an issue.
Do you have any use case for list series in production code ?

pauldix added a commit that referenced this issue Sep 10, 2014
Fixes #867. Updated lexer and parser to work, added code to coordinator to insert spaces if requested.
@Dieterbe
Copy link
Contributor

I think the real question for implementing this feature is: how is list series currently used ?

great point!
people who use influxdb as graphite backend, do so via graphite-influxdb. let me explain that use case because i think it's important and pretty common.
the graphite server receives a request such as target=someFunc(foo.bar.*.something.{match1,match2}.blah), and it needs to convert this into the appropriate queries for influxdb. this goes as follows:
1 is figuring out which series are matched
2 querying the data of those series
3 applying someFunc on the data (in the graphite process)
4 converting the results in a png graph, and returning that to the user.

all of this, of course, needs to be as fast as possible. certain things can be cached but even on first hit it should still be fast, and we can't cache for long anyway because new series should become visible quickly. That's why the ability to retrieve the list of series (filtered by regex) as fast as possible is so important for the graphite-api use case (see also #884)

@Civil
Copy link

Civil commented Sep 10, 2014

Also my 5c to the previous comment:
Ability to know about retention scheme and space mapping also will be useful - because it's better to get this info from influx, then to force user to specify it in config file. Though speed of this query is more important. Cause even now 'list series' is slow, very slow. For dashboards (typical dashboard in my practice is around 10-15 graphs), graphite executes at least one 'list series /regex/' per graph (and that's with out of tree patches, with patches - 1 query per line, so it can be easily 50 queries for dashboard). Even now you can see how dashboard redraws, if it'll be 30% slower it'll be totally unacceptable for displaying data with graphite (and I think for any other graph system with influxdb as storage).

For 1kk series (250k series with 4 spaces), it could take really really a lot. 10 quereis, let's say 500ms each - it'll be 5 seconds just to ensure that graph can be plotted. And update period is 30-60s. And what if there'll be more graphs? In my experience there could be 700k series without spaces (and with it'll be more then 2kk), how fast it would work?

Though it's only my opinion as a user.

@schmurfy
Copy link
Contributor

thanks :)

@schmurfy
Copy link
Contributor

are you sure the result is reliable ?
Because it shows me everything going into default which is a bit odd since my default space seems to retain 7d worth of data and I can get points from 2 weeks ago.

Can anyone confirm it can returns something other than default ?

@schmurfy
Copy link
Contributor

@pauldix on a somewhat related topic would it be possible to add a similar way to return the shard space when doing a select ? It would return the real space where the point was read instead of where it would end up being stored, I think both make sense to track down configuration errors.

@pauldix
Copy link
Member Author

pauldix commented Sep 12, 2014

The test checks mappings to other spaces: https://github.com/influxdb/influxdb/blob/master/integration/single_server_test.go#L146-L181

I'm wondering if there's a problem here where you created shard spaces, but didn't have a catch all space (like the default). Then you wrote data in and it fell through and then created the default space.

The problem is that when spaces are created, they get put at the front of the list so they are evaluated first. So after that everything gets assigned to the default. Not sure, just a guess.

Can you post a gist of http://localhost:8086/cluster/configuration?u=root&p=root?

@schmurfy
Copy link
Contributor

it seems to work now, what I did in the interval was remove the database an recreate it with a master build (to get the "include spaces" option).
Now if I run list series with space it shows what I expects, my old configuration had one regexp wrong but the others were right so showing everything going in default was still wrong, anyway I don't have the faulty database anymore so I suppose it's fine if I was the only one with this issue ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants