Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce query planning allocations #7385

Merged
merged 2 commits into from
Oct 3, 2016
Merged

Reduce query planning allocations #7385

merged 2 commits into from
Oct 3, 2016

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Sep 29, 2016

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated

This PR reduces allocations during query planning. The main change is to the Measurement.TagSets which was creating a lot of garbage.

The TagSets function was creating a lot of intermediate maps and
slices to calculate the sorted tag sets. It first creates a map
to group tag sets with their series, it then created an equally
sized slice of the tag keys and sorted them. Finally, it created
a new slice and added the tag sets in the original map by the ordering
of the sorted keys. It was also recreating the tags map multiple time
creating extra garbage in the loop.

This simplifies the code to create one map for grouping and than adding
the distinct sets to a slice which is then sorted. It also fixes the
multple tag maps getting created.

This could probably be optimized further using a ordered tree data structure.

1.0

$ influx_stress
Total Queries: 250
Average Query Response Time: 65.779923ms

1.1 master

$ influx_stress
Total Queries: 250
Average Query Response Time: 17.209273ms

cc @jsternberg

return nil, err
}

// For every series, get the tag values for the requested tag keys i.e. dimensions. This is the
// TagSet for that series. Series with the same TagSet are then grouped together, because for the
// purpose of GROUP BY they are part of the same composite series.
tagSets := make(map[string]*influxql.TagSet)
tagSets := make(map[string]*influxql.TagSet, len(filters))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may end up overallocating. While this performs well if you have GROUP BY *, I imagine it would perform poorly when there is no GROUP BY because everything maps to the same tag set. So you could have 10,000 series ids, but only need one tag set and this would allocate a map that was ready for storing 10,000 unique entries.

Copy link
Contributor Author

@jwilder jwilder Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test my datasets w/ queries w/o group by. The problem it has without it as that it almost always has to re-allocate and grow. Maybe a smaller default of of 1024 or would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This certainly needs to change in some way and the AddFilter below also needs to. I'm just not sure the proper way to do it.

}

// Associate the series and filter with the Tagset.
tagSet.AddFilter(m.seriesByID[id].Key, filter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my previous comment, this is a hot-zone I found while I was working on lazy iterators. When no GROUP BY * is used, a lot of time is spent growing the slice inside of the tag set. I have not figured out a good way to deal with this since attempting to count the number of elements ahead of time didn't really reduce the amount of time.

tagsAsKey := string(MarshalTags(tags))
tagSet, ok := tagSets[tagsAsKey]
tagsAsKey := MarshalTags(tags)
tagSet, ok := tagSets[string(tagsAsKey)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember. Does this cause the map to use some kind of optimization for memory allocations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supposed to. But it reduces the []byte{} -> string conversion from 2 -> 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsternberg yeah this approach is significantly than the previous approach. Around 15ns per lookup versus 45ns from some recent benchmarking I did.


// Sort the series in each tag set.
for _, t := range tagSets {
sort.Sort(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this, while it increases speed by a lot, will cause results to be returned non-deterministically when we use things like LIMIT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1161,6 +1161,12 @@ func (e *Engine) createVarRefIterator(opt influxql.IteratorOptions, aggregate bo
return err
}

if opt.Limit > 0 || opt.Offset > 0 || opt.SLimit > 0 || opt.SOffset > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still needed even when these conditions aren't met. Otherwise, the output is still non-deterministic for queries without a limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag sets are ordered by key still. This just order the series keys within each tagset.

Do you know of a query that could have this non-determinism?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tag sets are ordered by the returned series. Internal to a series wouldn't be ordered so if you had this:

cpu,host=server01,region=uswest value=0
cpu,host=server01,region=useast value=1
cpu,host=server02 value=2

> SELECT * FROM cpu GROUP BY host

The host would be ordered, but the internal series grouped into the server01 bucket would not be deterministic. This is why it affects LIMIT <n> and OFFSET <n>. While I'm not sure it matters, I assume we still want that to be deterministic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the sort removal so it will always sort as before. It seems like it is unnecessary in some cases though. For example,

SELECT SUM(value) FROM cpu GROUP BY host

Should not need to sort. Is there a way to detect this scenario here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is. We also may want to remove the sort here and instead perform the sort when we're using the tag set to create the iterators inside of the query engine.

The TagSets function was creating a lot of intermediate maps and
slices to calculate the sorted tag sets.  It first creates a map
to group tag sets with their series, it then created an equally
sized slice of the tag keys and sorted then.  Finally, it created
a new slice and added the tag sets in the original map by the ordering
of the sorted keys.  It was also recreating the tags map multiple time
creating extra garbage in the loop.

This simplifies the code to create one map for grouping and than adding
the distinct sets to a slice which is then sorted.  It also fixes the
multple tag maps getting created.
Copy link
Contributor

@e-dard e-dard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@jwilder jwilder merged commit f727eff into master Oct 3, 2016
@jwilder jwilder deleted the jw-query-allocs branch October 3, 2016 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants