-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two-step aggregation #10
Comments
Yeah, I think a custom type is the way to go. Since it's intended as an intermediate value, the in/out functions are not too important. At first you could even just return "hello world" for Those functions are not really that hard to write. Postgres has some nice helper functions. Here is the I'm not sure it makes sense to have All these |
Great, thanks for the link to that example and confirmation on the approach.
Yes, all the |
I can't think of any concrete reason using a Btw in case you haven't noticed, there are special-purpose |
I've been working through the custom type, defined currently as CREATE TYPE vecaggstats (
internallength = variable,
input = vecaggstats_in,
output = vecaggstats_out
); and thanks for your suggestion of using placeholder values for the // cache our custom VecAggElementStats type Oid
static Oid VecAggElementStatsTypeOid;
void
_PG_init(void)
{
// FIXME: this returns error "Unknown type: vecaggstats"... how do this?
VecAggElementStatsTypeOid = typenameTypeId(NULL, typeStringToTypeName("vecaggstats"));
} it throws an error when running |
I don't think caching the type oid is correct. As you say the type doesn't exist yet, but also it seems like there may be problems if the type gets dropped/restored. Anyway Postgres has its own type cache so just using the |
I see, it was the |
I am to the point in feature/two-step-aggs where this SQL runs: SELECT vec_agg_count(vec_stat_agg(nums)) FROM measurements;
vec_agg_count
---------------
{4,3,4} However I have so many doubts/questions that I'm unsure of the correctness of most of it. First up, the CREATE OR REPLACE FUNCTION
vec_stat_agg_finalfn(internal, numeric[])
RETURNS vecaggstats[]
AS 'aggs_for_vecs', 'vec_stat_agg_finalfn'
LANGUAGE c; typedef struct VecAggElementStatsType {
char vl_len_[4];
Oid elemTypeId;
uint32 count;
Datum sum;
Datum min;
Datum max;
Datum mean;
} VecAggElementStatsType; I was thinking that actually that CREATE OR REPLACE FUNCTION
vec_stat_agg_finalfn(internal, numeric[])
RETURNS vecaggstats -- no longer array here
AS 'aggs_for_vecs', 'vec_stat_agg_finalfn'
LANGUAGE c; typedef struct VecAggStatsType {
char vl_len_[4];
Oid elemTypeId; // type of Datum used below
uint32 elemCount; // number of elements in Datum arrays below
Datum *counts;
Datum *sums;
Datum *mins;
Datum *maxes;
Datum *means;
} VecAggStatsType; If so, is that even be the way to define Datum arrays here, or would it need to be defined as typedef struct VecAggStatsType {
char vl_len_[4];
Oid elemTypeId; // type of Datum used below
uint32 elemCount; // number of elements in Datum arrays below
ArrayType *counts;
ArrayType *sums;
ArrayType *mins;
ArrayType *maxes;
ArrayType *means;
} VecAggStatsType; |
I think returning a single value instead of an array makes more sense. I think I'd make I don't think I'd include I think I'd do |
Great, I'll give that a go.
Yes you are correct, I was glossing over the fact that most of the array types are the same as the input element type, but not necessarily so for things like
The reason for this is that the mean value would actually calculated by a function call like
Right-o, I'll go with Thanks for all the feedback! |
I converted to that SELECT vec_agg_count(vec_stat_agg(nums)) AS counts
, vec_agg_sum(vec_stat_agg(nums)) AS sums
, vec_trim_scale(vec_agg_mean(vec_stat_agg(nums))) AS means
FROM measurements;
counts | sums | means
---------+-------------------+-----------------
{4,3,4} | {4.92,5.91,14.80} | {1.23,1.97,3.7}
(1 row) And I verified via a debugger that indeed the |
Awesome! |
I've been testing out the performance of the new functions on a larger data set, and have a surprising result where the two-step aggregate is performing about the same, or even slightly slower than, the one-step aggregates. For example here is what I'm comparing: -- one step variation
SELECT vec_to_count(data_i) AS counts
, vec_to_sum(data_i) AS sums
, vec_trim_scale(vec_to_mean(data_i)) AS means
, vec_to_min(data_i) AS mins
, vec_to_max(data_i) AS maxes
FROM measurements2;
-- two step variation; expected to be faster
SELECT vec_agg_count(vec_stat_agg(data_i)) AS counts
, vec_agg_sum(vec_stat_agg(data_i)) AS sums
, vec_trim_scale(vec_agg_mean(vec_stat_agg(data_i))) AS means
, vec_agg_min(vec_stat_agg(data_i)) AS mins
, vec_agg_max(vec_stat_agg(data_i)) AS maxes
FROM measurements2; When run on this test data set, the two-step variation performs about 2x as fast, but on my larger data set it is about break-even and usually slightly slower. An eample one-step plan looks like
and a two-step plan like this:
I tried experimenting with using I'm a bit stumped! Have any ideas here? |
I took a look at the benchmark support you created and threw in some benchmarks to profile the numeric support. The results I get look like:
So in this query the two-step variation is actually faster. Do you have any tips on approaching some low-level profiling of the functions, I've not done anything like that in C before. Also, I discovered that if the aggregate spools to temp files, the output of the two-step aggregation is incorrect and sometimes the server even crashes. Would that be from the lack of real |
I am still curious about the two-step approach with the custom time, however I also realized I could achieve a more limited "two-step" benefit the same way Postgres does for the CREATE AGGREGATE vec_agg_count(numeric[]) (
sfunc = vec_stat_accum,
stype = internal,
finalfunc = vec_agg_count_finalfn
);
CREATE AGGREGATE vec_agg_mean(numeric[]) (
sfunc = vec_stat_accum,
stype = internal,
finalfunc = vec_agg_mean_finalfn
); Then Postgres still only creates a single state object for all these columns, and I achieve that "two-step" performance benefit I was seeking: SELECT vec_agg_count(data_i) AS counts
, vec_agg_sum(data_i) AS sums
, vec_trim_scale(vec_agg_mean(data_i)) AS means
, vec_agg_min(data_i) AS mins
, vec_agg_max(data_i) AS maxes
FROM measurements2; This is working for me even against the data sets that cause the two-step approach to fail as mentioned previously. Also the benchmarks are similar:
What isn't quite as nice with this approach is how other statistics like the It is nice not to have to perform the extra calculations for sum-of-squares when not needed (especially for numerics, which are expensive), so I thought an aggregate argument to toggle that support on/off could be used, like SELECT vec_agg_var_samp(data_i, variance => true)
, vec_agg_mean(data_i, variance => true)
FROM measurements2; But passing SELECT vec_agg_var_samp(vec_agg_stat(data_i, variance => true))
, vec_agg_mean(vec_agg_stat(data_i, variance => true))
FROM measurements2; But, the two-step branch does not fully work so I'm missing something. |
I didn't get a chance to dig into these questions on the weekend. I agree having Is that an hstore argument? I haven't seen that used for named arguments, but I like it! Your benchmarks are really surprising to me. I'm interested in exploring that some myself, but I probably won't be able to right away. Same with the corruption when spooling to disk (but yeah it's possibly because of the in/our functions). I'm surprised that re-using the same transfn means it doesn't get called twice. I thought it was just a means of code re-use, not a potential for optimization. I suppose it's true that Postgres shouldn't need to compute it twice though. Nice! That does make me wonder why the Timescale folks aren't just relying on that. Perhaps their own intermediate result needs to do something in a finalfn. |
I was just copying something I remembered seeing the Timescale folks doing, but actually I hadn't thought much about what it actually meant, sorry. I thought it was useful for documenting the example, is all.
This is where I'm really stumped. I've been trying all sorts of different approaches. It seems like the "numeric" quality gets lost when passed to another function before returning with the final query results. Here's the most concise example I have come up with, with the SELECT vec_agg_mean(nums) FROM measurements;
vec_agg_mean
----------------------------------------------------------------
{1.23000000000000000000,1.9700000000000000,3.7000000000000000}
(1 row)
SELECT vec_trim_scale(vec_agg_mean(nums)) FROM measurements;
vec_trim_scale
------------------------------------------------
{94239562249968,94239562249980,94239562249992}
(1 row)
I have yet to be able to figure out what's going on here. |
This seems related to memory contexts to me, since the Datum is by-reference. It could also be TOAST-related, where we are failing to deTOAST or the Datums in our struct are getting serialized but then later point to something else. But if I'm reading your last message correctly, it doesn't actually depend on spooling to disk after all, and an intermediate value like that shouldn't be getting TOASTed (I think). The way we're calling built-in numeric functions to do the real work is suspicious to me too, since we don't have as much control over how things are being allocated. Perhaps try making a copy of each |
Yeah, it doesn't depend on spooling to disk. And this is based solely on invoking SELECT unnest(vec_agg_mean(nums)) FROM measurements;
NOTICE: avg 0 = 1.23000000000000000000
NOTICE: avg 1 = 1.9700000000000000
NOTICE: avg 2 = 3.7000000000000000
unnest
----------------
94777532913296
94777532913308
94777532913320
(3 rows) The I have tried making a copy of the result of that call to state->astate->dvalues[i] = datumCopy(DirectFunctionCall1(numeric_avg, state->vec_states[i]), state->astate->typbyval, state->astate->typlen); I'm afraid I get the same results as before, however. |
I tried asking for help on the pgsql-general mailing list but didn't get any takers. I've organized what I feel has culminated in the best performing approach in a feature/accum-cached-fcinfo branch. I cache and re-use Still, the same problem persists, and it affects botht he SELECT vec_agg_mean(nums), vec_trim_scale(vec_agg_mean(nums)) FROM measurements;
-[ RECORD 1 ]--+---------------------------------------------------------------
vec_agg_mean | {1.23000000000000000000,1.9700000000000000,3.7000000000000000}
vec_trim_scale | {94597491308128,94597491308140,94597491308152} There the If you think of anything, let me know! |
Tom Lane weighed in and found the issue: a dumb copy/paste problem in the SQL definition 😂. I fixed that, and now I think it's working finally! I have to get back in and test again, and add test cases, and then add support for other types... |
Ha, that's great! I'm glad he was able to help figure it out. I'm still interested in exploring the weird benchmark results, and I may have some opportunity the next couple days or this weekend. Also I'm still curious about |
I think I've been able to plow through everything on this issue, at least not with a true two-step aggregates but with "shared state" aggregates that achieve my immediate goals of having a fast-performing min/max/sum/mean numeric array aggregate. For example this query computes a single state value that is shared by all output columns: SELECT vec_agg_count(pad_vec(data_a, 3)) AS counts
, vec_agg_sum(pad_vec(data_a, 3)) AS sums
, vec_trim_scale(vec_agg_mean(pad_vec(data_a, 3))) AS means
, vec_agg_min(pad_vec(data_a, 3)) AS mins
, vec_agg_max(pad_vec(data_a, 3)) AS maxes
FROM measurements2;
-[ RECORD 1 ]------------------------------
counts | {184,0,180}
sums | {1164885081,NULL,11117.700}
means | {6330897.179347826087,NULL,61.765}
mins | {6255484,NULL,61.765}
maxes | {6477419,NULL,61.765} And that query seems to be about 6x as fast as the no-extension
And it is about 2x faster than the equivalent SQL using all the I ended up following the approach Postgres does with its own
For
I discovered this as I looked up how to call the appropriate delegate transition/final functions for each type. Everything is now pushed up in the feature/accum-cached-fcinfo branch. I've added tests as well. Feedback welcome! If you think this is worthy of a PR, I'm happy to submit one. |
I'd be happy for a PR!
|
Completed by #13 . |
Per our discussion in #8, I started work on two-step aggregation support. The first goal is to support count/min/max/sum/mean access functions after a
vec_stat_agg(numeric[])
aggregate. Note I usenumeric[]
here but hope the other number types can be supported as well. Essentially this would support a SQL statement like this:I started exploring this on a feature/two-step-aggs branch. I have a lot to grok still... was hoping you might be able to provide feedback.
Originally I thought I could just return an arbitrary
struct
and all thevec_to_X()
functions would know how to use, and I could define the aggregate likeHowever Postgres does not like this: it complains
The only way I could get Postgres to accept this was by using
anyarray
andanyelement
everywhere, like:But I started seeing how perhaps the
vec_stat_agg
aggregate's final function has to return an actual registered custom type? Sort of like theirComplex
example? Do you think that is how this needs to work, by defining a custom type?I started looking into this, in a new stats.c file with this:
The idea here that I'd need a variable-length type (as Numeric is variable length) and I'd store the aggregate state values as Datum and then perhaps all the number types could be encoded there, even though they are not variable length.
If this is to be an actual custom type, then I see I'd have to implement
*_in
and*_out
functions to de/serialize the object from/to text. But then I was getting a bit lost in how to actually approach that, with the variable-length aspect involved using Numeric types. Do you have any pointers here, or can comment on the approach overall with regards to the need for a custom type like this?The text was updated successfully, but these errors were encountered: