Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-step aggregation #10

Closed
msqr opened this issue Dec 6, 2021 · 24 comments
Closed

Two-step aggregation #10

msqr opened this issue Dec 6, 2021 · 24 comments

Comments

@msqr
Copy link
Contributor

msqr commented Dec 6, 2021

Per our discussion in #8, I started work on two-step aggregation support. The first goal is to support count/min/max/sum/mean access functions after a vec_stat_agg(numeric[]) aggregate. Note I use numeric[] here but hope the other number types can be supported as well. Essentially this would support a SQL statement like this:

SELECT vec_agg_count(vec_stat_agg(nums))
     , vec_agg_min(vec_stat_agg(nums))
     , vec_agg_max(vec_stat_agg(nums))
     , vec_agg_sum(vec_stat_agg(nums))
     , vec_agg_mean(vec_stat_agg(nums))
FROM measurements

I started exploring this on a feature/two-step-aggs branch. I have a lot to grok still... was hoping you might be able to provide feedback.

Originally I thought I could just return an arbitrary struct and all the vec_to_X() functions would know how to use, and I could define the aggregate like

CREATE OR REPLACE FUNCTION
vec_stat_agg_transfn(internal, numeric[])
RETURNS internal
AS 'aggs_for_vecs', 'vec_stat_agg_transfn'
LANGUAGE c;

CREATE OR REPLACE FUNCTION
vec_stat_agg_finalfn(internal, numeric[])
RETURNS internal
AS 'aggs_for_vecs', 'vec_stat_agg_finalfn'
LANGUAGE c;

CREATE AGGREGATE vec_stat_agg(numeric[]) (
  sfunc = vec_stat_agg_transfn,
  stype = internal,
  finalfunc = vec_stat_agg_finalfn,
  finalfunc_extra
);

-- vec_agg_count

CREATE OR REPLACE FUNCTION
vec_agg_count(internal)
RETURNS ANYARRAY
AS 'aggs_for_vecs', 'vec_agg_count'
LANGUAGE c;

However Postgres does not like this: it complains

ERROR: unsafe use of pseudo-type "internal" DETAIL: A function returning "internal" must have a least one "internal" argument.

The only way I could get Postgres to accept this was by using anyarray and anyelement everywhere, like:

CREATE OR REPLACE FUNCTION
vec_stat_agg_transfn(internal, anyarray)
RETURNS internal
AS 'aggs_for_vecs', 'vec_stat_agg_transfn'
LANGUAGE c;

CREATE OR REPLACE FUNCTION
vec_stat_agg_finalfn(internal, anyarray)
RETURNS anyelement
AS 'aggs_for_vecs', 'vec_stat_agg_finalfn'
LANGUAGE c;

CREATE AGGREGATE vec_stat_agg(anyarray) (
  sfunc = vec_stat_agg_transfn,
  stype = internal,
  finalfunc = vec_stat_agg_finalfn,
  finalfunc_extra
);

-- vec_agg_count

CREATE OR REPLACE FUNCTION
vec_agg_count(anyelement)
RETURNS ANYARRAY
AS 'aggs_for_vecs', 'vec_agg_count'
LANGUAGE c;

But I started seeing how perhaps the vec_stat_agg aggregate's final function has to return an actual registered custom type? Sort of like their Complex example? Do you think that is how this needs to work, by defining a custom type?

I started looking into this, in a new stats.c file with this:

// Aggregate element statistics to use in two-step aggregation functions.
typedef struct VecAggElementStats {
  char    vl_len_[4];
  Oid     elementType;
  uint32  count;
  Datum   sum;
  Datum   min;
  Datum   max;
  Datum   mean;
} VecAggElementStats;

The idea here that I'd need a variable-length type (as Numeric is variable length) and I'd store the aggregate state values as Datum and then perhaps all the number types could be encoded there, even though they are not variable length.

If this is to be an actual custom type, then I see I'd have to implement *_in and *_out functions to de/serialize the object from/to text. But then I was getting a bit lost in how to actually approach that, with the variable-length aspect involved using Numeric types. Do you have any pointers here, or can comment on the approach overall with regards to the need for a custom type like this?

@pjungwir
Copy link
Owner

pjungwir commented Dec 6, 2021

Yeah, I think a custom type is the way to go. Since it's intended as an intermediate value, the in/out functions are not too important. At first you could even just return "hello world" for out and something initialized to all zeros for in. Of course doing something "real" would be better.

Those functions are not really that hard to write. Postgres has some nice helper functions. Here is the in for multiranges, something I wrote a year or two ago. It might be a good example because it's not completely trivial, but it's still pretty easy to understand. A mini finite-state machine is a good way to go. And then even for something complicated, the out function should be very easy.

I'm not sure it makes sense to have Datum fields in this struct instead of the internal types---but I understand how that keeps it generic. Perhaps it's okay.

All these Datums will be of type elementType, right? So then for your _out you can call that type's own _out. And for your _in, you can use a separator that is guaranteed not to be in those types (which should be easy since they're all numbers). Then you can read until that separator and call the appropriate _in.

@msqr
Copy link
Contributor Author

msqr commented Dec 6, 2021

Great, thanks for the link to that example and confirmation on the approach.

I'm not sure it makes sense to have Datum fields in this struct instead of the internal types---but I understand how that keeps it generic. Perhaps it's okay.

All these Datums will be of type elementType, right? So then for your _out you can call that type's own _out. And for your _in, you can use a separator that is guaranteed not to be in those types (which should be easy since they're all numbers). Then you can read until that separator and call the appropriate _in.

Yes, all the Datums will be of elementType so using that type's _in and _out was exactly what I was hoping to be able to do. Alternatively I thought these could be pgnum values, if you think that would be a better approach? I figured I'd be trying to convert them to Datum frequently enough that maybe Datum was better?

@pjungwir
Copy link
Owner

pjungwir commented Dec 6, 2021

I can't think of any concrete reason using a Datum is bad. I suppose it just feels like Datum is for external interfaces and that struct is something internal. But yeah since you're using DirectFunctionCall a lot it's convenient to keep them that way. And it makes for some nice polymorphism.

Btw in case you haven't noticed, there are special-purpose InputFunctionCall and OutputFunctionCall functions you can use here instead of DirectFunctionCalling the component in/out functions.

@msqr
Copy link
Contributor Author

msqr commented Dec 7, 2021

I've been working through the custom type, defined currently as

CREATE TYPE vecaggstats (
    internallength = variable,
    input = vecaggstats_in,
    output = vecaggstats_out
);

and thanks for your suggestion of using placeholder values for the _in and _out functions to start. I was thinking it would be useful to lookup & cache the Oid of the vecaggstats type in the _PG_init function, however if I do this:

// cache our custom VecAggElementStats type Oid
static Oid VecAggElementStatsTypeOid;

void
_PG_init(void)
{
  // FIXME: this returns error "Unknown type: vecaggstats"... how do this?
  VecAggElementStatsTypeOid = typenameTypeId(NULL, typeStringToTypeName("vecaggstats"));
}

it throws an error when running CREATE EXTENSION aggs_for_vecs. I suppose that the vecaggstats type is not available until after that _PG_init callback then? Is there another way to lookup/cache the Oid just once after the extension is loaded? Or is this typeStringToTypeName() call not worth caching anyway?

@pjungwir
Copy link
Owner

pjungwir commented Dec 7, 2021

I don't think caching the type oid is correct. As you say the type doesn't exist yet, but also it seems like there may be problems if the type gets dropped/restored. Anyway Postgres has its own type cache so just using the typenameTypeId function should be very fast (and cached after the first time).

@msqr
Copy link
Contributor Author

msqr commented Dec 8, 2021

I see, it was the typeStringToTypeName function I thought might be worth caching, but I'll just call that as needed then, thanks.

@msqr
Copy link
Contributor Author

msqr commented Dec 8, 2021

I am to the point in feature/two-step-aggs where this SQL runs:

SELECT vec_agg_count(vec_stat_agg(nums)) FROM measurements;
 vec_agg_count 
---------------
 {4,3,4}

However I have so many doubts/questions that I'm unsure of the correctness of most of it. First up, the vec_stat_agg function is returning an array of VecAggElementsStatsType as previously discussed:

CREATE OR REPLACE FUNCTION
vec_stat_agg_finalfn(internal, numeric[])
RETURNS vecaggstats[]
AS 'aggs_for_vecs', 'vec_stat_agg_finalfn'
LANGUAGE c;
typedef struct VecAggElementStatsType {
  char    vl_len_[4];
  Oid     elemTypeId;
  uint32  count;
  Datum   sum;
  Datum   min;
  Datum   max;
  Datum   mean;
} VecAggElementStatsType;

I was thinking that actually that elemTypeId really is for the entire array, not each element. Do you think that instead a single value should be returned from vec_stat_agg, something like:

CREATE OR REPLACE FUNCTION
vec_stat_agg_finalfn(internal, numeric[])
RETURNS vecaggstats    -- no longer array here
AS 'aggs_for_vecs', 'vec_stat_agg_finalfn'
LANGUAGE c;
typedef struct VecAggStatsType {
  char   vl_len_[4];
  Oid    elemTypeId; // type of Datum used below
  uint32 elemCount;  // number of elements in Datum arrays below
  Datum *counts;
  Datum *sums;
  Datum *mins;
  Datum *maxes;
  Datum *means;
} VecAggStatsType;

If so, is that even be the way to define Datum arrays here, or would it need to be defined as ArrayType * like

typedef struct VecAggStatsType {
  char       vl_len_[4];
  Oid        elemTypeId; // type of Datum used below
  uint32     elemCount;  // number of elements in Datum arrays below
  ArrayType *counts;
  ArrayType *sums;
  ArrayType *mins;
  ArrayType *maxes;
  ArrayType *means;
} VecAggStatsType;

@pjungwir
Copy link
Owner

pjungwir commented Dec 8, 2021

I think returning a single value instead of an array makes more sense.

I think I'd make elemTypeId the type of the input vector's elements, not the type of the array.

I don't think I'd include means at all, since the accessor method can just do sums/counts.

I think I'd do Datum *foo rather than ArrayType *foo. Since these are just internal intermediate values, it doesn't seem worth it to build Postgres arrays for them.

@msqr
Copy link
Contributor Author

msqr commented Dec 8, 2021

I think returning a single value instead of an array makes more sense.

Great, I'll give that a go.

I think I'd make elemTypeId the type of the input vector's elements, not the type of the array.

Yes you are correct, I was glossing over the fact that most of the array types are the same as the input element type, but not necessarily so for things like counts which will be fixed to int8 even when the input is numeric.

I don't think I'd include means at all, since the accessor method can just do sums/counts.

The reason for this is that the mean value would actually calculated by a function call like numeric_avg on that element's state object. I thought that would be preferable (at least in the numeric case) because of all the edge cases numeric_avg takes care of. It is a bit of extra work if you don't want the mean but getting the mean from this aggregate is, I think, a primary use case. If you didn't want all the mean/sum/count values this aggregate calculates, I figure you'd just use a non two-step function. What do you think about that?

I think I'd do Datum *foo rather than ArrayType *foo. Since these are just internal intermediate values, it doesn't seem worth it to build Postgres arrays for them.

Right-o, I'll go with Datum * then.

Thanks for all the feedback!

@msqr
Copy link
Contributor Author

msqr commented Dec 9, 2021

I converted to that VecAggStatsType now, and cleaned up and implemented vec_agg_sum and vec_agg_mean functions so the following SQL runs:

SELECT vec_agg_count(vec_stat_agg(nums)) AS counts
	, vec_agg_sum(vec_stat_agg(nums)) AS sums
	, vec_trim_scale(vec_agg_mean(vec_stat_agg(nums))) AS means
FROM measurements;
 counts  |       sums        |      means      
---------+-------------------+-----------------
 {4,3,4} | {4.92,5.91,14.80} | {1.23,1.97,3.7}
(1 row)

And I verified via a debugger that indeed the vec_stat_agg(nums) part is executed just once! 🎉

@pjungwir
Copy link
Owner

pjungwir commented Dec 9, 2021

Awesome!

@msqr
Copy link
Contributor Author

msqr commented Dec 10, 2021

I've been testing out the performance of the new functions on a larger data set, and have a surprising result where the two-step aggregate is performing about the same, or even slightly slower than, the one-step aggregates. For example here is what I'm comparing:

-- one step variation
SELECT vec_to_count(data_i) AS counts
	, vec_to_sum(data_i) AS sums
	, vec_trim_scale(vec_to_mean(data_i)) AS means
	, vec_to_min(data_i) AS mins
	, vec_to_max(data_i) AS maxes
FROM measurements2;

-- two step variation; expected to be faster
SELECT vec_agg_count(vec_stat_agg(data_i)) AS counts
	, vec_agg_sum(vec_stat_agg(data_i)) AS sums
	, vec_trim_scale(vec_agg_mean(vec_stat_agg(data_i))) AS means
	, vec_agg_min(vec_stat_agg(data_i)) AS mins
	, vec_agg_max(vec_stat_agg(data_i)) AS maxes
FROM measurements2;

When run on this test data set, the two-step variation performs about 2x as fast, but on my larger data set it is about break-even and usually slightly slower. An eample one-step plan looks like

HashAggregate  (cost=291202.49..291263.99 rows=200 width=72) (actual time=331.386..2979.986 rows=20885 loops=1)
  Output: (time_bucket(('900 seconds'::cstring)::interval, d.ts)), vec_trim_scale(vec_to_mean(d.data_i)), solarcommon.array_transpose2d(ARRAY[(vec_to_count(d.data_i))::numeric[], vec_to_min(d.data_i), vec_to_max(d.data_i)])
  Group Key: time_bucket(('900 seconds'::cstring)::interval, d.ts)
  Buffers: shared hit=1747

and a two-step plan like this:

HashAggregate  (cost=291202.49..291265.99 rows=200 width=72) (actual time=385.891..3263.576 rows=20885 loops=1)
  Output: (time_bucket(('900 seconds'::cstring)::interval, d.ts)), vec_trim_scale(vec_agg_mean(vec_stat_agg(d.data_i))), solarcommon.array_transpose2d(ARRAY[(vec_agg_count(vec_stat_agg(d.data_i)))::numeric[], vec_agg_min(vec_stat_agg(d.data_i)), vec_agg_max(vec_stat_agg(d.data_i))])
  Group Key: time_bucket(('900 seconds'::cstring)::interval, d.ts)
  Buffers: shared hit=1747

I tried experimenting with using DirectFunctionCallX style invocations instead of the FunctionCallInvokestyle where possible, but that didn't make any noticeable difference.

I'm a bit stumped! Have any ideas here?

@msqr
Copy link
Contributor Author

msqr commented Dec 11, 2021

I took a look at the benchmark support you created and threw in some benchmarks to profile the numeric support. The results I get look like:

vec_to_mean
=============
SQL rows 20355.5 ms
C array 2247.86 ms

vec_to_mean_numeric
===================
SQL rows 21820.2 ms
C array 4162.31 ms

vec_agg_stat                   <-- this is the two-step variation for count/min/max/avg
============
SQL rows 23994.9 ms
C array 3767.61 ms

vec_agg_stats                  <-- this is the one-step variation for count/min/max/avg
=============
SQL rows 23695 ms
C array 5827.55 ms

So in this query the two-step variation is actually faster. Do you have any tips on approaching some low-level profiling of the functions, I've not done anything like that in C before.

Also, I discovered that if the aggregate spools to temp files, the output of the two-step aggregation is incorrect and sometimes the server even crashes. Would that be from the lack of real _in and _out functions? Or some other fault in the code?

@msqr
Copy link
Contributor Author

msqr commented Dec 12, 2021

I am still curious about the two-step approach with the custom time, however I also realized I could achieve a more limited "two-step" benefit the same way Postgres does for the avg() and sum() aggregates: they share the same transition function and just use different final functions. Given I'm interested in count/min/max support as well, I started a feature/accum branch and threw those in a shared transition function vec_stat_accum, then added count, min, max, sum, and average final functions. For example:

CREATE AGGREGATE vec_agg_count(numeric[]) (
  sfunc     = vec_stat_accum,
  stype     = internal,
  finalfunc = vec_agg_count_finalfn
);
CREATE AGGREGATE vec_agg_mean(numeric[]) (
  sfunc     = vec_stat_accum,
  stype     = internal,
  finalfunc = vec_agg_mean_finalfn
);

Then Postgres still only creates a single state object for all these columns, and I achieve that "two-step" performance benefit I was seeking:

SELECT vec_agg_count(data_i) AS counts
	, vec_agg_sum(data_i) AS sums
	, vec_trim_scale(vec_agg_mean(data_i)) AS means
	, vec_agg_min(data_i) AS mins
	, vec_agg_max(data_i) AS maxes
FROM measurements2;

This is working for me even against the data sets that cause the two-step approach to fail as mentioned previously. Also the benchmarks are similar:

vec_agg_stat                   <-- this is the new vec_agg_X variation for count/min/max/avg/sum
============
SQL rows 24211.1 ms
C array 3841.45 ms

vec_agg_stats                  <-- this is the original vec_to_X variation for count/min/max/avg/sum
=============
SQL rows 24292.2 ms
C array 7910.46 ms

What isn't quite as nice with this approach is how other statistics like the var_samp ones would require additional or different delegate function invocations, which support calculating stuff like the sum-of-squares. For example var_samp(numeric) calls numeric_accum instead of numeric_avg_accum like avg(numeric) does, the only difference between the two being numerc_avg_accum does not calculate the sum-of-squares.

It is nice not to have to perform the extra calculations for sum-of-squares when not needed (especially for numerics, which are expensive), so I thought an aggregate argument to toggle that support on/off could be used, like

SELECT vec_agg_var_samp(data_i, variance => true)
     , vec_agg_mean(data_i, variance => true)
FROM measurements2;

But passing , variance => true to things like vec_agg_mean just so the optimizer can create just one state instance makes the SQL confusing, I think. This seems more clear to me:

SELECT vec_agg_var_samp(vec_agg_stat(data_i, variance => true))
     , vec_agg_mean(vec_agg_stat(data_i, variance => true))
FROM measurements2;

But, the two-step branch does not fully work so I'm missing something.

@pjungwir
Copy link
Owner

I didn't get a chance to dig into these questions on the weekend. I agree having variance => true on vec_agg_{mean,var_samp} isn't really something I want, but it does make sense on vec_agg_stat I think.

Is that an hstore argument? I haven't seen that used for named arguments, but I like it!

Your benchmarks are really surprising to me. I'm interested in exploring that some myself, but I probably won't be able to right away. Same with the corruption when spooling to disk (but yeah it's possibly because of the in/our functions).

I'm surprised that re-using the same transfn means it doesn't get called twice. I thought it was just a means of code re-use, not a potential for optimization. I suppose it's true that Postgres shouldn't need to compute it twice though. Nice!

That does make me wonder why the Timescale folks aren't just relying on that. Perhaps their own intermediate result needs to do something in a finalfn.

@msqr
Copy link
Contributor Author

msqr commented Dec 13, 2021

Is that an hstore argument? I haven't seen that used for named arguments, but I like it!

I was just copying something I remembered seeing the Timescale folks doing, but actually I hadn't thought much about what it actually meant, sorry. I thought it was useful for documenting the example, is all.

Same with the corruption when spooling to disk (but yeah it's possibly because of the in/our functions).

This is where I'm really stumped. I've been trying all sorts of different approaches. It seems like the "numeric" quality gets lost when passed to another function before returning with the final query results. Here's the most concise example I have come up with, with the vec_agg_mean aggregate. I have a log that prints out the final average value, and the log lines are correct, and the results in psql are correct. However, if I instead pass the results to vec_trim_scale the output turns into what appear like memory addresses. Here's my query:

SELECT vec_agg_mean(nums) FROM measurements;
                          vec_agg_mean                          
----------------------------------------------------------------
 {1.23000000000000000000,1.9700000000000000,3.7000000000000000}
(1 row)

SELECT vec_trim_scale(vec_agg_mean(nums)) FROM measurements;
                 vec_trim_scale                 
------------------------------------------------
 {94239562249968,94239562249980,94239562249992}
(1 row)

I have yet to be able to figure out what's going on here.

@pjungwir
Copy link
Owner

This seems related to memory contexts to me, since the Datum is by-reference. It could also be TOAST-related, where we are failing to deTOAST or the Datums in our struct are getting serialized but then later point to something else. But if I'm reading your last message correctly, it doesn't actually depend on spooling to disk after all, and an intermediate value like that shouldn't be getting TOASTed (I think). The way we're calling built-in numeric functions to do the real work is suspicious to me too, since we don't have as much control over how things are being allocated. Perhaps try making a copy of each Datum before you store it in your struct?

@msqr
Copy link
Contributor Author

msqr commented Dec 13, 2021

Yeah, it doesn't depend on spooling to disk. And this is based solely on invoking numeric_avg on the numeric element's state object. I can even do this to trigger it:

SELECT unnest(vec_agg_mean(nums)) FROM measurements;
NOTICE:  avg 0 = 1.23000000000000000000
NOTICE:  avg 1 = 1.9700000000000000
NOTICE:  avg 2 = 3.7000000000000000
     unnest     
----------------
 94777532913296
 94777532913308
 94777532913320
(3 rows)

The NOTICE logs are in the aggregate final function, showing the result of calling numerc_avg. They produce the correct output there.

I have tried making a copy of the result of that call to numeric_avg, e.g.

state->astate->dvalues[i] = datumCopy(DirectFunctionCall1(numeric_avg, state->vec_states[i]), state->astate->typbyval, state->astate->typlen);

I'm afraid I get the same results as before, however.

@msqr
Copy link
Contributor Author

msqr commented Dec 14, 2021

I tried asking for help on the pgsql-general mailing list but didn't get any takers. I've organized what I feel has culminated in the best performing approach in a feature/accum-cached-fcinfo branch. I cache and re-use FunctionCallInfo instances for the numeric_avg_accum and numeric_cmp functions on the aggregate state, and then create another final-function-local FunctionCallInfo to loop over the final array results with in vec_agg_mean and vec_agg_sum.

Still, the same problem persists, and it affects botht he vec_agg_mean and vec_agg_sum final functions. You can see correct and incorrect results in a query like

SELECT vec_agg_mean(nums), vec_trim_scale(vec_agg_mean(nums)) FROM measurements;
-[ RECORD 1 ]--+---------------------------------------------------------------
vec_agg_mean   | {1.23000000000000000000,1.9700000000000000,3.7000000000000000}
vec_trim_scale | {94597491308128,94597491308140,94597491308152}

There the vec_agg_mean result is correct, while the same result passed through vec_trim_scale produces incorrect results. So I'm stuck for now without any new ideas. 🤔

If you think of anything, let me know!

@msqr
Copy link
Contributor Author

msqr commented Dec 16, 2021

Tom Lane weighed in and found the issue: a dumb copy/paste problem in the SQL definition 😂. I fixed that, and now I think it's working finally! I have to get back in and test again, and add test cases, and then add support for other types...

@pjungwir
Copy link
Owner

Ha, that's great! I'm glad he was able to help figure it out.

I'm still interested in exploring the weird benchmark results, and I may have some opportunity the next couple days or this weekend. Also I'm still curious about vec_to_var_samp getting a different result from the built-in version. Is there anything else you need help with this this issue's line of work?

@msqr
Copy link
Contributor Author

msqr commented Dec 16, 2021

I think I've been able to plow through everything on this issue, at least not with a true two-step aggregates but with "shared state" aggregates that achieve my immediate goals of having a fast-performing min/max/sum/mean numeric array aggregate. For example this query computes a single state value that is shared by all output columns:

SELECT vec_agg_count(pad_vec(data_a, 3)) AS counts
	, vec_agg_sum(pad_vec(data_a, 3)) AS sums
	, vec_trim_scale(vec_agg_mean(pad_vec(data_a, 3))) AS means
	, vec_agg_min(pad_vec(data_a, 3)) AS mins
	, vec_agg_max(pad_vec(data_a, 3)) AS maxes
FROM measurements2;

-[ RECORD 1 ]------------------------------
counts | {184,0,180}
sums   | {1164885081,NULL,11117.700}
means  | {6330897.179347826087,NULL,61.765}
mins   | {6255484,NULL,61.765}
maxes  | {6477419,NULL,61.765}

And that query seems to be about 6x as fast as the no-extension unnest() SQL:

vec_agg_stat
============
SQL rows 24083.8 ms
C array 3595.71 ms

And it is about 2x faster than the equivalent SQL using all the vec_to_X aggregates.

I ended up following the approach Postgres does with its own sum() and avg() aggregates, in that for vec_agg_sum()

  • on int2, int4 input, int8 is output
  • on int8 and numeric input, numeric is output
  • on float4 and float8 input, float8 is output

For vec_agg_mean()

  • on float4 and float8 input, float8 is output
  • on other input, numeric is output

I discovered this as I looked up how to call the appropriate delegate transition/final functions for each type.

Everything is now pushed up in the feature/accum-cached-fcinfo branch. I've added tests as well. Feedback welcome! If you think this is worthy of a PR, I'm happy to submit one.

@pjungwir
Copy link
Owner

pjungwir commented Dec 20, 2021 via email

@pjungwir
Copy link
Owner

Completed by #13 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants