Add support for pg custom type (tuple) arguments in window PL/R functions #56

ikasou · 2019-04-21T10:46:39Z

This is based on
a) the initial work by jconway which converted tuple arguments to PL/R non-window functions into one-row R dataframes (function pg_tuple_get_r_frame), and
b) the very recent additions by mlt as model for the addition of pg_window_frame_get_r_frame() to return a multi-row R dataframe containing PG window frame custom type (tuple) data in fargX.

Please let me know your thoughts on this. Incidentally I have spotted what looks like a bug in plr_convertargs() where numels gets set close to the end. Moreover this constant value optimization is changing past intended behaviour: Would an on-the-fly N-times copy of the constant element into the fargX vector be a better solution maybe?

mlt · 2019-04-27T03:59:59Z

It seems sql file for testing is missing from commits although expected output is there. Also note that AppVeyor tests actually failed in few placed. ~~I'll take a closer look why builds weren't reported as failed.~~ It is d754480 in another PR that is WIP.

I didn't look close, but would it be possible to work this into an existing function somehow? I feel like there is lots of repetitions in the code in general. I'm currently trying to DRY up set returning functions (along with x8 speed up in my tests). And it seems at the first glance that there is lots of commonality between our functions.

Regarding numels at the end… It is not set, but the vector is trimmed down to that length. I deliberately allocate vector as long as entire partition then I trim it down if window frame is bound, i.e. shorter than partition. Vector allocations are fast in R so why bother copying things twice? I hope setting the length actually trims the vector and frees up memory 😇

While it does somewhat break existing behavior and it is doable to pass a constant fargX, but why would you want that in the first place if it is constant anyway? Also shall it be allocated as a vector of that constant then as one could have used subscripts. Just use originally passed value. I don't know if it is common when some arguments aren't expected as constants but just happen to be such for whatever reason in some cases and aren't constant in other… But it just seemed odd to me when I implemented it. Anyway...if there is a legit reason to pass a vector full of constant value aside from code smell, then sure it has to be implemented.

ikasou · 2019-04-30T18:38:15Z

Hi there and thanks for the feedback. I take your point about repetition and will try to merge our code, although I am a little concerned that it may result in a huge and unwieldy function which may defeat the purpose - but let me get back to you on that towards the end of next week.

Regarding numels at the end, and correct me if I am wrong, but I get the sense you refer above to the end of the pg_window_frame_get_r() function whereas I was referring to the end of the plr_convertargs() function in plr.c and more specifically looking at the line reading:
numels = function->nargs > 0 ? GET_LENGTH(el) : 0;
In some cases el may equal R_NilValue because of:
el = get_fn_expr_arg_stable(fcinfo->flinfo, i) ? R_NilValue : XXXXX
So I was a bit concerned there may be a problem there, but I could always be missing something.

Finally, with regards to the constant argX issue my personal view based on my preference of a) consistency of function and hence b) user-friendliness over favouring extreme optimisation would be to not break existing behaviour and return what is expected, while obviously special-casing the situation and handling it with a fast copy. I think things are complicated enough already for someone attempting to use PL/R that they should not have to read the code to see why they get null when they really expected the frame's worth of data. For example, I use PL/R for rolling regressions over N-day time windows, and constant time series are just another case of valid time-series, so I would not expect to have to special-case them! Needless to say opinions here can vary ..

mlt · 2019-04-30T22:33:57Z

Thank you for catching that numels thing! Indeed, fnumrows would be zero if last argument is a stable expression. I will take maximum or 0 if all stable.

I know what you are saying about preserving the interface. However, sometimes breakage does happen. Even in the code as is, I preserve fargX count instead of simply skipping over stable expressions in case those are passed first. I stated my personal view on this in another GitHub issue. I personally am in favor of explicit request for frame arguments like getFrame(mycol) then deparse mycol and extract frame data for the namesake argument of a function. The simple case would be cumulative sum over partition. Tehnically, we do not need frame data at all (to minimize memory footprint) but the state variable so why bother extracting frame data implicitly every call or even once? I would say we need more discussion on this. Once again, I am not set in stone with my decision how I implemented it. I hope more users will chime in.

If you do store constant time series, they are not stable in PG terms. But indeed, if one let's say calls mean(farg1) on various things including constants, then it needs to be wrapped in ifelse(is.null(farg1), my_var, farg1).

Regarding your case, I would say you will see a drastic speed up if you do a rolling window regression entirely within R for entire partition (unbounded frame), then slowly return results one by one… unless you bound the frame by just a handful of rows. I would be curious to hear timing comparison.

On a side note, I'm not sure what your mileage with git is, the following might be useful. In the future, open PRs from a custom branch. Do not use master for your changes as it will be difficult to keep up with origin/master changes and rebasing. I see you merge changes back instead of rebasing. It results in a convoluted history later.... unless someone will squash PR 😇 For now, you can rename origin/master into something else like git fetch origin master:vanilla (that command will always overwrite vanilla branch with most up to date origin/master and you can abuse it) and then rebase your changes in your checked out master on top of vanilla with git rebase -i vanilla, followed by a force push git push your_remote --force-with-lease. This will clean up your commit history for this PR. I personally also like to abuse autosquashing.

Addresses issue raised in postgres-plr#56

ikasou · 2019-05-18T15:38:55Z

I managed to create a single function returning the window_frame that looks reasonably sane, please have a look and let me know thoughts.

Unlike the code, the git commit history is a complete mess though, although I did try to follow your suggestions - sorry about that.

Finally I just noticed your commit regarding the numels glitch we discussed, I think it is still not taking into account the cases where plr_is_unbound_frame(win obj) == TRUE, returning 0 in those cases. Is that intended?

Around the same location, I also noticed that a PROTECT() was missing around the 2 calls to pg_window_frame_get_r(winobj, i, function) and so I added that, I believe it is necessary, let me know what you think.

mlt · 2019-05-20T21:30:47Z

I'll take a closer look at your code.

Regarding numels when plr_is_unbound_frame, you are right. The thing is, I personally don't find fnumrows useful at all. In most cases, one does something in R without providing an explicit argument length. Also, it is always possible to use something like length(farg1) (essentially what code does). But regardless my preferences, you are correct, it is missing and shall be fixed for compatibility! Thank you for pointing it out.

Regarding PROTECT…The outer vector is protected and I think macros we use are safe in a sense they won't lead to memory allocations. There is a very "similar" example in "Writing R Extensions" where the result of eval is assigned to a vector element without wrapping that in PROTECT. I know it is tempting to be "overprotective". But I think lots of that can be safely removed from PL/R code. But before doing this, it would be nice to use gctorture(TRUE) while running tests, in case I am very wrong here😨

Last but not least…Not trying to start an eternal debate, but it seems PL/R as well as the rest of PG prefers tabs over spaces 😇 I think we should add .dir-locals.el as well as .editorconfig so github diffs are friendlier?

mlt · 2019-05-20T22:09:04Z

It seems draft PR's are not supported on Travis. Would you flip it and add a big bold statement it is a work in progress in the very first post? I hope it is okay with @davecramer

Also, remove a check from the test if you insist on passing constant expressions through.

davecramer · 2019-05-21T10:23:59Z

ya, it's fine... wonder why they aren't supported.

mlt · 2019-05-22T23:09:17Z

I think we can go a step further and allow RECORD input type mlt@ff2151e I am not sure whether there is a better test for setting arg_is_rel… Perhaps leave typrelid out and just switch (typtype) { case 'c': case 'p': ... default: ...}.

hm... I just realized we need to actually add corresponding CI tests. But we need to get uniform across PG versions testing into master first for that.

* Unify testing pre & post PG12 * Log with any PG error level This closes #9 * Inline language handler This allows to have SQL statements like do language plr $$ pg.throwlog("Hello, world!) $$; * Update Makefile for inline handler * Basic syntax checking validator Try to parse R function body. No other checks done. * Do not export unnecessary functions on Windows This partially reverts squashed c42eb9c * Split RECORD returning case from SRF * Unify handling one or many OUT arguments Make little to no difference between returning a RECORD or having one or a few OUT arguments * Test NaN * Numeric Datum isn't Float8 * Test OUT arguments * Fix incorrect fnumrows when last argument is a stable expression Addresses issue raised in #56 * Fix up NaN passed into R

ikasou · 2019-05-31T19:42:56Z

I have cleaned up the commit history by rebasing on 8.4 and squashing all intermediate commits
Kept just the cherry-picked one with record arguments by @mlt
(and had to stick the const argument check disable in that - sorry)
I believe it is ready for reviewing and potential inclusion upstream, so I am deleting the comment in the beginning.

mlt · 2019-05-31T20:13:18Z

Do you feel like fixing tabs? 😀 Also get an invite and join the chat.

ikasou · 2019-06-01T13:51:48Z

tabs should be good now ;-)

davecramer · 2021-05-24T11:31:29Z

@ikasou my apologies. It would appear this has languished far too long without attention. Can you confirm that this is still PR is still valid ?

pg_conversion.c

AndreMikulec · 2021-05-25T03:10:37Z

@ikasou and @mlt, if this feature "pg custom type (tuple) arguments in window PL/R functions"
would be integrated into a "personal" plr, today, by anyone,
with PostgreSQL 13 and R 4.1.0, what changes, if any, would one recommend?

AndreMikulec · 2021-05-25T03:32:29Z

@ikasou and @mlt,

The C code looks clean and well programmed.

I have created a starter branch here.

https://github.com/AndreMikulec/plr/tree/master_pg_custom_type_tuple_args_in_window_plr_funcs_dev

For my personal use, I am going to try to implement this feature (into a year 2021 personal plr). This integration may require some time.

ikasou · 2021-05-25T09:25:50Z

Hi @AndreMikulec I believe the fundamentals have not changed hence this feature addition should still be valid and would be worth incorporating in the codebase as it is quite useful.

AndreMikulec · 2021-05-25T12:14:50Z

@ikasou,

Hi @AndreMikulec I believe the fundamentals have not changed hence this feature addition should still be valid and would be worth incorporating in the codebase as it is quite useful.

O.K. Thanks.

codecov-commenter · 2021-05-25T14:28:26Z

Codecov Report

Merging #56 (27cbca3) into master (b019a01) will decrease coverage by 2.38%.
The diff coverage is 42.85%.

❗ Current head 27cbca3 differs from pull request most recent head 890e144. Consider uploading reports for the commit 890e144 to get more accurate results

@@            Coverage Diff             @@
##           master      #56      +/-   ##
==========================================
- Coverage   72.68%   70.29%   -2.39%     
==========================================
  Files           5        5              
  Lines        2120     2175      +55     
==========================================
- Hits         1541     1529      -12     
- Misses        579      646      +67

Impacted Files	Coverage Δ
pg_conversion.c	`68.85% <40.33%> (-6.12%)`	⬇️
plr.c	`85.43% <85.71%> (+0.67%)`	⬆️
pg_backend_support.c	`64.02% <0.00%> (-0.52%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3bbf1e9...890e144. Read the comment docs.

…num_frame_row

AndreMikulec · 2021-05-26T00:51:22Z

@davecramer

From 1 workflow awaiting approval
Would you mind approving @ikasou to run on your Github Actions:
https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks

Thanks,
Andre

AndreMikulec · 2021-05-26T00:51:47Z

@ikasou
Thank you, for the excellent pull request.
This saves much much much time.

To make the regression test of opt_window_frame, run,
would you mind, in appveyor.yml, changing (or adding to) the line 166 from

bad_fun opt_window do out_args plr_transaction 2>&1 |

to

bad_fun opt_window do out_args plr_transaction opt_window_frame 2>&1 |

and re-submit the pull request again.

Thanks,
Andre

AndreMikulec · 2021-05-26T12:12:08Z

@ikasou and @davecramer

Both Travis and Github Actions did-not-pickup the force-push change 
(nor the entire MAY 25 2021 - 56 pull)

They ran something pre-"MAY 25 2021 - 56 pull".

https://travis-ci.org/github/postgres-plr/plr/jobs/772448332#L2398

https://github.com/postgres-plr/plr/pull/56/checks?sha=fc99faded1884eda7626d406ad025e0ee6dcfc5c#step:5:83

My mistake:
@davecramer picked up add opt_window_frame to regression tests in makefile
https://github.com/postgres-plr/plr/runs/2675452663

davecramer · 2021-05-26T12:35:02Z

it seems travis may pick it up if I close this PR and reopen it. Not sure about github actions though

davecramer · 2021-05-26T12:38:44Z

attempt to get travis to build off of the current sha

AndreMikulec · 2021-05-26T13:20:30Z

@ikasou

Updated regression checks (all not-passing) with "opt_window_frame" added to the Makefile are here: #113

--

@ikasou

May you take a look at? . . .

The opt_window_frame regression test is not passing (itself)
https://github.com/postgres-plr/plr/blob/e4e8887cd46cbc3237abd8108ccc62ab212b4c4f/expected/opt_window_frame.out

The expected result is with or without row 10.
https://ci.appveyor.com/project/davecramer/plr-lek5y/builds/39330178/job/yg5akbwatb896e9k?fullLog=true#L2597

davecramer · 2021-05-27T12:07:27Z

@ikasou thanks again for your diligence getting this merged int

AndreMikulec · 2021-05-27T21:02:15Z

@ikasou
Thanks!

…w PL/R functions postgres-plr#56"

mlt added a commit to mlt/plr that referenced this pull request May 10, 2019

Fix incorrect fnumrows when last argument is a stable expression

97848d3

Addresses issue raised in postgres-plr#56

mlt added a commit to mlt/plr that referenced this pull request May 10, 2019

Fix incorrect fnumrows when last argument is a stable expression

49ec6a4

Addresses issue raised in postgres-plr#56

mlt added a commit to mlt/plr that referenced this pull request May 10, 2019

Fix incorrect fnumrows when last argument is a stable expression

8181f1d

Addresses issue raised in postgres-plr#56

mlt added a commit to mlt/plr that referenced this pull request May 10, 2019

Fix incorrect fnumrows when last argument is a stable expression

01f517c

Addresses issue raised in postgres-plr#56

mlt added a commit to mlt/plr that referenced this pull request May 14, 2019

Fix incorrect fnumrows when last argument is a stable expression

d79a94b

Addresses issue raised in postgres-plr#56

ikasou marked this pull request as ready for review May 22, 2019 19:32

ikasou force-pushed the master branch from 16dbac4 to 645fe5e Compare May 22, 2019 19:37

ikasou force-pushed the master branch 2 times, most recently from e6c4782 to 72e2890 Compare May 31, 2019 19:36

ikasou force-pushed the master branch from 72e2890 to 920dbb6 Compare June 1, 2019 13:48

davecramer reviewed May 24, 2021

View reviewed changes

pg_conversion.c Outdated Show resolved Hide resolved

davecramer mentioned this pull request May 24, 2021

Appveyor: patching of Mkvcbuild.pm and vcregress.pl, should happen automatically #89

Closed

ikasou and others added 3 commits May 25, 2021 17:34

Tuple arguments support in PL/R Window Functions

4cd4151

Allow to take RECORD as input argument type

c8cfa22

Change var names for clarity: nr to num_partition_rows and numels to …

88ebb77

…num_frame_row

fix regtest for opt_window_frame

fc99fad

ikasou force-pushed the master branch from 8952a3d to fc99fad Compare May 26, 2021 11:02

davecramer closed this May 26, 2021

davecramer reopened this May 26, 2021

davecramer mentioned this pull request May 26, 2021

copy of pr #56 for testing #113

Closed

ikasou added 2 commits May 27, 2021 09:26

Remove last (empty) row of expected/opt_window_frame.out

890e144

Add opt_window_frame to REGRESS var in Makefile

96f10e9

davecramer merged commit fffa692 into postgres-plr:master May 27, 2021

davecramer mentioned this pull request May 27, 2021

Release 8.4.2 #114

Open

AndreMikulec added a commit to AndreMikulec/plr that referenced this pull request May 28, 2021

catchup to "Add support for pg custom type (tuple) arguments in windo…

1ea4bb5

…w PL/R functions postgres-plr#56"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for pg custom type (tuple) arguments in window PL/R functions #56

Add support for pg custom type (tuple) arguments in window PL/R functions #56

ikasou commented Apr 21, 2019 •

edited

Loading

mlt commented Apr 27, 2019 •

edited

Loading

ikasou commented Apr 30, 2019

mlt commented Apr 30, 2019

ikasou commented May 18, 2019

mlt commented May 20, 2019

mlt commented May 20, 2019

davecramer commented May 21, 2019

mlt commented May 22, 2019 •

edited

Loading

ikasou commented May 31, 2019

mlt commented May 31, 2019

ikasou commented Jun 1, 2019

davecramer commented May 24, 2021

AndreMikulec commented May 25, 2021

AndreMikulec commented May 25, 2021

ikasou commented May 25, 2021

AndreMikulec commented May 25, 2021

codecov-commenter commented May 25, 2021 •

edited

Loading

AndreMikulec commented May 26, 2021

AndreMikulec commented May 26, 2021

AndreMikulec commented May 26, 2021 •

edited

Loading

davecramer commented May 26, 2021

davecramer commented May 26, 2021

AndreMikulec commented May 26, 2021 •

edited

Loading

davecramer commented May 27, 2021

AndreMikulec commented May 27, 2021

Add support for pg custom type (tuple) arguments in window PL/R functions #56

Add support for pg custom type (tuple) arguments in window PL/R functions #56

Conversation

ikasou commented Apr 21, 2019 • edited Loading

mlt commented Apr 27, 2019 • edited Loading

ikasou commented Apr 30, 2019

mlt commented Apr 30, 2019

ikasou commented May 18, 2019

mlt commented May 20, 2019

mlt commented May 20, 2019

davecramer commented May 21, 2019

mlt commented May 22, 2019 • edited Loading

ikasou commented May 31, 2019

mlt commented May 31, 2019

ikasou commented Jun 1, 2019

davecramer commented May 24, 2021

AndreMikulec commented May 25, 2021

AndreMikulec commented May 25, 2021

ikasou commented May 25, 2021

AndreMikulec commented May 25, 2021

codecov-commenter commented May 25, 2021 • edited Loading

Codecov Report

AndreMikulec commented May 26, 2021

AndreMikulec commented May 26, 2021

AndreMikulec commented May 26, 2021 • edited Loading

davecramer commented May 26, 2021

davecramer commented May 26, 2021

AndreMikulec commented May 26, 2021 • edited Loading

davecramer commented May 27, 2021

AndreMikulec commented May 27, 2021

ikasou commented Apr 21, 2019 •

edited

Loading

mlt commented Apr 27, 2019 •

edited

Loading

mlt commented May 22, 2019 •

edited

Loading

codecov-commenter commented May 25, 2021 •

edited

Loading

AndreMikulec commented May 26, 2021 •

edited

Loading

AndreMikulec commented May 26, 2021 •

edited

Loading