Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pg custom type (tuple) arguments in window PL/R functions #56

Merged
merged 6 commits into from
May 27, 2021

Conversation

ikasou
Copy link

@ikasou ikasou commented Apr 21, 2019

This is based on
a) the initial work by jconway which converted tuple arguments to PL/R non-window functions into one-row R dataframes (function pg_tuple_get_r_frame), and
b) the very recent additions by mlt as model for the addition of pg_window_frame_get_r_frame() to return a multi-row R dataframe containing PG window frame custom type (tuple) data in fargX.

Please let me know your thoughts on this. Incidentally I have spotted what looks like a bug in plr_convertargs() where numels gets set close to the end. Moreover this constant value optimization is changing past intended behaviour: Would an on-the-fly N-times copy of the constant element into the fargX vector be a better solution maybe?

@mlt
Copy link
Collaborator

mlt commented Apr 27, 2019

It seems sql file for testing is missing from commits although expected output is there. Also note that AppVeyor tests actually failed in few placed. I'll take a closer look why builds weren't reported as failed. It is d754480 in another PR that is WIP.

I didn't look close, but would it be possible to work this into an existing function somehow? I feel like there is lots of repetitions in the code in general. I'm currently trying to DRY up set returning functions (along with x8 speed up in my tests). And it seems at the first glance that there is lots of commonality between our functions.

Regarding numels at the end… It is not set, but the vector is trimmed down to that length. I deliberately allocate vector as long as entire partition then I trim it down if window frame is bound, i.e. shorter than partition. Vector allocations are fast in R so why bother copying things twice? I hope setting the length actually trims the vector and frees up memory 😇

While it does somewhat break existing behavior and it is doable to pass a constant fargX, but why would you want that in the first place if it is constant anyway? Also shall it be allocated as a vector of that constant then as one could have used subscripts. Just use originally passed value. I don't know if it is common when some arguments aren't expected as constants but just happen to be such for whatever reason in some cases and aren't constant in other… But it just seemed odd to me when I implemented it. Anyway...if there is a legit reason to pass a vector full of constant value aside from code smell, then sure it has to be implemented.

@ikasou
Copy link
Author

ikasou commented Apr 30, 2019

Hi there and thanks for the feedback. I take your point about repetition and will try to merge our code, although I am a little concerned that it may result in a huge and unwieldy function which may defeat the purpose - but let me get back to you on that towards the end of next week.

Regarding numels at the end, and correct me if I am wrong, but I get the sense you refer above to the end of the pg_window_frame_get_r() function whereas I was referring to the end of the plr_convertargs() function in plr.c and more specifically looking at the line reading:
numels = function->nargs > 0 ? GET_LENGTH(el) : 0;
In some cases el may equal R_NilValue because of:
el = get_fn_expr_arg_stable(fcinfo->flinfo, i) ? R_NilValue : XXXXX
So I was a bit concerned there may be a problem there, but I could always be missing something.

Finally, with regards to the constant argX issue my personal view based on my preference of a) consistency of function and hence b) user-friendliness over favouring extreme optimisation would be to not break existing behaviour and return what is expected, while obviously special-casing the situation and handling it with a fast copy. I think things are complicated enough already for someone attempting to use PL/R that they should not have to read the code to see why they get null when they really expected the frame's worth of data. For example, I use PL/R for rolling regressions over N-day time windows, and constant time series are just another case of valid time-series, so I would not expect to have to special-case them! Needless to say opinions here can vary ..

@mlt
Copy link
Collaborator

mlt commented Apr 30, 2019

Thank you for catching that numels thing! Indeed, fnumrows would be zero if last argument is a stable expression. I will take maximum or 0 if all stable.

I know what you are saying about preserving the interface. However, sometimes breakage does happen. Even in the code as is, I preserve fargX count instead of simply skipping over stable expressions in case those are passed first. I stated my personal view on this in another GitHub issue. I personally am in favor of explicit request for frame arguments like getFrame(mycol) then deparse mycol and extract frame data for the namesake argument of a function. The simple case would be cumulative sum over partition. Tehnically, we do not need frame data at all (to minimize memory footprint) but the state variable so why bother extracting frame data implicitly every call or even once? I would say we need more discussion on this. Once again, I am not set in stone with my decision how I implemented it. I hope more users will chime in.

If you do store constant time series, they are not stable in PG terms. But indeed, if one let's say calls mean(farg1) on various things including constants, then it needs to be wrapped in ifelse(is.null(farg1), my_var, farg1).

Regarding your case, I would say you will see a drastic speed up if you do a rolling window regression entirely within R for entire partition (unbounded frame), then slowly return results one by one… unless you bound the frame by just a handful of rows. I would be curious to hear timing comparison.

On a side note, I'm not sure what your mileage with git is, the following might be useful. In the future, open PRs from a custom branch. Do not use master for your changes as it will be difficult to keep up with origin/master changes and rebasing. I see you merge changes back instead of rebasing. It results in a convoluted history later.... unless someone will squash PR 😇 For now, you can rename origin/master into something else like git fetch origin master:vanilla (that command will always overwrite vanilla branch with most up to date origin/master and you can abuse it) and then rebase your changes in your checked out master on top of vanilla with git rebase -i vanilla, followed by a force push git push your_remote --force-with-lease. This will clean up your commit history for this PR. I personally also like to abuse autosquashing.

mlt added a commit to mlt/plr that referenced this pull request May 10, 2019
mlt added a commit to mlt/plr that referenced this pull request May 10, 2019
mlt added a commit to mlt/plr that referenced this pull request May 10, 2019
mlt added a commit to mlt/plr that referenced this pull request May 10, 2019
mlt added a commit to mlt/plr that referenced this pull request May 14, 2019
@ikasou
Copy link
Author

ikasou commented May 18, 2019

I managed to create a single function returning the window_frame that looks reasonably sane, please have a look and let me know thoughts.

Unlike the code, the git commit history is a complete mess though, although I did try to follow your suggestions - sorry about that.

Finally I just noticed your commit regarding the numels glitch we discussed, I think it is still not taking into account the cases where plr_is_unbound_frame(win obj) == TRUE, returning 0 in those cases. Is that intended?

Around the same location, I also noticed that a PROTECT() was missing around the 2 calls to pg_window_frame_get_r(winobj, i, function) and so I added that, I believe it is necessary, let me know what you think.

@mlt
Copy link
Collaborator

mlt commented May 20, 2019

I'll take a closer look at your code.

Regarding numels when plr_is_unbound_frame, you are right. The thing is, I personally don't find fnumrows useful at all. In most cases, one does something in R without providing an explicit argument length. Also, it is always possible to use something like length(farg1) (essentially what code does). But regardless my preferences, you are correct, it is missing and shall be fixed for compatibility! Thank you for pointing it out.

Regarding PROTECTThe outer vector is protected and I think macros we use are safe in a sense they won't lead to memory allocations. There is a very "similar" example in "Writing R Extensions" where the result of eval is assigned to a vector element without wrapping that in PROTECT. I know it is tempting to be "overprotective". But I think lots of that can be safely removed from PL/R code. But before doing this, it would be nice to use gctorture(TRUE) while running tests, in case I am very wrong here😨

Last but not least…Not trying to start an eternal debate, but it seems PL/R as well as the rest of PG prefers tabs over spaces 😇 I think we should add .dir-locals.el as well as .editorconfig so github diffs are friendlier?

@mlt
Copy link
Collaborator

mlt commented May 20, 2019

It seems draft PR's are not supported on Travis. Would you flip it and add a big bold statement it is a work in progress in the very first post? I hope it is okay with @davecramer

Also, remove a check from the test if you insist on passing constant expressions through.

@davecramer
Copy link
Collaborator

ya, it's fine... wonder why they aren't supported.

@ikasou ikasou marked this pull request as ready for review May 22, 2019 19:32
@mlt
Copy link
Collaborator

mlt commented May 22, 2019

I think we can go a step further and allow RECORD input type mlt@ff2151e I am not sure whether there is a better test for setting arg_is_rel… Perhaps leave typrelid out and just switch (typtype) { case 'c': case 'p': ... default: ...}.

hm... I just realized we need to actually add corresponding CI tests. But we need to get uniform across PG versions testing into master first for that.

davecramer pushed a commit that referenced this pull request May 23, 2019
* Unify testing pre & post PG12

* Log with any PG error level

This closes #9

* Inline language handler

This allows to have SQL statements like

do language plr $$
  pg.throwlog("Hello, world!)
$$;

* Update Makefile for inline handler

* Basic syntax checking validator

Try to parse R function body. No other checks done.

* Do not export unnecessary functions on Windows

This partially reverts squashed c42eb9c

* Split RECORD returning case from SRF

* Unify handling one or many OUT arguments

Make little to no difference between returning a RECORD
or having one or a few OUT arguments

* Test NaN

* Numeric Datum isn't Float8

* Test OUT arguments

* Fix incorrect fnumrows when last argument is a stable expression

Addresses issue raised in #56

* Fix up NaN passed into R
@ikasou ikasou force-pushed the master branch 2 times, most recently from e6c4782 to 72e2890 Compare May 31, 2019 19:36
@ikasou
Copy link
Author

ikasou commented May 31, 2019

I have cleaned up the commit history by rebasing on 8.4 and squashing all intermediate commits
Kept just the cherry-picked one with record arguments by @mlt
(and had to stick the const argument check disable in that - sorry)
I believe it is ready for reviewing and potential inclusion upstream, so I am deleting the comment in the beginning.

@mlt
Copy link
Collaborator

mlt commented May 31, 2019

Do you feel like fixing tabs? 😀 Also get an invite and join the chat.

@ikasou
Copy link
Author

ikasou commented Jun 1, 2019

tabs should be good now ;-)

@davecramer
Copy link
Collaborator

@ikasou my apologies. It would appear this has languished far too long without attention. Can you confirm that this is still PR is still valid ?

pg_conversion.c Outdated Show resolved Hide resolved
@AndreMikulec
Copy link
Collaborator

@ikasou and @mlt, if this feature "pg custom type (tuple) arguments in window PL/R functions"
would be integrated into a "personal" plr, today, by anyone,
with PostgreSQL 13 and R 4.1.0, what changes, if any, would one recommend?

@AndreMikulec
Copy link
Collaborator

@ikasou and @mlt,

The C code looks clean and well programmed.

I have created a starter branch here.

https://github.com/AndreMikulec/plr/tree/master_pg_custom_type_tuple_args_in_window_plr_funcs_dev

For my personal use, I am going to try to implement this feature (into a year 2021 personal plr). This integration may require some time.

@ikasou
Copy link
Author

ikasou commented May 25, 2021

Hi @AndreMikulec I believe the fundamentals have not changed hence this feature addition should still be valid and would be worth incorporating in the codebase as it is quite useful.

@AndreMikulec
Copy link
Collaborator

@ikasou,

Hi @AndreMikulec I believe the fundamentals have not changed hence this feature addition should still be valid and would be worth incorporating in the codebase as it is quite useful.

O.K. Thanks.

@codecov-commenter
Copy link

codecov-commenter commented May 25, 2021

Codecov Report

Merging #56 (27cbca3) into master (b019a01) will decrease coverage by 2.38%.
The diff coverage is 42.85%.

❗ Current head 27cbca3 differs from pull request most recent head 890e144. Consider uploading reports for the commit 890e144 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master      #56      +/-   ##
==========================================
- Coverage   72.68%   70.29%   -2.39%     
==========================================
  Files           5        5              
  Lines        2120     2175      +55     
==========================================
- Hits         1541     1529      -12     
- Misses        579      646      +67     
Impacted Files Coverage Δ
pg_conversion.c 68.85% <40.33%> (-6.12%) ⬇️
plr.c 85.43% <85.71%> (+0.67%) ⬆️
pg_backend_support.c 64.02% <0.00%> (-0.52%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3bbf1e9...890e144. Read the comment docs.

@AndreMikulec
Copy link
Collaborator

@davecramer

From 1 workflow awaiting approval
Would you mind approving @ikasou to run on your Github Actions:
https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks

Thanks,
Andre

@AndreMikulec
Copy link
Collaborator

@ikasou
Thank you, for the excellent pull request.
This saves much much much time.

To make the regression test of opt_window_frame, run,
would you mind, in appveyor.yml, changing (or adding to) the line 166 from

bad_fun opt_window do out_args plr_transaction 2>&1 |

to

bad_fun opt_window do out_args plr_transaction opt_window_frame 2>&1 |

and re-submit the pull request again.

Thanks,
Andre

@AndreMikulec
Copy link
Collaborator

AndreMikulec commented May 26, 2021

@ikasou and @davecramer

Both Travis and Github Actions did-not-pickup the force-push change 
(nor the entire MAY 25 2021 - 56 pull)

They ran something pre-"MAY 25 2021 - 56 pull".

https://travis-ci.org/github/postgres-plr/plr/jobs/772448332#L2398

https://github.com/postgres-plr/plr/pull/56/checks?sha=fc99faded1884eda7626d406ad025e0ee6dcfc5c#step:5:83

My mistake:
@davecramer picked up add opt_window_frame to regression tests in makefile
https://github.com/postgres-plr/plr/runs/2675452663

@davecramer
Copy link
Collaborator

it seems travis may pick it up if I close this PR and reopen it. Not sure about github actions though

@davecramer davecramer closed this May 26, 2021
@davecramer
Copy link
Collaborator

attempt to get travis to build off of the current sha

@AndreMikulec
Copy link
Collaborator

AndreMikulec commented May 26, 2021

@ikasou

Updated regression checks (all not-passing) with "opt_window_frame" added to the Makefile are here: #113

--

@ikasou

May you take a look at? . . .

The opt_window_frame regression test is not passing (itself)
https://github.com/postgres-plr/plr/blob/e4e8887cd46cbc3237abd8108ccc62ab212b4c4f/expected/opt_window_frame.out

The expected result is with or without row 10.
https://ci.appveyor.com/project/davecramer/plr-lek5y/builds/39330178/job/yg5akbwatb896e9k?fullLog=true#L2597

@davecramer davecramer merged commit fffa692 into postgres-plr:master May 27, 2021
@davecramer
Copy link
Collaborator

@ikasou thanks again for your diligence getting this merged int

@davecramer davecramer mentioned this pull request May 27, 2021
@AndreMikulec
Copy link
Collaborator

@ikasou
Thanks!

AndreMikulec added a commit to AndreMikulec/plr that referenced this pull request May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants