-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for pg custom type (tuple) arguments in window PL/R functions #56
Conversation
It seems sql file for testing is missing from commits although expected output is there. Also note that AppVeyor tests actually failed in few placed. I didn't look close, but would it be possible to work this into an existing function somehow? I feel like there is lots of repetitions in the code in general. I'm currently trying to DRY up set returning functions (along with x8 speed up in my tests). And it seems at the first glance that there is lots of commonality between our functions. Regarding numels at the end… It is not set, but the vector is trimmed down to that length. I deliberately allocate vector as long as entire partition then I trim it down if window frame is bound, i.e. shorter than partition. Vector allocations are fast in R so why bother copying things twice? I hope setting the length actually trims the vector and frees up memory 😇 While it does somewhat break existing behavior and it is doable to pass a constant fargX, but why would you want that in the first place if it is constant anyway? Also shall it be allocated as a vector of that constant then as one could have used subscripts. Just use originally passed value. I don't know if it is common when some arguments aren't expected as constants but just happen to be such for whatever reason in some cases and aren't constant in other… But it just seemed odd to me when I implemented it. Anyway...if there is a legit reason to pass a vector full of constant value aside from code smell, then sure it has to be implemented. |
Hi there and thanks for the feedback. I take your point about repetition and will try to merge our code, although I am a little concerned that it may result in a huge and unwieldy function which may defeat the purpose - but let me get back to you on that towards the end of next week. Regarding numels at the end, and correct me if I am wrong, but I get the sense you refer above to the end of the pg_window_frame_get_r() function whereas I was referring to the end of the plr_convertargs() function in plr.c and more specifically looking at the line reading: Finally, with regards to the constant argX issue my personal view based on my preference of a) consistency of function and hence b) user-friendliness over favouring extreme optimisation would be to not break existing behaviour and return what is expected, while obviously special-casing the situation and handling it with a fast copy. I think things are complicated enough already for someone attempting to use PL/R that they should not have to read the code to see why they get null when they really expected the frame's worth of data. For example, I use PL/R for rolling regressions over N-day time windows, and constant time series are just another case of valid time-series, so I would not expect to have to special-case them! Needless to say opinions here can vary .. |
Thank you for catching that numels thing! Indeed, fnumrows would be zero if last argument is a stable expression. I will take maximum or 0 if all stable. I know what you are saying about preserving the interface. However, sometimes breakage does happen. Even in the code as is, I preserve fargX count instead of simply skipping over stable expressions in case those are passed first. I stated my personal view on this in another GitHub issue. I personally am in favor of explicit request for frame arguments like If you do store constant time series, they are not stable in PG terms. But indeed, if one let's say calls mean(farg1) on various things including constants, then it needs to be wrapped in Regarding your case, I would say you will see a drastic speed up if you do a rolling window regression entirely within R for entire partition (unbounded frame), then slowly return results one by one… unless you bound the frame by just a handful of rows. I would be curious to hear timing comparison. On a side note, I'm not sure what your mileage with git is, the following might be useful. In the future, open PRs from a custom branch. Do not use master for your changes as it will be difficult to keep up with origin/master changes and rebasing. I see you merge changes back instead of rebasing. It results in a convoluted history later.... unless someone will squash PR 😇 For now, you can rename origin/master into something else like |
Addresses issue raised in postgres-plr#56
Addresses issue raised in postgres-plr#56
Addresses issue raised in postgres-plr#56
Addresses issue raised in postgres-plr#56
Addresses issue raised in postgres-plr#56
I managed to create a single function returning the window_frame that looks reasonably sane, please have a look and let me know thoughts. Unlike the code, the git commit history is a complete mess though, although I did try to follow your suggestions - sorry about that. Finally I just noticed your commit regarding the numels glitch we discussed, I think it is still not taking into account the cases where plr_is_unbound_frame(win obj) == TRUE, returning 0 in those cases. Is that intended? Around the same location, I also noticed that a PROTECT() was missing around the 2 calls to pg_window_frame_get_r(winobj, i, function) and so I added that, I believe it is necessary, let me know what you think. |
I'll take a closer look at your code. Regarding numels when plr_is_unbound_frame, you are right. The thing is, I personally don't find fnumrows useful at all. In most cases, one does something in R without providing an explicit argument length. Also, it is always possible to use something like Regarding Last but not least…Not trying to start an eternal debate, but it seems PL/R as well as the rest of PG prefers tabs over spaces 😇 I think we should add |
It seems draft PR's are not supported on Travis. Would you flip it and add a big bold statement it is a work in progress in the very first post? I hope it is okay with @davecramer Also, remove a check from the test if you insist on passing constant expressions through. |
ya, it's fine... wonder why they aren't supported. |
I think we can go a step further and allow RECORD input type mlt@ff2151e I am not sure whether there is a better test for setting arg_is_rel… Perhaps leave typrelid out and just hm... I just realized we need to actually add corresponding CI tests. But we need to get uniform across PG versions testing into master first for that. |
* Unify testing pre & post PG12 * Log with any PG error level This closes #9 * Inline language handler This allows to have SQL statements like do language plr $$ pg.throwlog("Hello, world!) $$; * Update Makefile for inline handler * Basic syntax checking validator Try to parse R function body. No other checks done. * Do not export unnecessary functions on Windows This partially reverts squashed c42eb9c * Split RECORD returning case from SRF * Unify handling one or many OUT arguments Make little to no difference between returning a RECORD or having one or a few OUT arguments * Test NaN * Numeric Datum isn't Float8 * Test OUT arguments * Fix incorrect fnumrows when last argument is a stable expression Addresses issue raised in #56 * Fix up NaN passed into R
e6c4782
to
72e2890
Compare
I have cleaned up the commit history by rebasing on 8.4 and squashing all intermediate commits |
Do you feel like fixing tabs? 😀 Also get an invite and join the chat. |
tabs should be good now ;-) |
@ikasou my apologies. It would appear this has languished far too long without attention. Can you confirm that this is still PR is still valid ? |
The C code looks clean and well programmed. I have created a starter branch here. https://github.com/AndreMikulec/plr/tree/master_pg_custom_type_tuple_args_in_window_plr_funcs_dev For my personal use, I am going to try to implement this feature (into a year 2021 personal plr). This integration may require some time. |
Hi @AndreMikulec I believe the fundamentals have not changed hence this feature addition should still be valid and would be worth incorporating in the codebase as it is quite useful. |
O.K. Thanks. |
Codecov Report
@@ Coverage Diff @@
## master #56 +/- ##
==========================================
- Coverage 72.68% 70.29% -2.39%
==========================================
Files 5 5
Lines 2120 2175 +55
==========================================
- Hits 1541 1529 -12
- Misses 579 646 +67
Continue to review full report at Codecov.
|
From Thanks, |
@ikasou To make the regression test of
to
and re-submit the pull request again. Thanks, |
@ikasou and @davecramer
My mistake: |
it seems travis may pick it up if I close this PR and reopen it. Not sure about github actions though |
attempt to get travis to build off of the current sha |
Updated regression checks (all not-passing) with "opt_window_frame" added to the Makefile are here: #113 -- May you take a look at? . . . The opt_window_frame regression test is not passing (itself) The expected result is with or without row 10. |
@ikasou thanks again for your diligence getting this merged int |
@ikasou |
This is based on
a) the initial work by jconway which converted tuple arguments to PL/R non-window functions into one-row R dataframes (function pg_tuple_get_r_frame), and
b) the very recent additions by mlt as model for the addition of pg_window_frame_get_r_frame() to return a multi-row R dataframe containing PG window frame custom type (tuple) data in fargX.
Please let me know your thoughts on this. Incidentally I have spotted what looks like a bug in plr_convertargs() where numels gets set close to the end. Moreover this constant value optimization is changing past intended behaviour: Would an on-the-fly N-times copy of the constant element into the fargX vector be a better solution maybe?