ENH: R poly compatibility #92

thequackdaddy · 2016-09-15T01:51:58Z

Hello,

I've added the function poly which attempts to reproduce R's poly function.

See here: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/poly.html

Thanks.

coveralls · 2016-09-15T01:58:29Z

Coverage increased (+0.007%) to 98.667% when pulling d290dd3 on thequackdaddy:poly_qr into 8b6c712 on pydata:master.

codecov-io · 2016-09-15T01:58:42Z

Codecov Report

Merging #92 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #92      +/-   ##
==========================================
+ Coverage   98.96%   98.98%   +0.02%     
==========================================
  Files          30       32       +2     
  Lines        5585     5703     +118     
  Branches      775      791      +16     
==========================================
+ Hits         5527     5645     +118     
  Misses         35       35              
  Partials       23       23

Impacted Files	Coverage Δ
patsy/__init__.py	`96.66% <100%> (+0.11%)`	⬆️
patsy/polynomials.py	`100% <100%> (ø)`
patsy/contrasts.py	`100% <100%> (ø)`	⬆️
patsy/test_poly_data.py	`100% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c613d0...4d107c6. Read the comment docs.

thequackdaddy · 2016-09-16T23:20:38Z

I realized after writing that this may relate to #20 .

But 20 points out some good questions...

The code could be simplified to use the different vander methods that numpy provides.
Josef brings up a point that qr might be an unnecessary complication. I think simply scaling the values might get you to the same end.
My real need is for a method to provide some non-linear relationships between exogenous and endogenous data with the ability to extrapolate values not in the training data. So changing the scope of this PR to still accommodate for that might be a good initial step.

Thanks.

njsmith · 2016-10-25T20:00:57Z

Is it possible for this to share the core implementation with patsy.contrasts.Poly? That was cribbed from R as well, and in principle at least they should be doing essentially the same thing...

thequackdaddy · 2016-10-25T21:48:26Z

Is it possible for this to share the core implementation with patsy.contrasts.Poly?

I would think so. Honestly I wrote this before I noticed the above noted, Issue that was already filed. Maybe I can incorporate those thoughts into this too.

thequackdaddy · 2016-10-26T03:45:32Z

@njsmith 3 changes...

patsy.contrast.Poly uses the same code as patsy.polynomials to generate the ContrastMatrix.
Per some notes in the PR, I made it so that you can use any of the XXXvander methods from numpy.
In addition to QR-orthogonalization to "scale" the data, you can just use a straight standardizer

thequackdaddy · 2016-10-26T04:00:05Z

Also dumb question...

Does patsy require pandas now? I noticed that my codecov score is not the best, the reason appears to be that I'm skipping the have_pandas == False scenario. Not sure if that's even something we should be considering? Why write exception handling if the exception isn't supported?

njsmith · 2016-10-26T05:39:18Z

Hmm, that's a ton of options there, between raw versus not, different polynomial bases, standardizing... I guess my first question is, are any of these... useful? Besides the boring basic R-compatible orthonormalized polynomials? Is there some compelling argument for using them in some particular situation, or an existing audience who expects them?

Regarding pandas: nominally at least we still don't depend on pandas. You're right that there ought to be tests for this, though, oops. Possibly this doesn't make sense anymore; it's a different world now than it was back in 2012 or whatever when this decision was made... OTOH I dunno if people using patsy with scikit-learn for example necessarily use/want pandas.

I've just tried adding a no-pandas test to the travis matrix -- I guess in ~10 minutes we'll know if (a) I'm any good at convincing travis to do what I want, (b) if so, whether the no-pandas branches actually work!

njsmith · 2016-10-26T05:55:04Z

master branch is now testing the no-pandas configuration too, so the next time you push it then codecov should start calculating your stats more accurately :-)

thequackdaddy · 2016-10-26T06:07:09Z

Yeah I may have gone overboard there... :-/

I think QR vs. raw is necessary. Standardizing is kind of dumb... Not sure why you'd use it ever. AFAIU, the point of scaling in this way is just so that 1. the columns of data in the design matrix are relatively orthogonal and 2. when columns are of wildly different scale, the np.linalg.matrix_rank function won't have sufficient precision to correctly rank the matrix.

I don't really know the reasons for the differences in poly vs. chebyshev vs legende vs. laguerre. Josef poster mentioned that we should point to a generic XXXvander, I just assumed we let the user pick. I may have misinterpreted some of what Josef was trying to say.

You think I should drop all the other vanders other than polyvander and standardizing? I doubt I would use anything but that. Plus, my tests are vs. R, which is just QR-orthogonalized polyvander so the rest of this we are just assuming...

cc @josef-pkt - I know you're probably busy, but I'd appreciate your thoughts if there's any value in all this. Happy to remove this complexity if its not worthwhile.

thequackdaddy · 2016-11-04T01:46:33Z

I've simplified this back down so that it mimics R's poly function. Removed all the other vanders. When raw=False (the default) it will use QR-to orthogonalize a polyvander.

thequackdaddy · 2018-02-28T14:21:31Z

@njsmith - Its been over a year, but I think this is close to being good enough. Any comments? Thanks!

Travis fixes

… addition to qr decomposition. Added all numpy polynomial types. remove poly

has2k1 · 2019-09-10T14:59:17Z

Any update on this PR?

jonathan-taylor · 2019-09-26T16:05:32Z

Is this merged yet?

matthewwardrop · 2021-09-07T18:09:51Z

Hi @thequackdaddy ! I took a quick look through, and this looks good to me. Let me know if this is still good to merge as is. If so, I'll merge it in :).

matthewwardrop · 2021-10-16T05:08:58Z

Support for poly has now been merged into formulaic.

thequackdaddy force-pushed the poly_qr branch from d290dd3 to 8fa9eef Compare October 26, 2016 03:25

thequackdaddy force-pushed the poly_qr branch from efa3cd6 to 2f9b292 Compare November 4, 2016 01:03

thequackdaddy force-pushed the poly_qr branch from d646b56 to 799be8d Compare March 1, 2017 00:57

thequackdaddy force-pushed the poly_qr branch from 799be8d to d18429a Compare February 27, 2018 15:49

thequackdaddy added 5 commits November 3, 2018 19:00

ENH: R poly compatibility

7dc23bd

Travis fixes

Use poly to calculate poly contrast. Add ability to use standarize in…

35e0923

… addition to qr decomposition. Added all numpy polynomial types. remove poly

Removing other polytypes, standardize procedure.

becad9a

Missed a few other places where polytype/scaler were.

0cd4299

assert_no_pickling again

0cb862b

thequackdaddy force-pushed the poly_qr branch from d18429a to 0cb862b Compare November 4, 2018 00:01

Increase test coverage

4d107c6

thequackdaddy force-pushed the poly_qr branch from 5a5e657 to 4d107c6 Compare November 4, 2018 01:04

has2k1 mentioned this pull request Sep 10, 2019

Add formula to stat_smooth has2k1/plotnine#314

Merged

4 tasks

matthewwardrop force-pushed the master branch 2 times, most recently from b07ba3f to 48fd2e4 Compare September 5, 2021 04:56

This was referenced Sep 26, 2021

How to do a polynomial? #20

Closed

Add poly transform matthewwardrop/formulaic#37

Closed

matthewwardrop added the fixed in formulaic label Oct 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: R poly compatibility #92

ENH: R poly compatibility #92

thequackdaddy commented Sep 15, 2016

coveralls commented Sep 15, 2016 •

edited

codecov-io commented Sep 15, 2016 •

edited

thequackdaddy commented Sep 16, 2016

njsmith commented Oct 25, 2016

thequackdaddy commented Oct 25, 2016

thequackdaddy commented Oct 26, 2016

thequackdaddy commented Oct 26, 2016

njsmith commented Oct 26, 2016

njsmith commented Oct 26, 2016

thequackdaddy commented Oct 26, 2016

thequackdaddy commented Nov 4, 2016

thequackdaddy commented Feb 28, 2018 •

edited

has2k1 commented Sep 10, 2019

jonathan-taylor commented Sep 26, 2019

matthewwardrop commented Sep 7, 2021

matthewwardrop commented Oct 16, 2021

ENH: R poly compatibility #92

Are you sure you want to change the base?

ENH: R poly compatibility #92

Conversation

thequackdaddy commented Sep 15, 2016

coveralls commented Sep 15, 2016 • edited

codecov-io commented Sep 15, 2016 • edited

Codecov Report

thequackdaddy commented Sep 16, 2016

njsmith commented Oct 25, 2016

thequackdaddy commented Oct 25, 2016

thequackdaddy commented Oct 26, 2016

thequackdaddy commented Oct 26, 2016

njsmith commented Oct 26, 2016

njsmith commented Oct 26, 2016

thequackdaddy commented Oct 26, 2016

thequackdaddy commented Nov 4, 2016

thequackdaddy commented Feb 28, 2018 • edited

has2k1 commented Sep 10, 2019

jonathan-taylor commented Sep 26, 2019

matthewwardrop commented Sep 7, 2021

matthewwardrop commented Oct 16, 2021

coveralls commented Sep 15, 2016 •

edited

codecov-io commented Sep 15, 2016 •

edited

thequackdaddy commented Feb 28, 2018 •

edited