Pairwise versions for rolling_cov, ewmcov and expanding_cov #4950

Merged
merged 5 commits into from Mar 28, 2014

Conversation

Projects
None yet
4 participants
Contributor

snth commented Sep 23, 2013

I added the functions rolling_cov_pairwise(), expanding_cov_pairwise() and ewmcov_pairwise(). I also modified the behaviour of rolling_corr_pairwise(), expanding_corr_pairwise() and ewmcorr_pairwise() slightly so that now these can be computed between different DataFrames and the resulting matrices at each time slice will be rectangular rather than square.

I think this is what was asked for in the original discussion that gave rise to issue #1853.

Fixes #1853

Contributor

jtratner commented Sep 23, 2013

It looks like these have nearly the same signature as the existing
functions. Would it make more sense to add a keyword argument
pairwise=False to the public api and then dispatch to the appropriate
functions based on that?

Contributor

snth commented Sep 25, 2013

I agree with you. I provided these implementations under the *_pairwise names to fit in with the current API as described on the Computational Tools page.

While the behaviour of rolling_cov and rolling_corr for the Series/Series and DataFrame/Series seems perfectly fine, the DataFrame/DataFrame behaviour doesn't make much sense to me. I can't think of any scenarios where I would want to match on column names and then describe the result as a rolling_cov rather than a rolling_var. For me a covariance is calculation between two different Series so the name matching doesn't make much sense. The tabular layout of the *_pairwise functions makes it clear which two inputs were used to calculate the covariance or correlation.

However I clearly don't know where people are using this so I rather provided these pairwise implementations under different function names in line with the current rollin_corr_pairwise API and let you core devs decide how to integrate it into the API.

Contributor

snth commented Sep 25, 2013

Ok, I've been thinking more about this and think that rather than taking guidance from my own biased preferences, a better approach is to try and minimise cognitive impedance in the API along the lines of the "Don't make me think" principle.

Now DataFrame already has cov and corr methods which compute pairwise covariances and correlations respectively. Presumably the raison d'etre of the rolling_* and expanding_* functions is a convenience to the user (to avoid having to loop over the index manually) as well as to take advantage of any performance improvements that might exist in an on-line algorithm over the naive loop implementation. As such it would seem that the rolling_cov and rolling_corr functions should preserve the API and behaviour of cov and corr respectively while adding an additional dimension to the result introduced by the rolling operation. Therefore {rolling|expanding}_cov and {rolling|expanding}_corr should produce Panels so that the invariant expanding_cov(df, ...).ix[-1] == df.cov() is maintained.

To deal with the case where the non-pairwise version is desired, I just learned that for DataFrame there exists a corrwith method which covers this case. Therefore how about renaming the existing rolling_corr function to rolling_corrwith and introducing covwith as well as {rolling_|expanding_|ewm}covwith. Then the *_pairwise methods I provided could just become the standard API without the _pairwise suffix.

Contributor

jtratner commented Sep 25, 2013

@snth It could be hard sell to change the existing functions in the way you describe (since they'd produce different behavior with current defaults). if you couldn't change the existing API (i.e., pd.rolling_corr(...) with the current defaults needs to return the same thing as it did in previous versions), are there still ways that you think are simple to make these changes?

It looks like there already is a rolling_corr_pairwise function in the pandas namespace.

That said, why couldn't we change the signature such that if you only passed one object, it computes the rolling pairwise correlation and otherwise computes the correlation between the two objects. Shouldn't be that hard to implement. The only quirk would be that you'd have to either change the signature to: def rolling_corr(arg1, arg2=None, window=None, ...). And then if arg2 is None, do the pairwise correlation. It would mean that you'd always have to use keyword arguments for rolling pairwise functions, but I don't think that's such a big deal.

Contributor

snth commented Sep 25, 2013

@jtratner I like your approach of overloading the signature. I can probably do this inside _flex_binary_moment() and then it should become available to all these functions we're discussing here. However I noticed that ewmcov() is currently not using _flex_binary_moment so I'll see if I can change that first.

Contributor

snth commented Sep 25, 2013

@jtratner I've tried to incorporate the API changes as you suggested. I still need to update the documentation but I ran out of time. I'll look at that tomorrow.

Contributor

snth commented Sep 26, 2013

@jtratner I've updated the documentation for the API changes. Along the way I unified the 3 doc template. Hopefully everything unaffected function docstrings are still intact. I did a couple of spot checks and everything looked ok to me. Would be good to have some more eyeballs on this.

I noticed that the center keyword argument isn't documented but appears in the function signatures. Should we add this?

The only other thing I'd like to add is to make these functions part of the DataFrame API as I'd quite like to be able to say things like df.rolling_cov(window=5). How do you feel about that?

Contributor

jreback commented Feb 16, 2014

@snth can you rebase this....looks pretty good

@snth I just merged a PR (#6362) that also touches the docstrings in this file, which will make it more difficult to rebase (there will be some merge conflicts). If you need any help to resolve them, just let it know!

Contributor

snth commented Feb 20, 2014

I tried a rebase of this onto on of the 0.13 release candidates some time ago and there were quite a few conflicts already then. I'm a little busy at the moment so I'll only be able to look at this next week at the earliest.

Contributor

jreback commented Mar 9, 2014

@snth when you have a chance

Contributor

snth commented Mar 14, 2014

@jreback I finally got around to doing that rebase. I rebased off of master as of this morning for which the whole test suite passed on my machine. Post-rebase tests look all ok as well but let's see what Travis says.

@jorisvandenbossche I tried to incorporate your docstring changes but as I had changed the template slightly I could have missed copying some changes across. If you wouldn't mind having a look to check that everything looks ok.

Contributor

jreback commented Mar 14, 2014

this look great!

  • can you add a 1-liner release note referncing the issue inImprovements and put in 0.14.0.txt as well.
  • if you want to put a nice example in the docs / v0.14.0 would be great
  • can you squash down

One remark: due to your (usefull!) refactoring of the docstrings, now the two notes I added (on "by default, result is set to right edge of window ..." and on "the freq keyword is used ...") are added to all functions, also to the expanding_ functions. However, there the center keyword argument is not sensible (although it is in the function signature, it is also not in the docstring). So the note on that should also only be shown in the rolling_ docstrings.

Contributor

snth commented Mar 17, 2014

Thanks @jorisvandenbossche . Please check if the last commit fixed this.

@jreback If @jorisvandenbossche is happy with the docstrings then I'll squash down and add a release note.

Contributor

snth commented Mar 17, 2014

A quick note about the additional behaviour I introduced.

Basically I took the pairwise behaviour from rolling_corr_pairwise() and moved it into _flex_binary_moment() in order to make it available to rolling_cov(), ewmcov(), expanding_cov(), ...

I am no statistician and cannot promise that this is statistically sound. In particular if there are missing values in the data then a different number of datapoints will be used in the calculation of different entries of the covariance matrices and these covariance matrices are not guaranteed to be positive semi-definite.

I just want to point that out as often the availability of something is taken as an implicit guarantee that it is correct, especially by novice users who are not trained in the field. In this case the user is responsible for ensuring that the results are suitable for his or her use-case.

Contributor

jreback commented Mar 17, 2014

  • can you add a blurb to the docstring about this (and in the doc in the moments section)?
  • can you add a release note / v0.14.0.txt
  • squash down pls

thanks

otherwise looks good

Contributor

snth commented Mar 27, 2014

I'm busy putting in release note. Should be done tomorrow.

Contributor

snth commented Mar 28, 2014

I added a release note and some example usage for pairwise=True to the docs.

I also marked rolling_corr_pairwise() and expanding_corr_pairwise() as deprecated since these simply call rolling_corr() and expanding_corr() respectively. If you don't want to deprecate these then I will simply omit that commit when I squash everything down.

I think you need to rebase (it can't be automatically merged)

@jreback jreback commented on an outdated diff Mar 28, 2014

doc/source/computation.rst
correls[df.index[-50]]
+.. note::
+
+ This was previously available through ``rolling_corr_pairwise`` which is
@jreback

jreback Mar 28, 2014

Contributor

put a prior to version 0.14

@jreback jreback commented on an outdated diff Mar 28, 2014

doc/source/computation.rst
correls[df.index[-50]]
+.. note::
+
+ This was previously available through ``rolling_corr_pairwise`` which is
+ now simply syntactic sugar for calling ``rolling_corr(..., pairwise=True)``
+ and deprecated. This is likely to be removed in a future release.
+
+.. ipython:: python
@jreback

jreback Mar 28, 2014

Contributor

you don't need to show the older (remember these are for 0.14 docs, older docs are always available)

@jreback jreback commented on the diff Mar 28, 2014

pandas/stats/moments.py
- correls : Panel
- """
- from pandas import Panel
- from collections import defaultdict
-
- all_results = defaultdict(dict)
-
- for i, k1 in enumerate(df.columns):
- for k2 in df.columns[i:]:
- corr = rolling_corr(df[k1], df[k2], window,
- min_periods=min_periods)
- all_results[k1][k2] = corr
- all_results[k2][k1] = corr
-
- return Panel.from_dict(all_results).swapaxes('items', 'major')
+@Substitution("Deprecated. Use rolling_corr(..., pairwise=True) instead.\n\n"
@jreback

jreback Mar 28, 2014

Contributor

this is all fine

Contributor

jreback commented Mar 28, 2014

minor comments, pls rebase and I think good to go

jsexauer referenced this pull request Mar 28, 2014

Open

DEPR: deprecations from prior versions #6581

0 of 51 tasks complete
Contributor

jreback commented Mar 28, 2014

The deprecation is fine (I marked a reminder to remove in #6581 at some point in the future)

snth added some commits Sep 23, 2013

@snth snth TST: Added tests for pairwise behaviour in rolling moments.
Added a test for rolling_cov_pairwise similar to the test for
rolling_corr_pairwise.

Added tests for ewmcov_pairwise and ewmcorr_pairwise.

Added test for expanding_cov_pairwise based on the test for
expanding_corr_pairwise.
50c0a03
@snth snth ENH: Implemented pairwise rolling moment functions.
Implemented rolling_cov_pairwise function.

Implemented ewmcov_pairwise and ewmcorr_pairwise.

Implemented expanding_cov_pairwise.

Refactored the rolling moment functions to use _flex_pairwise_moment.
9a92ad2
@snth snth ENH: Folded the *_pairwise rolling moment functions into the base fun…
…ction API.
aeba806
@snth snth DOC: Updated documentation for pairwise API changes.
Added a release note and example usage for pairwise=True.
27442b1
Contributor

snth commented Mar 28, 2014

@jreback I changed the rolling_corr_pairwise comment to say "Prior to version 0.14 ..." and then rebased onto the current master and squashed down.

Good to go from my side.

Contributor

jreback commented Mar 28, 2014

looks good...ping when green

Contributor

snth commented Mar 28, 2014

@jtratner Travis build is green.

@jreback jreback added a commit that referenced this pull request Mar 28, 2014

@jreback jreback Merge pull request #4950 from snth/pairwise
Pairwise versions for rolling_cov, ewmcov and expanding_cov
aa166bf

@jreback jreback merged commit aa166bf into pandas-dev:master Mar 28, 2014

1 check passed

default The Travis CI build passed
Details
Contributor

jreback commented Mar 28, 2014

@snth awesome! thanks!

Contributor

jreback commented Mar 29, 2014

http://pandas-docs.github.io/pandas-docs-travis/computation.html#moving-rolling-statistics-moments

the mention of the pairwise should be removed from the list for rolling and expanding yes?
eg rolling_corr_pairwise

Contributor

jreback commented Mar 30, 2014

I deleted those 2 references, pls review and submit a fix if the docs don't look correct

http://pandas-docs.github.io/pandas-docs-travis/computation.html#moving-rolling-statistics-moments

Contributor

snth commented Mar 31, 2014

You're right, the mention of the 2 *_corr_pairwise() functions should have been deleted. Thanks for sorting it out. Looks good to me.

@jreback jreback added a commit to jreback/pandas that referenced this pull request Dec 19, 2015

@jreback jreback DEPR: removal of expanding_corr_pairwise/rolling_cor_pairwise, xref #… e47bd99

jreback referenced this pull request Jul 24, 2016

Open

DEPR: deprecations log for removed issues #13777

65 of 65 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment