Reimplemented Pearson's correlation to use two pass Welford's model #62750

eicchen · 2025-10-19T08:25:41Z

closes BUG: Pearson correlation outside expected range -1 to 1 #59652
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This improves numerical stability for values that are really large or small such as the example given in the original issue.

…issue-59652

Alvaro-Kothe

Can you run the benchmarks?

eicchen · 2025-10-20T01:43:46Z

I've already run the relevant benchmarks. Unsurprisingly, we are looking at performance decrease in stat_ops.Correlation. I've looked into other ways of solving the issue while keeping the online Welford.

The problem stems from the co-moment calculations at large/small values and the asymmetric nature of Welford's Algorithm. The three values above are mathematically the same, however, when calculating the values provided in the test case, there are always two that are correct and one that is not.

We could pick the value of the pair that match as a redundancy measure. Theoretically, it can only reduce our errors compared to our current version, as the values should be equal. But without a larger test pool, I wouldn't be confident enough to put it in a release version. Two-pass provides the best numerical stability at the cost of performance.

eicchen added 6 commits October 18, 2025 17:15

Initial commit, no pre-commit

0823950

Added test case for welford failure

e20b045

Merge branch 'issue-59652' of https://github.com/eicchen/pandas into …

60471c2

…issue-59652

Removed test file

28fb765

Implemented two pass welford for improved numeric stability

d1c1e83

pre-commit edits, moved test case

8af06ed

Alvaro-Kothe reviewed Oct 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reimplemented Pearson's correlation to use two pass Welford's model #62750

Reimplemented Pearson's correlation to use two pass Welford's model #62750

eicchen commented Oct 19, 2025 •

edited

Loading

Uh oh!

Alvaro-Kothe left a comment

Uh oh!

eicchen commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Reimplemented Pearson's correlation to use two pass Welford's model #62750

Are you sure you want to change the base?

Reimplemented Pearson's correlation to use two pass Welford's model #62750

Conversation

eicchen commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alvaro-Kothe left a comment

Choose a reason for hiding this comment

Uh oh!

eicchen commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eicchen commented Oct 19, 2025 •

edited

Loading