ENH: multiple small improvements to scipy.stats.circmean #20240

fancidev · 2024-03-13T02:34:31Z

Is your feature request related to a problem? Please describe.

When working on the MLE of vonmises distribution, I came across the circmean function and had to read the source code to find out what exactly it’s doing. A few improvements could make it easier to use the function.

Describe the solution you'd like.

The improvements I’d suggest are:

The documentation can be clearer. Especially the notions of “samples”, “boundary”, and “sample range” are rather confusing. (DOC: stats.{circmean, circvar, circstd}: improve accuracy/clarity #20726)
Deprecate the high and low arguments. They are there for radian/degree conversion (which explains why high comes before low), but such conversion should be handled by the user. (Or otherwise all trigonometric functions would accept high and low.) The doc already provides a clear example of how to do the radian/degree conversion.
Rename the first argument to a and make it a position-only argument. The naming is consistent with e.g. np.mean. And making it position-only (a breaking change) ensures callers don’t reference it by name.
Before high and low are fully removed, rearrange the computation code so that the conversions don’t bring unnecessary numerical error. (MAINT: stats: minor numerical improvements to circular statistics #20766)

Describe alternatives you've considered.

No response

Additional context (e.g. screenshots, GIFs)

No response

The text was updated successfully, but these errors were encountered:

dschmitz89 · 2024-03-13T17:38:15Z

Documentation improvements are always welcome.
I think here high and low are also for handling situations where data only live on the half circle, for example. So not sure if we should remove them.
samples is not the best possible name likely but not sure if a is better. I guess that whoever uses the function will assume that the first argument is the data array they want to compute the circular mean of. But if it confused you, I might be wrong. After all, users always surprise us devs ;).
What would the improved code look like? Could you provide a small example where another ordering of the operations reduces the error?

fancidev · 2024-03-14T00:16:07Z

I’ll make separate PRs for (1) and (4). (4) is straightforward and I’d simply rewrite a*b/b as a*(b/b). For (1) I’d say high and low correspond to the value of complete angle and zero angle, respectively, possibly with a shift. I’d also mention what the function returns if the points are symmetric and the resultant vector is zero.

For (2), I don’t mean that scaling is not useful, but that they should be handled outside of circmean. For the case where data points live on the half circle, say between 0 to 180 degrees, is the circmean of 45 degrees and 135 degrees supposed to be 90 degrees or mathematically undefined?

fancidev · 2024-03-14T03:04:25Z

An example where two angles are symmetric numerically:

scipy.stats.circmean([0.32202300504740655,3.4636156586372]) returns 0.

The same data gives inf for circstd, which is mathematically (approximately) correct but we may (or may not) get rid of the RuntimeWarning caused.

fancidev · 2024-03-14T03:46:02Z

A related corner case is that scipy.stats.circstd([0]) returns -0.0. This usually has no impact but if someone writes 1/circstd(…) there’s a chance the wrong sign gets propagated. So might worth fixing.

fancidev added the enhancement A new feature or improvement label Mar 13, 2024

j-bowhay added the scipy.stats label Mar 13, 2024

This was referenced May 16, 2024

DOC: stats.{circmean, circvar, circstd}: improve accuracy/clarity #20726

Merged

MAINT: stats: minor numerical improvements to circular statistics #20766

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: multiple small improvements to scipy.stats.circmean #20240

ENH: multiple small improvements to scipy.stats.circmean #20240

fancidev commented Mar 13, 2024 •

edited

dschmitz89 commented Mar 13, 2024 •

edited

fancidev commented Mar 14, 2024

fancidev commented Mar 14, 2024 •

edited

fancidev commented Mar 14, 2024

ENH: multiple small improvements to scipy.stats.circmean #20240

ENH: multiple small improvements to scipy.stats.circmean #20240

Comments

fancidev commented Mar 13, 2024 • edited

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe alternatives you've considered.

Additional context (e.g. screenshots, GIFs)

dschmitz89 commented Mar 13, 2024 • edited

fancidev commented Mar 14, 2024

fancidev commented Mar 14, 2024 • edited

fancidev commented Mar 14, 2024

fancidev commented Mar 13, 2024 •

edited

dschmitz89 commented Mar 13, 2024 •

edited

fancidev commented Mar 14, 2024 •

edited