Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: multiple small improvements to scipy.stats.circmean #20240

Open
3 of 4 tasks
fancidev opened this issue Mar 13, 2024 · 4 comments · May be fixed by #20766
Open
3 of 4 tasks

ENH: multiple small improvements to scipy.stats.circmean #20240

fancidev opened this issue Mar 13, 2024 · 4 comments · May be fixed by #20766
Labels
enhancement A new feature or improvement scipy.stats

Comments

@fancidev
Copy link
Contributor

fancidev commented Mar 13, 2024

Is your feature request related to a problem? Please describe.

When working on the MLE of vonmises distribution, I came across the circmean function and had to read the source code to find out what exactly it’s doing. A few improvements could make it easier to use the function.

Describe the solution you'd like.

The improvements I’d suggest are:

  • The documentation can be clearer. Especially the notions of “samples”, “boundary”, and “sample range” are rather confusing. (DOC: stats.{circmean, circvar, circstd}: improve accuracy/clarity #20726)
  • Deprecate the high and low arguments. They are there for radian/degree conversion (which explains why high comes before low), but such conversion should be handled by the user. (Or otherwise all trigonometric functions would accept high and low.) The doc already provides a clear example of how to do the radian/degree conversion.
  • Rename the first argument to a and make it a position-only argument. The naming is consistent with e.g. np.mean. And making it position-only (a breaking change) ensures callers don’t reference it by name.
  • Before high and low are fully removed, rearrange the computation code so that the conversions don’t bring unnecessary numerical error. (MAINT: stats: minor numerical improvements to circular statistics #20766)

Describe alternatives you've considered.

No response

Additional context (e.g. screenshots, GIFs)

No response

@fancidev fancidev added the enhancement A new feature or improvement label Mar 13, 2024
@dschmitz89
Copy link
Contributor

dschmitz89 commented Mar 13, 2024

  1. Documentation improvements are always welcome.
  2. I think here high and low are also for handling situations where data only live on the half circle, for example. So not sure if we should remove them.
  3. samples is not the best possible name likely but not sure if a is better. I guess that whoever uses the function will assume that the first argument is the data array they want to compute the circular mean of. But if it confused you, I might be wrong. After all, users always surprise us devs ;).
  4. What would the improved code look like? Could you provide a small example where another ordering of the operations reduces the error?

@fancidev
Copy link
Contributor Author

I’ll make separate PRs for (1) and (4). (4) is straightforward and I’d simply rewrite a*b/b as a*(b/b). For (1) I’d say high and low correspond to the value of complete angle and zero angle, respectively, possibly with a shift. I’d also mention what the function returns if the points are symmetric and the resultant vector is zero.

For (2), I don’t mean that scaling is not useful, but that they should be handled outside of circmean. For the case where data points live on the half circle, say between 0 to 180 degrees, is the circmean of 45 degrees and 135 degrees supposed to be 90 degrees or mathematically undefined?

@fancidev
Copy link
Contributor Author

fancidev commented Mar 14, 2024

An example where two angles are symmetric numerically:

scipy.stats.circmean([0.32202300504740655,3.4636156586372]) returns 0.

The same data gives inf for circstd, which is mathematically (approximately) correct but we may (or may not) get rid of the RuntimeWarning caused.

@fancidev
Copy link
Contributor Author

A related corner case is that scipy.stats.circstd([0]) returns -0.0. This usually has no impact but if someone writes 1/circstd(…) there’s a chance the wrong sign gets propagated. So might worth fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants