-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: add stats/strided/dmadfmsorted
#9895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
- task: lint_filenames
status: passed
- task: lint_editorconfig
status: passed
- task: lint_markdown
status: passed
- task: lint_package_json
status: passed
- task: lint_repl_help
status: passed
- task: lint_javascript_src
status: passed
- task: lint_javascript_cli
status: na
- task: lint_javascript_examples
status: passed
- task: lint_javascript_tests
status: passed
- task: lint_javascript_benchmarks
status: passed
- task: lint_python
status: na
- task: lint_r
status: na
- task: lint_c_src
status: missing_dependencies
- task: lint_c_examples
status: missing_dependencies
- task: lint_c_benchmarks
status: missing_dependencies
- task: lint_c_tests_fixtures
status: na
- task: lint_shell
status: na
- task: lint_typescript_declarations
status: passed
- task: lint_typescript_tests
status: passed
- task: lint_license_headers
status: passed
---
stats/strided/dmadsorted ( median absolute deviation )
Coverage Report
The above coverage report was generated for the changes in this PR. |
---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
- task: lint_filenames
status: passed
- task: lint_editorconfig
status: passed
- task: lint_markdown
status: na
- task: lint_package_json
status: passed
- task: lint_repl_help
status: na
- task: lint_javascript_src
status: na
- task: lint_javascript_cli
status: na
- task: lint_javascript_examples
status: na
- task: lint_javascript_tests
status: na
- task: lint_javascript_benchmarks
status: na
- task: lint_python
status: na
- task: lint_r
status: na
- task: lint_c_src
status: na
- task: lint_c_examples
status: na
- task: lint_c_benchmarks
status: na
- task: lint_c_tests_fixtures
status: na
- task: lint_shell
status: na
- task: lint_typescript_declarations
status: passed
- task: lint_typescript_tests
status: na
- task: lint_license_headers
status: passed
---
|
Hey @kgryte, I hope this doesn’t already exist (I checked and couldn’t find it). The median absolute deviation (MAD) would be a really valuable addition to stats/strided. Please take a look when you get a chance. |
| * var v = dmadsorted( x.length, x, 1, 0 ); | ||
| * // returns 1.0 | ||
| */ | ||
| function dmadsorted( N, x, strideX, offsetX ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments:
- Use of the abbreviation
madshould be reserved for "mean absolute deviation", not median absolute deviation from the median. For the latter, the abbreviation should bemadfm. - Your algorithm below needs work. (a) If the array is already sorted, what does this tell us about the sort order of the deviations from the median? Are they not already sorted? (b) And if they are already sorted, is it actually necessary to compute the deviations across the entire array? (c) If not, how might you compute the median absolute deviation in O(1)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Re: 2. I see that madfm requires that the deviations be sorted based on their absolute differences. Even then, how you are allocating a generic array for handling this is not desirable. Furthermore, you should be using blas/ext/base/dsort for sorting a double-precision floating-point strided array.
kgryte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left initial comments.
stats/strided/dmadsorted ( median absolute deviation )stats/strided/dmadsorted
| diffs = (double *)malloc( N * sizeof(double) ); | ||
| if ( diffs == NULL ) { | ||
| // Handle memory allocation failure | ||
| return 0.0 / 0.0; // NaN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is desirable, as it conflates invalid data points with an out-of-memory error.
| } | ||
|
|
||
| // 4. Sort this temporary array | ||
| qsort( diffs, N, sizeof(double), compare_doubles ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be using our blas/ext/base/dsort API.
| * @param offsetX starting index for X | ||
| * @return output value (median absolute deviation) | ||
| */ | ||
| double API_SUFFIX(stdlib_strided_dmadsorted_ndarray)( const CBLAS_INT N, const double *X, const CBLAS_INT strideX, const CBLAS_INT offsetX ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of dynamic memory allocation, I would suggest pushing this to userland and instead update the signature to
| double API_SUFFIX(stdlib_strided_dmadsorted_ndarray)( const CBLAS_INT N, const double *X, const CBLAS_INT strideX, const CBLAS_INT offsetX ) { | |
| double API_SUFFIX(stdlib_strided_dmadsorted_ndarray)( const CBLAS_INT N, const double *X, const CBLAS_INT strideX, const CBLAS_INT offsetX, double *Work, const CBLAS_INT strideW, const CBLAS_INT offsetW ) { |
where Work is a provided workspace strided array. That also applies to stdlib_strided_dmadsorted above. This is similar to how BLAS/LAPACK APIs can commonly have a workspace array which is provided from user land.
| } | ||
|
|
||
| // 2. Create a temporary array to store the absolute differences | ||
| diffs = (double *)malloc( N * sizeof(double) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are provided a workspace array, allocation is no longer necessary. However, you will need to copy values to the workspace array.
| for ( i = 0; i < N; i++ ) { | ||
| diffs[ i ] = fabs( X[ offsetX + (i * strideX) ] - median ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A manual for loop is not particularly desirable. If this is a common enough operation, we should create a dedicated strided API for it.
Furthermore, you shouldn't be using fabs. You should be using math/base/special/abs. Meaning, always prefer stdlib APIs if they are available.
|
|
||
| // 1. Calculate the median of the input array X | ||
| median = API_SUFFIX(stdlib_strided_dmediansorted_ndarray)( N, X, strideX, offsetX ); | ||
| if ( isnan(median) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing. Use stdlib APIs.
| mad = API_SUFFIX(stdlib_strided_dmediansorted_ndarray)( N, diffs, 1, 0 ); | ||
|
|
||
| // Free the allocated memory | ||
| free( diffs ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary if we support a workspace array.
---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
- task: lint_filenames
status: passed
- task: lint_editorconfig
status: passed
- task: lint_markdown
status: passed
- task: lint_package_json
status: passed
- task: lint_repl_help
status: passed
- task: lint_javascript_src
status: passed
- task: lint_javascript_cli
status: na
- task: lint_javascript_examples
status: passed
- task: lint_javascript_tests
status: passed
- task: lint_javascript_benchmarks
status: passed
- task: lint_python
status: na
- task: lint_r
status: na
- task: lint_c_src
status: passed
- task: lint_c_examples
status: passed
- task: lint_c_benchmarks
status: passed
- task: lint_c_tests_fixtures
status: na
- task: lint_shell
status: na
- task: lint_typescript_declarations
status: passed
- task: lint_typescript_tests
status: passed
- task: lint_license_headers
status: passed
---
stats/strided/dmadsortedstats/strided/dmadfmsorted
|
Hi @kgryte thank you for the earlier review and feedback. The previous implementation of the median absolute deviation from the median was intentionally written for simplicity, motivated by an earlier discussion Based on your comments, I’ve now done a full refactor and updated the implementation to use an O(log N) algorithm, which is the optimal complexity for computing MADFM for already-sorted arrays. Along with this refactor, I’ve also renamed the API to At a high level, the new algorithm works as follows:
Because this is a complete algorithmic refactor rather than an incremental change, some of the earlier review comments no longer apply in the current version. If you have time, I’d really appreciate another look at the updated implementation when convenient. Thanks again for the guidance it was very helpful in pushing this toward a more optimal and idiomatic solution. |
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
status: passed
status: passed
status: passed
status: passed
status: passed
status: passed
status: na
status: passed
status: passed
status: passed
status: na
status: na
status: missing_dependencies
status: missing_dependencies
status: missing_dependencies
status: na
status: na
status: passed
status: passed
status: passed
Resolves #N/A.
Description
This pull request introduces a new
stats/stridedpackage,dmadsorted, which computes the median absolute deviation (MAD) for sorted double-precision floating-point strided arrays.The median absolute deviation is a robust measure of statistical dispersion that is less sensitive to outliers than variance or standard deviation. Adding
dmadsortedextends the existing family of strided statistical functions (e.g.,dmediansorted) and provides a commonly used robust statistic in a performance-oriented, strided form.This PR includes:
stats/strided/*sortedconventionsRelated Issues
This pull request has no related issues.
Questions
No.
Other
The implementation assumes sorted input, consistent with other
*sortedstrided APIs.Checklist
AI Assistance
@stdlib-js/reviewers