Skip to content

Conversation

@Om-A-osc
Copy link
Contributor


type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:

  • task: lint_filenames
    status: passed
  • task: lint_editorconfig
    status: passed
  • task: lint_markdown
    status: passed
  • task: lint_package_json
    status: passed
  • task: lint_repl_help
    status: passed
  • task: lint_javascript_src
    status: passed
  • task: lint_javascript_cli
    status: na
  • task: lint_javascript_examples
    status: passed
  • task: lint_javascript_tests
    status: passed
  • task: lint_javascript_benchmarks
    status: passed
  • task: lint_python
    status: na
  • task: lint_r
    status: na
  • task: lint_c_src
    status: missing_dependencies
  • task: lint_c_examples
    status: missing_dependencies
  • task: lint_c_benchmarks
    status: missing_dependencies
  • task: lint_c_tests_fixtures
    status: na
  • task: lint_shell
    status: na
  • task: lint_typescript_declarations
    status: passed
  • task: lint_typescript_tests
    status: passed
  • task: lint_license_headers
    status: passed

Resolves #N/A.

Description

What is the purpose of this pull request?

This pull request introduces a new stats/strided package, dmadsorted, which computes the median absolute deviation (MAD) for sorted double-precision floating-point strided arrays.

The median absolute deviation is a robust measure of statistical dispersion that is less sensitive to outliers than variance or standard deviation. Adding dmadsorted extends the existing family of strided statistical functions (e.g., dmediansorted) and provides a commonly used robust statistic in a performance-oriented, strided form.

This PR includes:

  • A JavaScript implementation following existing stats/strided/*sorted conventions
  • A corresponding C API implementation
  • Comprehensive JavaScript tests
  • C benchmarks to evaluate performance
  • Usage examples and documentation (README and REPL help)

Related Issues

Does this pull request have any related issues?

This pull request has no related issues.

Questions

Any questions for reviewers of this pull request?

No.

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

The implementation assumes sorted input, consistent with other *sorted strided APIs.

Checklist

Please ensure the following tasks are completed before submitting this pull request.

AI Assistance

When authoring the changes proposed in this PR, did you use any kind of AI assistance?

  • Yes
  • No

@stdlib-js/reviewers

---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown
    status: passed
  - task: lint_package_json
    status: passed
  - task: lint_repl_help
    status: passed
  - task: lint_javascript_src
    status: passed
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: passed
  - task: lint_javascript_tests
    status: passed
  - task: lint_javascript_benchmarks
    status: passed
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: missing_dependencies
  - task: lint_c_examples
    status: missing_dependencies
  - task: lint_c_benchmarks
    status: missing_dependencies
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: passed
  - task: lint_typescript_tests
    status: passed
  - task: lint_license_headers
    status: passed
---
@stdlib-bot stdlib-bot added Statistics Issue or pull request related to statistical functionality. Needs Review A pull request which needs code review. labels Jan 25, 2026
@Om-A-osc Om-A-osc changed the title feat: add stats/strided/dmadsorted feat: add stats/strided/dmadsorted ( median absolute deviation ) Jan 25, 2026
@stdlib-bot
Copy link
Contributor

stdlib-bot commented Jan 25, 2026

Coverage Report

Package Statements Branches Functions Lines
stats/strided/dmadfmsorted $\color{red}366/372$
$\color{green}+98.39%$
$\color{red}24/28$
$\color{green}+85.71%$
$\color{green}4/4$
$\color{green}+100.00%$
$\color{red}366/372$
$\color{green}+98.39%$

The above coverage report was generated for the changes in this PR.

---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown
    status: na
  - task: lint_package_json
    status: passed
  - task: lint_repl_help
    status: na
  - task: lint_javascript_src
    status: na
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: na
  - task: lint_javascript_tests
    status: na
  - task: lint_javascript_benchmarks
    status: na
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: na
  - task: lint_c_examples
    status: na
  - task: lint_c_benchmarks
    status: na
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: passed
  - task: lint_typescript_tests
    status: na
  - task: lint_license_headers
    status: passed
---
@Om-A-osc
Copy link
Contributor Author

Om-A-osc commented Jan 28, 2026

Hey @kgryte, I hope this doesn’t already exist (I checked and couldn’t find it). The median absolute deviation (MAD) would be a really valuable addition to stats/strided. Please take a look when you get a chance.

* var v = dmadsorted( x.length, x, 1, 0 );
* // returns 1.0
*/
function dmadsorted( N, x, strideX, offsetX ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments:

  1. Use of the abbreviation mad should be reserved for "mean absolute deviation", not median absolute deviation from the median. For the latter, the abbreviation should be madfm.
  2. Your algorithm below needs work. (a) If the array is already sorted, what does this tell us about the sort order of the deviations from the median? Are they not already sorted? (b) And if they are already sorted, is it actually necessary to compute the deviations across the entire array? (c) If not, how might you compute the median absolute deviation in O(1)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Re: 2. I see that madfm requires that the deviations be sorted based on their absolute differences. Even then, how you are allocating a generic array for handling this is not desirable. Furthermore, you should be using blas/ext/base/dsort for sorting a double-precision floating-point strided array.

Copy link
Member

@kgryte kgryte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left initial comments.

@kgryte kgryte added Feature Issue or pull request for adding a new feature. difficulty: 3 Likely to be challenging but manageable. Needs Changes Pull request which needs changes before being merged. and removed Needs Review A pull request which needs code review. labels Jan 30, 2026
@kgryte kgryte changed the title feat: add stats/strided/dmadsorted ( median absolute deviation ) feat: add stats/strided/dmadsorted Jan 30, 2026
diffs = (double *)malloc( N * sizeof(double) );
if ( diffs == NULL ) {
// Handle memory allocation failure
return 0.0 / 0.0; // NaN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is desirable, as it conflates invalid data points with an out-of-memory error.

}

// 4. Sort this temporary array
qsort( diffs, N, sizeof(double), compare_doubles );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be using our blas/ext/base/dsort API.

* @param offsetX starting index for X
* @return output value (median absolute deviation)
*/
double API_SUFFIX(stdlib_strided_dmadsorted_ndarray)( const CBLAS_INT N, const double *X, const CBLAS_INT strideX, const CBLAS_INT offsetX ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of dynamic memory allocation, I would suggest pushing this to userland and instead update the signature to

Suggested change
double API_SUFFIX(stdlib_strided_dmadsorted_ndarray)( const CBLAS_INT N, const double *X, const CBLAS_INT strideX, const CBLAS_INT offsetX ) {
double API_SUFFIX(stdlib_strided_dmadsorted_ndarray)( const CBLAS_INT N, const double *X, const CBLAS_INT strideX, const CBLAS_INT offsetX, double *Work, const CBLAS_INT strideW, const CBLAS_INT offsetW ) {

where Work is a provided workspace strided array. That also applies to stdlib_strided_dmadsorted above. This is similar to how BLAS/LAPACK APIs can commonly have a workspace array which is provided from user land.

}

// 2. Create a temporary array to store the absolute differences
diffs = (double *)malloc( N * sizeof(double) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are provided a workspace array, allocation is no longer necessary. However, you will need to copy values to the workspace array.

Comment on lines 81 to 83
for ( i = 0; i < N; i++ ) {
diffs[ i ] = fabs( X[ offsetX + (i * strideX) ] - median );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A manual for loop is not particularly desirable. If this is a common enough operation, we should create a dedicated strided API for it.

Furthermore, you shouldn't be using fabs. You should be using math/base/special/abs. Meaning, always prefer stdlib APIs if they are available.


// 1. Calculate the median of the input array X
median = API_SUFFIX(stdlib_strided_dmediansorted_ndarray)( N, X, strideX, offsetX );
if ( isnan(median) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing. Use stdlib APIs.

mad = API_SUFFIX(stdlib_strided_dmediansorted_ndarray)( N, diffs, 1, 0 );

// Free the allocated memory
free( diffs );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary if we support a workspace array.

---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown
    status: passed
  - task: lint_package_json
    status: passed
  - task: lint_repl_help
    status: passed
  - task: lint_javascript_src
    status: passed
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: passed
  - task: lint_javascript_tests
    status: passed
  - task: lint_javascript_benchmarks
    status: passed
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: passed
  - task: lint_c_examples
    status: passed
  - task: lint_c_benchmarks
    status: passed
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: passed
  - task: lint_typescript_tests
    status: passed
  - task: lint_license_headers
    status: passed
---
@Om-A-osc Om-A-osc changed the title feat: add stats/strided/dmadsorted feat: add stats/strided/dmadfmsorted Jan 30, 2026
@Om-A-osc
Copy link
Contributor Author

Hi @kgryte thank you for the earlier review and feedback.

The previous implementation of the median absolute deviation from the median was intentionally written for simplicity, motivated by an earlier discussion
(#dev-questions > Issue #9361 — feat: add stats/strided/dminabssorted).
However, it indeed had a time complexity of O(N log N) due to explicitly materializing and sorting the deviations.

Based on your comments, I’ve now done a full refactor and updated the implementation to use an O(log N) algorithm, which is the optimal complexity for computing MADFM for already-sorted arrays. Along with this refactor, I’ve also renamed the API to dmadfmsorted to avoid ambiguity with mean absolute deviation, and I’ve tried to reuse existing stdlib APIs wherever possible (e.g., dmediansorted) instead of rolling custom logic.

At a high level, the new algorithm works as follows:

  • Since the input array is already sorted, we first compute the median in O(1).
  • The absolute deviations from the median naturally form two monotonic sequences:
    • one on the left of the median (when traversed backwards),
    • and one on the right of the median (when traversed forwards).
  • Instead of explicitly computing and sorting all deviations, we treat these two monotonic sequences as two virtual sorted arrays and apply a binary-search-based partitioning strategy (similar to the median-of-two-sorted-arrays approach) to directly find the median of the deviations in O(log N) time.

Because this is a complete algorithmic refactor rather than an incremental change, some of the earlier review comments no longer apply in the current version. If you have time, I’d really appreciate another look at the updated implementation when convenient.

Thanks again for the guidance it was very helpful in pushing this toward a more optimal and idiomatic solution.

@Om-A-osc Om-A-osc requested a review from kgryte February 2, 2026 16:50
@stdlib-bot stdlib-bot added the Needs Review A pull request which needs code review. label Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

difficulty: 3 Likely to be challenging but manageable. Feature Issue or pull request for adding a new feature. Needs Changes Pull request which needs changes before being merged. Needs Review A pull request which needs code review. Statistics Issue or pull request related to statistical functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants