Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of a subroutine to compute the median of array elements #426

Merged
merged 33 commits into from
Jul 23, 2021
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8dddff4
progress
jvdp1 Jun 3, 2021
dbc16af
median: add subroutine to compute median of elements of an array
jvdp1 Jun 4, 2021
5d38d82
median: mv to ord_sort + combine REAL and INTEGER procedures
jvdp1 Jun 5, 2021
935949e
median: mv to sort due to issue #428
jvdp1 Jun 5, 2021
d506f4a
median: some cleaning
jvdp1 Jun 6, 2021
2bc1a8b
median: remove trailing whitespaces
jvdp1 Jun 6, 2021
ac1a2a2
median: add check on shapes between mask and x
jvdp1 Jun 11, 2021
fdfd150
median: add pure statement
jvdp1 Jun 11, 2021
b19c537
Update src/stdlib_stats_median.fypp
jvdp1 Jun 11, 2021
8ed99fe
Update src/stdlib_stats_median.fypp
jvdp1 Jun 11, 2021
0d46361
median: update test
jvdp1 Jun 11, 2021
bad19f8
Merge branch 'median' of https://github.com/jvdp1/stdlib into median
jvdp1 Jun 11, 2021
3342c6a
median: add comments to common.fypp
jvdp1 Jun 11, 2021
929d5c1
Merge remote-tracking branch 'upstream/master' into median
jvdp1 Jun 16, 2021
7e3111e
median: replace sort to ord_sort
jvdp1 Jun 16, 2021
dfea79d
update specs
jvdp1 Jul 4, 2021
8f4a57f
modif reduce_subvector + select_subvector
jvdp1 Jul 4, 2021
4f43ce5
add comment regarding ord_sort
jvdp1 Jul 4, 2021
6644140
add comment + combined fypp loops
jvdp1 Jul 4, 2021
2ca833a
update hyperlinks
jvdp1 Jul 4, 2021
e0f68ed
add link in stdlib_stats.md
jvdp1 Jul 4, 2021
3eb4e4a
formatting
jvdp1 Jul 4, 2021
de2c419
Apply suggestions from code review
jvdp1 Jul 21, 2021
9bbcb74
median: reorder fypp variable
jvdp1 Jul 21, 2021
f17b890
median: replace _ by numbers
jvdp1 Jul 21, 2021
391c658
median: add in common.fypp where it is used for median case
jvdp1 Jul 21, 2021
afc92a2
median: add comment in test median
jvdp1 Jul 21, 2021
4d328dc
median: add warning about naive implementation
jvdp1 Jul 21, 2021
bdb47b7
median: return NaN when real array contain NaN
jvdp1 Jul 21, 2021
227f021
@ivan-pi suggestions from code review
jvdp1 Jul 23, 2021
9d38d9d
median: rename fypp RName by name
jvdp1 Jul 23, 2021
cbdc4ac
median: replace median_mask_all by median_all_mask
jvdp1 Jul 23, 2021
a13c700
Merge branch 'median' of https://github.com/jvdp1/stdlib into median
jvdp1 Jul 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions doc/specs/stdlib_stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ The Pearson correlation between two rows (or columns), say `x` and `y`, of `arra

`result = [[stdlib_stats(module):corr(interface)]](array, dim [, mask])`

### Class

Generic subroutine

### Arguments

`array`: Shall be a rank-1 or a rank-2 array of type `integer`, `real`, or `complex`.
Expand Down Expand Up @@ -83,6 +87,10 @@ The scaling can be changed with the logical argument `corrected`. If `corrected`

`result = [[stdlib_stats(module):cov(interface)]](array, dim [, mask [, corrected]])`

### Class

Generic subroutine

### Arguments

`array`: Shall be a rank-1 or a rank-2 array of type `integer`, `real`, or `complex`.
Expand Down Expand Up @@ -134,6 +142,10 @@ Returns the mean of all the elements of `array`, or of the elements of `array` a

`result = [[stdlib_stats(module):mean(interface)]](array, dim [, mask])`

### Class

Generic subroutine

### Arguments

`array`: Shall be an array of type `integer`, `real`, or `complex`.
Expand Down Expand Up @@ -166,6 +178,75 @@ program demo_mean
end program demo_mean
```

## `median` - median of array elements

### Status

Experimental

### Description

Returns the median of all the elements of `array`, or of the elements of `array` along dimension `dim` if provided, and if the corresponding element in `mask` is `true`.

After that the elements are sorted in an increasing order, e.g. `array_sorted =
sort(array)`, the median of the elements of `array` is defined as the "middle"
element. If `n = size(array)` is an even number, the median is:
jvdp1 marked this conversation as resolved.
Show resolved Hide resolved

```
median(array) = array_sorted( floor( (n + 1) / 2.))
```

and if `n` is an odd number, the median is:

```
median(array) = mean( array_sorted( floor( (n + 1) / 2.):floor( (n + 1) / 2.) + 1 ) )
```

The array is sorted using the subroutine `[[stdlib_sorting(module):ord_sort(interface)]]`
provided by the `[[stdlib_sorting(module)]]` module.

### Syntax

`result = [[stdlib_stats(module):median(interface)]](array [, mask])`

`result = [[stdlib_stats(module):median(interface)]](array, dim [, mask])`

### Class

Generic subroutine

### Arguments

`array`: Shall be an array of type `integer` or `real`.

`dim`: Shall be a scalar of type `integer` with a value in the range from 1 to `n`, where `n` is the rank of `array`.

`mask` (optional): Shall be of type `logical` and either a scalar or an array of the same shape as `array`.

### Return value

If `array` is of type `real`, the result is of type `real` with the same kind as `array`.
If `array` is of type `integer`, the result is of type `real(dp)`.

If `dim` is absent, a scalar with the median of all elements in `array` is returned. Otherwise, an array of rank `n-1`, where `n` equals the rank of `array`, and a shape similar to that of `array` with dimension `dim` dropped is returned.

If `mask` is specified, the result is the median of all elements of `array` corresponding to `true` elements of `mask`. If every element of `mask` is `false`, the result is IEEE `NaN`.

### Example

```fortran
program demo_median
use stdlib_stats, only: median
implicit none
real :: x(1:6) = [ 1., 2., 3., 4., 5., 6. ]
real :: y(1:2, 1:3) = reshape([ 1., 2., 3., 4., 5., 6. ], [ 2, 3])
print *, median(x) !returns 3.5
print *, median(y) !returns 3.5
print *, median(y, 1) !returns [ 1.5, 3.5, 5.5 ]
print *, median(y, 1,y > 3.) !returns [ NaN, 4.0, 5.5 ]
end program demo_median
```

## `moment` - central moments of array elements

### Status
Expand Down Expand Up @@ -199,6 +280,10 @@ The _k_-th order moment about `center` is defined as :

`result = [[stdlib_stats(module):moment(interface)]](array, order, dim [, center [, mask]])`

### Class

Generic subroutine

### Arguments

`array`: Shall be an array of type `integer`, `real`, or `complex`.
Expand Down Expand Up @@ -264,6 +349,10 @@ The use of the term `n-1` for scaling is called Bessel 's correction. The scalin

`result = [[stdlib_stats(module):var(interface)]](array, dim [, mask [, corrected]])`

### Class

Generic subroutine

### Arguments

`array`: Shall be an array of type `integer`, `real`, or `complex`.
Expand Down
1 change: 1 addition & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ set(fppFiles
stdlib_stats_corr.fypp
stdlib_stats_cov.fypp
stdlib_stats_mean.fypp
stdlib_stats_median.fypp
stdlib_stats_moment.fypp
stdlib_stats_moment_all.fypp
stdlib_stats_moment_mask.fypp
Expand Down
6 changes: 6 additions & 0 deletions src/Makefile.manual
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ SRCFYPP =\
stdlib_stats_corr.fypp \
stdlib_stats_cov.fypp \
stdlib_stats_mean.fypp \
stdlib_stats_median.fypp \
stdlib_stats_moment.fypp \
stdlib_stats_moment_all.fypp \
stdlib_stats_moment_mask.fypp \
Expand Down Expand Up @@ -108,6 +109,11 @@ stdlib_stats_mean.o: \
stdlib_optval.o \
stdlib_kinds.o \
stdlib_stats.o
stdlib_stats_median.o: \
stdlib_optval.o \
stdlib_kinds.o \
stdlib_sorting.o \
stdlib_stats.o
stdlib_stats_moment.o: \
stdlib_optval.o \
stdlib_kinds.o \
Expand Down
57 changes: 57 additions & 0 deletions src/common.fypp
Original file line number Diff line number Diff line change
Expand Up @@ -156,4 +156,61 @@ ${prefix + joinstr.join([line.strip() for line in txt.split("\n")]) + suffix}$
#:endcall
#:enddef

#!
#! Generates an array rank suffix for subarrays along a dimension
#!
jvdp1 marked this conversation as resolved.
Show resolved Hide resolved
#! Args:
#! varname (str): Name of the variable to be used as origin
#! rank (int): Rank of the original variable
#! dim (int): Dimension of the variable
#!
#! Returns:
#! Array rank suffix string enclosed in braces
#!
#! E.g.,
#! select_subvector('j', 5, 2)
#! -> (j_, :, j___, j____, j_____)
#!
jvdp1 marked this conversation as resolved.
Show resolved Hide resolved
#:def select_subvector(varname, rank, idim)
#:assert rank > 0
#:call join_lines(joinstr=", ", prefix="(", suffix=")")
#:for i in range(1, idim)
${varname}$${ "_" * (i) }$
jvdp1 marked this conversation as resolved.
Show resolved Hide resolved
#:endfor
:
#:for i in range(idim + 1, rank + 1)
${varname}$${ "_" * (i) }$
#:endfor
#:endcall
#:enddef

#!
#! Generates an array rank suffix for arrays
#!
#! Args:
#! varname (str): Name of the variable to be used as origin
#! rank (int): Rank of the original variable
jvdp1 marked this conversation as resolved.
Show resolved Hide resolved
#! idim (int): Dimension of the variable
jvdp1 marked this conversation as resolved.
Show resolved Hide resolved
#!
#! Returns:
#! Array rank suffix string enclosed in braces
#!
#! E.g.,
#! reduce_subvector('j', 5, 2)
#! -> (j_, j___, j____, j_____)
#!
#:def reduce_subvector(varname, rank, idim)
#:assert rank > 0
#:if rank > 1
#:call join_lines(joinstr=", ", prefix="(", suffix=")")
#:for i in range(1, idim)
${varname}$${ "_" * (i) }$
#:endfor
#:for i in range(idim + 1, rank + 1)
${varname}$${ "_" * (i) }$
#:endfor
#:endcall
#:endif
#:enddef

#:endmute
67 changes: 61 additions & 6 deletions src/stdlib_stats.fypp
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
#:set RANKS = range(1, MAXRANK + 1)
#:set REDRANKS = range(2, MAXRANK + 1)
#:set RC_KINDS_TYPES = REAL_KINDS_TYPES + CMPLX_KINDS_TYPES
#:set IR_KINDS_TYPES_OUTPUT = list(zip(REAL_KINDS, REAL_TYPES, REAL_KINDS)) + list(zip(INT_KINDS,INT_TYPES, ['dp']*len(INT_KINDS)))
jvdp1 marked this conversation as resolved.
Show resolved Hide resolved
module stdlib_stats
!! Provides support for various statistical methods. This includes currently
!! descriptive statistics
Expand All @@ -11,14 +12,14 @@ module stdlib_stats
implicit none
private
! Public API
public :: corr, cov, mean, moment, var
public :: corr, cov, mean, median, moment, var


interface corr
!! version: experimental
!!
!! Pearson correlation of array elements
!! ([Specification](../page/specs/stdlib_stats.html#description))
!! ([Specification](../page/specs/stdlib_stats.html#corr-pearson-correlation-of-array-elements))
#:for k1, t1 in RC_KINDS_TYPES
#:set RName = rname("corr",1, t1, k1)
module function ${RName}$(x, dim, mask) result(res)
Expand Down Expand Up @@ -110,7 +111,7 @@ module stdlib_stats
!! version: experimental
!!
!! Covariance of array elements
!! ([Specification](../page/specs/stdlib_stats.html#description_1))
!! ([Specification](../page/specs/stdlib_stats.html#cov-covariance-of-array-elements))
#:for k1, t1 in RC_KINDS_TYPES
#:set RName = rname("cov",1, t1, k1)
module function ${RName}$(x, dim, mask, corrected) result(res)
Expand Down Expand Up @@ -209,7 +210,7 @@ module stdlib_stats
!! version: experimental
!!
!! Mean of array elements
!! ([Specification](../page/specs/stdlib_stats.html#description_2))
!! ([Specification](../page/specs/stdlib_stats.html#mean-mean-of-array-elements))
#:for k1, t1 in RC_KINDS_TYPES
#:for rank in RANKS
#:set RName = rname("mean_all",rank, t1, k1)
Expand Down Expand Up @@ -305,11 +306,65 @@ module stdlib_stats
end interface mean


interface median
!! version: experimental
!!
!! Median of array elements
!! ([Specification](../page/specs/stdlib_stats.html#median-median-of-array-elements))
#:for k1, t1, o1 in IR_KINDS_TYPES_OUTPUT
#:for rank in RANKS
#:set RName = rname("median_all",rank, t1, k1, o1)
module function ${RName}$ (x, mask) result(res)
jvdp1 marked this conversation as resolved.
Show resolved Hide resolved
${t1}$, intent(in) :: x${ranksuffix(rank)}$
logical, intent(in), optional :: mask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a question. I couldn't really grasp the purpose of median with the scalar optional mask? The false case will always give NaN, and the true case is the same as having no mask, right?

Is the use case supposed to be something like:

x = [...]
y = median(x, all(x > 0))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I'm confused about this as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to be in agreement with the API of the intrinsic sum, and the already implemented mean, var,...
Such API will indeed cover cases as proposed by @ivan-pi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is indeed the case for SUM. It would be interesting to learn the output of sum(array,.false.). Is it also NaN?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum(array, .false.) will return 0, as following the standard.

The functions mean, var, .... will return NaN, because of mean(array, .false.) = sum(array, .false.) / 0

For median, if all elements correspond to .false., then there are no elements for median. Hence, it makes sense to return NaN IMO.

real(${o1}$) :: res
end function ${RName}$
#:endfor
#:endfor

#:for k1, t1, o1 in IR_KINDS_TYPES_OUTPUT
#:for rank in RANKS
#:set RName = rname("median",rank, t1, k1, o1)
module function ${RName}$(x, dim, mask) result(res)
${t1}$, intent(in) :: x${ranksuffix(rank)}$
integer, intent(in) :: dim
logical, intent(in), optional :: mask
real(${o1}$) :: res${reduced_shape('x', rank, 'dim')}$
end function ${RName}$
#:endfor
#:endfor

#:for k1, t1, o1 in IR_KINDS_TYPES_OUTPUT
#:for rank in RANKS
#:set RName = rname('median_mask_all',rank, t1, k1, o1)
jvdp1 marked this conversation as resolved.
Show resolved Hide resolved
module function ${RName}$(x, mask) result(res)
${t1}$, intent(in) :: x${ranksuffix(rank)}$
logical, intent(in) :: mask${ranksuffix(rank)}$
real(${o1}$) :: res
end function ${RName}$
#:endfor
#:endfor

#:for k1, t1, o1 in IR_KINDS_TYPES_OUTPUT
#:for rank in RANKS
#:set RName = rname('median_mask',rank, t1, k1, o1)
module function ${RName}$(x, dim, mask) result(res)
${t1}$, intent(in) :: x${ranksuffix(rank)}$
integer, intent(in) :: dim
logical, intent(in) :: mask${ranksuffix(rank)}$
real(${o1}$) :: res${reduced_shape('x', rank, 'dim')}$
end function ${RName}$
#:endfor
#:endfor

end interface


interface var
!! version: experimental
!!
!! Variance of array elements
!! ([Specification](../page/specs/stdlib_stats.html#description_4))
!! ([Specification](../page/specs/stdlib_stats.html#var-variance-of-array-elements))

#:for k1, t1 in RC_KINDS_TYPES
#:for rank in RANKS
Expand Down Expand Up @@ -418,7 +473,7 @@ module stdlib_stats
!! version: experimental
!!
!! Central moment of array elements
!! ([Specification](../page/specs/stdlib_stats.html#description_3))
!! ([Specification](../page/specs/stdlib_stats.html#moment-central-moments-of-array-elements))
#:for k1, t1 in RC_KINDS_TYPES
#:for rank in RANKS
#:set RName = rname("moment_all",rank, t1, k1)
Expand Down
Loading