Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex numbers #321

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from
Draft

Complex numbers #321

wants to merge 10 commits into from

Conversation

mjskay
Copy link
Collaborator

@mjskay mjskay commented Nov 20, 2023

Summary

This is a draft PR to support complex numbers in draws and rvar data types. It closes #319. The main changes, plus some points of conversation:

  • Adjust two checks in as_draws.R to allow complex values (this is the main thing that buys us support in draws formats)
  • Provide an implementation of sd(), variance(), and var() for complex values. This fixes printing issues, especially with rvars. The implementation of variance() follows the definition of the variance of a complex random variable as the sum of the variances of its real and imaginary components; and its sd is taken as the square root of this.
  • Ensure other rvar functions work with complex numbers, or (in cases of functions that should not work with complex numbers), raise an error. Functions that do not work with complex numbers include:
    • [rvar_]min/max/range - since complex numbers are not orderable, these are not defined
    • [rvar_]median - base R will compute a median of a complex number by taking the medians of each component; however, I'm not sure sure if that's correct (or if something like a geometric median would make more sense). One might argue that if someone just wanted the point estimates of the medians for each component, as in summarise_draws(), the base R approach could make sense. Conservatively I've let this throw an error, but would be curious thoughts.
    • Base R quantile() on complex numbers seems (like median) to just treat each component separately, which I think is wrong, but again might be okay for us (in rvar_quantile and [rvar_]quantile2) if people just want the separate point estimates for the two components. I'm not sure.
    • [rvar_]density, [rvar_]cdf - if implemented, these should (I think) be joint densities/cdfs for the two components, i.e. using 2d ecdfs / 2d density estimators. Absent that implementation, for now, these throw an error.
  • I'm not sure what to do about convergence metrics. At the moment, most fail because of a check in is_constant() that requires min / max, which are not defined on complex numbers. However, some succeed but I think give incorrect answers: e.g. rhat() appears to work, but I think returns incorrect values, as it relies on rank() in z_scale(), but base R's rank() ranks complex numbers in lexicographic order, when in reality it should probably return an error. I'm not sure if there's a good definition of any of these convergence metrics for complex numbers (maybe something like max rhat of both components?) or if these should all just return errors.

The main motivating example for this, inspired by @WardBrian's comment, does work now:

> x = cbind(real = rvar_rng(rnorm, 5), imag = rvar_rng(rnorm, 5, 1))
> x[,"real"] + x[,"imag"] * 1i
rvar<4000>[5,1] mean ± sd:
     real              
[1,]  0.00+0.98i ± 1.4 
[2,] -0.01+0.98i ± 1.4 
[3,]  0.01+0.97i ± 1.4 
[4,]  0.00+1.01i ± 1.4 
[5,]  0.01+0.99i ± 1.4

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses:

@codecov-commenter
Copy link

codecov-commenter commented Nov 20, 2023

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (9059111) 95.52% compared to head (8a371ac) 95.48%.
Report is 20 commits behind head on master.

❗ Current head 8a371ac differs from pull request most recent head 31f82f8. Consider uploading reports for the commit 31f82f8 to get more accurate results

Files Patch % Lines
R/pareto_smooth.R 82.60% 8 Missing ⚠️
R/weight_draws.R 81.81% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #321      +/-   ##
==========================================
- Coverage   95.52%   95.48%   -0.05%     
==========================================
  Files          47       47              
  Lines        3691     3789      +98     
==========================================
+ Hits         3526     3618      +92     
- Misses        165      171       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if c97b1de is merged into master:

  •   :ballot_box_with_check:as_draws_array: 101ms -> 101ms [-0.68%, +1.65%]
  •   :rocket:as_draws_df: 33ms -> 31.6ms [-6.25%, -2.21%]
  •   :rocket:as_draws_list: 182ms -> 180ms [-2.26%, -0.14%]
  •   :ballot_box_with_check:as_draws_matrix: 29.3ms -> 29.1ms [-3.99%, +2.73%]
  •   :ballot_box_with_check:as_draws_rvars: 153ms -> 154ms [-0.26%, +1.94%]
  •   :ballot_box_with_check:summarise_draws_100_variables: 708ms -> 710ms [-0.42%, +1.11%]
  •   :ballot_box_with_check:summarise_draws_10_variables: 78.4ms -> 78.4ms [-0.39%, +0.38%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f70d6b9 is merged into master:

  •   :rocket:as_draws_array: 103ms -> 101ms [-2.07%, -0.74%]
  •   :ballot_box_with_check:as_draws_df: 31.9ms -> 32.2ms [-0.34%, +2.33%]
  •   :ballot_box_with_check:as_draws_list: 184ms -> 185ms [-1.92%, +2.06%]
  • ❗🐌as_draws_matrix: 29.1ms -> 32.1ms [+4.52%, +16%]
  •   :ballot_box_with_check:as_draws_rvars: 156ms -> 158ms [-1.16%, +3.85%]
  •   :ballot_box_with_check:summarise_draws_100_variables: 720ms -> 719ms [-1.75%, +1.52%]
  •   :ballot_box_with_check:summarise_draws_10_variables: 79ms -> 79.4ms [-0.08%, +1.25%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if a76290c is merged into master:

  •   :ballot_box_with_check:as_draws_array: 100ms -> 105ms [-5.1%, +14.36%]
  • ❗🐌as_draws_df: 32.8ms -> 34ms [+2.3%, +5.21%]
  •   :ballot_box_with_check:as_draws_list: 185ms -> 184ms [-1.28%, +0.49%]
  •   :ballot_box_with_check:as_draws_matrix: 29.3ms -> 29.2ms [-2.04%, +1.58%]
  •   :ballot_box_with_check:as_draws_rvars: 153ms -> 155ms [-0.15%, +2.04%]
  • ❗🐌summarise_draws_100_variables: 713ms -> 719ms [+0.08%, +1.64%]
  •   :ballot_box_with_check:summarise_draws_10_variables: 78.7ms -> 79.1ms [-0.24%, +1.43%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@WardBrian
Copy link
Member

Does this handle the automatic re-shaping if I pass something that has colnames like x.1.real,...? Note that it is slightly subtle to do this correctly, since you end up with essentially strided draws in most ways of arranging these

@mjskay
Copy link
Collaborator Author

mjskay commented Nov 21, 2023

If I understand correctly, you're asking if it matters where in the order of dimensions the dimension indexing the real and imaginary parts is? I don't think it should make a difference; array slicing should handle that.

e.g. here's an rvar where the second dimension indexes the real/imaginary parts:

x = rvar(array(
  1:96, dim = c(4,4,2,3), 
  dimnames = list(NULL, NULL, c("real", "imag"), NULL)
))
x
#> rvar<4>[4,2,3] mean ± sd:
#> , , 1
#> 
#>      real        imag       
#> [1,]  2.5 ± 1.3  18.5 ± 1.3 
#> [2,]  6.5 ± 1.3  22.5 ± 1.3 
#> [3,] 10.5 ± 1.3  26.5 ± 1.3 
#> [4,] 14.5 ± 1.3  30.5 ± 1.3 
#> 
#> , , 2
#> 
#>      real        imag       
#> [1,] 34.5 ± 1.3  50.5 ± 1.3 
#> [2,] 38.5 ± 1.3  54.5 ± 1.3 
#> [3,] 42.5 ± 1.3  58.5 ± 1.3 
#> [4,] 46.5 ± 1.3  62.5 ± 1.3 
#> 
#> , , 3
#> 
#>      real        imag       
#> [1,] 66.5 ± 1.3  82.5 ± 1.3 
#> [2,] 70.5 ± 1.3  86.5 ± 1.3 
#> [3,] 74.5 ± 1.3  90.5 ± 1.3 
#> [4,] 78.5 ± 1.3  94.5 ± 1.3

Showing it as a draws_df shows the corresponding variable names in most formats:

as_draws_df(x)
#> # A draws_df: 4 iterations, 1 chains, and 24 variables
#>   x[1,real,1] x[2,real,1] x[3,real,1] x[4,real,1] x[1,imag,1] x[2,imag,1]
#> 1           1           5           9          13          17          21
#> 2           2           6          10          14          18          22
#> 3           3           7          11          15          19          23
#> 4           4           8          12          16          20          24
#>   x[3,imag,1] x[4,imag,1]
#> 1          25          29
#> 2          26          30
#> 3          27          31
#> 4          28          32
#> # ... with 16 more variables
#> # ... hidden reserved variables {'.chain', '.iteration', '.draw'}

We can still just slice the two components and sum them. We could also do x[,"real",] + x[,"imag",] * 1i (note the extra commas), but it's not necessary as rvar fills in missing dimensions on the end during a slice:

x[,"real"] + x[,"imag"] * 1i
#> rvar<4>[4,1,3] mean ± sd:
#> , , 1
#> 
#>      real         
#> [1,]  2+18i ± 1.8 
#> [2,]  6+22i ± 1.8 
#> [3,] 10+26i ± 1.8 
#> [4,] 14+30i ± 1.8 
#> 
#> , , 2
#> 
#>      real         
#> [1,] 34+50i ± 1.8 
#> [2,] 38+54i ± 1.8 
#> [3,] 42+58i ± 1.8 
#> [4,] 46+62i ± 1.8 
#> 
#> , , 3
#> 
#>      real         
#> [1,] 66+82i ± 1.8 
#> [2,] 70+86i ± 1.8 
#> [3,] 74+90i ± 1.8 
#> [4,] 78+94i ± 1.8

If you don't want the (now-useless) "real" dim, you can drop it:

drop(x[,"real"] + x[,"imag"] * 1i)
#> rvar<4>[4,3] mean ± sd:
#>      [,1]          [,2]          [,3]         
#> [1,]  2+18i ± 1.8  34+50i ± 1.8  66+82i ± 1.8 
#> [2,]  6+22i ± 1.8  38+54i ± 1.8  70+86i ± 1.8 
#> [3,] 10+26i ± 1.8  42+58i ± 1.8  74+90i ± 1.8 
#> [4,] 14+30i ± 1.8  46+62i ± 1.8  78+94i ± 1.8

Which also looks right if we again convert to a format like draws_df:

as_draws_df(drop(x[,"real"] + x[,"imag"] * 1i))
#> # A draws_df: 4 iterations, 1 chains, and 12 variables
#>   x[1,1] x[2,1] x[3,1] x[4,1] x[1,2] x[2,2] x[3,2] x[4,2]
#> 1  1+17i  5+21i  9+25i 13+29i 33+49i 37+53i 41+57i 45+61i
#> 2  2+18i  6+22i 10+26i 14+30i 34+50i 38+54i 42+58i 46+62i
#> 3  3+19i  7+23i 11+27i 15+31i 35+51i 39+55i 43+59i 47+63i
#> 4  4+20i  8+24i 12+28i 16+32i 36+52i 40+56i 44+60i 48+64i
#> # ... with 4 more variables
#> # ... hidden reserved variables {'.chain', '.iteration', '.draw'}

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 8a371ac is merged into master:

  •   :ballot_box_with_check:as_draws_array: 102ms -> 101ms [-2.4%, +0.52%]
  •   :ballot_box_with_check:as_draws_df: 34ms -> 31.9ms [-19.2%, +6.81%]
  •   :ballot_box_with_check:as_draws_list: 187ms -> 186ms [-2.78%, +2.06%]
  •   :ballot_box_with_check:as_draws_matrix: 29.7ms -> 29.3ms [-2.98%, +0.79%]
  •   :ballot_box_with_check:as_draws_rvars: 158ms -> 158ms [-2.47%, +2.2%]
  •   :ballot_box_with_check:summarise_draws_100_variables: 711ms -> 712ms [-0.64%, +0.86%]
  •   :ballot_box_with_check:summarise_draws_10_variables: 78.4ms -> 78.5ms [-0.22%, +0.49%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for complex numbers
3 participants