Skip to content

Conversation

@heltonmc
Copy link
Contributor

This PR avoids the fallback to numerical integration in the transition region of nu~x for struveh and struvek. It works by instead using the expansion of bessel series http://dlmf.nist.gov/11.4.E19.

This formula is kind of tricky because it involves computing the Bessel function within the loop and it requires a large amount of terms to converge (50-150). So if we naively do this our function will not be very fast at all. The other challenge is that the besselj provided by SpecialFunctions.jl actually allocates so if we use it within the loop we will get many allocations.

One thing we can exploit is because we only need k and then k +1 order of the bessel function is we can use recurrence. The problem is that forward recurrence with besselj is unstable when nu > x which is the dominant region that we will use this method. So to make recurrence always work we need to use backward recurrence. Though if we work backward through the loop we need to know where to start. I picked three regions that determines how many terms to use to reach good convergence and went from there. Here are the plot of the relative errors of this method.
Screenshot from 2022-08-26 12-42-13

So as long as x is not much larger than nu (covered by the large argument expansions) this will give us good convergence.
One thing is we probably want to use Bessels.jl here which will be much faster and doesn't allocate for besselj.
Here is a benchmark

julia> @benchmark struveh_bessel_series(50.0, 50.0)
BenchmarkTools.Trial: 10000 samples with 198 evaluations.
 Range (min  max):  442.672 ns  508.288 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     443.960 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   444.678 ns ±   2.727 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▂▃▄▆██▇▅▃▁                                 ▁▂▃▂▂▁           ▂
  ▅▇███████████▇▇▇▅▇▇▆▆▅▄▅▅▅▃▄▄▁▁▁▃▁▃▁▁▁▁▁▃▄▅▆████████▇▇▇▇▆▆▆▆▆ █
  443 ns        Histogram: log(frequency) by time        453 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@gwater
Copy link
Owner

gwater commented Aug 29, 2022

Interesting. This approach seems a bit delicate — as long as it works I am okay with it though. The speed up is of course impressive. Do you have any idea if this is more or less robust / efficient than (non-adaptive) Gaussian quadrature?

Also, I am fine with depending on Bessels.jl instead of SpecialFunctions.jl—as far as I can see SpecialFunctions.jl mostly provides backwards compatibility for Base. (And please kindly restore the newline at the end of struveh.jl.)

@heltonmc
Copy link
Contributor Author

heltonmc commented Aug 30, 2022

I think non-adaptive gauss could use some further investigation but straight integration of special functions are most of the time going to be the slowest as it requires evaluation of transcendental functions within a hot loop (unless the integrand converges rapidly). I set the problem up using Gauss-Laguerre integration with 250 points.

h(t, v, x) = exp(t*(1-x)) * (1 + t^2)^(v - 1/2)
#h(t, v, x) = exp((t*(1-x)) + (v-1/2)*log(1 + t^2))
function struvek_gauss(v, x; gauss_points = 150)
    t, w = gausslaguerre(gauss_points)
    return 2*dot(w, h.(t, v, x))*(x/2)^v / sqrt(pi) / gamma(v+1/2)
end
struveh_gauss(v, x; gauss_points = 250) = struvek_gauss(v, x; gauss_points = gauss_points) + bessely(v, x)

Here is a plot of the relative error...
Screen Shot 2022-08-30 at 8 21 56 AM

So not very good at all.. even just benchmarking the function within the loop

t, w = gausslaguerre(250);
v, x = 10.0, 10.0
@benchmark dot(w, h.(t, 10.0, 10.0)) setup=(w=rand(250)*100; t = rand(250)*100)
julia> @benchmark dot($w, h.($t, $v, $x))
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min  max):  3.703 μs  743.531 μs  ┊ GC (min  max): 0.00%  99.28%
 Time  (median):     3.734 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.125 μs ±  11.758 μs  ┊ GC (mean ± σ):  4.91% ±  1.72%

  █▆▃▃▂▁ ▁▂▂▁  ▁  ▁▂▃▄▃▃▂▁▁                                   ▁
  ██████▇████████████████████▇▇▇▆▇▇▇█▇▆▆▅▅▅▅▄▄▃▃▃▅▅▄▄▃▃▃▃▃▂▃▃ █
  3.7 μs       Histogram: log(frequency) by time      5.38 μs <

 Memory estimate: 2.06 KiB, allocs estimate: 1.

Is very slow. It might be that we can set this problem up better but it will always have a high floor as the integrand is expensive. For just 50 points still takes over 700ns

julia> @benchmark dot($w, h.($t, $v, $x))
BenchmarkTools.Trial: 10000 samples with 104 evaluations.
 Range (min  max):  782.856 ns   10.159 μs  ┊ GC (min  max): 0.00%  92.08%
 Time  (median):     786.856 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   797.936 ns ± 273.259 ns  ┊ GC (mean ± σ):  1.02% ±  2.75%

   ▅▇█▇▆▇▅▄▅▄▄▄▃▂▁                  ▁▂▂▁ ▁                      ▂
  ████████████████▇█▇▆▅▆▅▃▄▃▁▅▁▁▅▁▁▇██████▇▆▇▆▆▇▄▅▃▄▄▆▃▄▅▆▅▆▅▆▆ █
  783 ns        Histogram: log(frequency) by time        832 ns <

 Memory estimate: 496 bytes, allocs estimate: 1.

Surprisingly the Bessel power series is more robust than any of these methods (power series, asymptotic expansion, fixed integration) as it has the largest range of validity. The trick is setting it up so it isn't super slow which just requires determining the number of terms in the sum ahead of time.

I think switching over to Bessels.jl would be best in a separate PR as there are several other things to consider gamma, loggamma.. that I've been working on so might be best when that is complete.

Also - is tagbot working now? I noticed that there still wasn't a release on github yet but probably needs a retrigger after the edits.

@codecov-commenter
Copy link

codecov-commenter commented Aug 30, 2022

Codecov Report

Base: 64.03% // Head: 77.46% // Increases project coverage by +13.43% 🎉

Coverage data is based on head (dac389d) compared to base (f8319fd).
Patch coverage: 78.57% of modified lines in pull request are covered.

❗ Current head dac389d differs from pull request most recent head dd0bbfd. Consider uploading reports for the commit dd0bbfd to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           master      #13       +/-   ##
===========================================
+ Coverage   64.03%   77.46%   +13.43%     
===========================================
  Files           4        4               
  Lines         228      324       +96     
===========================================
+ Hits          146      251      +105     
+ Misses         82       73        -9     
Impacted Files Coverage Δ
src/struveh.jl 87.16% <78.57%> (+16.39%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@heltonmc heltonmc marked this pull request as draft August 30, 2022 21:58
@heltonmc
Copy link
Contributor Author

heltonmc commented Aug 30, 2022

Let me add the large order asymptotic expansions here to this as well. The bessel series works very well though for orders above 1,000 it will need more terms (>300) which is fine we just have to add the log series for it but at that point the large order expansion should be very convergent so we should have that in here instead. (We should really really favor that expansion in general over the bessel series for large orders).

Edit: The large order expansion also requires x to be large so we can employ this when nu < evalpoly(x, (-3.0, 0.04, 0.00255)) which is ok but it now limits the range where the bessel series expansion is used which allows for better control of backwards summation. I'll spend some time optimizing the large order expansion and add that soon.

@heltonmc
Copy link
Contributor Author

Ok I've added the large order and large argument expansion. This expression is slightly different than the large argument expansion for struvek as it will work also for large orders and when x < nu. I've set this up to be by fair the fastest algorithm we will employ.

julia> @benchmark struveh_expansion((100.0), (x)) setup=(x=rand()*10)
BenchmarkTools.Trial: 10000 samples with 976 evaluations.
 Range (min  max):  68.007 ns  116.376 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     69.117 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   69.335 ns ±   1.241 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▃▂    ▄██▇▆▆▃▂▁                      ▂▂▁                  ▂
  ▄▁▄██▆▅▆▄██████████▇▅▃▄▃▄▅▅▃▅▄▃▄▅▆▃▅▇▅▇██████▆▅▄▄▄▁▁▃▅▃▃▄▅▅▆ █
  68 ns         Histogram: log(frequency) by time      73.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

but of course only used for large orders. This is not auto-vectorizing which will decrease the speed by half but hopefully I can figure out why in the future and we can make those changes. Not really sure how often any user hits this range and it's already much faster than anything else... but it takes care of the region where the bessel series is slow to converge and puts a nice upper bound on where we employ that so the number of terms we use is more robust there.

@heltonmc
Copy link
Contributor Author

heltonmc commented Sep 1, 2022

I've fixed a few issues now close to the subnormal range where struveh goes to zero.
Screenshot from 2022-09-01 14-12-51

Here is a screenshot of the relative errors of the function. There is only one region where we are not getting at least 13 digits of accuracy which is close to where struveh goes to zero when x is between 200 and 400 and v is large. The issue here is that the bessel series expansion doesn't converge fast enough and besselj goes to zero faster than the struve function which doesn't make that very useful. I tried doing it in the forward direction but didn't produce good results either. I've fallen back to just using the large order expansion in this region. Unfortunately, I only have about 10 terms of the large order expansion derived which this region might require more. So in the worst case it gives about 5 digits of accuracy but usually gives around 8. The absolute errors are well below 1e-270 in this region

@gwater
Copy link
Owner

gwater commented Sep 2, 2022

straight integration of special functions are most of the time going to be the slowest as it requires evaluation of transcendental functions within a hot loop

Of cocurse, that makes perfect sense. Thanks for pointing that out

@gwater
Copy link
Owner

gwater commented Sep 2, 2022

Also - is tagbot working now? I noticed that there still wasn't a release on github yet but probably needs a retrigger after the edits.

Yeah, I don't know what's up with that. There is a release in General registry but tagbot is not reacting

@gwater
Copy link
Owner

gwater commented Sep 2, 2022

I think switching over to Bessels.jl would be best in a separate PR

alright

@heltonmc heltonmc changed the title Use expansion in terms of Bessel series Use expansion in terms of Bessel series and large order expansion Sep 2, 2022
@heltonmc heltonmc marked this pull request as ready for review September 2, 2022 17:40
@heltonmc
Copy link
Contributor Author

heltonmc commented Sep 2, 2022

Ya CI is finicky but I think it looks ok now at least on the general registry? I've added a few more tests so that we fix the coverage issues so that all branches are hit and updated the docs.

But ya - there's some movement on like moving gamma functions out of Bessels.jl and SpecialFunctions.jl so I'll make the PR here when that settles down a little bit! It'll also allow me to test better those changes.

@gwater
Copy link
Owner

gwater commented Nov 11, 2022

@heltonmc Are you okay to merge this? (Sorry, I was distracted for a while)

@heltonmc
Copy link
Contributor Author

Not a problem! I still wasn't super happy with the errors on this anyway in that one region I showed above. To try to alleviate that I derived more terms in the expansion which helped decrease the range where a slight loss in accuracy occurred. Here is the updated relative error.

Screenshot from 2022-11-11 17-18-58

So there is a small region there where the function goes to zero where we only get about 5 digits... The issue is that the bessel series can't converge there because the Bessel function underflows to quickly so we can't use enough terms. This can be solved by having a scaled version of the bessel function for large orders that doesn't underflow.

I would probably need access to the internals of the Bessels.jl library to make this feature work well.... there is already starting to be a lot of code overlap that I've copied over here. So at that point I don't know if it's easier to just move the Struve functions over to Bessels or not...

@gwater
Copy link
Owner

gwater commented Jan 26, 2023

For the record, I'm perfectly fine with merging this functionality into a larger package where code sharing is possible. I only started it because back then, there wasn't an effort like Bessels.jl

@heltonmc
Copy link
Contributor Author

heltonmc commented Jan 26, 2023

Sorry this has really slipped by while I worked on my dissertation.

I used to be more supportive of having separate packages because of package load and compilation times so each could be a lightweight dependency. But now with v1.9 dropping soon and pkgimages this kind of math code (with only a few types) can be generated and precompiled. Now load times are negligible, these special functions can share already compiled code, so it might make sense to have this under one repo where we can better control package load time, precompilation, and of course maintenance with invalidations etc.

I would be in favor of moving this code over as well to take advantage of the bessel subroutines needed to move this forward. I did think about this problem some more and I think the best way to do this is to actually do a Miller type scheme with downward recurrence then have a normalization condition at the end. That way we are normalizing to the lowest order which will avoid overflow instead of starting at terms that could potentially underflow. It will need some care but I think that is the best way.

@heltonmc heltonmc mentioned this pull request Apr 13, 2023
This was referenced Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants