-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the speed #2
Comments
The first thing I would check is how much time the code is spending in each
function, and especially how the SED is passed around. I suspect it is
reading the SED in new each time. Better would be to pass an SED object of
some kind, that was already in memory. Can you check the lenspop code and
figure out a workaround?
|
I ran the |
Hi guys, I believe it shows that @jennykim1016 was right to point out the line she did above, but it's not the call to I think we could speed up this problem using a lookup table, that is, by pre-computing the distance modulus for a bunch of thinly-spaced redshift values, and then for each iteration of the lens-painting, rather than going through all that again, we just look up the appropriate pre-computed value! It's also possible that
|
Thanks Mike! Tom almost certainly implemented some sort of lookup table
solution in his lenspop script. Jenny, can you see if you can find it
please?
…On Feb 2, 2017 15:53, "Mike Baumer" ***@***.***> wrote:
Hi guys,
I ran a profile (timing each of the subcalls invoked by a piece of code to
find the rate-limiting step)
of the db.paint() code (see the relevant parts below). The first column
shows the number of times a subroutine was called, and the second shows the
total amount of time spent on that subroutine by db.paint().
I believe it shows that @jennykim1016 <https://github.com/jennykim1016>
was right to point out the line she did above, but it's not the call to
tools.ABFilterMagnitude() that takes up that much time (.189s) It looks
like most of the time is spent calculating distance-related quantities
(~2s).
I think we could speed up this problem using a lookup table, that is, by
pre-computing the distance modulus for a bunch of thinly-spaced redshift
values, and then for each iteration of the lens-painting, rather than going
through all that again, we just look up the appropriate pre-computed value!
It's also possible that astropy.cosmology already implements a scheme
like this, since its distance computations seem faster than the one Tom
implemented by hand in distances.py.
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 5.393 5.393 db.py:280(calculate_rest_frame_r_magnitude)
1 0.000 0.000 5.510 5.510 db.py:322(paint)
405 0.003 0.000 0.033 0.000 distances.py:109(distance_modulus)
2 0.000 0.000 0.000 0.000 distances.py:16(__init__)
73620 0.550 0.000 3.817 0.000 distances.py:50(comoving_distance)
2190216 1.904 0.000 1.904 0.000 distances.py:66(<lambda>)
20905 0.049 0.000 0.993 0.000 distances.py:73(comoving_transverse_distance)
20501 0.030 0.000 0.995 0.000 distances.py:87(angular_diameter_distance)
404 0.001 0.000 0.029 0.000 distances.py:92(luminosity_distance)
401 0.004 0.000 3.333 0.008 distances.py:95(comoving_volume)
52715 0.201 0.000 3.073 0.000 distances.py:99(<lambda>)
5257 0.147 0.000 0.257 0.000 fitpack.py:318(splrep)
3017 0.028 0.000 0.084 0.000 fitpack.py:533(splev)
5034 0.024 0.000 0.037 0.000 fitpack.py:616(splint)
2 0.026 0.013 0.026 0.013 fitpack2.py:1137(__init__)
73620 0.106 0.000 3.103 0.000 quadpack.py:358(_quad)
73620 0.151 0.000 3.267 0.000 quadpack.py:42(quad)
3504 0.100 0.000 3.195 0.001 quadrature.py:111(vfunc)
3103 0.031 0.000 3.270 0.001 quadrature.py:537(_difftrap)
10674 0.011 0.000 0.011 0.000 quadrature.py:563(_romberg_diff)
401 0.040 0.000 3.329 0.008 quadrature.py:589(romberg)
401 0.000 0.000 0.000 0.000 quadrature.py:82(vectorize1)
2416 0.189 0.000 0.501 0.000 tools.py:90(ABFilterMagnitude)
73620 1.093 0.000 2.997 0.000 {scipy.integrate._quadpack._qagse}
3017 0.030 0.000 0.030 0.000 {scipy.interpolate._fitpack._spl_}
5034 0.013 0.000 0.013 0.000 {scipy.interpolate._fitpack._splint}
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AArY95ZjdiMFmN9FGz30DJ8kO-np9VF-ks5rYmx6gaJpZM4LtM8j>
.
|
Thank you so much @mbaumer and @drphilmarshall ! I was just about to close this issue, and I realized I have not replied to the messages. I was not able to find lenspop's lookup table, so I tried to make some mock dictionaries and tried to emulate the lookup table. Speed did not increase significantly. Then I realized maybe the decomposition might be the issue. If there is an additional function call, it obviously will take more time and memory. Indeed, the low speed turned out to be a problem due to the function decomposition. The only problem with this is that if we do not decompose the function, it looks really ugly. The screenshot below contains only the quarter of the function, but it looks so intimidating. It is up to us to choose either functionality or style. I think the rules of thumb is if the function is more than 30 lines then we should decompose, but it would significantly lower the speed. Currently, the non-decomposed In addition to that, @mbaumer pointed out, I made Let me know whether we would be okay with the long, non-decomposed code. I think I will go to sleep right now, so it would be great if we could decide tomorrow. Thanks! |
Excellent analysis, Jenny. I share your concerns about lack of modularity
but that speed-up is too good to forego! Let's stick with the fast,
long-form method, and simply issue "Improve modularity of paint method"
with milestone "Someday" and keep going. We can chat about *why* the
function takes so much longer when decomposed when we meet. Nice work!
|
The speed of the
paint
method indb.py
is too slow. Mac does not seem to handle it well.Algorithm wise, maybe I could try reducing bigO by using some efficient data structure. I am not sure how to reduce more because the code should at least run N times.
The other thing I could try is multi-threading, but creating N threads would not optimal. Maybe think of putting every data into a set of sets where each set has <100 elements, and start threading?
The text was updated successfully, but these errors were encountered: