New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing many small HDUs gets progressively slower #32
Comments
I think this is due to calling update_hdu_list after each write. It might make more sense to just append to the list rather than regenerate it from scratch, but only if it can be guaranteed there were no side effects on other extensions |
Thanks for the pointer. I see that each call to
This is significantly faster but still has the same linear increase in time per call (so quadratic growth in the total time required) with a smaller slope. I suspect this is due to the fact that you close and reopen the file after writing each HDU with a call to Line 1272 in a4b9a50
Is this always necessary? Would it be possible to provide an option to |
For comparison, this test program using
The final |
right, I should have been more clear. I plan to replace the call to update_hdu_list with something that just appends to the hdu_list. This should result in a constant time for adding a new extension. I'm out sick today, I might not get to it until tomorrow. |
OK, I pushed a change to master that makes writing a new image extension constant time. Currently I'm only using this to write images. If your tests show it works for that I'll look into tables as well. |
Your changes are working with my test program and fitsio is now about 10x faster for writing 2K HDUs than the astropy version above. Thanks! |
I am writing a FITS file with many small HDUs and have noticed that the time to write each HDU increases ~linearly with the number of HDUs already written. The following test program shows the behavior:
I find that writing the first HDU takes ~0.02ms but that this slows down to ~1.0ms for the 2000th HDU. If you set
clobber=False
above, you can check that what matters is the number of HDUs in the file, not the number written by the current process.Any idea if this is something peculiar about my system, a feature of the cfitsio library, or possibly due to the python wrapping of cfitsio? Any suggestions for avoiding this slowdown? I would like to be able to write ~100K HDUs.
The text was updated successfully, but these errors were encountered: