-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slam script incorrect ResultInterferometer and ResultImaging adapt_image calls #252
Comments
Thanks for the issue :). I did a lot of updates to It sounds like you landed on the fix yourself, but post here if you have any other issues or the search which started appears to be stuck. |
Yeah Im unsure on the error above, but it seems reasonable that the results folder had a few incorrect files output there and rerunning it from scrtch (by renaming a few folders) would sort it. If errors persist I can do an end-to-end interferometer run myself and check that it all works smoothly. |
I have pushed a fix to the However, you may wish to edit your source code with the fix manually, just to make sure the code does not lose backwards compatibility with your old results (I don't think it would, but better safe than sorry). The fix requires you to go to this script in your virtual environment: https://github.com/Jammy2211/PyAutoArray/blob/main/autoarray/inversion/plot/inversion_plotters.py Go to line 282:
And replace it with the following code:
|
Thank you for the fast fix! Works like a charm now. |
Hello again, I have noticed that actually my pixel run started from scratch instead of continuing/finishing the run and producing all the required files, i.e. ".completed", "model.results", "search.summary", etc. So I think that, in the step of updating the final results, where it produced the error last time, it automatically removed the checkpoint.hdf5 file which enabled it to start again from there. Now it always wants to start from the beginning, even though the sampling is supposed to be finished. The other files, see screenshots, are still there, and also "samples_info.json" shows that it has all 26200 samples. Is there a way to either: create the checkpoint hdf5 file from what I have from the json and csv files? |
Can you try putting a Autolens checks if this file exists, and if it does it should bypass the search and load the result for doing stuff like visualization from |
That's a great idea! It works. Out of curiosity, should the second analysis for the second run in source_pix.py have a positions_likelihood from the parametric run as the first pixel run did? Speaking about interferometer data, of course. |
You are defo gonna wanna make sure your |
Strange, are you able to input a No idea why that exception is being raised, I'll note its a bug. Alternatively, you can prob also disable it by changing the following setting in
|
Two ways worked for me. One is:
And alternatively setting: While it is running now, it is only using a single core and only spawns one thread to generate the initial samples of the
I already have set the Edit: Nvm, now it did spawn the other threads. However, it is only using a single core. Before, I've only seen this when the multiprocessing didn't work. I tested DynestyStatic on my laptop and the multiprocessing works fine (for "Imaging" data), all 4 cores I set up were being used. |
Not currently, I think, I can raise an issue on I'm unsure why the threading isn't quite doing what you'd expect for |
Edit 2: This might be related to the code running on an hpc. I did the same test that I did on the laptop on an hpc and still had the same issue. On the hpc, changing It's a bit strange that you can set
I think that would be great for the future, for using ALMA datasets with millions of visibilities. But compared to total computation time it's probably only a minor time saver, great for testing though. |
Can I just check have you set up all this stuff correctly:
|
I will raise an issue on I think part of the problem is that the code is not displaying its set up to you and you're having to do annoying things to check if |
Thanks! In the sh file for starting the run I write:
So that should be fine. |
Small update: |
I dont think we ever implemented parallel initializer, got an issue up now: |
How many visibilities are you modeling? Can you send the full information of the With a time per sample of 2 mins there probably isn't much you can do to speed things up by changing Dynesty. We are working on a faster visibilities fitting method but... its not an easy problem lol. |
If you input an inaccurate lens model into a fit, it can often lead to a ilnear algebra error. This wouldn't happen if you input an "accurate" lens model. The input thing at the start obviously relies on the random model you put in working... I forgot this could happen when I wrote the docs.
I'm a bit confused, why is this giving a fit time of 1.45 if your actual runs are > 2 mins? Are you inputting the same data? Can you send the |
I just realised this today and was able to run the script to get the
part of the settings for the analysis object that is input in the run_time calculation. And since it doesn't check now, I got the
I was inputting the same data, however I used
with the
and
From the
|
Can you send the SLaM script you are running plz |
I basically want to check if your SLaM run is using these settings, which seemingly sped up your code 1.45 seconds:
|
Main script:
|
source_lp2Lenses script:
|
And lastly source_pix2Lenses script:
|
Finally, can you send me the script where you get a Fit Time of 1.45? |
I have overwritten it by now but I should be able to recreate it relatively quickly. |
Great. Your scripts look correct (e.g. I am now wondering if when you did the profiling test to get 1.45 seconds, your Your real space mask is very high resolution (e.g. it has a lot of image pixels), so the slow run times are not that surprising to me. |
Ok, i was able to recreate it:
From the following code
|
I suspect the fit time being 1.45 seconds is due to an input error, albeit tracing the exact root cause would be a bit of a nightmare. Are you able to simply input your data and mask into this function with your settings to get the fit time? It does not matter if things like the Obviously if you get a linear algebra error then this is no good. |
Not sure if I can directly translate this. For the Hilber image mesh, I need an I can give you the results for an overlay with shape (34,34) for now. |
Yep, |
Results from the
Fit time is long, as should be, i suppose. |
Ok, at least we have confirmed things are behaving as expected now. The fit times of 1.45 were due to an input bug, I sort of know the problem (its to do with I anticipate your run times basically scale directly with number of visilibites (albeit you may as well quickly check that changing At this point, I think your options are to either reduce the number of visilibites to do the fit, or bite the bullet and accept these extremely long run times. (I dont know what your science case is, but you can probably devise a strategy which fits with a reduced number of visibilities and then reconstructs the data and source with the full amount, using the max LH lens model). |
Thank you for the support and explanations.
Good to know, should I encounter this again sometime. I'll play around and see where I can optimize.
This sounds like a possible way forward, yes. That would mean doing a parametric and the first pixelation run with the reduced number of visibilities and then using that lens model for the second pixelization with the "full" number of visibilities, correct? Basically, I need to resolve as much as possible and follow the differential magnification of the image back to the spatial distribution of the source. Most likely, I will be able to use up to 32 cores and do a bunch of runs in parallel on different nodes on the hpc that I am using. This will be necessary since I want to do this for four galaxies with multiple continuum datasets and for a couple of emission/absorption line bins. Though for most of them, I can use the max LH lens model from the previous fits. |
I think you need things to run fast all the way through to I would then, at the end, output the quantities you need using the full visibilties dataset, using the max LH lens model. You can then get on with figuring out your source analysis scripts using these reconstructions, and make sure everything is looking like it'll work fine. In the mean time, whilst doing that, you can set off some |
Thank you for the advice! Seems like a good plan to move forward with for now. |
So this comment has occupied me for a little while. The shape_native parameter in the realspace mask certainly does impact the fit time a bit. But it's definitely the visibilities:
And the polarization averaged version ~2.5mil visibilities:
~5mil visibilities
~2.5mil visibilities
Another question I had, was related to the PyNUFFT paper as they use very detailed masking for their image. I can already implement quite complex masks,
Is this the correct behavior, or is the mask ignored somehow? |
Yeah, so now that I've had a chance to think over it your profiling all looks correct, at high numbers of visibilities (e.g. > 1 million) everything is fast compared to the NUFFT, which is called many times for a pixelized source. We are working on a solution to this (see JAX lol) but its a whilst off.
I would expect that it does speed things up (surprising it didnt for you) but as you've shown above, in terms of run times its secondary to calculations which scale with the number of visilbities. So I'm not sure you gain much using a tigher mask. Visibilities associated calculations are performed N times, where N is the number of source pixels. So reducing this might speed things up, but again I doubt it'll move you from the world of "really really slow". |
Thank you for your answer. I am still testing how much quicker it runs on the HPC when reducing the visibilities. With quite a bit of testing later, I was able to comfortably reduce the dataset to ~675000 visibilities without losing very important information. Amazingly, with the JAX fix I can finally just run the new reduced dataset with and without the advanced masking in parallel on different nodes. Having reduced the visibilities again, I got down to fit times of ~90s from the run_times.py test. So at the very least the fit time was cut in half.
So from additional testing, I have found that most of the time saved can be achieved through reducing the image mesh pixels. I wasn't sure you meant this when talking about the number of source pixels. Going from a shape of 32x32 to 16x16 did cut the total time in half. I will see how this applies also to the different datasets. |
It looks like the better the mask is the lower I can make the image mesh shape and still reconstruct the source nicely. |
Are you using the Ah, this requires a circular mask, so would require you to change to that. The flip side is the |
Yes, I am using the |
I would stick to a single mask, whilst it may be possible thigns like the regularization parameters depend on the mask and trying to swap masks during a fit sounds error prone. |
Between |
To summarize: Now, since I cannot further average the visibilities without losing important information for my science, we find that the only other factors significantly affecting run times are the For These three, I have to balance somehow and still get good source reconstructions and run times, correct? Also fit time from |
I think what you are saying is correct. The one problem I can imagine happening is that you compute an
Yes, this sounds like a good summary. We really need to speed up the interferometer code 🤣 I would be inclined to suggest you go down the
Is this for different mass models? I would guess that run time can change a lot depending on what the mass model does. |
Can you help me understand how adapt_image works for interferometric datasets? I thought that the adapt_image is the image plane best fit of the previous search that is then inverted again to visibilities, which are then used in the analysis object for the current run. So it should produce visibilities that we want to fit similarly to. However, once you invert these visibilities again we should automatically lose all information about the mask, since the resulting image has to be continuous (due to the overlapping fringes), no?
I wish it wasn't so, because this will be very difficult to balance. Me hoping that there is a solution for this somewhere 🙏 lol
Unfortunately, i won't get enough speed out of just reducing the Hilbert pixels. source_lp was quick at ~5h, source_pix[1] took ~20 days and source_pix[2] 22 days with the standard settings, with mass_total still running. Even if this is sped up by a factor of 3, we're still talking about more than a month. And there will be ~4-12 datasets that will have to go through this full pipeline. So at least also the real_space_mask needs adjusting (in shape and pixel_size).
Honestly, It might have been human error lol. It might have been related to using a fixed mass model and a free one, but I did so many tests and didn't document them in enough detail for this. Let's disregard this for now. |
I finally bit the bullet and reduced the circular Did you already find a way to set the image_mesh that optimizes run_time and has the source be well enough sampled in the pixelated grid? |
The Therefore, thinking about it, the
I think this sounds correct, I'm not the interferometry expert so can't give you a full answer! I dont think larger masks will mean more noise over fitting, the mask essentially defines in real space where you are modeling the lens.
Yep, its not ideal, but interferometry is just very slow :(. At least you have a clear assessment of the run time options now, I will keep trying to think of a way for you to speed it up.
Sounds good, it depends on your lens but 5.5" still sounds very large to me, I'm used to ALMA lenses before < 3.0" and therefore masks of that size being appropriate. What are the Einstein Radius of your lenses?
This isn't really possible, the The Make sure you are using these settings in your SLaM pipeline, they ensure the Hilbert mesh adapts better:
|
Thank you for extensive answers!
Thank you for the explanation. Is the adapt image then only used for creating the pixelation (grid?), i.e. to not pick up any spurious signal or noise we might get from using the previous dirty image? Wouldn't it be better to do this in uv space then, since the dirty image, model or data, always have correlated noise? Or is there a S/N cut you apply to not pick up the noise somehow?
Good to know!
No worries, I am also still learning about it. Usually, when going from uv space to image space both image size and pixel size are related to the size of the uv grid and its cell size, i.e. how the data is sampled. So it it correct, that
The einstein radii are 2.4868" and 0.1686". And the image is a little bit off center, with a 4.0" circular mask at the center being sufficient in picking up all important flux. However, it was not good in run time to use this mask. Somehow a 5" or 5.5" mask worked better than 4 or 4.5" mask. Even 8" was pretty fast. In the end, I guess the run time then depends on the image mesh size.
Good! This is what I assumed was the case. With your quoted settings, I should at least get the factor ?5? potentially more pixels for the bright regions, which should be plenty for a magnification of factor ~20. Since I didn't mention it yet (didn't want to tempt fate lol), with the above changes I am looking at factor 5 improvement for the source_lp, and even a factor ~10 improvement in run time for source_pix[1]. That's already fantastic, as 2 days run time is so much better than 20 lol. But it isn't finished yet. |
The adapt image is used to:
Both the tasks above are controlled by the parameters fitted for in The
This sounds correct.
Strange, I guess FFT's run times depend on crazy things like whether certain arrays are powers of 2 lol. You should probably try a smaller mask for the 0.1686" lens, feels like you're making the calculation slower than necessary.
Yep, this sounds realistic. We have done a lot of careful crafting of our settings in the past, and this took us from months to days... so its defo doable! Apologies its such a nightmare though, and the docs are stlil a bit lacking. |
Thanks again for the quick reply! It's so helpful!
Eventually, when I get to my lower S/N data this might come up again, but I hope for now this will be the case throughout.
This is great! Because then we can just push down shape_native down to the FOV of the antenna at that frequency for a pixel_scale with 5 pixels/beam, use the smallest circular mask as possible that captures the emission and then adapt the
Oops, my mistake this is just the other lens galaxy in the same image. There are two with different einstein radii in the same image with some offset. The other einstein rings for the other data sets are all either smaller or minimally larger.
Thank you so much for your support! This is so much better than all the other lensing codes I tried before. Your code has already good documentation and I really appreciate your support! Doing this on my own would take many more months without your input. |
Running the slam script source_pixelized.py on my own data, I found that line 77 in source_pix.py in slam/ produces the following error:
![image](https://private-user-images.githubusercontent.com/142518351/302033803-66b5c86b-9153-4803-9dc4-b8a023a7591e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkyNzE5NDksIm5iZiI6MTcxOTI3MTY0OSwicGF0aCI6Ii8xNDI1MTgzNTEvMzAyMDMzODAzLTY2YjVjODZiLTkxNTMtNDgwMy05ZGM0LWI4YTAyM2E3NTkxZS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYyNFQyMzI3MjlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0wYTU4OThkNzRhYzk4OGRjYzRiOWIyNWJkOWU4YWJmMWZlY2M1NjNlZjEyYjQyZjJiMjI1Mjg0MmMzMjJmZDNmJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.p69qwqotucjOgPB7wgKL3CljIdEUdpUncbIwC8A6qVk)
The issue may be that it calls the adapt_images from ResultInterferometer objects incorrectly. Looking into ResultInterferometer, I found that this class has the "adapt_images_from" function. Replacing any instance of "source_lp_results.last.adapt_images" in source_pix.py (and equivalent calls further down the pipeline!) with "source_lp_results.last.adapt_images_from(use_model_images=True)" appears to be a fix. However, I only see that the source_pix search is running and don't have results yet.
I tested if the adapt_image problem at least also appears for the imaging test data "simple__no_lens_light" and it produced the same error:
![image](https://private-user-images.githubusercontent.com/142518351/302033570-577a3954-f892-4f73-850d-d9c94609e813.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkyNzE5NDksIm5iZiI6MTcxOTI3MTY0OSwicGF0aCI6Ii8xNDI1MTgzNTEvMzAyMDMzNTcwLTU3N2EzOTU0LWY4OTItNGY3My04NTBkLWQ5Yzk0NjA5ZTgxMy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYyNFQyMzI3MjlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jMzYxZGE1MzM0ZTkyMWRiMzNlNjA3YmJhNDhkZjhkMWM1YTViOTYxMjU5Y2JhNGZmNzlmMGIxNDBkYjZiMTFiJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.lcEN1_MLvxAccKJ63puNa40k-kZoBYBPJppRb60o4Ng)
However, I don't have a workaround in this case.
The text was updated successfully, but these errors were encountered: