-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script to evaluate Celeste on a single frame #95
Comments
You'll also want to include a local linearized world coordinate system. I think we may need to treat large galaxies differently in more ways than subsampling. Take Andromeda, for example, and imagine hypothetically that we had enough computing power to actually fit the ELBO jointly for it and all the celestial objects in front of it. Our galaxy model is pretty primitive, and would not capture most of the variation in light from Andromeda. The ELBO would then try to explain deviations from our primitive model for Andromeda and the actual light in part with parameters from the objects in front of it. In other words, by trying to fit the residuals from Andromeda, which will be highly structured, Celeste will bias the objects in front of it. For truly large galaxies like this, we might be better off treating the large galaxy as background noise, which we would have to estimate ourselves, possibly iteratively. One way to think of this is as our "model" of Andromeda is just a rasterized pixel map, which we would then fit jointly with all the objects in front of it. Andromeda's catalog entry would then be derived from this pixel map, rather than fit jointly with everything else. I'd be interested to hear other ideas, though. How does the current catalog handle this problem? |
I added linearized world coordinates to the list. It's hard to predict exactly how large galaxies are going to give us trouble. How about we first try to optimize each object independently, with the data encoded as described in this issue, to get familiar with running at scale and to see where the model make mistakes? @rcthomas , is encoding the data this way something you might be up for working on, along with me and ryan? |
Oops sorry I missed the mention. Yeah I think I can help with this; let's talk about it at the meeting tomorrow. |
I put the current version of relevant files in |
Hi @kbarbary -- I've been thinking a bit more about how this issue. You might start by consolidating a lot of the scripts in the
Stages might include "preprocess" (generates a JLD file of Task objects), "infer" (runs OptimizeElbo for each Task, and outputting a fits catalog) , and "score" (compares outputted catalog to a catalog build for "coadd" images). Each stage reads the previous stage's output from disk and writes its output to disk, in a file named after its stage, run, camcol, and field. Operating on run-camcol-field triples, rather than bricks/bulkans, should get us started generating big catalogs more quickly. |
To reiterate a point that Rollin made now that Kyle is in the room, running On Thu, Feb 4, 2016 at 10:30 AM, Jeffrey Regier notifications@github.com
|
The fits files are mostly (all?) already in the cosmo repository at On Thu, Feb 4, 2016 at 10:38 AM Ryan notifications@github.com wrote:
|
I have mapped the files that you are getting via download_fits_files.py to where they are on the NERSC global file system. See e.g. from Cori: /global/projecta/projectdirs/sdss/data/sdss/dr8/boss/photoObj/301/1000/1 This corresponds to looking here:
Note also that data.sdss3.org is sdss3data.lbl.gov; so this is where they are serving the data up from that you are getting with download_fits_files.py. |
Thanks for the additional pointers. Off-topic: It's not the highest priority, but I started working on a Julia replacement for the tractor bits, based on the Python astroquery.sdss module. This module is BSD-licensed, so vastly preferred over porting tractor, which is GPL (and therefore incompatible with the Celeste and SloanDigitalSkySurvey licenses). It's not much code so should be pretty quick. |
Ryan's point is that tractor doesn't have to be a "dependency" at all. You generate the inputs once as csv and you are done. That simplifies the problem a lot. There is a setup that works on Cori under hpcports, you run the setup once and store that as file data. |
Were we using tractor for something other than downloading fits files? On Thu, Feb 4, 2016 at 11:11 AM R. C. Thomas notifications@github.com
|
It's a bit more than wget but it's a limited subset of functionality. |
Thanks, those smaller more detailed issues are helpful. Now that the WCS dust has settled, I can move on to working on those. |
Through pre-processing, transform the SDSS dataset into "tasks", 1 per known astronomical object. The tasks are bundled into JLD files. Each bulkan/brick gets its own JLD file. For each astronomical object, the corresponding task contains everything necessary for processing that object, including
For large galaxies, we probably want downsampled pixels rather than the original pixels.
Neighboring astronomical objects are those that may contribute photons to any of the included tiles.
In the future, we may want to include just identifiers for the neighboring astronomical objects, rather than the SDSS catalog entries for those neighbors. That would allow for an iterative procedure, where better estimates of an object's neighbors' parameters also improve estimates of the object's parameters.
The text was updated successfully, but these errors were encountered: