Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revised color model #941

Merged
merged 10 commits into from
Oct 26, 2023
Merged

revised color model #941

merged 10 commits into from
Oct 26, 2023

Conversation

jeff-regier
Copy link
Contributor

While working on a case study, I noticed that we're generating some crazy fluxes. I saw cases where the z-band flux was 10,000,000 times as large as the r-band flux. In real data, I don't think you'll find many (any?) objects where fluxes for consecutive bands vary by more than 1 or 2 orders of magnitude, so our color prior seems off here.

This PR partially fixes the problem. I retrained the color model, and revised it to be more canonical: I fit the GMM to log flux ratios for adjacent bands (rather than comparing everything to the r band). Then, in the prior, I clip flux ratios to avoid the most extreme cases. I'm not entirely satisfied with this solution but it seems like an improvement.

@sawanp813 Please check out my changes to load_ratios.ipynb notebook.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link

codecov bot commented Oct 26, 2023

Codecov Report

Merging #941 (978bc34) into master (7d93f51) will increase coverage by 0.00%.
The diff coverage is 97.82%.

@@           Coverage Diff           @@
##           master     #941   +/-   ##
=======================================
  Coverage   96.24%   96.25%           
=======================================
  Files          25       25           
  Lines        3252     3254    +2     
=======================================
+ Hits         3130     3132    +2     
  Misses        122      122           
Flag Coverage Δ
unittests 96.25% <97.82%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
bliss/convnet.py 100.00% <100.00%> (ø)
bliss/encoder.py 98.07% <100.00%> (+<0.01%) ⬆️
bliss/simulator/decoder.py 100.00% <100.00%> (ø)
bliss/simulator/prior.py 100.00% <100.00%> (ø)
bliss/simulator/simulated_dataset.py 97.87% <ø> (ø)
bliss/surveys/dc2.py 100.00% <ø> (ø)
bliss/surveys/sdss.py 95.42% <ø> (ø)
bliss/image_normalizer.py 93.42% <92.30%> (+0.17%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jeff-regier jeff-regier merged commit 893d664 into master Oct 26, 2023
3 checks passed
@jeff-regier jeff-regier deleted the jr/color-model branch October 26, 2023 15:18
@jeff-regier
Copy link
Contributor Author

@sawanp813 Near the end of load_ratios.ipynb I may have caught/corrected a bug. It appears that the star color GMM was being saved to gal_gmm_nmgy.pkl while the galaxy color GMM was being saved to star_gmm_nmgy.pkl.

@sawanp813
Copy link
Collaborator

Changes look good - I actually was aware of that issue (seeing crazy ratios being sampled in the other bands) but didn't get around to correcting/clipping when we populate each source's flux in simulator. Thanks for catching that!

Is there a particular reason why we take the ratios relative to adjacent bands? It definitely is a reasonable method - is it just to avoid outliers solely with respect to when we take the ratios against the r-band?

@sawanp813
Copy link
Collaborator

@sawanp813 Near the end of load_ratios.ipynb I may have caught/corrected a bug. It appears that the star color GMM was being saved to gal_gmm_nmgy.pkl while the galaxy color GMM was being saved to star_gmm_nmgy.pkl.

I'm guessing I had the order of the models switched when saving via pickle - thanks for noticing that.

@jeff-regier
Copy link
Contributor Author

I'm not sure whether taking flux ratios with respect to the r band was the problem. I did it this way instead because it's more common/idiomatic to model log ratios of fluxes in adjacent bands: these are what are called "colors". The new GMM seems to be working better though. With the old one, once I started clipping unreasonable samples, a lot were being clipped. With the new GMM, it's much less frequent.

@sawanp813
Copy link
Collaborator

Right - it makes sense to consider adjacent bands since that's how we are defining colors. And if the new GMM doesn't have as many crazy samples, that's also a positive!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants