Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-30254: Fix jointcal crash when doing outlier rejection on only the model #182

Merged
merged 9 commits into from Jun 10, 2021

Conversation

parejkoj
Copy link
Collaborator

The new code here is not exercised by any existing tests, and it's not something I can readily mock up with a new test. It does allow us to try out an alternate fitting approach in DM-30252, and I hope that ticket produces a test case that uses this code.

Copy link
Contributor

@cmsaunders cmsaunders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good, but I have a few concerns with the new variable names and with the refStar treatment, as described in my inline comments. It might make sense to run cleanFittedStars at the end of _iterate_fit automatically so that in the higher level functions one doesn't have to think about whether outlier rejection is turned on in _iterate_fit.

include/lsst/jointcal/FitterBase.h Outdated Show resolved Hide resolved
return chi2;
}

std::size_t FitterBase::findOutliers(double nSigmaCut, MeasuredStarList &msOutliers,
FittedStarList &fsOutliers) const {
// collect chi2 contributions
Chi2List chi2List;
chi2List.reserve(_nMeasuredStars + _associations->refStarList.size());
chi2List.reserve(_associations->refStarList.size()); // TODO: do we need to reserve anything?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a great sense of whether anything needs to be reserved here. However, I don't understand changing this. Isn't the final size of chi2List going to be determined by accumulateStatImageList and accumulateStatRefStars below? Neither of these functions is being changed here, so why change the reserve line? I assume that _nMeasuredStars is much bigger than _associations->refStarList.size(), so I am not sure that the reserve command will be useful without including _nMeasuredStars.

Copy link
Collaborator Author

@parejkoj parejkoj May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_nMeasuredStars was always zero (it must have been a placeholder for something, but was never implemented), so it wasn't doing anything here.

I've added Associations._maxMeasuredStars, calculated at initialization, so we can reserve a maximally-sized vector. It'll be bigger than necessary after outlier removal, but I think that's not a big problem, since this chi2 list is a tiny part of the memory footprint (it holds a pointer and a chi2 value for each star). Reserving space here means that neither of the accumulate methods need to ever resize the vector.

LOGLS_TRACE(_log, "Outlier refStar not removed (not fitting fittedStar/refStar component) "
<< *(fittedStar->getRefStar()) << " chi2: " << chi2->chi2);
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like "refStar is outlier but not removed when not fitting Values parameters"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a deeper level though, I think that if refStars are not removed here, they also probably shouldn't go into the calculation of the cut level. Otherwise it is going to skew the results in future iterations to still have these outlier refStars in that are going to give the chi2 distribution a higher average and sigma.

If it is not too difficult to make, it would also be interesting to see the histograms of refStar and measuredStar chi2s to see if they have similar distributions and whether it makes sense to have one cut level for both.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those thoughts are good ones, but I think we should investigate them separately, on some larger HSC datasets. I haven't looked at the ref vs. measured chi2 distributions much (I think I did a few years ago): I'm sure it depends strongly on what refcat is used.

If we move to an approach that involves only fitting the model, we'll want to make a deeper investigation of this question. I made a note on DM-30252.

This relates to DM-8046, unifying the astrometry/photometry fitted
component names.
Add parameter number debug log output to astrometry.
Include FittedStar chi2 in "no measuredStars" log message.
This method only makes sense after calling `_iterate_fit` with only
"Distortions", before re-calling it with "DistortionsPositions" to
"finalize" the positions. But it prevents a non-positive definite
matrix situation in that case.

Update test values to reflect more correct chi2 calculations now that
zero-information stars are being removed.
@parejkoj
Copy link
Collaborator Author

parejkoj commented Jun 7, 2021

I've updated it based on our comments, including running cleanFittedStars at the end of minimize(), which altered the chi2 in a few tests. I think this actually makes them more correct, since those stars were not actually contributing any information.

Please take another look. It might be worth looking at it commit-by-commit, since the commits are pretty independent.

@parejkoj parejkoj merged commit 3b2d1f0 into master Jun 10, 2021
@parejkoj parejkoj deleted the tickets/DM-30254 branch June 10, 2021 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants