New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-30254: Fix jointcal crash when doing outlier rejection on only the model #182
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks good, but I have a few concerns with the new variable names and with the refStar treatment, as described in my inline comments. It might make sense to run cleanFittedStars at the end of _iterate_fit automatically so that in the higher level functions one doesn't have to think about whether outlier rejection is turned on in _iterate_fit.
src/FitterBase.cc
Outdated
return chi2; | ||
} | ||
|
||
std::size_t FitterBase::findOutliers(double nSigmaCut, MeasuredStarList &msOutliers, | ||
FittedStarList &fsOutliers) const { | ||
// collect chi2 contributions | ||
Chi2List chi2List; | ||
chi2List.reserve(_nMeasuredStars + _associations->refStarList.size()); | ||
chi2List.reserve(_associations->refStarList.size()); // TODO: do we need to reserve anything? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a great sense of whether anything needs to be reserved here. However, I don't understand changing this. Isn't the final size of chi2List going to be determined by accumulateStatImageList
and accumulateStatRefStars
below? Neither of these functions is being changed here, so why change the reserve
line? I assume that _nMeasuredStars
is much bigger than _associations->refStarList.size()
, so I am not sure that the reserve
command will be useful without including _nMeasuredStars
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_nMeasuredStars
was always zero (it must have been a placeholder for something, but was never implemented), so it wasn't doing anything here.
I've added Associations._maxMeasuredStars
, calculated at initialization, so we can reserve a maximally-sized vector. It'll be bigger than necessary after outlier removal, but I think that's not a big problem, since this chi2 list is a tiny part of the memory footprint (it holds a pointer and a chi2 value for each star). Reserving space here means that neither of the accumulate
methods need to ever resize the vector.
LOGLS_TRACE(_log, "Outlier refStar not removed (not fitting fittedStar/refStar component) " | ||
<< *(fittedStar->getRefStar()) << " chi2: " << chi2->chi2); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something like "refStar is outlier but not removed when not fitting Values parameters"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a deeper level though, I think that if refStars are not removed here, they also probably shouldn't go into the calculation of the cut level. Otherwise it is going to skew the results in future iterations to still have these outlier refStars in that are going to give the chi2 distribution a higher average and sigma.
If it is not too difficult to make, it would also be interesting to see the histograms of refStar and measuredStar chi2s to see if they have similar distributions and whether it makes sense to have one cut level for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those thoughts are good ones, but I think we should investigate them separately, on some larger HSC datasets. I haven't looked at the ref vs. measured chi2 distributions much (I think I did a few years ago): I'm sure it depends strongly on what refcat is used.
If we move to an approach that involves only fitting the model, we'll want to make a deeper investigation of this question. I made a note on DM-30252.
b6e367e
to
ccc2bc3
Compare
This relates to DM-8046, unifying the astrometry/photometry fitted component names. Add parameter number debug log output to astrometry.
Include FittedStar chi2 in "no measuredStars" log message.
This method only makes sense after calling `_iterate_fit` with only "Distortions", before re-calling it with "DistortionsPositions" to "finalize" the positions. But it prevents a non-positive definite matrix situation in that case. Update test values to reflect more correct chi2 calculations now that zero-information stars are being removed.
ccc2bc3
to
090b10a
Compare
I've updated it based on our comments, including running cleanFittedStars at the end of minimize(), which altered the chi2 in a few tests. I think this actually makes them more correct, since those stars were not actually contributing any information. Please take another look. It might be worth looking at it commit-by-commit, since the commits are pretty independent. |
The new code here is not exercised by any existing tests, and it's not something I can readily mock up with a new test. It does allow us to try out an alternate fitting approach in DM-30252, and I hope that ticket produces a test case that uses this code.