Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-8951: Add TE1 and TE2 KPMs to validate_drp #31

Merged
merged 42 commits into from Sep 8, 2017
Merged

Conversation

wmwv
Copy link
Contributor

@wmwv wmwv commented Feb 3, 2017

No description provided.

@wmwv wmwv force-pushed the tickets/DM-8951 branch 5 times, most recently from 0d8020a to d73699e Compare July 31, 2017 20:57
@wmwv wmwv force-pushed the tickets/DM-8951 branch 2 times, most recently from 769dea2 to 3b1c927 Compare August 5, 2017 00:37
Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. Mostly missing or unclear docstrings, and some tests that should be added.

One of the questions relates to execution speed, and you can feel free to tell me that it's irrelevant in the context of validate_drp.

etc/metrics.yaml Outdated
description: Separation distance
bin_range_operator:
value: "<="
description: Are we looking at correlations less than D or greater than D.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be phrased as a question?

etc/metrics.yaml Outdated
description: Separation distance
bin_range_operator:
value: ">="
description: Are we looking at correlations less than D or greater than D.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same phrasing question as above.

@@ -8,3 +8,4 @@
from .amx import AMxMeasurement # NOQA
from .afx import AFxMeasurement # NOQA
from .adx import ADxMeasurement # NOQA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need # NOQA here? I thought we just ignored the flake8 comments on __init__.py since it's a weird file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#NOQA for from .tex is in keeping with previous practice for these SRD metrics. Happy to consider overall #NOQA use in future ticket.

----------
metric : `lsst.validate.base.Metric`
An TE1 or TE2 `~lsst.validate.base.Metric` instance.
matchedDataset : lsst.validate.drp.matchreduce.MatchedMultiVisitDataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matchedDataset of what?

filter_name : `str`
filter_name (filter name) used in this measurement (e.g., ``'r'``).
width : `float` or `astropy.units.Quantity`, optional
Width around fiducial distance to include. [arcmin]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

width doesn't appear in the argument list.

@@ -140,6 +148,32 @@ def sphDist(ra1, dec1, ra2, dec2):
return dist


def averageRaFromCat(cat):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though validate_drp isn't a particularly time-critical operation, I do worry that your use of these "only RA" , "only Dec" functions is slow, given you're throwing away the other average. I see you're using it in the aggregator for TEx: how many values are typically input?

return meanRa


def averageDecFromCat(cat):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least a one-line docstring for all of these new functions.

outputPrefix=outputPrefix)
except RuntimeError as e:
print(e)
print('\tSkipped plot{}'.format(texName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be using the lsst.log system to log these errors?

self.assertFloatsAlmostEqual(exp_all_avg_xip, obs_all_avg_xip, rtol=1e-7)
self.assertFloatsAlmostEqual(exp_all_avg_xip_err, obs_all_avg_xip_err, rtol=1e-7)

exp_rlt2_avg_xip, exp_rlt2_avg_xip_err = 2.8171429e-05, 1.2857143e-06
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a note about how these numbers were arrived at.

select_bin_from_corr(r, xip, xip_err, radius=1, operator=operator.ge)
self.assertFloatsAlmostEqual(exp_rge1_avg_xip, obs_rge1_avg_xip, rtol=1e-7)
self.assertFloatsAlmostEqual(exp_rge1_avg_xip_err, obs_rge1_avg_xip_err, rtol=1e-7)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also want some tests for your correlation_function_ellipticity* functions, and probably for the added utils functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Tests added for correlation_function_ellipticity.

The utility functions that require a MatchedMultiVisitDataset object, a catalog, or ashape object are much more complicated to mock up than seems justified given the simplicity of the functions.

print("EXP_xip_err: ", repr(exp_xip_err))
print("OBS_xip_err: ", repr(obs_xip_err))
self.assertFloatsAlmostEqual(exp_r, obs_r, atol=1e-4 * u.arcmin)
self.assertFloatsAlmostEqual(exp_xip, obs_xip)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, you're expecting the correlation function to be identically 0 everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry. The tests are a work in progress that I started after submitting the ticket for review and realizing that I needed some tests. I'm hoping to have them fully implemented by Monday.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry! Carry on then. I won't look at things until you tell me to.

@wmwv wmwv force-pushed the tickets/DM-8951 branch 2 times, most recently from 403218b to 281eb19 Compare August 9, 2017 13:31
Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments on the new code. I like your approach to the ellipticity tests.

self.assertFloatsAlmostEqual(exp_rlt2_avg_xip, obs_rlt2_avg_xip, rtol=1e-7)
self.assertFloatsAlmostEqual(exp_rlt2_avg_xip_err, obs_rlt2_avg_xip_err, rtol=1e-7)

exp_rge1_avg_xip, exp_rge1_avg_xip_err = 4.3400000e-05, 1.8000000e-06
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note about where these numbers come from.

Which in turn references
http://adsabs.harvard.edu/abs/2002A%26A...389..729S
"""
rand_seed = 1238625876
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this choice for random seed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arbitrary.

random.seed(rand_seed)
# Yes, a million. N this high and L this large
# gets xip within 1e-7 absolute of the analytic limit.
# Takes 25 seconds to run on an early-2015 MacBook Air 2.2 GHz Intel Core i7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful comment.

"""
# Translate to 'verbose_level' here to refer to the integer levels in TreeCorr
# While 'verbose' is more generically what is being passed around
# for verbosity within 'validate_drp'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the extra spaces at the start here?

xip = gg.xip * u.Unit('')
xip_err = np.sqrt(gg.varxi) * u.Unit('')

return (r, xip, xip_err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to return a namedtuple, so the user doesn't have to keep track of which index is what thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, but I've been generically discouraged from using namedtuple in favor of pipe_base struct. But I don't want to invoke pipe_base struct here.

@@ -552,3 +552,62 @@ def plotAMx(amx, afx, filterName, amxSpecName='design', outputPrefix=""):
plt.tight_layout() # fix padding
plt.savefig(plotPath, dpi=300)
plt.close(fig)


def plotTEx(tex, filterName, texSpecName='design', outputPrefix=''):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't this be plot_TEx?


ax1.legend(loc='upper right', fontsize=16)

pathFormat = '{prefix}{metric}_D_{D:d}_{Dunits}.{ext}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. A comment to that effect would be good (prefix suggests directory to me).

This is based on Bob Armstrong's ellipticity calculation.
done in DM-3040
Remove PLOTting hardcoded stuff from tex.py
Comment TE plot lines.
Minor whitespace edits.
This matches the log radius binning that was used when computing the
correlation function in the first place.
Refactoring initially motivated by desire to make it easier to test the key algorithm
without having to construct a MatchedMultiVisitDataset object.
Also may allow for future external use from more flexible/direct interface.
Use example test case from TreeCorr to verify wrapper function.
TreeCorr itself has much more extensive tests.
The test here in validate_drp is to make sure that we're calling TreeCorr
correctly in correlate_function_ellipticity and, e.g., getting units correct.

Also adds a simple test of select_bin_from_corr.
Make nbins, min_sep, max_sep, verbose keyword arguments
passed through to treecorr.GGCorrelation
There's a significant performance penalty in using str keys
when looking things up in AFW tables.
Translate verbose=True to Treecorr verbose=2.
Factor of 50 (yes, fifty) increase in performance.
Previously was 'for i, s in unmerate(cat)'
Now use extracts entire column at a time.
'calculate_' wasn't a particularly informative prefix.
Most functions that return something calculate something.
Had had these reversed from the start of this branch.
Quantities within machine precision of 0 can trigger NaN in numpy.sqrt.
Use numpy.lib.scimath.sqrt in complex ellipticity calculation to
handle this cleanly.
Also clarify that a random seed in the tests was arbitrarily chosen.
@wmwv wmwv merged commit 6e851d0 into master Sep 8, 2017
@ktlim ktlim deleted the tickets/DM-8951 branch August 25, 2018 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants