Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-13851: Speed up ap_verify unit tests #34

Merged
merged 10 commits into from
Jun 28, 2018
2 changes: 1 addition & 1 deletion SConstruct
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# -*- python -*-
from lsst.sconsUtils import scripts
scripts.BasicSConstruct("ap_verify")
scripts.BasicSConstruct("ap_verify", disableCc=True)

6 changes: 3 additions & 3 deletions python/lsst/ap/verify/ingestion.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,12 +195,12 @@ def _ingestRaws(self, dataset, workspace):
self.log.info("Ingesting raw images...")
dataFiles = _findMatchingFiles(dataset.rawLocation, self.config.dataFiles)
if dataFiles:
self._doIngest(workspace.dataRepo, dataFiles, self.config.dataBadFiles)
self._doIngestRaws(workspace.dataRepo, dataFiles, self.config.dataBadFiles)
self.log.info("Images are now ingested in {0}".format(workspace.dataRepo))
else:
raise RuntimeError("No raw files found at %s." % dataset.rawLocation)

def _doIngest(self, repo, dataFiles, badFiles):
def _doIngestRaws(self, repo, dataFiles, badFiles):
"""Ingest raw images into a repository.

``repo`` shall be populated with *links* to ``dataFiles``.
Expand Down Expand Up @@ -350,7 +350,7 @@ def _doIngestDefects(self, repo, calibRepo, defectTarball):
# TODO: clean up implementation after DM-5467 resolved
defectDir = os.path.join(calibRepo, "defects")
with tarfile.open(defectTarball, "r") as opened:
if opened.getNames():
if opened.getnames():
pathlib.Path(defectDir).mkdir(parents=True, exist_ok=True)
opened.extractall(defectDir)
else:
Expand Down
62 changes: 62 additions & 0 deletions python/lsst/ap/verify/testUtils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#
# This file is part of ap_verify.
#
# Developed for the LSST Data Management System.
# This product includes software developed by the LSST Project
# (http://www.lsst.org).
# See the COPYRIGHT file at the top-level directory of this distribution
# for details of code ownership.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#

"""Common code for ap_verify unit tests.
"""

import unittest

import lsst.utils.tests

import lsst.pex.exceptions as pexExcept
from lsst.ap.verify.config import Config


class DataTestCase(lsst.utils.tests.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have this derive from TestCase unless you want it to potentially be picked up as a test. Subclasses that are tests would then derive from this and TestCase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? We used this pattern a lot in afw and didn't run into any problems. The fact that it's in python/ (necessary to ensure it's on the path) should make it immune to that kind of bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the test classes themselves inherit from DataTestCase so I don't think there is a problem. The problem is when you put a class in a test file that looks like a class of tests but isn't really.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that pattern is in afw, I'd have pushed back on it in review. DataTestCase is not a TestCase, it's essentially a mixin for things that are TestCases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that it's a mixin (though maybe that word means something different in Python than in Java); it's a specific type of test case rather than extra functionality.

Anyway, I propose we revisit this later (and maybe the aforementioned afw test utilities at the same time), since this bit of code sharing is not relevant to the actual speed-up work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Maybe I should make a Community post to discuss it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe discuss with Russell first, because lsst.afw.geom.testUtils.TransformTestBaseClass (the "lots" of uses in afw I thought I remembered 😰) was his idea.

"""Unit test class for tests that need to use the Dataset framework.

Unit tests based on this class will search for a designated dataset
(`testDataset`), and skip all tests if the dataset is not available.

Subclasses must call `DataTestCase.setUpClass()` if they override
``setUpClass`` themselves.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Subclasses must call super().setUpClass() ..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: never mind; the encapsulation violations I was worried about are a consequence of how classes work in Python (specifically, the distinction between initialization and construction), not of super() or the MRO. After thinking about it some more, I realized that they will also happen with explicitly named bases.

"""

testDataset = 'ap_verify_testdata'
"""The EUPS package name of the dataset to use for testing (`str`).
"""
datasetKey = 'test'
"""The ap_verify dataset name that would be used on the command line (`str`).
"""

@classmethod
def setUpClass(cls):
try:
lsst.utils.getPackageDir(cls.testDataset)
except pexExcept.NotFoundError:
raise unittest.SkipTest(cls.testDataset + ' not set up')

# Hack the config for testing purposes
# Note that Config.instance is supposed to be immutable, so, depending on initialization order,
# this modification may cause other tests to see inconsistent config values
Config.instance._allInfo['datasets.' + cls.datasetKey] = cls.testDataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worries me. Why do you have to do it this way? Can't you set the config values in an actual Config instance in setUp()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that instead of a Config singleton, each class/function in ap_verify should take a Config object for testing convenience?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that worries me is the "depending on initialization order... see inconsistent config values" bit. It also just feels hacky ("hack the config" afterall).

I absolutely wouldn't advocate the "config object for testing convenience" suggestion you propose. That's definitely clumsy.

Looking at it more, I thought Config was a pex_config type of thing. Instead, it's a manager for those. I'm not entirely sure what its real purpose is, but I guess the above is the way to do what you want with a singleton.

Binary file added tests/ingestion/defects.tar.gz
Binary file not shown.
25 changes: 15 additions & 10 deletions tests/test_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,11 @@

import lsst.utils.tests
import lsst.ap.verify.ap_verify as ap_verify
import lsst.ap.verify.testUtils


class CommandLineTestSuite(lsst.utils.tests.TestCase):
class CommandLineTestSuite(lsst.ap.verify.testUtils.DataTestCase):
# DataTestCase's test dataset is needed for successful parsing of the --dataset argument

def _parseString(self, commandLine, parser=None):
"""Tokenize and parse a command line string.
Expand All @@ -54,21 +56,21 @@ def _parseString(self, commandLine, parser=None):
def testMissingMain(self):
"""Verify that a command line consisting missing required arguments is rejected.
"""
args = '--dataset HiTS2015 --output tests/output/foo'
args = '--dataset %s --output tests/output/foo' % CommandLineTestSuite.datasetKey
with self.assertRaises(SystemExit):
self._parseString(args)

def testMissingIngest(self):
"""Verify that a command line consisting missing required arguments is rejected.
"""
args = '--dataset HiTS2015'
args = '--dataset %s' % CommandLineTestSuite.datasetKey
with self.assertRaises(SystemExit):
self._parseString(args, ap_verify._IngestOnlyParser())

def testMinimumMain(self):
"""Verify that a command line consisting only of required arguments parses correctly.
"""
args = '--dataset HiTS2015 --output tests/output/foo --id "visit=54123"'
args = '--dataset %s --output tests/output/foo --id "visit=54123"' % CommandLineTestSuite.datasetKey
parsed = self._parseString(args)
self.assertIn('dataset', dir(parsed))
self.assertIn('output', dir(parsed))
Expand All @@ -77,30 +79,31 @@ def testMinimumMain(self):
def testMinimumIngest(self):
"""Verify that a command line consisting only of required arguments parses correctly.
"""
args = '--dataset HiTS2015 --output tests/output/foo'
args = '--dataset %s --output tests/output/foo' % CommandLineTestSuite.datasetKey
parsed = self._parseString(args, ap_verify._IngestOnlyParser())
self.assertIn('dataset', dir(parsed))
self.assertIn('output', dir(parsed))

def testRerun(self):
"""Verify that a command line with reruns is handled correctly.
"""
args = '--dataset HiTS2015 --rerun me --id "visit=54123"'
args = '--dataset %s --rerun me --id "visit=54123"' % CommandLineTestSuite.datasetKey
parsed = self._parseString(args)
out = ap_verify._getOutputDir('non_lsst_repo/', parsed.output, parsed.rerun)
self.assertEqual(out, 'non_lsst_repo/rerun/me')

def testRerunInput(self):
"""Verify that a command line trying to redirect input is rejected.
"""
args = '--dataset HiTS2015 --rerun from:to --id "visit=54123"'
args = '--dataset %s --rerun from:to --id "visit=54123"' % CommandLineTestSuite.datasetKey
with self.assertRaises(SystemExit):
self._parseString(args)

def testTwoOutputs(self):
"""Verify that a command line with both --output and --rerun is rejected.
"""
args = '--dataset HiTS2015 --output tests/output/foo --rerun me --id "visit=54123"'
args = '--dataset %s --output tests/output/foo --rerun me --id "visit=54123"' \
% CommandLineTestSuite.datasetKey
with self.assertRaises(SystemExit):
self._parseString(args)

Expand All @@ -114,14 +117,16 @@ def testBadDataset(self):
def testBadKeyMain(self):
"""Verify that a command line with unsupported arguments is rejected.
"""
args = '--dataset HiTS2015 --output tests/output/foo --id "visit=54123" --clobber'
args = '--dataset %s --output tests/output/foo --id "visit=54123" --clobber' \
% CommandLineTestSuite.datasetKey
with self.assertRaises(SystemExit):
self._parseString(args)

def testBadKeyIngest(self):
"""Verify that a command line with unsupported arguments is rejected.
"""
args = '--dataset HiTS2015 --output tests/output/foo --id "visit=54123"'
args = '--dataset %s --output tests/output/foo --id "visit=54123"' \
% CommandLineTestSuite.datasetKey
with self.assertRaises(SystemExit):
self._parseString(args, ap_verify._IngestOnlyParser())

Expand Down
81 changes: 26 additions & 55 deletions tests/test_association.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,24 +22,20 @@
#

import unittest
from unittest.mock import NonCallableMock

import astropy.units as u
import numpy as np
import os
import shutil
import sqlite3
import tempfile

import lsst.daf.persistence as dafPersist
import lsst.afw.geom as afwGeom
import lsst.afw.table as afwTable
from lsst.ap.association import \
make_minimal_dia_source_schema, \
make_minimal_dia_object_schema, \
AssociationDBSqliteTask, \
AssociationDBSqliteConfig, \
AssociationTask
import lsst.obs.test as obsTest
import lsst.pipe.base as pipeBase
import lsst.utils.tests
from lsst.verify import Measurement
Expand Down Expand Up @@ -119,14 +115,6 @@ def setUp(self):
self.assocTask = AssociationTask()

# Create a empty butler repository and put data in it.
self.testDir = tempfile.mkdtemp(
dir=ROOT, prefix="TestAssocMeasurements-")
outputRepoArgs = dafPersist.RepositoryArgs(
root=os.path.join(self.testDir, 'repoA'),
mapper=obsTest.TestMapper,
mode='rw')
self.butler = dafPersist.Butler(
outputs=outputRepoArgs)
self.numTestSciSources = 10
self.numTestDiaSources = 5
testSources = createTestPoints(
Expand All @@ -135,42 +123,34 @@ def setUp(self):
testDiaSources = createTestPoints(
pointLocsDeg=[[idx, idx] for idx in
range(self.numTestDiaSources)])
self.butler.put(obj=testSources,
datasetType='src',
dataId=dataIdDict)
self.butler.put(obj=testDiaSources,
datasetType='deepDiff_diaSrc',
dataId=dataIdDict)

(self.tmpFile, self.dbFile) = tempfile.mkstemp(
dir=os.path.dirname(__file__))
assocDbConfig = AssociationDBSqliteConfig()
assocDbConfig.db_name = self.dbFile
assocDbConfig.filter_names = ['r']
assocDb = AssociationDBSqliteTask(config=assocDbConfig)
assocDb.create_tables()

# Fake Butler to avoid initialization and I/O overhead
def mockGet(datasetType, dataId=None):
"""An emulator for `lsst.daf.persistence.Butler.get` that can only handle test data.
"""
# Check whether dataIdDict is a subset of dataId
if dataIdDict.items() <= dataId.items():
if datasetType == 'src':
return testSources
elif datasetType == 'deepDiff_diaSrc':
return testDiaSources
raise dafPersist.NoResults("Dataset not found:", datasetType, dataId)
self.butler = NonCallableMock(spec=dafPersist.Butler, get=mockGet)

self.numTestDiaObjects = 5
diaObjects = createTestPoints(
self.diaObjects = createTestPoints(
pointLocsDeg=[[idx, idx] for idx in
range(self.numTestDiaObjects)],
schema=make_minimal_dia_object_schema(['r']))
for diaObject in diaObjects:
for diaObject in self.diaObjects:
diaObject['nDiaSources'] = 1
assocDb.store_dia_objects(diaObjects, True)
assocDb.close()

def tearDown(self):
del self.assocTask

if os.path.exists(self.testDir):
shutil.rmtree(self.testDir)
if hasattr(self, "butler"):
del self.butler

del self.tmpFile
os.remove(self.dbFile)

def testValidFromMetadata(self):
"""Verify that association information can be recovered from metadata.
"""
Expand Down Expand Up @@ -230,7 +210,7 @@ def testValidFromButler(self):
self.assertEqual(
meas.metric_name,
lsst.verify.Name(metric='ip_diffim.numSciSrc'))
self.assertEqual(meas.quantity, 10 * u.count)
self.assertEqual(meas.quantity, self.numTestSciSources * u.count)

meas = measureFractionDiaSourcesToSciSources(
self.butler,
Expand All @@ -240,12 +220,13 @@ def testValidFromButler(self):
self.assertEqual(
meas.metric_name,
lsst.verify.Name(metric='ip_diffim.fracDiaSrcToSciSrc'))
# We put in half the number of DIASources as detected sources.
self.assertEqual(meas.quantity, 0.5 * u.dimensionless_unscaled)
self.assertEqual(meas.quantity,
self.numTestDiaSources / self.numTestSciSources * u.dimensionless_unscaled)

def testValidFromSqlite(self):
conn = sqlite3.connect(self.dbFile)
cursor = conn.cursor()
# Fake DB handle to avoid DB initialization overhead
cursor = NonCallableMock(spec=sqlite3.Cursor)
cursor.fetchall.return_value = [(len(self.diaObjects),)]

meas = measureTotalUnassociatedDiaObjects(
cursor,
Expand All @@ -255,7 +236,7 @@ def testValidFromSqlite(self):
meas.metric_name,
lsst.verify.Name(
metric='association.numTotalUnassociatedDiaObjects'))
self.assertEqual(meas.quantity, 5 * u.count)
self.assertEqual(meas.quantity, self.numTestDiaObjects * u.count)

def testNoButlerData(self):
""" Test attempting to create a measurement with data that the butler
Expand Down Expand Up @@ -298,17 +279,6 @@ def testMetadataNotCreated(self):
"association.numNewDIAObjects")
self.assertIsNone(meas)

def testInvalidDb(self):
""" Test that the measurement raises the correct error when given an
improper database.
"""
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()
with self.assertRaises(sqlite3.OperationalError):
measureTotalUnassociatedDiaObjects(
cursor,
metricName='association.numTotalUnassociatedDiaObjects')

def testNoMetric(self):
"""Verify that trying to measure a nonexistent metric fails.
"""
Expand Down Expand Up @@ -337,8 +307,9 @@ def testNoMetric(self):
self.butler, dataId=dataIdDict,
metricName='foo.bar.FooBar')

conn = sqlite3.connect(self.dbFile)
cursor = conn.cursor()
# Fake DB handle to avoid DB initialization overhead
cursor = NonCallableMock(spec=sqlite3.Cursor)
cursor.fetchall.return_value = [(0,)]
with self.assertRaises(TypeError):
measureTotalUnassociatedDiaObjects(
cursor, metricName='foo.bar.FooBar')
Expand Down
18 changes: 4 additions & 14 deletions tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,28 +27,18 @@
import unittest

import lsst.utils.tests
import lsst.pex.exceptions as pexExcept
from lsst.ap.verify.config import Config
from lsst.ap.verify.dataset import Dataset
from lsst.ap.verify.testUtils import DataTestCase


class DatasetTestSuite(lsst.utils.tests.TestCase):
class DatasetTestSuite(DataTestCase):

@classmethod
def setUpClass(cls):
cls.testDataset = 'ap_verify_testdata'
cls.datasetKey = 'test'
super().setUpClass()

cls.obsPackage = 'obs_test'
cls.camera = 'test'
try:
lsst.utils.getPackageDir(cls.testDataset)
except pexExcept.NotFoundError:
raise unittest.SkipTest(cls.testDataset + ' not set up')

# Hack the config for testing purposes
# Note that Config.instance is supposed to be immutable, so, depending on initialization order,
# this modification may cause other tests to see inconsistent config values
Config.instance._allInfo['datasets.' + cls.datasetKey] = cls.testDataset

def setUp(self):
self._testbed = Dataset(DatasetTestSuite.datasetKey)
Expand Down