New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-11040: Implement persistence of DIAObjects and DIASources #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see a bunch of changes, ranging from better use of the DB (most important) to typos (least important). In addition to the specific comments below, I've a few general remarks.
- Please think about which methods of
AssociationDBSqliteTask
should be public (i.e., intended for use byAssociationTask
or another class) and which should be private. At the moment nearly everything is public, giving a very confusing API where it's not clear what's the correct way to do something. - I don't see a clear policy on how transactions are handled -- there's a call to
_db_connection.commit
increate_tables
,store
, andstore_updated
, but notload
or any of the other methods (which may or may not be intended to be called externally; see my previous remark), nor do I see anything that begins a transaction. Organizing updates into coherent commits will make it much easier to parallelize source association (and may also give a performance boost).
|
||
afw_to_db_types = { | ||
"L": "INTEGER", | ||
"Angle": "REAL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conversion could be potentially dangerous, especially if you need to give Angle
s special treatment on input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this and the converse of afw SourceRecord objects into it's own class to better define the conversions.
import lsst.pex.config as pexConfig | ||
import lsst.pipe.base as pipeBase | ||
from .dia_collection import * | ||
from .dia_object import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import *
is discouraged. How much stuff in these two modules does AssociationDBSqliteTask
need to know about?
\anchor AssociationDBSqliteConfig_ | ||
|
||
\brief Configuration parameters for the AssociationDBSqliteTask | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the dev guide when writing docstrings. Use of Doxygen (and \anchor
, of all things?) is not encouraged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed all docstrings to python docs.
functions as a testing ground for the L1 database and should mimic this | ||
database's eventual functionaly. This specific database implementation is | ||
useful for the verification packages which may not be run with access to | ||
L1 database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, doesn't this sort of thing belong in RST files in doc/
? Sphinx can do things like tables of contents for you, too.
indexer = IndexerRegistry.makeField( | ||
doc='Select the spatial indexer to use within the database.', | ||
default='HTM' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an odd thing to make configurable. Can you explain why you'd want the indexer to be user-specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm copying a bit of the configuration from Ingest/LoadIndexedReferenceTask which does allow the indexer to be a configurable. I think there is no need to hard code a specific indexer here especially since the standard in the rest of the stack seems to make this a configurable. One thing that I may want this it is a configurable, though, is force a check on the task init that the indexer specified in the config is the same as the one used on creation of the database (and also remove all the references to HTM). I'll work on this for the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does sound like a reasonable approach if you can make the queries indexer-agnostic.
(I guess I'm used to spatial indices having more infrastructure than just an indexer_id
column, so that caught me off guard. Please disregard my earlier question about whether your DB uses spatial indexing.)
tests/test_association_db_sqlite.py
Outdated
dia_collection = DIAObjectCollection(dia_objects) | ||
assoc_db = AssociationDBSqliteTask() | ||
assoc_db.create_tables() | ||
assoc_db.store(dia_collection, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test cases all have a lot of common setup; factor it into either setUp()
or a method to be called explicitly.
tests/test_association_db_sqlite.py
Outdated
dia_collection.dia_objects[obj_idx].n_dia_sources, 1) | ||
for src_id, src_record in enumerate( | ||
dia_collection.dia_objects[obj_idx].dia_source_catalog): | ||
self.assertEqual(src_record.getId(), src_id + obj_idx * 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test that the records themselves are valid (e.g., value of some statistic field)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a method in the unittests for doing comparisons between any given source record on a Field by Field basis.
tests/test_association_db_sqlite.py
Outdated
for obj_id in range(2): | ||
for src_id in range(5): | ||
assoc_db.store_dia_object_source_pair( | ||
obj_id, src_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please verify that the object and source are actually associated with each other.
tests/test_association_db_sqlite.py
Outdated
assoc_db.create_tables() | ||
|
||
assoc_db.store_dia_object(dia_objects[0]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please verify that the object's info can be recovered.
tests/test_association_db_sqlite.py
Outdated
|
||
assoc_db.store_dia_source(dia_sources[0]) | ||
|
||
assoc_db.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please verify that the source's info can be recovered.
be734d4
to
3bcd683
Compare
@kfindeisen wrote (on a commit that has disappeared):
The transition is as follows: replace Wcs with SkyWcs on DM-10765. That is my highest priority right now, but may take another 2 weeks or so. Then replace all |
@r-owen Okay, then right now it may be best for me to keep the Coord around as I rely on the methods from another package that still uses Coord. I've made ticket DM-11868 to reflect this work. |
tests/test_association_db_sqlite.py
Outdated
output_dia_collection = assoc_db.load( | ||
ctr_point, afwGeom.Angle(0.2)) | ||
|
||
for obj_idx in xrange(5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you run this on Python 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. Sorry about the xrange, Tim. Force of habit.
tests/test_association_db_sqlite.py
Outdated
|
||
output_dia_objects = assoc_db.get_dia_objects(indexer_ids) | ||
|
||
for obj_idx in xrange(2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix.
tests/test_association_db_sqlite.py
Outdated
|
||
assoc_db.get_dia_object_records(indexer_ids) | ||
|
||
for obj_idx in xrange(2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix.
tests/test_association_db_sqlite.py
Outdated
|
||
class TestAssociationDBSqlite(unittest.TestCase): | ||
|
||
def setup(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really want empty methods here. They serve no purpose other than adding cruft, and this one in particular is no good because it should be setUp()
not setup()
.
Finished responses to first review pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better. I still have some concerns about the test coverage and some SQL issues, but most of the comments are typos or style issues. (Speaking of which, do you have something that can selectively spell-check comments and docstrings?)
|
||
class SqliteDBConverter(object): | ||
""" Class for defining conversions to and from an sqlite database and | ||
afw SourceReocrd objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: SourceRecord
""" | ||
return self._schema | ||
|
||
def make_table_from_afw_shcema(self, table_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: schema
if sub_schema.getField().getTypeString() == 'Angle': | ||
output_source_record.set( | ||
sub_schema.getKey(), | ||
afwGeom.Angle(value * afwGeom.degrees)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Department of redundancy department: value * afwGeom.degrees
is already an Angle
.
|
||
Parameters | ||
---------- | ||
soruce_record : afw.table.SourceRecord |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: source
useful for the verification packages which may not be run with access to | ||
L1 database. | ||
|
||
Attriburtes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Attributes
tests/test_association_db_sqlite.py
Outdated
n_objects=1, n_sources=1, start_id=1) | ||
dia_collection.append(new_dia_object[0]) | ||
dia_collection.update_dia_objects() | ||
dia_collection.update_spatial_tree() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider testing what happens if you update an existing object.
tests/test_association_db_sqlite.py
Outdated
dia_collection.update_dia_objects() | ||
dia_collection.update_spatial_tree() | ||
|
||
self.assoc_db.store_updated(dia_collection, [1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that the final DB state is correct.
tests/test_association_db_sqlite.py
Outdated
|
||
for record_a, record_b in zip( | ||
output_dia_objects[obj_idx].dia_source_catalog, | ||
dia_collection.dia_objects[obj_idx].dia_source_catalog): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, can you safely assume order is preserved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changedthis to a similar sort on id as before.
tests/test_association_db_sqlite.py
Outdated
def _store_and_retrieve_source_record(self, | ||
source_record, | ||
converter): | ||
""" Convinience method for round tripping a soruce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
self._db_cursor.execute( | ||
"CREATE TABLE dia_objects_to_dia_sources (" | ||
"src_id INTEGER, " | ||
"obj_id INTEGER, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed you got rid of the primary key entirely. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an oversight on my part when I added the FOREIGN KEY constraint. I have added the PRIMARY KEY back.
dia_collection.py: Finished first pass implimentation Added association.py Fixed bugs in dia_collection and unittest. Passes most of unittests. Need to edit score and match both within test and class. Finished unittest and matching implementation. Added initial pass at Association and database tasks. Added comments throughout. Started unittests. Added unittest skeleton for association_db_sqlite.py Changed indexing to meas_algorithms indexRegistry from sphgeom. Implemented init and create_db unittests Added PRIMARY KEYs to create_tables method. Created intial tests of the database storage. Debugged unittest for store methods.
Finished first working version of AssociationDB Added get_dia_object_records method.
Changed sqlite queries from individual to "IN". Need to account for queries that could require greater than 127 variables in the query. Added initial batch db queries. Need to limit queries to 127 variables at most. Implemented batch queries to get around this limit. Completed linting pass. Responded to some reviewer comments Stashing current commits to rebase to master. Implemented a subset of reviewer suggested changes.
Still need to confirm new API and unittests pass. Cleaned up unittests. Unittests now run to completion. Need to finialize some tests and remove rest of "magic numbers" Removed more magic numbers from unittest Cleaned up usage of _compare_source_records Finished reponses to reviewer. Committing pre-unittested responses to reviewer. Debuged unittests and finalized review respones.
b2ab9e0
to
48a7b02
Compare
No description provided.