Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-11040: Implement persistence of DIAObjects and DIASources #4

Merged
merged 4 commits into from Sep 21, 2017

Conversation

morriscb
Copy link
Contributor

No description provided.

Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a bunch of changes, ranging from better use of the DB (most important) to typos (least important). In addition to the specific comments below, I've a few general remarks.

  1. Please think about which methods of AssociationDBSqliteTask should be public (i.e., intended for use by AssociationTask or another class) and which should be private. At the moment nearly everything is public, giving a very confusing API where it's not clear what's the correct way to do something.
  2. I don't see a clear policy on how transactions are handled -- there's a call to _db_connection.commit in create_tables, store, and store_updated, but not load or any of the other methods (which may or may not be intended to be called externally; see my previous remark), nor do I see anything that begins a transaction. Organizing updates into coherent commits will make it much easier to parallelize source association (and may also give a performance boost).


afw_to_db_types = {
"L": "INTEGER",
"Angle": "REAL"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion could be potentially dangerous, especially if you need to give Angles special treatment on input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this and the converse of afw SourceRecord objects into it's own class to better define the conversions.

import lsst.pex.config as pexConfig
import lsst.pipe.base as pipeBase
from .dia_collection import *
from .dia_object import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import * is discouraged. How much stuff in these two modules does AssociationDBSqliteTask need to know about?

\anchor AssociationDBSqliteConfig_

\brief Configuration parameters for the AssociationDBSqliteTask
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the dev guide when writing docstrings. Use of Doxygen (and \anchor, of all things?) is not encouraged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all docstrings to python docs.

functions as a testing ground for the L1 database and should mimic this
database's eventual functionaly. This specific database implementation is
useful for the verification packages which may not be run with access to
L1 database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, doesn't this sort of thing belong in RST files in doc/? Sphinx can do things like tables of contents for you, too.

indexer = IndexerRegistry.makeField(
doc='Select the spatial indexer to use within the database.',
default='HTM'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an odd thing to make configurable. Can you explain why you'd want the indexer to be user-specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm copying a bit of the configuration from Ingest/LoadIndexedReferenceTask which does allow the indexer to be a configurable. I think there is no need to hard code a specific indexer here especially since the standard in the rest of the stack seems to make this a configurable. One thing that I may want this it is a configurable, though, is force a check on the task init that the indexer specified in the config is the same as the one used on creation of the database (and also remove all the references to HTM). I'll work on this for the next commit.

Copy link
Member

@kfindeisen kfindeisen Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does sound like a reasonable approach if you can make the queries indexer-agnostic.

(I guess I'm used to spatial indices having more infrastructure than just an indexer_id column, so that caught me off guard. Please disregard my earlier question about whether your DB uses spatial indexing.)

dia_collection = DIAObjectCollection(dia_objects)
assoc_db = AssociationDBSqliteTask()
assoc_db.create_tables()
assoc_db.store(dia_collection, True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases all have a lot of common setup; factor it into either setUp() or a method to be called explicitly.

dia_collection.dia_objects[obj_idx].n_dia_sources, 1)
for src_id, src_record in enumerate(
dia_collection.dia_objects[obj_idx].dia_source_catalog):
self.assertEqual(src_record.getId(), src_id + obj_idx * 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test that the records themselves are valid (e.g., value of some statistic field)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a method in the unittests for doing comparisons between any given source record on a Field by Field basis.

for obj_id in range(2):
for src_id in range(5):
assoc_db.store_dia_object_source_pair(
obj_id, src_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify that the object and source are actually associated with each other.

assoc_db.create_tables()

assoc_db.store_dia_object(dia_objects[0])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify that the object's info can be recovered.


assoc_db.store_dia_source(dia_sources[0])

assoc_db.commit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify that the source's info can be recovered.

@r-owen
Copy link
Contributor

r-owen commented Sep 8, 2017

@kfindeisen wrote (on a commit that has disappeared):

Wcs is being replaced by SkyWcs , which will return SpherePoint (though it currently does not). The names are a bit different: separation instead of angularSeparation.

Maybe @r-owen can comment on how the transition will look?

The transition is as follows: replace Wcs with SkyWcs on DM-10765. That is my highest priority right now, but may take another 2 weeks or so. Then replace all Coord classes with SpherePoint on DM-11162. I expect that to go fairly quickly.

@morriscb
Copy link
Contributor Author

morriscb commented Sep 8, 2017

@r-owen Okay, then right now it may be best for me to keep the Coord around as I rely on the methods from another package that still uses Coord. I've made ticket DM-11868 to reflect this work.

output_dia_collection = assoc_db.load(
ctr_point, afwGeom.Angle(0.2))

for obj_idx in xrange(5):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you run this on Python 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. Sorry about the xrange, Tim. Force of habit.


output_dia_objects = assoc_db.get_dia_objects(indexer_ids)

for obj_idx in xrange(2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix.


assoc_db.get_dia_object_records(indexer_ids)

for obj_idx in xrange(2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix.


class TestAssociationDBSqlite(unittest.TestCase):

def setup(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want empty methods here. They serve no purpose other than adding cruft, and this one in particular is no good because it should be setUp() not setup().

@morriscb
Copy link
Contributor Author

Finished responses to first review pass.

Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better. I still have some concerns about the test coverage and some SQL issues, but most of the comments are typos or style issues. (Speaking of which, do you have something that can selectively spell-check comments and docstrings?)


class SqliteDBConverter(object):
""" Class for defining conversions to and from an sqlite database and
afw SourceReocrd objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: SourceRecord

"""
return self._schema

def make_table_from_afw_shcema(self, table_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: schema

if sub_schema.getField().getTypeString() == 'Angle':
output_source_record.set(
sub_schema.getKey(),
afwGeom.Angle(value * afwGeom.degrees))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Department of redundancy department: value * afwGeom.degrees is already an Angle.


Parameters
----------
soruce_record : afw.table.SourceRecord
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: source

useful for the verification packages which may not be run with access to
L1 database.

Attriburtes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Attributes

n_objects=1, n_sources=1, start_id=1)
dia_collection.append(new_dia_object[0])
dia_collection.update_dia_objects()
dia_collection.update_spatial_tree()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider testing what happens if you update an existing object.

dia_collection.update_dia_objects()
dia_collection.update_spatial_tree()

self.assoc_db.store_updated(dia_collection, [1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that the final DB state is correct.


for record_a, record_b in zip(
output_dia_objects[obj_idx].dia_source_catalog,
dia_collection.dia_objects[obj_idx].dia_source_catalog):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, can you safely assume order is preserved?

Copy link
Contributor Author

@morriscb morriscb Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changedthis to a similar sort on id as before.

def _store_and_retrieve_source_record(self,
source_record,
converter):
""" Convinience method for round tripping a soruce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

self._db_cursor.execute(
"CREATE TABLE dia_objects_to_dia_sources ("
"src_id INTEGER, "
"obj_id INTEGER, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you got rid of the primary key entirely. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an oversight on my part when I added the FOREIGN KEY constraint. I have added the PRIMARY KEY back.

dia_collection.py: Finished first pass implimentation
Added association.py

Fixed bugs in dia_collection and unittest.

Passes most of unittests. Need to edit score and match both within
test and class.

Finished unittest and matching implementation.

Added initial pass at Association and database tasks.

Added comments throughout. Started unittests.

Added unittest skeleton for association_db_sqlite.py

Changed indexing to meas_algorithms indexRegistry from sphgeom.

Implemented init and create_db unittests

Added PRIMARY KEYs to create_tables method.

Created intial tests of the database storage.

Debugged unittest for store methods.
Finished first working version of AssociationDB

Added get_dia_object_records method.
Changed sqlite queries from individual to "IN".

Need to account for queries that could require greater than
127 variables in the query.

Added initial batch db queries.

Need to limit queries to 127 variables at most.
Implemented batch queries to get around this limit.

Completed linting pass.

Responded to some reviewer comments

Stashing current commits to rebase to master.

Implemented a subset of reviewer suggested changes.
Still need to confirm new API and unittests pass.

Cleaned up unittests.

Unittests now run to completion.
Need to finialize some tests and remove rest of "magic numbers"

Removed more magic numbers from unittest

Cleaned up usage of _compare_source_records

Finished reponses to reviewer.

Committing pre-unittested responses to reviewer.

Debuged unittests and finalized review respones.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants