New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-14428: Spatial joins tables and views #55
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TallJimbo, I'm not completely done reviewing yet, but have some comments you may want to start looking at while I'm finishing my part.
@@ -69,7 +69,8 @@ def setConfigRoot(cls, root, config, full): | |||
from defaults when Butler instances are constructed | |||
should be copied from `full` to `Config`. | |||
""" | |||
raise NotImplementedError() | |||
for key in ("registry.skypix.name", "registry.skypix.level"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be registry.skypix.cls
instead of registry.skypix.name
?
config/schema.yaml
Outdated
type: int | ||
primary_key: true | ||
nullable: false | ||
doc: visit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sensor ID?
config/schema.yaml
Outdated
doc: visit | ||
- | ||
name: region | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type is string
in this table but blob
in Tract/Patch tables. Should we instead introduce special region
column type and use type magic name rather than column magic name in Python code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made it a blob here too. Is there a way to use magic Python conversions in a way that is truly RDBMS-independent? (Note I'd like to avoid even exposing that we're using SQLAlchemy publicly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need magic here, we are reading this schema ourselves and transform it into backed-specific schema so we have complete freedom. My worry is that using low-level (physical) type here instead of logical (region
) we limit ourselves to a particular choice of storage type. I can imagine for some backends BLOB may be too inefficient and you may want to store regions (which should be relatively short strings of bytes) in more efficient storage type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I just misunderstood what you were suggesting. I'll go ahead and make that change (while still mapping region
columns to sqlalchemy.LargeBinary
for now).
this Visit+Sensor combination. This is expected to be more | ||
precise than the SkyPixels associated with the Visit+Sensor (see | ||
VisitSensorSkyPixJoin), but may still be larger than the true | ||
region as long as it fully covers it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we describe string representation for regions anywhere or do we depend on sphgeom for it? If latter them maybe add couple of words here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do rely on sphgeom for this. I'll add some docs.
config/schema.yaml
Outdated
foreignKeys: | ||
- | ||
src: camera | ||
tgt: Camera.camera |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this foreign key is useful at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed redundant with other foreign key constraints. I'll remove it.
@@ -270,7 +312,7 @@ def _initDataUnitNames(self, config): | |||
Parameters | |||
---------- | |||
config : `SchemaConfig` | |||
Schema configuration describing `DataUnit` relations. | |||
The `DataUnit` component of a `SchemaConfig`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataUnits (or dataUnits)?
config : `SchemaConfig` | ||
Schema configuration describing `DataUnit` relations. | ||
config : `Config` | ||
The `DataUnit` component of a `SchemaConfig`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too.
if 'tables' in dataUnitJoinDescription and builder is not None: | ||
for tableName, tableDescription in dataUnitJoinDescription['tables'].items(): | ||
table = builder.addTable(tableName, tableDescription) | ||
isView = "sql" in tableDescription |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could "sql" in YAML potentially mean something else than view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, yes (see ExposureRangeJoin
), inside a tables
section it always means a view.
@@ -115,5 +116,7 @@ def fromConfig(registryConfig, schemaConfig=None, create=False): | |||
def __init__(self, registryConfig, schemaConfig=None, create=False): | |||
assert isinstance(registryConfig, RegistryConfig) | |||
self.config = registryConfig | |||
pixelizationCls = doImport(self.config["skypix.cls"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make self.pixelization
a property so that import is deferred until it's actually needed?
@@ -788,6 +788,12 @@ def addDataUnitEntry(self, dataUnitName, values): | |||
values : `dict` | |||
Dictionary of ``columnName, columnValue`` pairs. | |||
|
|||
If ``values`` includes a "region" key, `setDataUnitRegion` will | |||
automatically be called to set it any any associated spatial join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm done with review, it looks OK but I think there may be couple of thing that need change, please see my comments.
config/schema.yaml
Outdated
@@ -769,7 +790,8 @@ schema: | |||
lhs: | |||
- Visit | |||
- Sensor | |||
rhs: SkyPix | |||
rhs: | |||
- SkyPix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simple cases like this could we possibly use more compact inline style?
lhs: [Visit, Sensor]
rhs: [SkyPix]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure (I don't have a preference - it's a question of whether compactness or consistency is more valuable).
fileDescriptor : `FileDescriptor` | ||
Identifies the file to read, type to read it into and parameters | ||
to be used for reading. | ||
comp : `str`, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you spell this parameter name completely? comp
is such a common prefix, hard to tell if it means compare/computer/compost/compassion 😉
Also, base class read()
method does not seem to have this parameter, I can't figure out how it can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it's my fault because I call it comp
in fileFormatter
. posixDatastore does use that form of formatter.read()
. You are right I forgot to change the abstract base class. The change here is to allow Exposure components to be read without requiring the entire Exposure to be read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed this to component
and added it to the Formatter
base class.
# Just pass parameters into kwargs for constructor, but check that we recognize them. | ||
kwds.update(fileDescriptor.parameters) | ||
if not self.parameters.issuperset(kwds.keys()): | ||
raise KeyError("Unrecognized parameter key(s): {}".format(self.parameters - kwds.keys())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be kwds.keys() - self.parameters
instead?
# inserted previously and this is an improved region for them, or | ||
# because the region is associated with a single DataUnit instance and | ||
# is hence part of that DataUnit's main table. | ||
if not new or (len(dataUnitNames) == 1 and table == dataUnit.table): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess dataUnit
here originates from the loop above? Not illegal but potentially fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right; it looks a bit weird to my C++-trained eyes, too, but I think this is an example of exactly why Python allows loop variables to live beyond their loops.
import os.path | ||
|
||
from lsst.daf.butler.formatters.fileFormatter import FileFormatter | ||
from ..core.formatter import Formatter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was deliberately using the public formatter interface rather than doing a relative import, since this is what external formatters would look like and it seemed wrong to use internals for them.
ups/daf_butler.table
Outdated
@@ -9,6 +9,7 @@ setupRequired(utils) | |||
setupRequired(daf_persistence) | |||
setupRequired(afw) | |||
|
|||
envPrepend(PATH, ${PRODUCT_DIR}/bin) | |||
envPrepend(LD_LIBRARY_PATH, ${PRODUCT_DIR}/lib) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whereby I notice that we are setting library search paths when we should not be since this is pure python package.
@@ -65,6 +65,7 @@ def setConfigRoot(cls, root, config, full): | |||
from defaults when Butler instances are constructed | |||
should be copied from `full` to `Config`. | |||
""" | |||
super(SqliteRegistry, cls).setConfigRoot(root, config, full) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If super().setConfigRoot()
is not going to work it might be worth having a comment here explaining why you are using the explicit form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I thought super()
wouldn't work in a classmethod
, but it seemed I was wrong about that.
(e.g. VisitSensorRegion)
9da91bb
to
342b357
Compare
@andy-slac, I believe I've addressed all comments. Please take another look and approve if you agree. |
Approved, github says "andy-slac approved these changes 17 seconds from now" but I approved it 20 seconds ago 😄 |
No description provided.