Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-14428: Spatial joins tables and views #55

Merged
merged 18 commits into from Jun 28, 2018
Merged

Conversation

TallJimbo
Copy link
Member

No description provided.

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TallJimbo, I'm not completely done reviewing yet, but have some comments you may want to start looking at while I'm finishing my part.

@@ -69,7 +69,8 @@ def setConfigRoot(cls, root, config, full):
from defaults when Butler instances are constructed
should be copied from `full` to `Config`.
"""
raise NotImplementedError()
for key in ("registry.skypix.name", "registry.skypix.level"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be registry.skypix.cls instead of registry.skypix.name?

type: int
primary_key: true
nullable: false
doc: visit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sensor ID?

doc: visit
-
name: region
type: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type is string in this table but blob in Tract/Patch tables. Should we instead introduce special region column type and use type magic name rather than column magic name in Python code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it a blob here too. Is there a way to use magic Python conversions in a way that is truly RDBMS-independent? (Note I'd like to avoid even exposing that we're using SQLAlchemy publicly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need magic here, we are reading this schema ourselves and transform it into backed-specific schema so we have complete freedom. My worry is that using low-level (physical) type here instead of logical (region) we limit ourselves to a particular choice of storage type. I can imagine for some backends BLOB may be too inefficient and you may want to store regions (which should be relatively short strings of bytes) in more efficient storage type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just misunderstood what you were suggesting. I'll go ahead and make that change (while still mapping region columns to sqlalchemy.LargeBinary for now).

this Visit+Sensor combination. This is expected to be more
precise than the SkyPixels associated with the Visit+Sensor (see
VisitSensorSkyPixJoin), but may still be larger than the true
region as long as it fully covers it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we describe string representation for regions anywhere or do we depend on sphgeom for it? If latter them maybe add couple of words here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do rely on sphgeom for this. I'll add some docs.

foreignKeys:
-
src: camera
tgt: Camera.camera
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this foreign key is useful at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed redundant with other foreign key constraints. I'll remove it.

@@ -270,7 +312,7 @@ def _initDataUnitNames(self, config):
Parameters
----------
config : `SchemaConfig`
Schema configuration describing `DataUnit` relations.
The `DataUnit` component of a `SchemaConfig`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataUnits (or dataUnits)?

config : `SchemaConfig`
Schema configuration describing `DataUnit` relations.
config : `Config`
The `DataUnit` component of a `SchemaConfig`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

if 'tables' in dataUnitJoinDescription and builder is not None:
for tableName, tableDescription in dataUnitJoinDescription['tables'].items():
table = builder.addTable(tableName, tableDescription)
isView = "sql" in tableDescription
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could "sql" in YAML potentially mean something else than view?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, yes (see ExposureRangeJoin), inside a tables section it always means a view.

@@ -115,5 +116,7 @@ def fromConfig(registryConfig, schemaConfig=None, create=False):
def __init__(self, registryConfig, schemaConfig=None, create=False):
assert isinstance(registryConfig, RegistryConfig)
self.config = registryConfig
pixelizationCls = doImport(self.config["skypix.cls"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make self.pixelization a property so that import is deferred until it's actually needed?

@@ -788,6 +788,12 @@ def addDataUnitEntry(self, dataUnitName, values):
values : `dict`
Dictionary of ``columnName, columnValue`` pairs.

If ``values`` includes a "region" key, `setDataUnitRegion` will
automatically be called to set it any any associated spatial join
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any any

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm done with review, it looks OK but I think there may be couple of thing that need change, please see my comments.

@@ -769,7 +790,8 @@ schema:
lhs:
- Visit
- Sensor
rhs: SkyPix
rhs:
- SkyPix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simple cases like this could we possibly use more compact inline style?

lhs: [Visit, Sensor]
rhs: [SkyPix]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure (I don't have a preference - it's a question of whether compactness or consistency is more valuable).

fileDescriptor : `FileDescriptor`
Identifies the file to read, type to read it into and parameters
to be used for reading.
comp : `str`, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you spell this parameter name completely? comp is such a common prefix, hard to tell if it means compare/computer/compost/compassion 😉
Also, base class read() method does not seem to have this parameter, I can't figure out how it can be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's my fault because I call it comp in fileFormatter. posixDatastore does use that form of formatter.read(). You are right I forgot to change the abstract base class. The change here is to allow Exposure components to be read without requiring the entire Exposure to be read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed this to component and added it to the Formatter base class.

# Just pass parameters into kwargs for constructor, but check that we recognize them.
kwds.update(fileDescriptor.parameters)
if not self.parameters.issuperset(kwds.keys()):
raise KeyError("Unrecognized parameter key(s): {}".format(self.parameters - kwds.keys()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be kwds.keys() - self.parameters instead?

# inserted previously and this is an improved region for them, or
# because the region is associated with a single DataUnit instance and
# is hence part of that DataUnit's main table.
if not new or (len(dataUnitNames) == 1 and table == dataUnit.table):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess dataUnit here originates from the loop above? Not illegal but potentially fragile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right; it looks a bit weird to my C++-trained eyes, too, but I think this is an example of exactly why Python allows loop variables to live beyond their loops.

import os.path

from lsst.daf.butler.formatters.fileFormatter import FileFormatter
from ..core.formatter import Formatter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was deliberately using the public formatter interface rather than doing a relative import, since this is what external formatters would look like and it seemed wrong to use internals for them.

@@ -9,6 +9,7 @@ setupRequired(utils)
setupRequired(daf_persistence)
setupRequired(afw)

envPrepend(PATH, ${PRODUCT_DIR}/bin)
envPrepend(LD_LIBRARY_PATH, ${PRODUCT_DIR}/lib)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whereby I notice that we are setting library search paths when we should not be since this is pure python package.

@@ -65,6 +65,7 @@ def setConfigRoot(cls, root, config, full):
from defaults when Butler instances are constructed
should be copied from `full` to `Config`.
"""
super(SqliteRegistry, cls).setConfigRoot(root, config, full)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If super().setConfigRoot() is not going to work it might be worth having a comment here explaining why you are using the explicit form.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought super() wouldn't work in a classmethod, but it seemed I was wrong about that.

@TallJimbo
Copy link
Member Author

@andy-slac, I believe I've addressed all comments. Please take another look and approve if you agree.

@andy-slac
Copy link
Contributor

Approved, github says "andy-slac approved these changes 17 seconds from now" but I approved it 20 seconds ago 😄

@TallJimbo TallJimbo merged commit 05a3be9 into master Jun 28, 2018
@ktlim ktlim deleted the tickets/DM-14428 branch August 25, 2018 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants