Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-15588: Remove home-brewed SQLite PPDB #32

Merged
merged 4 commits into from Nov 30, 2018
Merged

Conversation

morriscb
Copy link
Contributor

No description provided.

Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some style comments (in particular, code duplication in the test).

Please try to clean up the commit history by e.g. squashing the "debug" and "fix" commits, though. I find it hard to tell what the changes were from the current list.

- name: nDiaSources
type: INT
nullable: false
description: Total number of DiaSources associated with this DiaObject.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this. You add the same thing to dax_ppdb/data/ppdb-schema-extra.yaml; why do you need two more copies here? Which file is actually being used?

Copy link
Contributor Author

@morriscb morriscb Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one being used in ap_pipe is from ap_association. The reason the duplication exists is because the ap_association one as extra columns for the observed DiaSource filter. It also doesn't have the DiaObjectLast table as we don't currently use the Ppdb concept of "dailyJob". The current default config is baseline.

doc='Calibrated scatter in flux in %s band.' %
filter_name)

# Generated automatically from ppdb-schema.yaml in dax_ppdb/data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tell me the source code for afwUtils.py isn't auto-generated...

Why not use one of the .yaml files to populate schema, instead of having it (pseudo-?)hard-coded like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the underlying cat schema format (which the Ppdb schema is auto generated from) is changing in the near future. I didn't want completely auto generate the afw schema until the new cat schema format is finalized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but what do you mean by the comment? Is afwUtils.py being auto-generated somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full file isn't, but the afw columns are.

# Mapping of arrays and BLOBs is not currently supported by the PPDB so we
# these columns out.
"""
schema.addField('uLcPeriodic', type='ArrayF',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's a clever way to emulate block comments in Python. I'm surprised none of the style-checking software complains, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. I'll change these to normal comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a quick and dirty solution to comment out the block because Ppdb doesn't currently support array types or BLOBs. Forgot to change it for the PR.

@@ -130,22 +143,91 @@ def run(self, dia_sources, exposure):
Input exposure representing the region of the sky the dia_sources
were detected on. Should contain both the solved WCS and a bounding
box of the ccd.
ppdb : `lsst.dax.ppdb.Ppdb`
Ppdb connection object to retrieve DIASources/Objects from and
write to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means DM-13602 can be marked as obsolete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems so? ap_association doesn't set the db location anymore but instead receives a Ppdb object. The settings for the Ppdb are a PexConfig object though and currently ap_verify and ap_pipe set these configs. Not sure if that completely answers the question.

output_dia_objects.append(cov_dia_object)

# Return deep copy to enforce contiguity.
return output_dia_objects.copy(deep=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If contiguity is something you need to enforce, consider documenting that the return value is guaranteed contiguous.

dia_source["filterId"] = self.exposure.getFilter().getId()
dia_source["x"] = 0
dia_source["y"] = 0
dia_source["snr"] = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these all mandatory columns? 😨 I'm a bit worried about code duplication with the code you're supposed to be testing...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"x" and "y" are non-Nullable in the Ppdb thus I set them here. The association step takes place in RA/DEC and these columns are not touched.

dia_object['%sPSFluxSigma' % filter_name] = 1
dia_object['%sPSFluxNdata' % filter_name] = 1

dateTime = dafBase.DateTime(nsecs=1400000000 * 10**9 - 1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dateTime = dafBase.DateTime(nsecs=1400000000 * 10**9 - 1000)
dateTime = dafBase.DateTime(nsecs=1400000000 * 1e9 - 1000)

This is kind of unusual notation anyway; why not just write out the number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a string input.

dia_source["apFlux"] = 10000 / self.flux0
dia_source["apFluxErr"] = \
np.sqrt((100 / self.flux0) ** 2 + (10000 * self.flux0_err / self.flux0 ** 2) ** 2)
dia_source["snr"] = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some code duplication with _run_association_and_retrieve_objects and test_update_dia_objects; can you combine how the three methods initialize sources?

assoc_db.create_tables()
ppdb = Ppdb(config=self.ppdbConfig,
afw_schemas=dict(DiaObject=make_dia_object_schema(),
DiaSource=make_dia_source_schema()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code duplication with _store_dia_objects_and_sources; maybe have a makePpdb function?

'psFluxMean_g', 'psFluxMeanErr_g', 'psFluxSigma_g',
'psFluxMean_r', 'psFluxMeanErr_r', 'psFluxSigma_r']
'id', 'gPSFluxMean', 'gPSFluxMeanErr', 'gPSFluxSigma',
'rPSFluxMean', 'rPSFluxMeanErr', 'rPSFluxSigma']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this depend on the content of the schema files or some other external source? I'm worried about it getting out of sync.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, why not just use test_dia_object_values[0].keys()? You'd have one fewer copy to keep up to date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized I can delete this specific list and just use the keys from the dictionary. Does that simplification satisfy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

ppdb = Ppdb(config=self.ppdbConfig,
afw_schemas=dict(DiaObject=make_dia_object_schema(),
DiaSource=make_dia_source_schema()))
ppdb._schema.makeSchema()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the discussion on lsst/dax_ppdb#5, this line should be ppdb.makeSchema(drop=True) but otherwise doesn't need to be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the new make_schema function satisfy this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really meant _make_ppdb, then mostly yes. Did you mean to leave off the drop=True bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did. I end up using the function not just to create a database but to connect to an existing one.

ppdb = Ppdb(config=self.ppdbConfig,
afw_schemas=dict(DiaObject=make_dia_object_schema(),
DiaSource=make_dia_source_schema()))
ppdb._schema.makeSchema()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the discussion on lsst/dax_ppdb#5, this line should be ppdb.makeSchema(drop=True) but otherwise doesn't need to be changed (or it could be merged with the similar code in the test case just above it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto with above.

Rename yaml schemas.
Simplify afwUtils and add DPDD columns.
Remove tests for previous DB wrappers.
Add Ppdb as input to AssociationTask API.

Debug imports and column names in unittests.

Debug initial problems with test_update_dia_object.

Add afw mappings.
Comment out columns with unsupport datatypes.
Fix column names in association flux calculations.
Add radecTai to DiaObject values.
Add values to DiaSources and Objects.

Fixed unittests for new API.

Added column mappings.
Fixed bugs in AssociationTask regarding flux MeanErrors.
Fixed values and bugs in test_association.
Remove unnecessary afw mappings.

Add nDiaSources back to DiaObject schemas.

Remove deprecated afw->sql code.
Fix schema function names and change Sigma to Err

Remove PPDB internal "Validity" columns

Change psFlux and Err to type D.

Add nDiaSources to afw schema.

Change afw float types to double

Bug fixes for use in ap_verify.

Change schema error fields to float.
Compute and store total diaSources.
Add centroid mapping.
Add filter name and id to MapDiaSource
Reduce default MapDiaSource calibration fields.
Add make_schema method.

Debug unittest.

Change date input to string.

Remove association config class.

Change to getPackageDir

Remove schemaMapper.

Fix docstring

Add docstring on continuity.

Change doc comment to hash comment.

Add nDiaSources to test_run.

Add nDiaSource value set.

Create _set_dia_source_values function.

Debug _set_source_values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants