New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-15588: Remove home-brewed SQLite PPDB #32
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some style comments (in particular, code duplication in the test).
Please try to clean up the commit history by e.g. squashing the "debug" and "fix" commits, though. I find it hard to tell what the changes were from the current list.
- name: nDiaSources | ||
type: INT | ||
nullable: false | ||
description: Total number of DiaSources associated with this DiaObject. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this. You add the same thing to dax_ppdb/data/ppdb-schema-extra.yaml
; why do you need two more copies here? Which file is actually being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one being used in ap_pipe is from ap_association. The reason the duplication exists is because the ap_association one as extra columns for the observed DiaSource filter. It also doesn't have the DiaObjectLast table as we don't currently use the Ppdb concept of "dailyJob". The current default config is baseline.
doc='Calibrated scatter in flux in %s band.' % | ||
filter_name) | ||
|
||
# Generated automatically from ppdb-schema.yaml in dax_ppdb/data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please tell me the source code for afwUtils.py
isn't auto-generated...
Why not use one of the .yaml
files to populate schema
, instead of having it (pseudo-?)hard-coded like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the underlying cat schema format (which the Ppdb schema is auto generated from) is changing in the near future. I didn't want completely auto generate the afw schema until the new cat schema format is finalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but what do you mean by the comment? Is afwUtils.py
being auto-generated somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full file isn't, but the afw columns are.
# Mapping of arrays and BLOBs is not currently supported by the PPDB so we | ||
# these columns out. | ||
""" | ||
schema.addField('uLcPeriodic', type='ArrayF', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's a clever way to emulate block comments in Python. I'm surprised none of the style-checking software complains, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. I'll change these to normal comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a quick and dirty solution to comment out the block because Ppdb doesn't currently support array types or BLOBs. Forgot to change it for the PR.
@@ -130,22 +143,91 @@ def run(self, dia_sources, exposure): | |||
Input exposure representing the region of the sky the dia_sources | |||
were detected on. Should contain both the solved WCS and a bounding | |||
box of the ccd. | |||
ppdb : `lsst.dax.ppdb.Ppdb` | |||
Ppdb connection object to retrieve DIASources/Objects from and | |||
write to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this means DM-13602 can be marked as obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems so? ap_association doesn't set the db location anymore but instead receives a Ppdb object. The settings for the Ppdb are a PexConfig object though and currently ap_verify and ap_pipe set these configs. Not sure if that completely answers the question.
output_dia_objects.append(cov_dia_object) | ||
|
||
# Return deep copy to enforce contiguity. | ||
return output_dia_objects.copy(deep=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If contiguity is something you need to enforce, consider documenting that the return value is guaranteed contiguous.
tests/test_association_task.py
Outdated
dia_source["filterId"] = self.exposure.getFilter().getId() | ||
dia_source["x"] = 0 | ||
dia_source["y"] = 0 | ||
dia_source["snr"] = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these all mandatory columns? 😨 I'm a bit worried about code duplication with the code you're supposed to be testing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"x" and "y" are non-Nullable in the Ppdb thus I set them here. The association step takes place in RA/DEC and these columns are not touched.
tests/test_association_task.py
Outdated
dia_object['%sPSFluxSigma' % filter_name] = 1 | ||
dia_object['%sPSFluxNdata' % filter_name] = 1 | ||
|
||
dateTime = dafBase.DateTime(nsecs=1400000000 * 10**9 - 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dateTime = dafBase.DateTime(nsecs=1400000000 * 10**9 - 1000) | |
dateTime = dafBase.DateTime(nsecs=1400000000 * 1e9 - 1000) |
This is kind of unusual notation anyway; why not just write out the number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a string input.
tests/test_association_task.py
Outdated
dia_source["apFlux"] = 10000 / self.flux0 | ||
dia_source["apFluxErr"] = \ | ||
np.sqrt((100 / self.flux0) ** 2 + (10000 * self.flux0_err / self.flux0 ** 2) ** 2) | ||
dia_source["snr"] = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some code duplication with _run_association_and_retrieve_objects
and test_update_dia_objects
; can you combine how the three methods initialize sources?
tests/test_association_task.py
Outdated
assoc_db.create_tables() | ||
ppdb = Ppdb(config=self.ppdbConfig, | ||
afw_schemas=dict(DiaObject=make_dia_object_schema(), | ||
DiaSource=make_dia_source_schema())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code duplication with _store_dia_objects_and_sources
; maybe have a makePpdb
function?
tests/test_association_task.py
Outdated
'psFluxMean_g', 'psFluxMeanErr_g', 'psFluxSigma_g', | ||
'psFluxMean_r', 'psFluxMeanErr_r', 'psFluxSigma_r'] | ||
'id', 'gPSFluxMean', 'gPSFluxMeanErr', 'gPSFluxSigma', | ||
'rPSFluxMean', 'rPSFluxMeanErr', 'rPSFluxSigma'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this depend on the content of the schema files or some other external source? I'm worried about it getting out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, why not just use test_dia_object_values[0].keys()
? You'd have one fewer copy to keep up to date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized I can delete this specific list and just use the keys from the dictionary. Does that simplification satisfy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
tests/test_association_task.py
Outdated
ppdb = Ppdb(config=self.ppdbConfig, | ||
afw_schemas=dict(DiaObject=make_dia_object_schema(), | ||
DiaSource=make_dia_source_schema())) | ||
ppdb._schema.makeSchema() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the discussion on lsst/dax_ppdb#5, this line should be ppdb.makeSchema(drop=True)
but otherwise doesn't need to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the new make_schema function satisfy this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really meant _make_ppdb
, then mostly yes. Did you mean to leave off the drop=True
bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did. I end up using the function not just to create a database but to connect to an existing one.
tests/test_association_task.py
Outdated
ppdb = Ppdb(config=self.ppdbConfig, | ||
afw_schemas=dict(DiaObject=make_dia_object_schema(), | ||
DiaSource=make_dia_source_schema())) | ||
ppdb._schema.makeSchema() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the discussion on lsst/dax_ppdb#5, this line should be ppdb.makeSchema(drop=True)
but otherwise doesn't need to be changed (or it could be merged with the similar code in the test case just above it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto with above.
Rename yaml schemas. Simplify afwUtils and add DPDD columns. Remove tests for previous DB wrappers. Add Ppdb as input to AssociationTask API. Debug imports and column names in unittests. Debug initial problems with test_update_dia_object. Add afw mappings. Comment out columns with unsupport datatypes. Fix column names in association flux calculations. Add radecTai to DiaObject values. Add values to DiaSources and Objects. Fixed unittests for new API. Added column mappings. Fixed bugs in AssociationTask regarding flux MeanErrors. Fixed values and bugs in test_association.
Remove unnecessary afw mappings. Add nDiaSources back to DiaObject schemas. Remove deprecated afw->sql code.
Fix schema function names and change Sigma to Err Remove PPDB internal "Validity" columns Change psFlux and Err to type D. Add nDiaSources to afw schema. Change afw float types to double Bug fixes for use in ap_verify. Change schema error fields to float. Compute and store total diaSources. Add centroid mapping. Add filter name and id to MapDiaSource Reduce default MapDiaSource calibration fields.
Add make_schema method. Debug unittest. Change date input to string. Remove association config class. Change to getPackageDir Remove schemaMapper. Fix docstring Add docstring on continuity. Change doc comment to hash comment. Add nDiaSources to test_run. Add nDiaSource value set. Create _set_dia_source_values function. Debug _set_source_values
2849f4c
to
4c9e5e0
Compare
No description provided.