New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-29147: Switch from using mapApData to new pipeline task and functors. #110
Conversation
Debug tests. Fix linting. Debug pipeline.
Add comment to drop columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some wording docs suggestions, but the IxxPsf
and df.drop()
questions are important ones, I think.
@@ -86,8 +93,42 @@ funcs: | |||
# trailLnL not implemented | |||
# trailChi2 not implemented | |||
# trailNdata not implemented | |||
# dipMeanFlux needs functor DM- | |||
# dipFluxDiff needs functor DM- | |||
dipMeanFlux: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"dip" is really the name of it? Not dipole
? That's very confusing to me: I saw "dip" and thought this was about a source whose flux had gone down or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep: http://ls.st/dpdd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is very unfortunate, but not a problem for this ticket.
@@ -86,8 +93,42 @@ funcs: | |||
# trailLnL not implemented | |||
# trailChi2 not implemented | |||
# trailNdata not implemented | |||
# dipMeanFlux needs functor DM- | |||
# dipFluxDiff needs functor DM- | |||
dipMeanFlux: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could you please put a note at the top of the file referencing or linking to where these fields are defined? They don't have docstrings in here, and I assume they are formally defined elsewhere. Not knowing the package, I don't know where to look that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added reference to http://ls.st/dpdd
data/DiaSource.yaml
Outdated
args: slot_PsfShape_xx | ||
functor: ConvertPixelSqToArcsecondsSq | ||
args: | ||
- slot_Shape_xx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it would be identical in value to Ixx
: Shouldn't the IxxPsf
use an explicitly PSF-like shape? slot_Shape_xx
is not necessarily PSF-like. I see you used slot_PsfShape_xy
for IxyPsf
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch. Was that way due to a bad copy paste job. Now fixed.
name="{fakesType}{coaddName}Diff_diaSrc", | ||
storageClass="SourceCatalog", | ||
diaSourceTable = connTypes.Input( | ||
doc=".", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a doc.
@@ -169,11 +162,6 @@ class DiaPipelineConfig(pipeBase.PipelineTaskConfig, | |||
"band not on this list, the appropriate band specific columns " | |||
"must be added to the Apdb schema in dax_apdb.", | |||
) | |||
diaSourceDpddifier = pexConfig.ConfigurableField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to note that I am glad to see "Dpddifier" removed as a thing.
@@ -182,8 +183,11 @@ def run(self, | |||
ParquetTable(dataFrame=diaSourceDf), | |||
self.funcs, | |||
dataId=None).df | |||
# The Ra/DecColumn functors preserve the coord_ra/dec original columns. | |||
# Since we don't need these and keeping them causes a DB insert crash | |||
# we drop them from the DataFrame before returning out value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"returning out value"?
return pipeBase.Struct( | ||
diaSourceTable=df | ||
diaSourceTable=df.drop(columns=["coord_ra", "coord_dec"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the docs, df.drop()
defaults to returning the modified object. That results in a new copy of the data frame. I think what you want here is to make the df.drop()
call before return
, and pass inplace=True
so that it modifies the dataframe in place without making a copy. See:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that's what happening. What is your concern here of returning the copy of the dataframe that is computed in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general avoidance of creating unnecessary copies.
Though I've now spent some time on stackoverflow and some pandas docs, and it looks like an extra copy is typically created in this case even with inplace=True
(there's a proposal to deprecate inplace
: pandas-dev/pandas#16529 ). This is surprising behavior, coming from the C++ side of the stack where "don't make unnecessary copies" was a big design goal.
So, I guess leave the code here as-is, and now I wonder whether we shouldn't mention this somewhere in the dev guide since we're starting to use pandas in more places?
Add reference to DPDD to start of DiaSource.yaml file. Change Psf moments calculations to correct columns. Fixup missing doc and typo in comment.
No description provided.