Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-29147: Switch from using mapApData to new pipeline task and functors. #110

Merged
merged 3 commits into from May 1, 2021

Conversation

morriscb
Copy link
Contributor

No description provided.

Debug tests.

Fix linting.

Debug pipeline.
Add comment to drop columns.
Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some wording docs suggestions, but the IxxPsf and df.drop() questions are important ones, I think.

@@ -86,8 +93,42 @@ funcs:
# trailLnL not implemented
# trailChi2 not implemented
# trailNdata not implemented
# dipMeanFlux needs functor DM-
# dipFluxDiff needs functor DM-
dipMeanFlux:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"dip" is really the name of it? Not dipole? That's very confusing to me: I saw "dip" and thought this was about a source whose flux had gone down or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is very unfortunate, but not a problem for this ticket.

@@ -86,8 +93,42 @@ funcs:
# trailLnL not implemented
# trailChi2 not implemented
# trailNdata not implemented
# dipMeanFlux needs functor DM-
# dipFluxDiff needs functor DM-
dipMeanFlux:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could you please put a note at the top of the file referencing or linking to where these fields are defined? They don't have docstrings in here, and I assume they are formally defined elsewhere. Not knowing the package, I don't know where to look that up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added reference to http://ls.st/dpdd

args: slot_PsfShape_xx
functor: ConvertPixelSqToArcsecondsSq
args:
- slot_Shape_xx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it would be identical in value to Ixx: Shouldn't the IxxPsf use an explicitly PSF-like shape? slot_Shape_xx is not necessarily PSF-like. I see you used slot_PsfShape_xy for IxyPsf below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. Was that way due to a bad copy paste job. Now fixed.

name="{fakesType}{coaddName}Diff_diaSrc",
storageClass="SourceCatalog",
diaSourceTable = connTypes.Input(
doc=".",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a doc.

@@ -169,11 +162,6 @@ class DiaPipelineConfig(pipeBase.PipelineTaskConfig,
"band not on this list, the appropriate band specific columns "
"must be added to the Apdb schema in dax_apdb.",
)
diaSourceDpddifier = pexConfig.ConfigurableField(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to note that I am glad to see "Dpddifier" removed as a thing.

@@ -182,8 +183,11 @@ def run(self,
ParquetTable(dataFrame=diaSourceDf),
self.funcs,
dataId=None).df
# The Ra/DecColumn functors preserve the coord_ra/dec original columns.
# Since we don't need these and keeping them causes a DB insert crash
# we drop them from the DataFrame before returning out value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"returning out value"?

return pipeBase.Struct(
diaSourceTable=df
diaSourceTable=df.drop(columns=["coord_ra", "coord_dec"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the docs, df.drop() defaults to returning the modified object. That results in a new copy of the data frame. I think what you want here is to make the df.drop() call before return, and pass inplace=True so that it modifies the dataframe in place without making a copy. See:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that's what happening. What is your concern here of returning the copy of the dataframe that is computed in this method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general avoidance of creating unnecessary copies.

Though I've now spent some time on stackoverflow and some pandas docs, and it looks like an extra copy is typically created in this case even with inplace=True (there's a proposal to deprecate inplace: pandas-dev/pandas#16529 ). This is surprising behavior, coming from the C++ side of the stack where "don't make unnecessary copies" was a big design goal.

So, I guess leave the code here as-is, and now I wonder whether we shouldn't mention this somewhere in the dev guide since we're starting to use pandas in more places?

Add reference to DPDD to start of DiaSource.yaml file.
Change Psf moments calculations to correct columns.
Fixup missing doc and typo in comment.
@morriscb morriscb merged commit 83b8c68 into master May 1, 2021
@morriscb morriscb deleted the tickets/DM-29147 branch May 1, 2021 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants