New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-16234: Move transform and object catalog tasks to pipe_tasks #318
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few questions in the comments and a few specific comments/requests.
-Docs don't seem to complete and some times when they exists they are not completely in the numpydoc format.
-It would be helpful to our sprint to identify specific places where things need to me modified/expanded on for usage in Alert Production.
-More than a few TODOs and questions are sitting around in the code. Would it be possible to spawn tickets to address them?
python/lsst/pipe/tasks/functors.py
Outdated
While not currently implemented, it would be | ||
relatively straightforward to generalize the base `Functor` class to be able to | ||
accept arbitrary `ParquetTable` formats (other than that of `deepCoadd_obj`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much work is this going to be since we need it, or some other behavior, to use in AP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you tried it in https://jira.lsstcorp.org/browse/DM-17999 and it wasn't terrible
Takes a `calib` argument, which returns the flux at mag=0 | ||
as `calib.getFluxMag0()`. If not provided, then the default | ||
`fluxMag0` is 63095734448.0194, which is default for HSC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely worth making a ticket to update all of the flux functors to update to the newly stored localCalibration value.
test = (x < 0.5).astype(int) | ||
test = test.mask(mask, 2) | ||
|
||
# are these backwards? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any word on this?
Parameters | ||
---------- | ||
df : `pandas.DataFrame` | ||
Dataframe to write to Parquet file. | ||
|
||
filename : str | ||
Path to which to write. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters doesn't match the input?
except AttributeError: | ||
columns = self._sanitizeColumns(columns) | ||
df = self._pf.read(columns=columns, use_pandas_metadata=True).to_pandas() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a warn for the column(s) that failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing, I found that in the latest version of pyarrow pyarrow.parquet.ParquetFile
no longer throws an AttributeError
if the requested column name is missing. I'm going to remove the except AttributeError
and just check that column lengths match by hand and warn on which columns aren't in the result if they don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed my mind on changing behavior on this ticket. This behavior was intentionally specified in one of @timothydmorton's unit tests. I added some more unit tests to complete coverage of Parquet and MultilevelParquet for the cases where no good columns are requested and only some good columns are requested. Deferring behavior change to DM-21976: Decide on behavior of ParquetTable if one requested column does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, as long as there's a ticket to decide I'm happy.
Oh, also I remember seeing |
744a7b1
to
75b0beb
Compare
75b0beb
to
3d69605
Compare
* Bring up to standards * Remove Subaru-specific references * Rename classes
* Replace naked asserts. * Adhere to standards guide * Create simulated dataframes instead of reading from disk * Remove test_postprocess * Store test multilevel file as csv.gz because Overhead for parquet files are MB, which is too big for a test dataset.
3d69605
to
3cb52d6
Compare
* Fail with NaNs if input columns do not exist. Necessary for case where input data for one of the filters doesn't exist * Change default ra/dec names to coord_ra/dec * Reformat documentation
49f365a
to
cb2d1bb
Compare
cb2d1bb
to
1c70baf
Compare
Will squash obviously after review