Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-16234: Move transform and object catalog tasks to pipe_tasks #318

Merged
merged 5 commits into from Nov 19, 2019

Conversation

yalsayyad
Copy link
Contributor

Will squash obviously after review

Copy link
Contributor

@morriscb morriscb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few questions in the comments and a few specific comments/requests.
-Docs don't seem to complete and some times when they exists they are not completely in the numpydoc format.
-It would be helpful to our sprint to identify specific places where things need to me modified/expanded on for usage in Alert Production.
-More than a few TODOs and questions are sitting around in the code. Would it be possible to spawn tickets to address them?

Comment on lines 87 to 91
While not currently implemented, it would be
relatively straightforward to generalize the base `Functor` class to be able to
accept arbitrary `ParquetTable` formats (other than that of `deepCoadd_obj`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much work is this going to be since we need it, or some other behavior, to use in AP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you tried it in https://jira.lsstcorp.org/browse/DM-17999 and it wasn't terrible

Comment on lines +498 to +504
Takes a `calib` argument, which returns the flux at mag=0
as `calib.getFluxMag0()`. If not provided, then the default
`fluxMag0` is 63095734448.0194, which is default for HSC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely worth making a ticket to update all of the flux functors to update to the newly stored localCalibration value.

test = (x < 0.5).astype(int)
test = test.mask(mask, 2)

# are these backwards?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any word on this?

Comment on lines 71 to 74
Parameters
----------
df : `pandas.DataFrame`
Dataframe to write to Parquet file.

filename : str
Path to which to write.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters doesn't match the input?

Comment on lines 148 to 150
except AttributeError:
columns = self._sanitizeColumns(columns)
df = self._pf.read(columns=columns, use_pandas_metadata=True).to_pandas()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a warn for the column(s) that failed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing, I found that in the latest version of pyarrow pyarrow.parquet.ParquetFile no longer throws an AttributeError if the requested column name is missing. I'm going to remove the except AttributeError and just check that column lengths match by hand and warn on which columns aren't in the result if they don't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed my mind on changing behavior on this ticket. This behavior was intentionally specified in one of @timothydmorton's unit tests. I added some more unit tests to complete coverage of Parquet and MultilevelParquet for the cases where no good columns are requested and only some good columns are requested. Deferring behavior change to DM-21976: Decide on behavior of ParquetTable if one requested column does not exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, as long as there's a ticket to decide I'm happy.

@morriscb
Copy link
Contributor

morriscb commented Oct 9, 2019

Oh, also I remember seeing lsst.qa.explorer in a few places around the code. Need to change those.

* Bring up to standards
* Remove Subaru-specific references
* Rename classes
* Replace naked asserts.
* Adhere to standards guide
* Create simulated dataframes instead of reading from disk
* Remove test_postprocess
* Store test multilevel file as csv.gz because
  Overhead for parquet files are MB, which is too
  big for a test dataset.
* Fail with NaNs if input columns do not exist.
Necessary for case where input data for one of the filters doesn't exist
* Change default ra/dec names to coord_ra/dec
* Reformat documentation
@yalsayyad yalsayyad force-pushed the tickets/DM-16234 branch 3 times, most recently from 49f365a to cb2d1bb Compare November 19, 2019 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants