DM-16234: Move transform and object catalog tasks to pipe_tasks #318

yalsayyad · 2019-09-27T04:12:02Z

Will squash obviously after review

morriscb

Looks good. Just a few questions in the comments and a few specific comments/requests.
-Docs don't seem to complete and some times when they exists they are not completely in the numpydoc format.
-It would be helpful to our sprint to identify specific places where things need to me modified/expanded on for usage in Alert Production.
-More than a few TODOs and questions are sitting around in the code. Would it be possible to spawn tickets to address them?

morriscb · 2019-10-08T00:50:40Z

python/lsst/pipe/tasks/functors.py

+    While not currently implemented, it would be
+    relatively straightforward to generalize the base `Functor` class to be able to
+    accept arbitrary `ParquetTable` formats (other than that of `deepCoadd_obj`).


How much work is this going to be since we need it, or some other behavior, to use in AP.

Looks like you tried it in https://jira.lsstcorp.org/browse/DM-17999 and it wasn't terrible

morriscb · 2019-10-08T22:58:43Z

python/lsst/pipe/tasks/functors.py

+    Takes a `calib` argument, which returns the flux at mag=0
+    as `calib.getFluxMag0()`.  If not provided, then the default
+    `fluxMag0` is 63095734448.0194, which is default for HSC.


Likely worth making a ticket to update all of the flux functors to update to the newly stored localCalibration value.

morriscb · 2019-10-08T23:09:36Z

python/lsst/pipe/tasks/functors.py

+        test = (x < 0.5).astype(int)
+        test = test.mask(mask, 2)
+
+        # are these backwards?


Any word on this?

morriscb · 2019-10-08T23:35:58Z

python/lsst/pipe/tasks/parquetTable.py

+        Parameters
+        ----------
+        df : `pandas.DataFrame`
+            Dataframe to write to Parquet file.
+
+        filename : str
+            Path to which to write.


Parameters doesn't match the input?

morriscb · 2019-10-08T23:41:26Z

python/lsst/pipe/tasks/parquetTable.py

+        except AttributeError:
+            columns = self._sanitizeColumns(columns)
+            df = self._pf.read(columns=columns, use_pandas_metadata=True).to_pandas()


Worth adding a warn for the column(s) that failed?

When testing, I found that in the latest version of pyarrow pyarrow.parquet.ParquetFile no longer throws an AttributeError if the requested column name is missing. I'm going to remove the except AttributeError and just check that column lengths match by hand and warn on which columns aren't in the result if they don't.

Changed my mind on changing behavior on this ticket. This behavior was intentionally specified in one of @timothydmorton's unit tests. I added some more unit tests to complete coverage of Parquet and MultilevelParquet for the cases where no good columns are requested and only some good columns are requested. Deferring behavior change to DM-21976: Decide on behavior of ParquetTable if one requested column does not exist.

Cool, as long as there's a ticket to decide I'm happy.

morriscb · 2019-10-09T00:06:50Z

Oh, also I remember seeing lsst.qa.explorer in a few places around the code. Need to change those.

* Bring up to standards * Remove Subaru-specific references * Rename classes

* Replace naked asserts. * Adhere to standards guide * Create simulated dataframes instead of reading from disk * Remove test_postprocess * Store test multilevel file as csv.gz because Overhead for parquet files are MB, which is too big for a test dataset.

* Fail with NaNs if input columns do not exist. Necessary for case where input data for one of the filters doesn't exist * Change default ra/dec names to coord_ra/dec * Reformat documentation

morriscb approved these changes Oct 9, 2019

View reviewed changes

yalsayyad force-pushed the tickets/DM-16234 branch 2 times, most recently from 744a7b1 to 75b0beb Compare October 27, 2019 15:08

yalsayyad force-pushed the tickets/DM-16234 branch from 75b0beb to 3d69605 Compare November 18, 2019 19:52

yalsayyad added 3 commits November 18, 2019 13:55

Move postprocess tasks and tests from qa_explorer as they are

f6f98ba

Modify postprocessing Tasks

1536839

* Bring up to standards * Remove Subaru-specific references * Rename classes

Modify unit tests

c2dff11

* Replace naked asserts. * Adhere to standards guide * Create simulated dataframes instead of reading from disk * Remove test_postprocess * Store test multilevel file as csv.gz because Overhead for parquet files are MB, which is too big for a test dataset.

yalsayyad force-pushed the tickets/DM-16234 branch from 3d69605 to 3cb52d6 Compare November 18, 2019 19:57

Rename postprocessing to transform, including

3ba38ca

* Fail with NaNs if input columns do not exist. Necessary for case where input data for one of the filters doesn't exist * Change default ra/dec names to coord_ra/dec * Reformat documentation

yalsayyad force-pushed the tickets/DM-16234 branch 3 times, most recently from 49f365a to cb2d1bb Compare November 19, 2019 04:58

Skip unit tests that require pyarrow if no pyarrow

1c70baf

yalsayyad force-pushed the tickets/DM-16234 branch from cb2d1bb to 1c70baf Compare November 19, 2019 14:53

yalsayyad merged commit 1c70baf into master Nov 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-16234: Move transform and object catalog tasks to pipe_tasks #318

DM-16234: Move transform and object catalog tasks to pipe_tasks #318

yalsayyad commented Sep 27, 2019

morriscb left a comment

morriscb Oct 8, 2019

yalsayyad Oct 25, 2019

morriscb Oct 8, 2019

morriscb Oct 8, 2019

morriscb Oct 8, 2019

morriscb Oct 8, 2019

yalsayyad Oct 25, 2019

yalsayyad Oct 25, 2019

morriscb Oct 25, 2019

morriscb commented Oct 9, 2019

DM-16234: Move transform and object catalog tasks to pipe_tasks #318

DM-16234: Move transform and object catalog tasks to pipe_tasks #318

Conversation

yalsayyad commented Sep 27, 2019

morriscb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morriscb commented Oct 9, 2019