Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Astropy view support to afw.table (DM-5641) #64

Merged
merged 2 commits into from
May 17, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 14 additions & 0 deletions python/lsst/afw/table/Base.i
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ template <> struct NumpyTraits<lsst::afw::geom::Angle> : public NumpyTraits<doub
%}

%declareNumPyConverters(ndarray::Array<bool const,1>);
%declareNumPyConverters(ndarray::Array<bool const,1,1>);
%declareNumPyConverters(ndarray::Array<bool,1,1>);
%declareNumPyConverters(ndarray::Array<lsst::afw::table::RecordId const,1>);
%declareNumPyConverters(ndarray::Array<boost::uint16_t const,1>);
%declareNumPyConverters(ndarray::Array<boost::int32_t const,1>);
Expand Down Expand Up @@ -684,4 +686,16 @@ namespace lsst { namespace afw { namespace table {

%declareCatalog(CatalogT, Base)

// This needs to be here, not Catalog.i, to prevent it from being picked up in afw.detection, where _syntax.py is not available.
%extend CatalogT<BaseRecord> {
%pythoncode %{
asAstropy = _syntax.BaseCatalog_asAstropy

def __astropy_table__(self, cls, copy, **kwds):
"""Implement interface called by Astropy table constructors to construct a view or copy.
"""
return _syntax.BaseCatalog_asAstropy(cls=cls, copy=copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the kwargs have been forwarded on here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I did that, but apparently not. I'll create a mini-ticket to fix.

%}
}

}}} // namespace lsst::afw::table
67 changes: 67 additions & 0 deletions python/lsst/afw/table/_syntax.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,3 +202,70 @@ def processArray(a):
else:
d[name] = processArray(self.get(schemaItem.key))
return d

def BaseCatalog_asAstropy(self, cls=None, copy=False, unviewable="copy"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like asAstropy. I'd prefer as_astropy_table but I understand that camel case is king. I don't really care if it's an internal method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly ambivalent about underscores vs. CamelCase here; overall I have a slight preference for switching everything to underscores, but I don't want to switch just some interfaces because then it becomes hard to remember which convention to use where. But I also don't want people to be stuck trying to remember whether it's asAstropy or asAstroPy. Maybe let's see if we get any more opinions?

In any case, I'd like to avoid appending "table"; I think that's pretty strongly implied by the fact that this is a Catalog object, and I do think this is something people may end up typing a fair amount in interactive use, so fewer characters matters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's "import astropy" so I don't think that people will think "asAstroPy". I vote for asAstropy for consistency with the rest of our camelCaseCode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my worry about confusion comes from the fact that I write NumPy in CamelCase code even though it's import numpy.

But the NumPy and Astropy documentation and web pages seem to be quite internally consistent on the capitalization of the "p" while being unfortunately inconsistent with each other, so I maintain it's not my fault.

"""!
Return an astropy.table.Table (or subclass thereof) view into this catalog.

@param[in] cls Table subclass to use; None implies astropy.table.Table itself.
Use astropy.table.QTable to get Quantity columns.

@param[in] copy Whether to copy data from the LSST catalog to the astropy table.
Not copying is usually faster, but can keep memory from being
freed if columns are later removed from the Astropy view.

@param[in] unviewable One of the following options, indicating how to handle field types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method exist because you can't get the unviewable keyword argument into the __astropy_table__ call? Could they forward some keyword arguments through the astropy.table constructor to this method? Then you could get rid of the entire trampoline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. This argument is indeed the only one we can't get through the __astropy_table__ call. But I also think it's not obvious that the astropy table constructor interface is the more natural one even if it did kwarg forwarding. We had some discussion of this on the current Astropy PR's predecessor (astropy/astropy/pull/4864, but it's a long thread); it's sort of unfortunate that Python users expect overloaded constructors, even though the language doesn't actually support overloading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So we have both.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If copy=False actually might leak shouldn't copy=True be the default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not actually a leak; it's just that you're keeping a large array alive via a view into a subset of it. It's exactly the same as what happens here:

a = numpy.zeros(100, dtype=float)
b = a[0:5]
del a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the first case of using __astropy_table__ and it looks like passing through **kwargs would be useful, this might indicate it will be useful in other cases. This is already officially part of the __astropy_table__ API (except there is nothing to populate **kwargs). We would need to have some discussion, but it isn't entirely unreasonable to have **kwargs in the main Table API for this purpose. I would be much happier to have Table(lsst_table, unviewable='raise') be the one true way to convert, rather than one mostly-works method and another fully-works method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the philosophical side, when I see Table(lsst_table) I almost view as a casting operation and it seems obvious to me that this should work. So one just needs to provide the machinery to cast from one kind of table to other.

Right now Table has methods to_pandas() and from_pandas() to convert back and forth to pandas DataFrames. I think this is really inelegant compared to DataFrame(astropy_table) and Table(dataframe). Part of the reason is that Table(dataframe) can actually take all the other constructor args, for instance I could change the column names or types at that time. With from_pandas() currently you don't have the full constructor API. It is certainly possible to put the full API into from_pandas() but that just gets ugly and repetitious for each new compatible table type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was no precedent here, I think I'd feel pretty strongly that to_X() methods are the better interface, because it puts the responsibility in the right place (if object A is viewable as a B, it's A that actually has to know how to do that), and it lets constructors stay simple. And there's nothing stopping you from adding pandas constructor args to to_pandas(), and in fact you can limit it to just the pandas constructor arguments that make sense for this conversion. That's potentially an advantage over the constructor, which has to have all possible arguments for all possible construction patterns and thus pretty complex documentation to specify how those options interact. You can use **kwargs to get around some of that, but I think it's always preferable to avoid that when you can.

All that said, using constructors for type casting is very much an established pattern in Python, even way down in the standard library. I think that may have started largely because it's familiar to people coming from C++, Java, or other strongly-typed languages with overloading (where this pattern very much makes sense), but it's here now, and it's not something I can wish away.

I don't have a well-formed opinion about from_X vs. a constructor; generally I'd prefer the former to reduce overloading, but your point about replicating the full constructor API is a good one.

Anyhow, I think the current interface should accept **kwargs correctly in __astropy_table__ already, so if Table's constructor does allow those to be forwarded from the user, the type-casting constructor will indeed be as fully functional as the asAstropy method, and I'd be happy to eventually deprecate the latter if it ultimately proves unpopular in comparison.

(string and Flag) for which views cannot be constructed:
- 'copy' (default): copy only the unviewable fields.
- 'raise': raise ValueError if unviewable fields are present.
- 'skip': do not include unviewable fields in the Astropy Table.
This option is ignored if copy=True.
"""
import astropy.table
if cls is None:
cls = astropy.table.Table
if unviewable not in ("copy", "raise", "skip"):
raise ValueError("'unviewable' must be one of 'copy', 'raise', or 'skip'")
ps = self.getMetadata()
meta = ps.toOrderedDict() if ps is not None else None
columns = []
items = self.schema.extract("*", ordered=True)
for name, item in items.iteritems():
key = item.key
unit = item.field.getUnits() or None # use None instead of "" when empty
if key.getTypeString() == "String":
if not copy:
if unviewable == "raise":
raise ValueError("Cannot extract string unless copy=True or unviewable='copy' or 'skip'.")
elif unviewable == "skip":
continue
data = numpy.zeros(len(self), dtype=numpy.dtype((str, key.getSize())))
for i, record in enumerate(self):
data[i] = record.get(key)
elif key.getTypeString() == "Flag":
if not copy:
if unviewable == "raise":
raise ValueError(
"Cannot extract packed bit columns unless copy=True or unviewable='copy' or 'skip'."
)
elif unviewable == "skip":
continue
data = self.columns.get_bool_array(key)
elif key.getTypeString() == "Angle":
data = self.columns.get(key)
unit = "radian"
if copy:
data = data.copy()
else:
data = self.columns.get(key)
if copy:
data = data.copy()
columns.append(
astropy.table.Column(
data,
name=item.field.getName(),
unit=unit,
description=item.field.getDoc()
)
)
return cls(columns, meta=meta, copy=False)
7 changes: 6 additions & 1 deletion python/lsst/afw/table/specializations.i
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,12 @@
}

%extend lsst::afw::table::BaseColumnView {
ndarray::Array<bool const,1> __getitem__(
ndarray::Array<bool const,1,1> __getitem__(
lsst::afw::table::Key< lsst::afw::table::Flag > const & key
) const {
return ndarray::copy((*self)[key]);
}
ndarray::Array<bool,1,1> get_bool_array(
lsst::afw::table::Key< lsst::afw::table::Flag > const & key
) const {
return ndarray::copy((*self)[key]);
Expand Down
197 changes: 197 additions & 0 deletions tests/testAstropyTableViews.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
#!/usr/bin/env python2
from __future__ import absolute_import, division

#
# LSST Data Management System
# Copyright 2016 AURA/LSST
#
# This product includes software developed by the
# LSST Project (http://www.lsst.org/).
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the LSST License Statement and
# the GNU General Public License along with this program. If not,
# see <http://www.lsstcorp.org/LegalNotices/>.
#

"""
Tests for Astropy views into afw.table Catalogs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in thinking that these tests are not testing the new functionality in Astropy but are testing that the asAstropy method does the right thing. ie we are not (yet) depending on Astropy v1.2 for this to work. When v1.2 is released shouldn't we be able to instantiate an astropy.table directly with asAstropy being a private method? Or did I misunderstand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically correct. The tests do not exercise any new Astropy functionality. We do have a method that will hopefully work with v1.2, but that's untested.

But even once v1.2 is out, I'd like to keep the asAstropy (or whatever we call it) method public, as it provides useful customization options that will likely never be available through the Astropy table constructors (since they'd require API changes to those constructors).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I say above, I would prefer it if we just said "you pass this afw table to the astropy table constructor" rather than "you call this special exporter routine because it has an extra feature". Getting some extra **kwargs passed in to __astropy_table__ would seem feasible if we asked about it quickly. We'd then have a transitional APIs that would allow people to use table views before 1.2 astropy is released but with the expectation that in the future we'd use the astropy constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and my main comment here was that I'd like a comment in this test explaining that it's not actually testing the Astropy native constructor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added.


Run with:
./testAstropyTableViews.py
or
python
>>> import testAstropyTableViews; testAstropyTableViews.run()
"""

import unittest
import operator

import numpy
import astropy.table
import astropy.units

import lsst.utils.tests
import lsst.afw.table


class AstropyTableViewTestCase(lsst.utils.tests.TestCase):
"""Test that we can construct Astropy views to afw.table Catalog objects.

This test case does not yet test the syntax

table = astropy.table.Table(lsst_catalog)

which is made available by BaseCatalog.__astropy_table__, as this will not
be available until Astropy 1.2 is released. However, this simply
delegates to BaseCatalog.asAstropy, which can also be called directly.
"""

def setUp(self):
schema = lsst.afw.table.Schema()
self.k1 = schema.addField("a1", type=float, units="meter", doc="a1 (meter)")
self.k2 = schema.addField("a2", type=int, doc="a2 (unitless)")
self.k3 = schema.addField("a3", type="ArrayF", size=3, units="count", doc="a3 (array, counts)")
self.k4 = schema.addField("a4", type="Flag", doc="a4 (flag)")
self.k5 = lsst.afw.table.CoordKey.addFields(schema, "a5", "a5 coordinate")
self.k6 = schema.addField("a6", type=str, size=8, doc="a6 (str)")
self.catalog = lsst.afw.table.BaseCatalog(schema)
self.data = [
{
"a1": 5.0, "a2": 3, "a3": numpy.array([0.5, 0.0, -0.5], dtype=numpy.float32),
"a4": True, "a5_ra": 45.0*lsst.afw.geom.degrees, "a5_dec": 30.0*lsst.afw.geom.degrees,
"a6": "bubbles"
},
{
"a1": 2.5, "a2": 7, "a3": numpy.array([1.0, 0.5, -1.5], dtype=numpy.float32),
"a4": False, "a5_ra": 25.0*lsst.afw.geom.degrees, "a5_dec": -60.0*lsst.afw.geom.degrees,
"a6": "pingpong"
},
]
for d in self.data:
record = self.catalog.addNew()
for k, v in d.iteritems():
record.set(k, v)

def tearDown(self):
del self.k1
del self.k2
del self.k3
del self.k4
del self.k5
del self.k6
del self.catalog
del self.data

def testQuantityColumn(self):
"""Test that a column with units is handled as expected by Table and QTable.
"""
v1 = self.catalog.asAstropy(cls=astropy.table.Table, unviewable="skip")
self.assertEqual(v1["a1"].unit, astropy.units.Unit("m"))
self.assertClose(v1["a1"], self.catalog["a1"])
self.assertNotIsInstance(v1["a1"], astropy.units.Quantity)
v2 = self.catalog.asAstropy(cls=astropy.table.QTable, unviewable="skip")
self.assertEqual(v2["a1"].unit, astropy.units.Unit("m"))
self.assertClose(v2["a1"]/astropy.units.Quantity(self.catalog["a1"]*100, "cm"), 1.0)
self.assertIsInstance(v2["a1"], astropy.units.Quantity)

def testUnitlessColumn(self):
"""Test that a column without units is handled as expected by Table and QTable.
"""
v1 = self.catalog.asAstropy(cls=astropy.table.Table, unviewable="skip")
self.assertEqual(v1["a2"].unit, None)
self.assertClose(v1["a2"], self.catalog["a2"]) # use assertClose just because it handles arrays
v2 = self.catalog.asAstropy(cls=astropy.table.QTable, unviewable="skip")
self.assertEqual(v2["a2"].unit, None)
self.assertClose(v2["a2"], self.catalog["a2"])

def testArrayColumn(self):
"""Test that an array column appears as a 2-d array with the expected shape.
"""
v = self.catalog.asAstropy(unviewable="skip")
self.assertClose(v["a3"], self.catalog["a3"])

def testFlagColumn(self):
"""Test that Flag columns can be viewed if copy=True or unviewable="copy".
"""
v1 = self.catalog.asAstropy(unviewable="copy")
self.assertClose(v1["a4"], self.catalog["a4"])
v2 = self.catalog.asAstropy(copy=True)
self.assertClose(v2["a4"], self.catalog["a4"])

def testCoordColumn(self):
"""Test that Coord columns appears as a pair of columns with correct angle units.
"""
v1 = self.catalog.asAstropy(cls=astropy.table.Table, unviewable="skip")
self.assertClose(v1["a5_ra"], self.catalog["a5_ra"])
self.assertEqual(v1["a5_ra"].unit, astropy.units.Unit("rad"))
self.assertNotIsInstance(v1["a5_ra"], astropy.units.Quantity)
self.assertClose(v1["a5_dec"], self.catalog["a5_dec"])
self.assertEqual(v1["a5_dec"].unit, astropy.units.Unit("rad"))
self.assertNotIsInstance(v1["a5_dec"], astropy.units.Quantity)
v2 = self.catalog.asAstropy(cls=astropy.table.QTable, unviewable="skip")
self.assertClose(v2["a5_ra"]/astropy.units.Quantity(self.catalog["a5_ra"], unit="rad"), 1.0)
self.assertEqual(v2["a5_ra"].unit, astropy.units.Unit("rad"))
self.assertIsInstance(v2["a5_ra"], astropy.units.Quantity)
self.assertClose(v2["a5_dec"]/astropy.units.Quantity(self.catalog["a5_dec"], unit="rad"), 1.0)
self.assertEqual(v2["a5_dec"].unit, astropy.units.Unit("rad"))
self.assertIsInstance(v2["a5_dec"], astropy.units.Quantity)

def testStringColumn(self):
"""Test that string columns can be viewed if copy=True or unviewable='copy'.
"""
v1 = self.catalog.asAstropy(unviewable="copy")
self.assertEqual(v1["a6"][0], self.data[0]["a6"])
self.assertEqual(v1["a6"][1], self.data[1]["a6"])
v2 = self.catalog.asAstropy(copy=True)
self.assertEqual(v2["a6"][0], self.data[0]["a6"])
self.assertEqual(v2["a6"][1], self.data[1]["a6"])

def testRaiseOnUnviewable(self):
"""Test that we can't view this table without copying, since it has Flag and String columns.
"""
self.assertRaises(ValueError, self.catalog.asAstropy, copy=False, unviewable="raise")

def testNoUnnecessaryCopies(self):
"""Test that fields that aren't Flag or String are not copied when copy=False (the default).
"""
v1 = self.catalog.asAstropy(unviewable="copy")
v1["a2"][0] = 4
self.assertEqual(self.catalog[0]["a2"], 4)
v2 = self.catalog.asAstropy(unviewable="skip")
v2["a2"][1] = 10
self.assertEqual(self.catalog[1]["a2"], 10)

def testUnviewableSkip(self):
"""Test that we can skip unviewable columns.
"""
v1 = self.catalog.asAstropy(unviewable="skip")
self.assertRaises(KeyError, operator.getitem, v1, "a4")
self.assertRaises(KeyError, operator.getitem, v1, "a6")


def suite():
"""Returns a suite containing all the test cases in this module."""

lsst.utils.tests.init()

suites = []
suites += unittest.makeSuite(AstropyTableViewTestCase)
suites += unittest.makeSuite(lsst.utils.tests.MemoryTestCase)
return unittest.TestSuite(suites)

def run(shouldExit = False):
"""Run the tests"""
lsst.utils.tests.run(suite(), shouldExit)

if __name__ == "__main__":
run(True)