Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-7199: afwTable's .getX()/.getY() do not appear in dir() #238

Merged
merged 2 commits into from Jun 8, 2017

Conversation

mrawls
Copy link
Contributor

@mrawls mrawls commented May 25, 2017

No description provided.

Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a unittest for this? I'm not sure at what level to put it (since this is the base class for so many things), but it seems like this could be a bit fragile (I don't have any evidence for that, just vague feelings)?

Also, has anyone tried benchmarking it for one of our classes a ways down the hierarchy? set is pretty fast, but I wonder if we have anything with enough __bases__ to matter?

@@ -283,6 +283,24 @@ def asAstropy(self, cls=None, copy=False, unviewable="copy"):
)
return cls(columns, meta=meta, copy=False)

def __dir__(self):
# This custom dir is necessary due to the custom getattr below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a docstring instead of a comment?

result |= recursive_get_class_dir(subcls)
result |= set(cls.__dict__.keys())
return result
return sorted(set(dir(self.columns)) | set(dir(self.table)) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about the set operators (pipe for union, in this case). That's cool.

Separately, I think this return should be separated from the above by a line, since it's not part of that inner function.

result |= set(cls.__dict__.keys())
return result
return sorted(set(dir(self.columns)) | set(dir(self.table)) |
recursive_get_class_dir(type(self)) | set(self.__dict__.keys()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just do set(self.__dict__) at the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed so, at least in python 3. I'm not sure if this would work in python 2 though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. Burned by py2 again.

'''
result = set()
if cls.__bases__:
for subcls in cls.__bases__:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this works, but the use of __bases__ feels exceptionally clever.

Does the MRO matter here at all? Probably not, since it's all going into a set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the... MRO?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I don't think this matters, because it's all going into a set, and ultimately lands in a sorted list that excludes duplicates. And I choose to take "exceptionally clever" as a compliment, largely on Russell's behalf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you are often much more clever writing code than trying to understand it later. That was the nature of my comment.

@mrawls
Copy link
Contributor Author

mrawls commented May 26, 2017

I'm willing to write a unittest, but I also have no idea where it should live, or even what exactly it would test. Suggestions welcome.
I have not tried timing anything that has a bazillion subclasses to see if it slows things down. Unless I'm mistaken, though, the common use case for calling dir is in an interactive environment, not when a command line task is run; I can't imagine it taking a frustrating amount of time, e.g., anywhere near as long as it does to import afw in the first place.
Also tagging @r-owen in case he has insights (we worked together on this fix).

@mrawls
Copy link
Contributor Author

mrawls commented May 26, 2017

Meant to also say - Jenkins passed for py2, but not for py3, because of test_obs.py? I will need to address this, it seems, hopefully with help from @r-owen :) https://ci.lsst.codes/job/stack-os-matrix/label=centos-7,python=py3/24053/console#console-section-3

@parejkoj
Copy link
Contributor

Re-run jenkins. That looks like one of the sporadic failures we've been getting.

@mrawls
Copy link
Contributor Author

mrawls commented May 26, 2017

Oh good, Jenkins passed this time!
Unless you have a specific suggestion for a test that would make sense to go along with this change and/or a benchmark I should run, I'd like to go ahead and wrap this up.

@r-owen
Copy link
Contributor

r-owen commented May 27, 2017

I agree that a unit test is a good idea. I suggest using the test script you already have as the basis for a new test method in an existing test of SourceCatalog. Something like this:

def testDir(self):
    ...make a source catalog named "cat" or "srcCat" with a minimal schema
    attrNames = dir(cat)
    desiredNames = set("_columns", "getX", ...)  # at least one from each class and subclass
    self.assertTrue(desiredNames.issubset(attrNames))  # I could not find a more specific assert in unittest, but I hope I overlooked something

Also, I would be tempted to move the special function definition up to the module level, to make sure it only gets declared once (I am not sure if Python is smart enough to realize that nothing in the function definition depends on variables local to the __dir__ method, and thus can cache the definition).

As to cleverness: the fact that Catalog overrides __getattr__ to delegate unrecognized methods makes it hard to show what methods are available. I hope someday we can find a cleaner way to delegate that does not require overriding __getattr__, at which point we can stop overriding __dir__.

@mrawls
Copy link
Contributor Author

mrawls commented May 28, 2017

OK, that makes sense, I'm happy to add this test.

When you say "move the special function definition up to the module level," do you mean just the custom __dir__ or the magic __getattr__ too? Right now they live in afw/table/base; do you mean move one or both of these override functions into a standalone script to apply to all of afw?

I'm headed off on vacation for a week tonight and won't be able to get to this until I'm back. If you want to finish it without me, @r-owen, you are welcome to, but you certainly don't have to.

@r-owen
Copy link
Contributor

r-owen commented May 30, 2017

The function I think might want to be moved is recursive_get_class_dir; right now it is defined inside of __dir__ and I lean towards defining it in baseContinued.py (as a function visible everywhere in that file) merely to make sure it is only defined once. It probably doesn't matter, and I would not suggest making it visible outside that file (if such a function is ever wanted in more than one place then I'd move it to utils). It probably doesn't matter enough to bother.

@mrawls mrawls force-pushed the tickets/DM-7199 branch 2 times, most recently from bf2cb27 to 6505879 Compare June 7, 2017 20:50
@mrawls
Copy link
Contributor Author

mrawls commented Jun 7, 2017

I think this should be ready now. It turns out I ran Jenkins before the new test was actually doing anything, but it passes when I run it locally with pytest. I can run Jenkins again if you think it's necessary.
Additionally, if you or @r-owen have a different subset of attributes you'd like to see in desiredNames I'm happy to switch it up, but I think I got something from each class and subclass.

Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See further comments.

The thing I was worried about re:speed was exactly in an interactive environment: ipython/jupyter use dir() to populate the auto-completion list. Might be worth copying some of your test code into a jupyter notebook, and typing catalog.[TAB] and see if the list pops up quickly.

'''
result = set()
if cls.__bases__:
for subcls in cls.__bases__:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you are often much more clever writing code than trying to understand it later. That was the nature of my comment.

result |= set(cls.__dict__.keys())
return result
return sorted(set(dir(self.columns)) | set(dir(self.table)) |
recursive_get_class_dir(type(self)) | set(self.__dict__.keys()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. Burned by py2 again.

tests/testDir.py Outdated
attrNames = dir(catalog)
desiredNames = set(['_columns', 'getX', 'asAstropy', 'hasPsfFluxSlot'])
self.assertTrue(desiredNames.issubset(attrNames))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have the MemoryTestCase here.

tests/testDir.py Outdated
# Compare catalog attributes with those from various catalog subclasses
attrNames = dir(catalog)
desiredNames = set(['_columns', 'getX', 'asAstropy', 'hasPsfFluxSlot'])
self.assertTrue(desiredNames.issubset(attrNames))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable way to test that dir(catalog) contains those things. I'd suggest we also include getY in the list. Anything else we should also include? It doesn't hurt to be comprehensive in this test.

@mrawls
Copy link
Contributor Author

mrawls commented Jun 7, 2017

  • IPython test (my jupyter notebook install is giving me python version trouble at present) -> catalog.[TAB] pops up a full list nice and quickly as I would expect 👍
  • Added getY to the list. There are 160 total attributes at present(!). Happy to add more but I don't have any particular favorites...
  • Added all the test boilerplate that should have been there before (thanks for DMing me the link)

@mrawls mrawls merged commit f16258e into master Jun 8, 2017
@ktlim ktlim deleted the tickets/DM-7199 branch August 25, 2018 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants