Confusing and cluttered namespace #99

Closed
ayshih opened this Issue Nov 21, 2011 · 12 comments

Comments

Projects
None yet
5 participants
Contributor

ayshih commented Nov 21, 2011

There are two related issues with SunPy's organization that should be tackled sooner rather than later.

Confusing hierarchy

SunPy's package/module hierarchy is currently rather confusing. Let's take sunpy.util as an example. Importing sunpy.util means that one accesses the function anytim as

import sunpy.util
sunpy.util.util.anytim(blah)

There are of course ways to map down sunpy.util.util as something more tractable, but I suggest that the various __init__.py files should be set up to pull the functions out from the extra layer. In this case, util\util.py would have the line:

__all__ = ["anytim", etc.]

and util\__init__.py would have the line:

from . util import *

Thus, following an import sunpy.util, the function would now just be sunpy.util.anytim. As an example from NumPy, the function meshgrid has been pulled out from numpy.lib.function_base.meshgrid to be simply numpy.meshgrid.

Cluttered namespace

Regarding the cluttered namespace, I will also use sunpy.util to illustrate. Because util.py has the line import numpy as np at the global level, sunpy.util.util.np now exists. These "extra" packages then show up under tab completion, which seems undesirable. I see two remedies that should both be used, as appropriate:

  • One can avoid imports at the global level, and instead import only within function definitions. The namespace is cleaner, but at the expense of repeated imports if multiple functions in the same module use the same packages.
  • If the aforementioned "pulling out" of functions is implemented, then sunpy.util.util will still be cluttered, but sunpy.util can be kept clean.
Member

khughitt commented Nov 22, 2011

Hey Albert,

Great suggestions.

Confusing hierarchy
Regarding the hierarchy I am all for simplifying things for the user. When I started structuring SunPy early on, I spent some time looking at NumPy, SciPy, Matplotlib, etc. and tried to see how they approached the issue. There didn't seem to be any single convention for handling hierarchies though so except for a couple exception for very useful classes (e.g. sunpy.map.Map) I didn't mess with the structure too much.

The approach you suggest seems like a viable one, but I'll spend a little time look at some other widely-adopted libraries to get a better feel for what other libraries are doing.

Cluttered namespace
Hmm. It looks like different people have differing opinions on this issue. It seems like by convention, imports tend to be placed at the top of the module, however, in terms of practicality, it might not make that much of a difference in terms of speed unless the function is called a large number of times. I see at least two approaches we could take, but neither are perfect:

  1. We could encourage people to import modules inside of functions that they suspect will not be used in large loops, and put it at the module level otherwise. The main problem with this is that it would be hard to get everyone to consistently adopt this approach, and even if they did, people may do unexpected things and call functions in large loops that the author didn't originally anticipate.
  2. We could also just address the first issue as you suggested and keep the commonly used namespaces clean. This is probably the better approach imo since it solves two of our problems in a fairly simple manner. The main issue here though is that it would require a bit more manual effort to structure the namespace and make it "appear" to be a certain way.

Ultimately thoguh I think both issues should be addressed, and it sounds like the solutions you suggested are have been adopted by other large and successfull projects whom, hopefully, also put some serious thought into the issue :)

Finally, somewhat unrelated, but I just want to point that I'm a bit wary about having a "util" module. It is useful, and we probably should have it, but we also need to be very careful about what goes into it. Having something like "helper" or "util" is pretty much like having a "misc" module which encourages people to add anything that doesn't seem to have a better place.

Instead, I would encourage people to find a more meaningful module to put the code in (e.g. sunpy.time.anytime instead of sunpy.util.anytim), or if it is just some helpful function that is only used in one place, just include it in that one place. If it ever becomes more used in other modules, it could be moved into util or another location at that time.

Member

segfaulthunter commented Jan 12, 2012

We can solve the issue of the cluttered namespace by using all.

Member

khughitt commented Jan 12, 2012

Can you elaborate?

On Thu, Jan 12, 2012 at 10:11 AM, Florian Mayer <
reply@reply.github.com

wrote:

We can solve the issue of the cluttered namespace by using all.


Reply to this email directly or view it on GitHub:
#99 (comment)

Member

segfaulthunter commented Jan 12, 2012

__all__ defines the public API of a module. Quote. The public names defined by a module are determined by checking the module's namespace for a variable named __all__; if defined, it must be a sequence of strings which are names defined or imported by that module. The names given in __all__ are all considered public and are required to exist. If __all__ is not defined, the set of public names includes all names found in the module's namespace which do not begin with an underscore character ("_"). __all__ should contain the entire public API. It is intended to avoid accidentally exporting items that are not part of the API (such as library modules which were imported and used within the module).

Member

segfaulthunter commented Jan 12, 2012

Also, imports in functions are generally a bad idea because they are executed every time the function is called (this does not mean the module is imported again, though).

Contributor

ayshih commented Jan 12, 2012

I may be mistaken, but my experience is that __all__ does not automatically keep the namespace clean; it only defines which names are public (and implicitly which ones are not). It is only when a wildcard import is used that the __all__ is used to manage the namespace. That is why my proposed solution uses an extra layer so that we can do the wildcard import for namespace cleaniness but still keep the names within a module.

I still advocate for placing imports in functions, except for maybe the heaviest hitters like Numpy. The performance hit of Python checking whether the module is already imported each time the function is called should be largely insignificant in all but the simplest functions. Here are two examples of problems with global imports:

  • In a module with multiple functions, let's say you reformulate a function and realize you no longer need some imported module for that function. However, it is then far from clear whether you should delete the global import without checking through the rest of the module, including possibly all the code that imports your module (if you don't enforce a clean namespace).
  • It can be frustrating to copy a function to a different module, not only because you have to track down which global imports you need to also copy, but also because you need to be careful about what global imports are made in the destination module. In pathological cases, incorrect module imports may not be even caught on compilation.
Member

segfaulthunter commented Jan 12, 2012

-1 on this. The only case where it's really a problem is the wildcard import, plus, that's just the way Python works, we aren't going to change that, quote PEP8

- Imports are always put at the top of the file, just after any module
  comments and docstrings, and before module globals and constants.

  Imports should be grouped in the following order:

  1. standard library imports
  2. related third party imports
  3. local application/library specific imports

  You should put a blank line between each group of imports.

  Put any relevant __all__ specification after the imports.
Contributor

ayshih commented Jan 12, 2012

Well, okay, that's a better argument than the performance one. If it's explicitly the Pythonic way, I guess I'll just have to throw up my hands. I would still argue that there are far more minuses than pluses with that approach.

Member

segfaulthunter commented Jan 12, 2012

Also, IMO, importing modules over and over in every function that needs them also leads to significant code clutter.

Member

khughitt commented Jan 12, 2012

Albert, from what I remember of our earlier discussion, you main annoyance was that with using tab completion for IPython (or dir() in the regular Python console), you would get a bunch of things like "sunpy.*.np" for every module where we have "import numpy as np", which distracts from the more useful parts of the module, right?

Florian - do you know if there is any way to avoid this? or is this just an annoyance users will have to live with?

Member

segfaulthunter commented Jan 12, 2012

As at least IPythonX on Windows seems to list everything, I don't think there is. We could import numpy as _np so that it at least is grouped with irrelevant things.

@ghost ghost assigned rhewett Dec 10, 2012

Owner

Cadair commented Apr 9, 2013

I presume this to be closed by the excellent work of @rhewett.

@Cadair Cadair closed this Apr 9, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment