Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore entire package / directory? #99

Closed
adrian-chang opened this issue Aug 24, 2019 · 16 comments
Closed

Ignore entire package / directory? #99

adrian-chang opened this issue Aug 24, 2019 · 16 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Milestone

Comments

@adrian-chang
Copy link

Hello there,

I actually download my dependencies locally into a subfolder within my package similar to something like this.

package/
     __init__.py
     code.py
     downloaded_modules/
           requests/
           ...

Is there any way to effectively ignore downloaded_modules?

Thanks

@kernc kernc added question Not a bug, but a FAQ entry duplicate This issue or pull request already exists labels Aug 25, 2019
@kernc
Copy link
Member

kernc commented Aug 25, 2019

Exclude items using __pdoc__ dict.

Duplicate of #12.

@kernc kernc closed this as completed Aug 25, 2019
@adrian-chang
Copy link
Author

adrian-chang commented Aug 25, 2019

Hello, I've tried using __pdoc__["module"] = False multiple times.

For example, in the solution above I would do something like __pdoc__["package.downloaded_modules"] = False in __init__.py, however, the CLI tools seems to still traverse the "downlaoded_modules" directory which then causes a problem as not everything in that "downloaded_modules" folder is not python code.

Is there something I'm missing?

More concrete package structure where the tool fails where in this case, I explictly put in tools/__init__.py __pdoc__ = {"cli.tools.downloaded_modules": False}.

cli/
  __init__.py
  tools/
    __init__.py
    tool_one.py
    downloaded_modules/
           requests-1.8.0.pkg/
           boostrap1.0.0.pkg/
          ....

@kernc kernc added bug Something isn't working and removed duplicate This issue or pull request already exists question Not a bug, but a FAQ entry labels Aug 26, 2019
@kernc kernc reopened this Aug 26, 2019
@kernc
Copy link
Member

kernc commented Aug 28, 2019

Yeah, I'm thinking pdoc shouldn't even traverse ignored modules lest havoc. 😄

It should probably also follow .gitignore so that pointless in-tree directories needn't even be listed in __pdoc__.

@kernc kernc added the help wanted Extra attention is needed label Aug 28, 2019
@kernc kernc added enhancement New feature or request good first issue Good for newcomers labels Sep 22, 2019
kernc added a commit that referenced this issue Sep 22, 2019
@kernc kernc removed the bug Something isn't working label Sep 22, 2019
poke1024 added a commit to poke1024/pdoc that referenced this issue Dec 23, 2019
poke1024 added a commit to poke1024/pdoc that referenced this issue Dec 23, 2019
poke1024 added a commit to poke1024/pdoc that referenced this issue Dec 24, 2019
also move _iter_modules out of method scope
@faerics
Copy link

faerics commented Mar 2, 2020

I have an idea of skipping a directory containing a specially named magic file, like .no-pdoc or .pdocignore. This would solve all the problems of deciding whether a directory is a package, and give a user a clear opportunity to drop a directory from docs, even if it is a valid package.

Note that this does not give an opprtunity to exclude a file from a directory, but here we can discuss some file-based (not import-based) approach, e.g parsing the content of .pdocignore file.
Generally, if a user does not need a doumentation for package / module / directory, it should not be imported.

@kernc

@faerics
Copy link

faerics commented Mar 5, 2020

@kernc @poke1024
So, is anybody working on this? I am about to take the issue, narrowing it to the points below:

  1. Every package containing empty .no-pdoc file must be ignored with a notification to user.
  2. Every package containing non-empty .no-pdoc file must be imported as usual, and a content of .no-pdoc must be saved line by line in a Module object.
  3. Every package must skip its module if it is present in its .no-pdoc by filepath; by full python module path; by name. The use must be notified about this as well.

Am I missing something?

@kernc
Copy link
Member

kernc commented Mar 5, 2020

like .no-pdoc or .pdocignore

What are some other ignore files besides .gitignore?

  1. Every package must skip its module if it is present in its .no-pdoc by filepath; by full python module path; by name. The use must be notified about this as well.

Can you show an example of 3.?


That's one solution. How about if __pdoc__["package.downloaded_modules"] = False in fact prevented traversing of said directory?

@faerics
Copy link

faerics commented Mar 9, 2020

like .no-pdoc or .pdocignore

What are some other ignore files besides .gitignore?

How .gitignore concerns this? Well, this one manages what will be pushed in your VCS, but consider the following:

  • One can push (and, thus, not state in .gitignore) files ot packages which are not ready. The documentation on such a package builds with error, and the whole project's documentation can not be built, Well. the code needs to be pushed. This is the case which lead me to this issue.
  • The .gitignore syntax is too hard to exclude one module or a package. Do you realy want to support all the features of this file syntax?
  • Strongest one. Well, if you don't use git, you don't have gitignore. Are you planning to add support fot .hgignore? Another ignore file for VCS named X?

Bettter to stay with pdoc3's own ignore file, if talking about files.

Of course, the last was my own opinion, don't blame.

  1. Every package must skip its module if it is present in its .no-pdoc by filepath; by full python module path; by name. The use must be notified about this as well.

Can you show an example of 3.?

The example

Let's assume we have the following structure:

package/
     __init__.py
     code.py
    subpackage/
        __init__.py
        module.py
        broken_module.py
     downloaded_modules/
           requests/
           ...

...and we want not to import downloaded_modules and broken_module.py, because they are unimportable. Well, the code for reference.

To achieve this iff the approach will be accepted:

  1. Put an empty .no-pdoc into downloaded_modules/
  2. Put into subpackage/ an .no-pdoc containing one of:
    • broken_module
    • package.subpackage.broken_module
    • <PATH>/package/subpackage/broken_module.py

That's one solution. How about if __pdoc__["package.downloaded_modules"] = False in fact prevented traversing of said directory?

That's another solution and it is good for its purpose: exclude a package or module from the package which has been imported. If you wonder what problems I see there, please, read below:

The assignments to pdoc need to be placed where they'll be executed when the module is imported.

So, we need to import the module to even know what the __pdoc__ is, right? This is what I understood. The import is not what is always needed.

And what if one module says __pdoc__[X] = False, while another one says __pdoc__[X] = True? I mean the order of importing matters, even the order of code.py and 'subpackage/init.py' in my example. This is the opportunity of a mistake.

The last one, the __pdoc__ entries are spread among all the modules and you need to see the sources to find them. This might be the pain in a big project.

But, let me say it second time: __pdoc__ is good.


So, again, my opinion is: you -- or we, as a product -- need both, and the file approach should be given a priority.

Btw, how pdoc traverses namespaces? The directory which is not a package but contains something we need to document?
@kernc

@kernc
Copy link
Member

kernc commented Mar 9, 2020

What are some other ignore files besides .gitignore?

I mean, what are some other examples of ignore files, such as .gitignore, .hgignore ...? Do you know of any other similar files? Just as a reference.

That's one solution. How about if __pdoc__["package.downloaded_modules"] = False in fact prevented traversing of said directory?

That's another solution and it is good for its purpose: exclude a package or module from the package which has been imported.

Before importing. In your example, __init__.py should contain:

# package/subpackage/__init__.py:

__pdoc__ {
    'broken_module': False,
    'downloaded_modules': False,
}

and then simply the file traversal routine should be adapted to look in self.obj.__pdoc__ for exclusions:

pdoc/pdoc/__init__.py

Lines 574 to 611 in 65b31fd

# If the module is a package, scan the directory for submodules
if self.is_package:
def iter_modules(paths):
"""
Custom implementation of `pkgutil.iter_modules()`
because that one doesn't play well with namespace packages.
See: https://github.com/pypa/setuptools/issues/83
"""
from os.path import isdir, join
for pth in paths:
for file in os.listdir(pth):
if file.startswith(('.', '__pycache__', '__init__.py')):
continue
module_name = inspect.getmodulename(file)
if module_name:
yield module_name
if isdir(join(pth, file)) and '.' not in file:
yield file
for root in iter_modules(self.obj.__path__):
# Ignore if this module was already doc'd.
if root in self.doc:
continue
# Ignore if it isn't exported
if not _is_public(root):
continue
assert self.refname == self.name
fullname = "%s.%s" % (self.name, root)
self.doc[root] = m = Module(import_module(fullname),
docfilter=docfilter, supermodule=self,
context=self._context)
# Skip empty namespace packages because they may
# as well be other auxiliary directories
if m.is_namespace and not m.doc:
del self.doc[root]

the __pdoc__ entries are spread among all the modules and you need to see the sources to find them. This might be the pain in a big project.

A simple question of grep -IR __pdoc__, not really that painful. ✨

Btw, how pdoc traverses namespaces? The directory which is not a package but contains something we need to document?

Non-empty namespaces should work.

pdoc/pdoc/__init__.py

Lines 714 to 719 in 1709915

@property
def is_namespace(self):
"""
`True` if this module is a namespace package.
"""
return self.obj.__spec__.origin in (None, 'namespace') # None in Py3.7+

pdoc/pdoc/test/__init__.py

Lines 414 to 422 in 1709915

def test_namespace(self):
# Test the three namespace types
# https://packaging.python.org/guides/packaging-namespace-packages/#creating-a-namespace-package
for i in range(1, 4):
path = os.path.join(TESTS_BASEDIR, EXAMPLE_MODULE, '_namespace', str(i))
with patch.object(sys, 'path', [os.path.join(path, 'a'),
os.path.join(path, 'b')]):
mod = pdoc.Module('a.main')
self.assertIn('D', mod.doc)

@ges0909
Copy link

ges0909 commented Dec 23, 2020

Exclude items using __pdoc__ dict.

Duplicate of #12.

But then you get a global variable __pdoc__ generated at least for the html documentation.

@kernc
Copy link
Member

kernc commented Dec 23, 2020

But then you get a global variable __pdoc__ generated at least for the html documentation.

@ges0909 Variables should not be auto-documented, unless annotated with PEP-224 docstrings? 🤨

@ges0909
Copy link

ges0909 commented Dec 23, 2020

But then you get a global variable __pdoc__ generated at least for the html documentation.

@ges0909 Variables should not be auto-documented, unless annotated with PEP-224 docstrings? 🤨

pdoc

@kernc
Copy link
Member

kernc commented Dec 23, 2020

@ges0909 Right. Would you please open a new issue with this?

@ges0909
Copy link

ges0909 commented Jan 28, 2021

Workaround: Avoid usage of __pdoc__. Instead prefix items not to be shown in generated doc with '_' (underscore).

@tbrodbeck
Copy link

Hey, I tried to follow the conversation but I still did not understand: How can I make pdoc ignore the files in .gitignore?

@aflugge
Copy link

aflugge commented Jun 26, 2024

Is there any solution to this problem now?

I put

__pdoc__ = {
    'subfolder': False
}

in my __init__.py, but pdoc still runs scripts in that subfolder. I don't care about documentation not being created, but I don't want pdoc to run python scripts in the folder I want it to ignore.

@kernc
Copy link
Member

kernc commented Jun 26, 2024

Apparently, you should use full reference names for submodule keys, i.e. __pdoc__ = {'project.subfolder': False}.

pdoc/pdoc/__init__.py

Lines 754 to 764 in 2a66eb2

for root in iter_modules(self.obj.__path__):
# Ignore if this module was already doc'd.
if root in self.doc:
continue
# Ignore if it isn't exported
if not _is_public(root) and not _is_whitelisted(root, self):
continue
if _is_blacklisted(root, self):
self._skipped_submodules.add(root)
continue

pdoc/pdoc/__init__.py

Lines 390 to 402 in 2a66eb2

def _is_blacklisted(name: str, doc_obj: Union['Module', 'Class']):
"""
Returns `True` if `name` (relative or absolute refname) is
contained in some module's __pdoc__ with value False.
"""
refname = f'{doc_obj.refname}.{name}'
module: Optional[Module] = doc_obj.module
while module:
qualname = refname[len(module.refname) + 1:]
if module.__pdoc__.get(qualname) is False or module.__pdoc__.get(refname) is False:
return True
module = module.supermodule
return False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Development

No branches or pull requests

6 participants