Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested dot folders copied upon build #1981

Closed
squidfunk opened this issue Feb 10, 2020 · 10 comments · Fixed by #1986
Closed

Nested dot folders copied upon build #1981

squidfunk opened this issue Feb 10, 2020 · 10 comments · Fixed by #1986
Milestone

Comments

@squidfunk
Copy link
Sponsor Contributor

squidfunk commented Feb 10, 2020

Sometimes it's necessary to include files in a theme which MkDocs should conditionally include in the templates, but not copy to the site directory when the documentation is built. As discussed in #1980, MkDocs will ignore files that start with a ., but this only applies to top-level directories. It would be great if MkDocs could exclude nested dot directories from the build.

A concrete example: the next version of the Material theme will bundle the FontAwesome files as SVGs for inlining during build time, but MkDocs copies all of them to the site directory. The folder structure is:

assets/images/icons/fontawesome/...

The current workaround is to add a *.html to the files bundles with the theme, resulting in a *.svg.html extension for each of the 1.500 icons. While this works, it's semantically incorrect.

If MkDocs would ignore nested dot directories, we could just use:

assets/images/icons/.fontawesome/...

As pointed out by @waylan, the code in question is here:

patterns = ['.*', '*.py', '*.pyc', '*.html', '*readme*', 'mkdocs_theme.yml']
patterns.extend('*{}'.format(x) for x in utils.markdown_extensions)
patterns.extend(config['theme'].static_templates)

I'm not proficient in Python, but since this looks like a glob implementation, I would guess that **/.* would solve the problem

@waylan
Copy link
Member

waylan commented Feb 10, 2020

since this looks like a glob implementation, I would guess that **/.* would solve the problem

That is correct on both counts. At least I'm assuming **/.* would work, although I fear it may be more complicated than that.

@facelessuser
Copy link
Contributor

This will not work like you expect. I will do a write up. Give me a minute.

@facelessuser
Copy link
Contributor

facelessuser commented Feb 10, 2020

@squidfunk it isn't really a glob pattern. It uses Python's fnmatch which is file name centric. Fnmatch doesn't really consider directory slashes:

>>> fnmatch.fnmatch('.test', '.*')
True
>>> fnmatch.fnmatch('.test/something/else', '.*')
True

But since we are comparing directories, this breaks:

>>> fnmatch.fnmatch('test/something/.else', '.*')
False

Fnmatch doesn't really look at **. That's not a thing for Python's fnmatch as ** is for directories, but maybe this is okay if all paths have their slashes normalized to /:

>>> fnmatch.fnmatch('test/something/.else', '*/.*')
True

But will fail if not:

>>> fnmatch.fnmatch('test\\something\\.else', '*/.*')
False

But I think maybe that all gets normalized based on the OS, so maybe you are okay, but if not, you'd have to account for that:

>>> fnmatch.fnmatch('test/something/.else', '*[/\\].*')
True

And maybe that is good enough.

To really use real glob patterns, you need something like a globmatch. I've actually written a library that does that, but maybe you don't really need something like that if the above is sufficient.

from wcmatch import glob
>>> glob.globmatch('test/something/.else', '**/.*', flags=glob.GLOBSTAR)
True

@squidfunk
Copy link
Sponsor Contributor Author

Thanks for the rundown @facelessuser. I think any solution that ignores nested dot-directories would be sufficient 😊

@facelessuser
Copy link
Contributor

Agreed, what is suggested by @waylan may work just fine in this case, depending on how the paths actually get normalized on various OS, it just needs to be tested. But since it is fnmatch ** is no different than just using *.

If */.* isn't sufficient, you could probably just basename(path) and compare in the filter function instead of pulling in another dependency (like a library with globmatch). I would only suggest that if you needed more complex glob logic. But it looks like they just want to really look at the file name. At least from my quick look.

@facelessuser
Copy link
Contributor

Yeah, it looks like Windows will normalize things so that fnmatch.fnmatch('test\\.test', '*/.*') gets normalized and works fine. But you no longer will match things like .dotfile as now it requires the file to lead with /.

It may just be best to change filter to use basename:

        def filter(name):
            patterns = ['.*', '*.py', '*.pyc', '*.html', '*readme*', 'mkdocs_theme.yml']
            patterns.extend('*{}'.format(x) for x in utils.markdown_extensions)
            patterns.extend(config['theme'].static_templates)
            for pattern in patterns:
                if fnmatch.fnmatch(os.path.basename(name).lower(), pattern):
                    return False
            return True

@waylan
Copy link
Member

waylan commented Feb 10, 2020

>>> fnmatch.fnmatch('test\\something\\.else', '*/.*')
False

But I think maybe that all gets normalized based on the OS, so maybe you are okay

I just checked and that works correctly on Windows:

>>> fnmatch.fnmatch('test\\something\\.else', '*/.*')
True
>>> fnmatch.fnmatch('test/something/.else', '*/.*')
True

@facelessuser
Copy link
Contributor

facelessuser commented Feb 10, 2020

So the real question is, do you also get files that have no parent directories?

>>> fnmatch.fnmatch('.dotfile', '*/.*')
False

Or can we expect they will always have a leading /?

If not, I'd use basename to match just what you actually care about.

@waylan
Copy link
Member

waylan commented Feb 10, 2020

Using basename only works on the filename, but we also want to match dir names at any nesting level.

I think the solution is to use two patterns: '.*', '*/.*'. The first pattern checks the root level segment and the second checks all nested segments of a path.

@facelessuser
Copy link
Contributor

facelessuser commented Feb 10, 2020

Ah, so it's looking at parent folders with dots too, then two patterns sounds like the way to go. Yeah, that's the simplest solution I think.

@waylan waylan added this to the 1.1 milestone Feb 10, 2020
waylan added a commit to waylan/mkdocs that referenced this issue Feb 12, 2020
Also, document how all files within themes are treated by MkDocs.
Fixes mkdocs#1981.
waylan added a commit to waylan/mkdocs that referenced this issue Feb 12, 2020
Also, document how all files within themes are treated by MkDocs.
Fixes mkdocs#1981.
waylan added a commit that referenced this issue Feb 12, 2020
Also, document how all files within themes are treated by MkDocs.
Fixes #1981.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants