-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support compound file extensions. Fix #2780 #2816
Support compound file extensions. Fix #2780 #2816
Conversation
Before this commit, the file extension matcher could not match compound file extensions; e.g., '.md.html' would match `.html`
Some problems:
|
I see. I'll admit, and this is probably obvious, but I'm kind of new to open source submissions and all that, so I probably jumped the gun by making this change without a better holistic view of how Pelican works. But I'm very determined to contribute, so I will spend more time getting acquainted, maybe write a plugin, and then revisit this topic. Thanks! |
That's all right and we appreciate the contribution :). If you need any help along the way, feel free to ask. |
Hi @holden-nelson. Just wanted to check in regarding the status of this PR. Do you intend to keep working on it? |
Hello @justinmayer. I had kind of put this on the backburner as other projects and real life happened, but ultimately I would like to do this. I'm finally wrapping up some of that other stuff, but I wouldn't expect to see any progress on this, on my end, for another couple of months. Let me know how that works on your end, I understand if you'd like to close it and move on. Thanks! |
Hello @justinmayer and @avaris. I'm ready to make this contribution, but I need to know how to handle the situation regarding a filename with multiple dots, like Right now, extensions are stripped out of file names with I imagine it would come down to either
Or is there some other way to allow arbitrary compound file extensions and dots in file names? What are your thoughts on this? |
File with dots can be handled if we use |
Forgive me if this comment shows ignorance of the Pelican software. I see how
|
If we are going to support this, it should be generic and not pinned down to a specific extension. Also I don't see the particular issue with
Now the problem is which of these combinations is compatible with available readers. And the challenging part would be selecting the longest matching extension from available readers. One possible (and simple) option is "ranking the extensions and their readers" (say, based on the number of dots) and going them one by one starting from the highest until a match is found and that would be the preferred reader. But this would generally be slower than current approach, which is extract the last extension and check if it is in a dict... Another approach is building a "tree of readers" based on the extensions and each level of compound extension makes it one more level deep. Something like readers = {
'.html': {
'': HTMLReader,
'.md': MDHTMLReader,
},
# ...
} So, once |
Ahh. Okay, now I see how that would work. Thanks for taking the time to explain. I'll continue working on this feature and submit a PR soon. |
Created a new class ReaderTree that is an infinitely nested defaultdict containing components of the extension. See comments on PR getpelican#2816.
Created a new class ReaderTree that is an infinitely nested defaultdict containing components of the extension. See comments on PR getpelican#2816.
Created a new class ReaderTree that is an infinitely nested defaultdict containing components of the extension. See comments on PR getpelican#2816.
Created a new class ReaderTree that is an infinitely nested defaultdict containing components of the extension. See comments on PR getpelican#2816.
Before this commit, the file extension matcher could not match compound
file extensions; e.g., '.md.html' would match
.html
Pull Request Checklist
Resolves: #2780