-
-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary items with umlauts from custom data source are created and deleted right away #837
Comments
Yikes! A wild guess, but this might be caused by Unicode normalisation being done differently in different places. Do you have a test case for me that I can reproduce locally? If not, it’d be helpful if you could do some digging on your side and see whether you can isolate the issue. If my hunch is correct, pruner.rb:43 would show that |
|
I added |
Scratch that. p present_files.find { |e| e =~ /team-alex/ }.encoding
# => #<Encoding:UTF-8>
p compiled_files.find { |e| e =~ /team-alex/ }.encoding
# => #<Encoding:IBM437> |
It seems this But even after changing the line to |
Might be related to this bug: https://bugs.ruby-lang.org/issues/9713 |
Copying the code from the issue above to something in lib/ and my spec_helper.rb I see there's a slight difference between
|
Does the problem disappear when you replace Find.find(site.config[:output_dir] + '/') do |f| in pruner.rb with Find.find(site.config[:output_dir] + '/').map { |f| f.encode('UTF-8') }.each do |f| ? If so, it looks like re-encoding all filenames obtained from Dir.glob to be UTF-8 would be the way to go. |
Thanks for the suggestion! Unfortunately it didn't work as it seems the files returned by
Perhaps it's even better enforce the encoding at a more central place, like all_raw_paths = site.compiler.reps.flat_map { |r| r.raw_paths.values.map { |f| f.encode('UTF-8') } } |
Yup, I’d argue that all strings (including filenames) constructed by Nanoc should be in UTF-8. Will fix and ensure that encodings are correct everywhere. (Hard to fix/test, because the default encoding is sadly part of the global state.) |
Fix in #852. |
Fixed in #852, and will be part of the 4.1.6 release. |
I'm currently migration an old CMS to nanoc. We load most of the CMS content from an XML file. Some items (binary items) are loaded from the file system.
There is one binary file containing an "ß" character. The item rep (default) for that item gets created and deleted right away. All binary items are handled by a passthrough rule.
The text was updated successfully, but these errors were encountered: