Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog category accent problem #6069

Closed
4 tasks done
tamasgt opened this issue Sep 24, 2023 · 24 comments
Closed
4 tasks done

Blog category accent problem #6069

tamasgt opened this issue Sep 24, 2023 · 24 comments
Labels
bug Issue reports a bug resolved Issue is resolved, yet unreleased if open

Comments

@tamasgt
Copy link

tamasgt commented Sep 24, 2023

Context

Something has changed, it was still working 1-2 months ago. I installed the latest version.

Bug description

If the category name contains an accented character, then:

ERROR   -  Encoding error reading file: blog\category\általános.md
ERROR   -  Error reading page 'blog/category/általános.md': 'utf-8' codec can't decode byte 0xc1 in position 2: invalid start byte
Traceback (most recent call last):
...

Related links

categories

Reproduction

9.4.1+insiders.4.42.0-accent.zip

Steps to reproduce

Just uncomment #- Általános in sample-post-1.md file.

---
date: 2022-01-01
categories:
  - Category 1
  #- Általános
...

and mkdocs serve.

Browser

Chrome

Before submitting

@squidfunk
Copy link
Owner

Thanks for reporting and providing the minimal reproduction. Unfortunately, I'm not able to observe the error – I'm on macOS and you're using Windows. It looks like you did not save your file in UTF-8, but in some other encoding. Could you please check if this also happens if you create a file with the same name, not using the blog plugin? In that case, this error ir not exclusively related to the blog plugin, but to MkDocs itself. Here's what I get – a perfectly working site:

screenshot-localhost-8000-blog-category-C3-A1ltal-C3-A1nos-1695569295122

@squidfunk squidfunk added the needs reproduction Issue lacks a minimal reproduction .zip file label Sep 24, 2023
@alexvoss
Copy link
Sponsor Collaborator

Ok, I had to change to the public version of mkdocs-material because my Windows laptop is not fully set up but with that I was able to reproduce the issue. One thing I just realized is that the file that cannot be read is not the original file containing the blog post but blog\category\általános.md, which I assume is an intermediate file produced by the blog plugin? Perhaps that helps?

@tamasgt
Copy link
Author

tamasgt commented Sep 25, 2023

Thanks for the confirmation! I'll be back as soon as I'm done updating my VMs.

@squidfunk
Copy link
Owner

I wasn't able to reproduce this with generated files as well. It's likely a problem that only occurs on Windows, so if you can have a look at it @alexvoss, that would be great. As suggested in my last comment, try the following:

  • Create a file and reproduce it with stock Material without the blog plugin
  • If that reproduces the error, try to switch to the mkdocs theme

We need to know what causes this, and if we must fix it or MkDocs.

@alexvoss
Copy link
Sponsor Collaborator

alexvoss commented Sep 25, 2023

Right, made the offending blog post a normal page and the site builds fine. Also changed the text to something with German ÄÖÜ and this causes a different error:

Traceback (most recent call last):
  File "C:\Users\Alex Voss\src\mkdocs-material\reproduce\accent\venv\Lib\site-packages\mkdocs\livereload\__init__.py", line 193, in _build_loop
    func()
  File "C:\Users\Alex Voss\src\mkdocs-material\reproduce\accent\venv\Lib\site-packages\mkdocs\commands\serve.py", line 67, in builder
    build(config, live_server=None if is_clean else server, dirty=is_dirty)
  File "C:\Users\Alex Voss\src\mkdocs-material\reproduce\accent\venv\Lib\site-packages\mkdocs\commands\build.py", line 304, in build
    files = config.plugins.on_files(files, config=config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Alex Voss\src\mkdocs-material\reproduce\accent\venv\Lib\site-packages\mkdocs\plugins.py", line 533, in on_files
    return self.run_event('files', files, config=config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Alex Voss\src\mkdocs-material\reproduce\accent\venv\Lib\site-packages\mkdocs\plugins.py", line 507, in run_event
    result = method(item, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Alex Voss\src\mkdocs-material\reproduce\accent\venv\Lib\site-packages\material\plugins\blog\plugin.py", line 145, in on_files
    self.blog.views.extend(views)
  File "C:\Users\Alex Voss\src\mkdocs-material\reproduce\accent\venv\Lib\site-packages\material\plugins\blog\plugin.py", line 603, in _generate_categories
    self._save_to_file(file.abs_src_path, f"# {name}")
  File "C:\Users\Alex Voss\src\mkdocs-material\reproduce\accent\venv\Lib\site-packages\material\plugins\blog\plugin.py", line 876, in _save_to_file
    f.write(content)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 3-4: character maps to <undefined>

I will see that I set up a development environment on Windows so I can have a look.

@squidfunk
Copy link
Owner

Perfect, thanks for looking into it! Did you use UTF-8 as a file encoding?

@alexvoss
Copy link
Sponsor Collaborator

The original file was utf-8 encoded (had a look with a hex editor) and I assume vim has not changed that when I edited the file to put the umlaute in.

@tamasgt
Copy link
Author

tamasgt commented Sep 25, 2023

The answer to your first question is yes, UTF-8.

Snag_8f94d

I'm about to do the tests.

@tamasgt
Copy link
Author

tamasgt commented Sep 25, 2023

Without blog, theme Material --> ok

Snag_f019d

Without blog, theme MkDocs--> ok
Snag_1109ff

általános.md

Snag_151314

With blog again, and uncommented Általános category:

Snag_18bca4

Error reading page 'blog/category/altalanos.md': 'utf-8' codec can't decode
           byte 0xc1 in position 2: invalid start byte
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Scripts\mkdocs.exe\__main__.py", line 7, in <module>
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\mkdocs\__main__.py", line 270, in serve_command
    serve.serve(**kwargs)
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\mkdocs\commands\serve.py", line 86, in serve
    builder(config)
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\mkdocs\commands\serve.py", line 67, in builder
    build(config, live_server=None if is_clean else server, dirty=is_dirty)
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\mkdocs\commands\build.py", line 322, in build
    _populate_page(file.page, config, files, dirty)
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\mkdocs\commands\build.py", line 167, in _populate_page
    page.read_source(config)
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\material\plugins\blog\structure\__init__.py", line 229, in read_source
    super().read_source(config)
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\site-packages\mkdocs\structure\pages.py", line 203, in read_source
    source = f.read()
             ^^^^^^^^
  File "<frozen codecs>", line 322, in decode
  File "C:\Users\yyyy\AppData\Local\Programs\Python\Python311\Lib\encodings\utf_8_sig.py", line 69, in _buffer_decode
    return codecs.utf_8_decode(input, errors, final)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 2: invalid start byte

Do you need anything else?

@alexvoss
Copy link
Sponsor Collaborator

Thanks, this is very useful. I am able to reproduce the problem and am setting up a development environment on a Windows box to try and debug. Might be tomorrow before I can really have a look at this, though.

@tamasgt
Copy link
Author

tamasgt commented Sep 25, 2023

Ok, thank you!

@squidfunk
Copy link
Owner

I think I know where this might be coming from: I think we're missing the Encoding here:

with open(path, "w") as f:

Could you try changing to one of the following, and check if the error is gone?

with open(path, "w" encoding = "utf-8") as f: 
with open(path, "w" encoding = "utf-8-sig") as f: 

@alexvoss
Copy link
Sponsor Collaborator

alexvoss commented Sep 25, 2023

Yes, that does fix the problem for me. Well spotted!

Sorry, I used the second line, also added in a comma before encoding. Let me check the first once I am back at the Windows PC.

@squidfunk squidfunk added bug Issue reports a bug and removed needs reproduction Issue lacks a minimal reproduction .zip file labels Sep 25, 2023
@alexvoss
Copy link
Sponsor Collaborator

Ok, sorry, that took longer than I would have liked it to but I can report that utf-8 works as well.

@squidfunk
Copy link
Owner

@alexvoss great! Could you create a PR that adds the encoding to the call? We're going for utf-8 for now, because I'm not sure why we would need utf-8-sig then.

@alexvoss
Copy link
Sponsor Collaborator

@squidfunk am on it but am running into issues setting up the development environment on Windows, so may take a little longer. I do not want to just edit the file and assume that this fixes it, so need to do this on my favorite game launcher ;o)

@squidfunk
Copy link
Owner

No hurry ☺️ You're generating very valuable insights regarding the dev environment on Windows, that will directly go into our customization guide 🚀

@alexvoss
Copy link
Sponsor Collaborator

On that note. I got to the point where I needed to install libcairo and so came across MSYS2. I am not a great fan of Powershell (much less CMD), so am tempted to first write instructions for doing all this in a Unix-y environment. So, so much easier. Also, btw., I had no joy so far installing Windows in a VM, not on my Mac nor on my Linux box. Might try again on the Windows machine, so at least I can undo stuff there more easily. However, I will carry a max. of two machines with me when traveling...

@squidfunk
Copy link
Owner

I've had success with VirtualBox and one of those free IE Testing machines from Microsoft a while ago, but don't really know what the current situation is. I'm also desperately in need of a Windows env, but I also don't want to carry around two laptops all the time. Meh meh meh.

@squidfunk
Copy link
Owner

@alexvoss would you like to create a PR for this or should I go ahead and fix it?

@alexvoss
Copy link
Sponsor Collaborator

@squidfunk will do a pull request.

@squidfunk
Copy link
Owner

Fixed in c3aafb7 by @alexvoss.

@squidfunk squidfunk added the resolved Issue is resolved, yet unreleased if open label Sep 29, 2023
@tamasgt
Copy link
Author

tamasgt commented Oct 1, 2023

Thanks guys! 😃

Snag_5981131

@squidfunk
Copy link
Owner

Released as part of 9.4.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue reports a bug resolved Issue is resolved, yet unreleased if open
Projects
None yet
Development

No branches or pull requests

3 participants