Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid UTF-8 (Core Dump) #191

Closed
samuelwatsonofficial opened this issue Feb 5, 2024 · 4 comments
Closed

Invalid UTF-8 (Core Dump) #191

samuelwatsonofficial opened this issue Feb 5, 2024 · 4 comments
Assignees
Labels
bug Something isn't working fixready
Milestone

Comments

@samuelwatsonofficial
Copy link

I was trying to compress a very large folder with mkdwarfs (v0.8.0) on archlinux when I got the error "terminate called after throwing an instance of 'utf8::invalid_utf8'what(): Invalid UTF-8" midway through the process of compressing. The verbose output of the error is here https://pastebin.com/LZaNWgeT . I'm very new to debugging but at a glance it could be one of the files has a strange name.
Or possibly, the chunk size was too small
"appending 512 bytes to block 726 @ 16,695,296 from chunkable offset 1,048,109"
the chunkable offset 1,048,109 is 467 away from 2^20 so appending 512 bytes may be what is causing the issue if it higher than the block size.
Again, I'm not familiar with this codebase and this is all purely speculation.
I hope this can be fixed soon and easily.

@mhx
Copy link
Owner

mhx commented Feb 5, 2024

Hi and thanks for your report.

It's most definitely a weird file name and for some reason nobody has encountered this before. Congrats! :)

I reckon some of the files originate from Windows or from a system using a non-UTF-8 locale. Given that it's just 170 files, it should be relatively easy to spot.

I can easily reproduce this here with a specially crafted file name:

$ ls tmp   
''$'\334''berraschung'
$ ./mkdwarfs -i tmp -o /dev/null --force
I 23:11:49.632683 scanning "/home/mhx/git/github/dwarfs/build-clang/tmp"
I 23:11:49.632794 assigning directory and link inodes...
I 23:11:49.632806 waiting for background scanners...
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

0 dirs, 0/0 soft/hard links, 0/0 files, 0 other
original size: 0 B, hashed: 0 B (0 files, 0 B/s)
scanned: 0 B (0 files, 0 B/s), categorizing: 0 B/s
saved by deduplication: 0 B (0 files), saved by segmenting: 0 B
filesystem: 0 B in 0 blocks (0 chunks, 0 fragments, 0/0 inodes)
compressed filesystem: 0 blocks/0 B written
▏                                                                                                                                                               ▏  0% 🌑
terminate called after throwing an instance of 'utf8::invalid_utf8'
  what():  Invalid UTF-8
*** Aborted at 1707174709 (Unix time, try 'date -d @1707174709') ***
*** Signal 6 (SIGABRT) (0x3e8000057ce) received by PID 22478 (pthread TID 0x7fb19d5ff6c0) (linux TID 22479) (maybe from PID 22478, UID 1000) (code: -6), stack trace: ***
    @ 0000000000526622 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       /home/mhx/git/github/dwarfs/folly/folly/experimental/symbolizer/SignalHandler.cpp:449
    @ 00000000000397cf (unknown)
    @ 00000000000881fc (unknown)
    @ 0000000000039731 raise
    @ 00000000000224ec abort
    @ 00000000000a0c8f (unknown)
    @ 00000000000b3145 (unknown)
    @ 00000000000b31b0 std::terminate()
    @ 00000000000b33f2 __cxa_throw
    @ 0000000000519cc0 __cxa_throw
                       /home/mhx/git/github/dwarfs/folly/folly/experimental/exception_tracer/ExceptionTracerLib.cpp:159
    @ 0000000000357642 unsigned int utf8::next<char const*>(char const*&, char const*)
    @ 00000000003567ea dwarfs::utf8_display_width(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
    @ 00000000001a8d2d dwarfs::console_writer::update(dwarfs::progress&, bool)
    @ 00000000002ab25b std::thread::_State_impl<std::thread::_Invoker<std::tuple<dwarfs::progress::progress(folly::Function<void (dwarfs::progress&, bool)>&&, unsigned int)::$_0> > >::_M_run()

I'm not entirely sure about the best way to fix this yet, but it's certainly not going to take long.

@mhx mhx added the bug Something isn't working label Feb 5, 2024
@mhx mhx self-assigned this Feb 5, 2024
@mhx mhx modified the milestones: v0.8.1, v0.9.1 Feb 5, 2024
@mhx mhx added the fixready label Feb 6, 2024
@mhx
Copy link
Owner

mhx commented Feb 6, 2024

In the meantime, a workaround is to use the --no-progress flag. This should prevent the crash.

@mhx
Copy link
Owner

mhx commented Feb 6, 2024

Fixed in v0.9.1.

@mhx mhx closed this as completed Feb 6, 2024
@pfactum
Copy link

pfactum commented Feb 6, 2024

It's most definitely a weird file name and for some reason nobody has encountered this before. Congrats! :)

FWIW, I did, but I thought this was my problem, so I ran convmv on an affected folder, and this fixed it for me. I should have reported this earlier, yes :(.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixready
Projects
None yet
Development

No branches or pull requests

3 participants