Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using ReplaceString with unicode character #12

Closed
taosxx opened this issue Aug 23, 2018 · 5 comments
Closed

Error when using ReplaceString with unicode character #12

taosxx opened this issue Aug 23, 2018 · 5 comments
Labels
bug Something isn't working invalid This doesn't seem right

Comments

@taosxx
Copy link

taosxx commented Aug 23, 2018

Instead of "_" (\xF5) I would like to use "»" (\xBB) to replace "\x20" in the libnickel patch "Allow searches on Extra dictionaries". This works with patch32lsb or if use a hexeditor (where I can even go up to \xFF). Unfortunately, kobopatch seems to allow only ASCII characters, the last printable character that works is "~" (\x7E) and the very last is the control character \x7F. Starting from \x80, kobopatch shows the following error in log.txt:

applying patchAllow searches on Extra dictionaries looping over instructions skipping non-instruction Enabled(), PatchGroup() or Description() FindBaseAddressString("\x00Extra: ") | hex:0045787472613a20 ReplaceString(7, " ", "\u0080") could not apply patch: ReplaceString: length mismatch in byte replacement Fatal: Could not apply patch file src/libnickel.so.1.0.0.yaml: ReplaceString: length mismatch in byte replacement

@pgaskin
Copy link
Owner

pgaskin commented Aug 24, 2018

This is partly by design (albeit with a cryptic error message). To replace bytes values, you need to use ReplaceBytes, and FindBaseAddressBytes. Try changing that (see the other patches for examples), and if it doesn't work, show me the whole patch.

@taosxx
Copy link
Author

taosxx commented Aug 24, 2018

Are you sure that there is a "FindBaseAddressBytes"?

I don't get it:
len("\x20") = 1 = len("\xBB")
len([]byte("\x20")) = 1 = len([]byte("\xBB"))

The default entry in libnickel.so.1.0.0.yaml (v10) looks like that:

Allow searches on Extra dictionaries:
  - Enabled: no
    ## To allow searches on Extra dictionaries change space character at end of
    ## "Extra: " to another char (ex: "Extra:_")
    ## The space char causes a non-desired "English - English" when searching on
    ## Extra dictionary from main menu.
  - FindBaseAddressString: "\0Extra:\x20"
  - ReplaceString: {Offset: 7, Find: "\x20", Replace: "_"}

If I replace the last two lines with - ReplaceBytes: {Offset: 0x00CC184F, FindH: 00 45 78 74 72 61 3A 20, ReplaceH: 00 45 78 74 72 61 3A BB} I get a libnickel.so.1.0.0 where the space \x20 is correctly replaced by "»" (\xBB). Not very convenient because I would have to use a hex editor to search for the correct offset with every firmware update.

If I use instead:

  - FindBaseAddressString: "\0Extra:\x20\0"
  - ReplaceString: {Offset: 7, Find: "\x20\0", Replace: "\xBB"}

For whatever reason, an additional character is inserted - the \x20\0 is replaced by "»" (\xC2\xBB). Why?

That means, in a hex editor, the interesting part looks like that:

00 45 78 74 72 61 3A 20 00 (unpatched libnickel)
00 45 78 74 72 61 3A BB 00 (correctly patched libnickel by replacing bytes)
00 45 78 74 72 61 3A C2 BB (incorrectly patched libnickel by replacing string)

@pgaskin
Copy link
Owner

pgaskin commented Aug 24, 2018

Oops, I meant FindBaseAddressHex, not FindBaseAddressBytes.

What I would suggest is:

Allow searches on Extra dictionaries:
  - Enabled: no
    ## To allow searches on Extra dictionaries change space character at end of
    ## "Extra: " to another char (ex: "Extra:_")
    ## The space char causes a non-desired "English - English" when searching on
    ## Extra dictionary from main menu.
  - FindBaseAddressHex: 00 45 78 74 72 61 3A 20 # Hex of \0Extra:\x20\0
  - ReplaceBytes: {Offset: 7, FindH: 20, ReplaceH: BB}

This way, you don't have to update offsets, but it still works.

The issue is due to the way strings are encoded as binary in Go. Golang uses UTF-8, which is a two-byte encoding. \xBB is the ISO-8859-1 encoding of », and when trying to decode it as UTF-8, you get an invalid char. \xC2\xBB is the UTF-8 encoding, which is what Go uses.

@pgaskin pgaskin added the question Further information is requested label Aug 24, 2018
@pgaskin pgaskin changed the title regression of kobopatch vs. patch32lsb: can't replace \x20 with non-ASCII character Error when using ReplaceString with unicode character Aug 24, 2018
@pgaskin
Copy link
Owner

pgaskin commented Aug 24, 2018

Also, are you sure this displays properly on the Kobo? AFAIK, Qt uses the UTF-8 encoding by default.


EDIT: Yep, I checked. Replacing it with \xBB will not work (it will show a square box), as it needs \xC2\xBB (the UTF-8 encoding). So, the fact that it counts it as two bytes was correct from the start, as it is (due to the UTF-8 encoding). I've added this to the FAQ on the kobopatch thread.

Here is what you should use:

Allow searches on Extra dictionaries:
  - Enabled: no
    ## To allow searches on Extra dictionaries change space character at end of
    ## "Extra: " to another char (ex: "Extra:_")
    ## The space char causes a non-desired "English - English" when searching on
    ## Extra dictionary from main menu.
  - FindBaseAddressString: "\0Extra:\x20"
  - ReplaceString: {Offset: 7, Find: "\x20\0", Replace: "»"}

@pgaskin pgaskin added bug Something isn't working invalid This doesn't seem right and removed question Further information is requested labels Aug 24, 2018
@pgaskin pgaskin closed this as completed Aug 24, 2018
@taosxx
Copy link
Author

taosxx commented Aug 24, 2018

Thanks for the suggestion.

BTW, where do you see the squares in the GUI? I've attached two pictures that show how it looks like for me (20 replaced by BB: 00 45 78 74 72 61 3A BB 00 (no \xC2\xBB)). One picture is from the dictionary lookup, the other from the language settings.

img_7846
img_7847

I don't have to work around utf-8 in libnickel.so but in KoboReader.sqlite and Kobo eReader.conf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants