Skip to content

Filename‐Trouble with Unicode

toraxmalu edited this page Apr 16, 2024 · 4 revisions

Problem

The Latin alphabet also includes diacritics, i.e. á, ç, ä, ñ. Unicode offers two variants to represent these characters:

  1. NFC (Normalization Form Composed): character with diacritics is encoded as a single character.
  2. NFD (Normalization Form Decomposed): base character and diacritics are stored separately.

Examples for 1st and 2nd form:

  • á => a + ´
  • ç => c + ¸
  • ä => a + ̈
  • ñ => n + ~

iOS uses APFS, which stores such characters always in the 2nd, decomposed, form.

Impact

In normal operation file names are read as simple byte sequences. Trouble starts, where they where interpreted…

My situation:

  1. A folder was mounted via mount -f ios . workCopy
  2. check-out via svn into the iOS-folder
  3. A directory name contained an "ü"
  4. SVN worked without any hick-up in checkout

The trouble began with use of svn status or svn commit: They recognized the directory in two variants:

  1. "original no longer exists"
  2. "not versioned"

SVN searched for the directory name with "ü" but only found the variant with "u¨" on the iOS directory.

And before i forget: Neither the file-manager nor cp converting the filename while copying or renaming. Don't use diacritics in names for folders or files! And there is another trouble lurking in the shadows: File content, if the app / command is not converting…

Clone this wiki locally