Skip to content

Commit

Permalink
(FAT) Relaxed restriction on Unicode codepoints to support glyphs up …
Browse files Browse the repository at this point in the history
…to index 255.
  • Loading branch information
sp193 committed Oct 27, 2018
1 parent 4333a06 commit 0df7cdd
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 12 deletions.
13 changes: 7 additions & 6 deletions iop/fs/bdmfs_vfat/src/fat_driver.c
@@ -1,5 +1,6 @@
#include <errno.h>
#include <stdio.h>
#include <limits.h>

#include <sysclib.h>
//#include <sys/stat.h>
Expand Down Expand Up @@ -398,8 +399,8 @@ int fat_getDirentry(unsigned char fatType, fat_direntry* dir_entry, fat_direntry
dir->name[offset] = 0; //terminate
cont = 0; //stop
} else {
// Handle non-ASCII characters
dir->name[offset] = character < 128 ? dir_entry->lfn.name1[i] : '?';
// Handle characters that we don't support.
dir->name[offset] = character <= UCHAR_MAX ? dir_entry->lfn.name1[i] : '?';
offset++;
}
}
Expand All @@ -411,8 +412,8 @@ int fat_getDirentry(unsigned char fatType, fat_direntry* dir_entry, fat_direntry
dir->name[offset] = 0; //terminate
cont = 0; //stop
} else {
// Handle non-ASCII characters
dir->name[offset] = character < 128 ? dir_entry->lfn.name2[i] : '?';
// Handle characters that we don't support.
dir->name[offset] = character <= UCHAR_MAX ? dir_entry->lfn.name2[i] : '?';
offset++;
}
}
Expand All @@ -424,8 +425,8 @@ int fat_getDirentry(unsigned char fatType, fat_direntry* dir_entry, fat_direntry
dir->name[offset] = 0; //terminate
cont = 0; //stop
} else {
// Handle non-ASCII characters
dir->name[offset] = character < 128 ? dir_entry->lfn.name3[i] : '?';
// Handle characters that we don't support.
dir->name[offset] = character <= UCHAR_MAX ? dir_entry->lfn.name3[i] : '?';
offset++;
}
}
Expand Down
13 changes: 7 additions & 6 deletions iop/usb/usbhdfsd/src/fat_driver.c
Expand Up @@ -3,6 +3,7 @@
//---------------------------------------------------------------------------
#include <stdio.h>
#include <errno.h>
#include <limits.h>

#ifdef WIN32
#include <malloc.h>
Expand Down Expand Up @@ -371,8 +372,8 @@ int fat_getDirentry(unsigned char fatType, fat_direntry* dir_entry, fat_direntry
dir->name[offset] = 0; //terminate
cont = 0; //stop
} else {
// Handle non-ASCII characters
dir->name[offset] = character < 128 ? dir_entry->lfn.name1[i] : '?';
// Handle characters that we don't support.
dir->name[offset] = character <= UCHAR_MAX ? dir_entry->lfn.name1[i] : '?';
offset++;
}
}
Expand All @@ -384,8 +385,8 @@ int fat_getDirentry(unsigned char fatType, fat_direntry* dir_entry, fat_direntry
dir->name[offset] = 0; //terminate
cont = 0; //stop
} else {
// Handle non-ASCII characters
dir->name[offset] = character < 128 ? dir_entry->lfn.name2[i] : '?';
// Handle characters that we don't support.
dir->name[offset] = character <= UCHAR_MAX ? dir_entry->lfn.name2[i] : '?';
offset++;
}
}
Expand All @@ -397,8 +398,8 @@ int fat_getDirentry(unsigned char fatType, fat_direntry* dir_entry, fat_direntry
dir->name[offset] = 0; //terminate
cont = 0; //stop
} else {
// Handle non-ASCII characters
dir->name[offset] = character < 128 ? dir_entry->lfn.name3[i] : '?';
// Handle characters that we don't support.
dir->name[offset] = character <= UCHAR_MAX ? dir_entry->lfn.name3[i] : '?';
offset++;
}
}
Expand Down

2 comments on commit 0df7cdd

@sp193
Copy link
Member Author

@sp193 sp193 commented on 0df7cdd Oct 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because somebody requested it. Some software like LaunchELF were crafted to support codepoints up to 255, which includes some Spanish characters.

I did make a new attempt with implementing proper Unicode conversion functions, for converting UCS-2 (the Unicode encoding used in FAT) to UTF-8, but I remembered why I gave up previously: the LFN entries are stored back to front, while UTF-8 characters make it difficult to insert characters in a non-sequential order. As a result, filenames are are jumbled up, and there doesn't seem to be an easy way to solve this, short of identifying and storing all LFN entries in memory, before converting the filename to UTF-8 with all of them.

@sp193
Copy link
Member Author

@sp193 sp193 commented on 0df7cdd Oct 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case somebody else wonders like I once did, the encoding used for the filenames was not specified. But was likely UCS-2 since FAT32 came with Windows 95B and Microsoft only switched over from UCS-2 to UTF-16 with Windows 2000. UCS-2 is a 16-bit encoding, which makes it simpler to deal with than UTF-16 since it has no concept of surrogate pairs. Although the encoding was "not specified", we're constrained to using what Microsoft used, for forward and backward-compatibility. The first 128 Unicode code points are compatible with ASCII, which is why we can simply ignore the upper byte of UCS-2 and things still work.

UTF-8, UTF-16 and UCS-2 are encodings, not the character sets. So you can use them to encode (represent) the same Unicode characters. UCS-2 cannot represent all glyphs on the planet, as it only has 16 bits, which UTF-16 is free from since it can represent much more glyphs with 20 bits (a surrogate pair).

Please sign in to comment.