Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File names case-sensitive (Ada-related) #101

Closed
simonjwright opened this issue Dec 7, 2022 · 28 comments
Closed

File names case-sensitive (Ada-related) #101

simonjwright opened this issue Dec 7, 2022 · 28 comments

Comments

@simonjwright
Copy link
Contributor

At gcc/ada/adaint.c:611 we have

	  /* By default, we suppose filesystems aren't case sensitive on
	     Windows and Darwin (but they are on arm-darwin).  */
#if defined (WINNT) || defined (__DJGPP__) \
  || (defined (__APPLE__) && !(defined (__arm__) || defined (__arm64__)))
	  file_names_case_sensitive_cache = 0;
#else
	  file_names_case_sensitive_cache = 1;
#endif

The change (b54d1d3 of 2015-10-20) was made because "file names are case sensitive on aarch64-ios".

But they aren’t on macOS (unless a user has made the unwise decision to change the default for their disk).

I can’t see any defines which would allow macOS to be detected vs iOS (or *OS, I suppose), except perhaps __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ which doesn’t look reliable.

The environment variable GNAT_FILE_NAME_CASE_SENSITIVE, see line 603, is for use in a built compiler, not while building the compiler.

@iains
Copy link
Owner

iains commented Dec 7, 2022

Typically, my build and sources disks are case-sensitive and I set GNAT_FILE_NAME_CASE_SENSITIVE in my .profile and I've not seen issues in building the compiler on either x86_64 or aarch64 macOS (once in a while I have built on a non CS disk again without issue).

Actually, since Catalina or so I do not see (the previously unwise) issue with making the OS disk CS either - since the OS is stored on a non-CS-read-only partition so that it is only the User's dynamic additions that are on a CS writable partition (although this is all 'under the hood' from the perspective of the end user).

There are defines that allow detection of macOS c.f. iOS / watchOS / tvOS .. PLATFORM_xx or something similar (from memory). grep should find them in the SDK.

@simonjwright
Copy link
Contributor Author

I had no trouble building the compiler and tools on aarch64-apple-darwin, but then none of the sources use upper-case (or international characters) in filenames so I wouldn’t expect any.

I found this because one of my users, against advice, or perhaps in ignorance of it, used mixed case for file names, and the compiler (or possibly the build tool, gprbuild) couldn’t find the file it needed.

I thought the problem was that -- if say /Applications is on a CS filesystem -- an applications’ code may assume non-CS and fail when it encounters CS?

I couldn’t see the SDK defines you mention, so my thoughts on how to progress this are

  • issue warnings in release notes to my users,
  • (as a last resort) patch out the problematic code in adaint.c.

@iains
Copy link
Owner

iains commented Dec 10, 2022

I had no trouble building the compiler and tools on aarch64-apple-darwin, but then none of the sources use upper-case (or international characters) in filenames so I wouldn’t expect any.

FWIW ( not a justification more an observation ) I started to build all my OSS projects on CS systems because some did assume that they were on CS - not specifically GCC. I have around 120 OSS projects on my sources and build some/all of them with GCC at times.

edit : although the sources and builds are on CS - until Catalina, I did keep my OS disk as case-preserving / non-CS.

I found this because one of my users, against advice, or perhaps in ignorance of it, used mixed case for file names, and the compiler (or possibly the build tool, gprbuild) couldn’t find the file it needed.

Ah.. of course, we cannot speak for tools outside of the builds that we do, but a failure to handle CS/non-CS is a bug, I'd say (CS is usually handled as the default for OSS projects, at least).

I thought the problem was that -- if say /Applications is on a CS filesystem -- an applications’ code may assume non-CS and fail when it encounters CS?

There were notable huge vendors who released applications that were not safe on CS OS X systems, indeed - but that was 15 years ago .. I'd hope they have the idea by now.

However, I think you are technically correct - post-delivery installations into / will be directed to the CS writeable partition.

FWIW, at this time, I have not seen problems (I've been using CS system since hmm... Catalina, I think) but then I do not install many 3rd-party applications.

I couldn’t see the SDK defines you mention,
nevertheless, the information is present - see for example
MacOSX12.sdk/usr/include//AvailabilityInternal.h

#define __API_AVAILABLE_PLATFORM_macos(x) macos,introduced=x
#define __API_AVAILABLE_PLATFORM_macosx(x) macosx,introduced=x
#define __API_AVAILABLE_PLATFORM_ios(x) ios,introduced=x
#define __API_AVAILABLE_PLATFORM_watchos(x) watchos,introduced=x

So figuring out how these get their information should work?

NOTE : GCC at this stage does NOT support iOS/tvOS (and has not got the 32b port to support watchOS) so that you could currently assume macOS.

so my thoughts on how to progress this

  • issue warnings in release notes to my users,
  • (as a last resort) patch out the problematic code in adaint.c.

unless this appears as a prevalent issue, I'd suggest the former .. for most people thing "just work".

@iains
Copy link
Owner

iains commented Dec 10, 2022

looking at clang's output, I'd say you can also use:

__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__
__ENVIRONMENT_IPHONE_OS_VERSION_MIN_REQUIRED__
__ENVIRONMENT_TV_OS_VERSION_MIN_REQUIRED__

etc.

(edit : GCC will only emit ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED thus far)

@fxcoudert
Copy link
Contributor

fxcoudert commented Dec 13, 2022

I would report this to the Ada maintainers: whether the system is case-sensitive is not a property of the OS at all, it will depend on the filesystem. That code makes no sense.

@iains
Copy link
Owner

iains commented May 27, 2023

I guess the only 'tactical' difference is, that with iOS, the end user has no influence over the FS type (so if you know the target is iOS you can decide), but with macOS this no longer holds.

As of now, GCC (as supported here, and upstream) does not support iOS - therefore we have to assume that the user could change the FS kind (I certainly do). The Ada components honour GNAT_FILE_NAME_CASE_SENSITIVE, so AFAICT that is usable. Can we close this issue, since it appears to be either something that should be raised upstream (it would manifest on x86_64 I presume)?

@cooljeanius
Copy link
Contributor

Related to upstream bug 81114 I think? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81114

@iains
Copy link
Owner

iains commented Oct 15, 2023

perhaps, but the phenomenon we are discussing here is not with older OS revisions - but actually with the "latest and greatest" .. my speculation is that HFS+ does some translation between the call and the disk - but APFS does not (and so if the supplied string is incompatible, it just punts instead of converting).

I'd say that this is a job for the runtime librar(ies) to sort out.

@simonjwright
Copy link
Contributor Author

I would report this to the Ada maintainers: whether the system is case-sensitive is not a property of the OS at all, it will depend on the filesystem. That code makes no sense.

I think all the Ada maintainers are AdaCore people; AdaCore have said they don’t support macOS.

I’ve written a small demonstrator (attached) that could be included in adaint.c, and invoked if GNAT_FILE_NAME_CASE_SENSITIVE isn’t set. Tested on Catalina (the earliest version I have), Monterey (x86_64), Sonoma (aarch64), against ExFAT, HFS, HFS CS, APFS, APFS CS (the HFS and both CS tests were against disk images).

Issues I see:

  • getattrlist() first appeared in Yosemite (10.10, darwin14). Could check for OS release and default to case-insensitive if < darwin14.
  • adaint.c has a lot of OS sensitivity, would make it tricky to review and very difficult to test

check_case_sensitivity.zip

@iains
Copy link
Owner

iains commented Oct 20, 2023

I would report this to the Ada maintainers: whether the system is case-sensitive is not a property of the OS at all, it will depend on the filesystem. That code makes no sense.

I think all the Ada maintainers are AdaCore people; AdaCore have said they don’t support macOS.

Well, they no longer actively develop for macOS, but I think that they will review macOS / Darwin patches for Ada (i.e. I do not think that I can approve them as Darwin maintainer - although perhaps I should clarify that with them).

I’ve written a small demonstrator (attached) that could be included in adaint.c, and invoked if GNAT_FILE_NAME_CASE_SENSITIVE isn’t set. Tested on Catalina (the earliest version I have), Monterey (x86_64), Sonoma (aarch64), against ExFAT, HFS, HFS CS, APFS, APFS CS (the HFS and both CS tests were against disk images).

Issues I see:

  • getattrlist() first appeared in Yosemite (10.10, darwin14). Could check for OS release and default to case-insensitive if < darwin14.

We can use min version in adainit.c - it's already done.

  • adaint.c has a lot of OS sensitivity, would make it tricky to review and very difficult to test

Indeed it is the "kitchen sink" for interface stuff.

check_case_sensitivity.zip

I will try to take a look at this sometime after Stage1 is finished (19th Nov) - I think the bug is a regression and therefore can be fixed in Stage 3. Until then, I am snowed under with Stage1 stuff :)

@iains
Copy link
Owner

iains commented Oct 20, 2023

As an aside: "// getattrlist() first appeared in Yosemite, 10.10 (darwin14). There
// are notes in the man page under Compatbility:"

This does not seem correct, I see getattlist() in /usr/include/sys/attr.h back to Darwin8 (MacOSX 10.4) [ not that this matters too much to the actual issue, since APFS is not available until 10.13 or so]

@simonjwright
Copy link
Contributor Author

As an aside: "// getattrlist() first appeared in Yosemite, 10.10 (darwin14). There // are notes in the man page under Compatbility:"

This does not seem correct, I see getattlist() in /usr/include/sys/attr.h back to Darwin8 (MacOSX 10.4) [ not that this matters too much to the actual issue, since APFS is not available until 10.13 or so]

I saw it in a Compatibility man page section somewhere on the web, sorry. Looking at current unistd.h, it seems to be saying that 10.6 is the start point. There were some changes in attr.h between Catalina and Sonoma, the one that affected me was that ATTRIBUTE_SET_INIT wasn’t in the earlier version.

I don’t believe that APFS affects the issue ("File names case sensitive"), this arises on HFS just as much as on APFS. The difference with APFS is that it will fail any attempt to create a file with an upper latin-1 character in its name, e.g. lower-case-a-acute, 0xe1; whereas HFS converts the character to the three-character string %E1.

If you try to run the 2.6 ACATS, as included in GCC (e.g. make -C gcc check-acats or check-ada) on an APFS volume, test C250002 will fail because it involves files with names containing latin-1 characters; whereas on an HFS volume, the test succeeds because the conversions noted above work for write and read.

If you run the current (4.1) ACATS on a case-insensitive volume, C250002 will fail because it involves files with names containing UTF-8 characters that GCC tries to convert to lower-case by treating each octet as if it were latin-1 (if it looks like an upper-case latin-1 character, set bit 5!). On a CS volume, GCC leaves well alone and all works.

Why does GNAT do this? because with Foo; translates to a dependence on file foo.ads (as does with foo;). This is all quite deep in the compiler, and I’m not at all sure it’s a library issue. All the same, recognising case-sensitivity properly can only be a good thing.

@iains
Copy link
Owner

iains commented Oct 20, 2023

Making the process automatic (rather than relying on GNAT_FILE_NAME_CASE_SENSITIVE) seems like a good objective.

However, it seems that GNAT_FILE_NAME_CASE_SENSITIVE is not sufficient on APFS (C250002 fails on a case-sensitive APFS volume even when that env is set), where it does work reliably on HFS+ (as you say some conversion is done).

I'll try to give this some better attention after stage 1 is done.

@simonjwright
Copy link
Contributor Author

simonjwright commented Oct 21, 2023

However, it seems that GNAT_FILE_NAME_CASE_SENSITIVE is not sufficient on APFS (C250002 fails on a case-sensitive APFS volume even when that env is set), where it does work reliably on HFS+ (as you say some conversion is done).

The attached is a C program that demonstrates this behaviour. On an HFS disk,

$ latin
attempting to open 'l?tin.txt':
ok
$ ls
l%E1tin.txt

On an HFS CS disk,

$ latin
attempting to open 'l?tin.txt':
ok
$ ls
l%E1tin.txt

On an APFS disk,

$ latin
attempting to open 'l?tin.txt':
call failed: Illegal byte sequence

On an APFS CS disk,

$ latin
attempting to open 'l?tin.txt':
call failed: Illegal byte sequence

I’ll raise a Bugzilla PR on the ACATS testsuite, as I should have done already: likewise on this issue.

latin.zip

@cooljeanius
Copy link
Contributor

I’ll raise a Bugzilla PR on the ACATS testsuite, as I should have done already: likewise on this issue.

I'm assuming that that's this? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111909

@simonjwright
Copy link
Contributor Author

I'm assuming that that's this? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111909

Yes (this issue).

@simonjwright
Copy link
Contributor Author

See this patch. It’ll take a while for AdaCore to comment one way or another; of course it’s very Darwin-specific. I don’t have a Linux box to test it on.

@iains
Copy link
Owner

iains commented Oct 30, 2023

See this patch. It’ll take a while for AdaCore to comment one way or another; of course it’s very Darwin-specific. I don’t have a Linux box to test it on.

I also want to check it on earlier Darwin - just in case (and then will comment on list)...

@iains
Copy link
Owner

iains commented Oct 30, 2023

FAOD, with this patch, I should no longer need to set GNAT_FILE_NAME_CASE_SENSITIVE (when using a CS filesystem)? If all goes smoothly, can also test on Linux.

@iains
Copy link
Owner

iains commented Oct 31, 2023

OK.. so initial testing on Darwin19 (OK) Darwin17 (not OK) and Darwin9 (not OK)

According to the manual page for getattrlist and the test for a case-sensitive volume should exist on Darwin9 and 17, so there is some debugging needed.

edit: this might relate to file systems mounted onto paths in the root dir (which seems to report correctly on later OS versions but with 'invalid parameter' on earlier). Not sure when I'll be able to debug further.

@cooljeanius
Copy link
Contributor

See this patch. It’ll take a while for AdaCore to comment one way or another; of course it’s very Darwin-specific. I don’t have a Linux box to test it on.

Following the thread up across the month-boundary that exists in the mailing list archives:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635104.html

@simonjwright
Copy link
Contributor Author

simonjwright commented Nov 19, 2023 via email

@simonjwright
Copy link
Contributor Author

simonjwright commented Nov 19, 2023 via email

@iains
Copy link
Owner

iains commented Nov 21, 2023

By the time we’d got to the last message in the patch thread, https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637004.html, I’d rather lost interest (in fact, I wrote a reply which I seem not to have posted saying that I thought it was really up to Darwin people as to whether Arno’s proposal - disregarding the possibility of iWatch, for instance - was OK).

We are a long long way from having time to extend the Arm64 port to cover 32b (not arguing against the idea, but being realistic in resourcing)

If we go ahead with this minimal patch, macOS computers will be consistent, which is a good thing. It might make it more difficult to get the original patch in (fixed up for the older OS issues that Iain found, of course) at a later date.

I will reply to the list thread, IMO it's a step forward; I'd prefer an automatic solution, but there's more work to do to achieve that and I do not have cycles spare to explore what's going wrong (the calls are a thin wrapper on a syscall, so the underlying difference is how the kernel is responding to the input path). I am not concerned that a short-term improvement would alter the acceptability of a better patch when we have one.

On further thought - Iain, have you actually encountered Ada case-related issues? I’ve not had any reports of aarch64-apple-darwin users having trouble in this area, in spite of the compiler thinking the FS is case-sensitive.

I have not seen reports of problems, there are none (other than the current discussed cases) in BZ AFAIK. I use the GNAT_FILE_NAME_CASE_SENSITIVE=1 for local testing (and, yes, it's irritating to repeat the testsuite run if I forget to do it - but not a show-stopper).

This means that ACATS 4.1’s C250002 actually works on aarch64! (PR81114 refers).

What has changed between the version that we have in GCC and 4.1?

@simonjwright
Copy link
Contributor Author

This means that ACATS 4.1’s C250002 actually works on aarch64! (PR81114 refers).

What has changed between the version that we have in GCC and 4.1?

The test involves a file whose name (c250002_á.ads) includes an international character; the text of that file includes its own name (C250002_Á), upper-cased as you see.

The in-tree test implements this using Latin-1 characters; ACATS 4.1 uses UTF-8.

I’ve only investigated the UTF-8 implementation. The compiler realises that it needs to lower-case the text name (because Ada), and manages this successfully.

If file names are case-insensitive, a later part of the compiler (for reasons) decides to lower-case this already-lower-case name by treating each byte as if it were Latin-1, which mangles the á (0xc3a1) into 0xe3a1, which isn’t a valid character.

I don’t know why APFS allows this, when it doesn’t allow upper-half Latin-1 characters!

@iains
Copy link
Owner

iains commented Feb 11, 2024

If I understand correctly, there's an upstream bug to track this (and the problem is not specific to this port) - if both those are tre please close the issue (the BZ PR is open).

@simonjwright
Copy link
Contributor Author

BZ? but, closing.

@iains
Copy link
Owner

iains commented Feb 12, 2024

BZ? but, closing.

Bugzilla .. (PR now gets confused with "pull request" 🙄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants