Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WindwowsString/WindowsFilePath support wrt AFPP #198

Merged
merged 7 commits into from
Jul 29, 2022

Conversation

hasufell
Copy link
Member

@hasufell hasufell commented Mar 24, 2022

Depends on: haskell/filepath#103

For testing file operations, you can use https://github.com/hasufell/file-io (also please review the windows module)

cabal.project Outdated Show resolved Hide resolved
Win32.cabal Outdated Show resolved Hide resolved
@Mistuke
Copy link
Contributor

Mistuke commented Mar 27, 2022

Thanks for the PR! Can I ask you to split it into two commits? One for the refactoring for the extraction to the internal functions and one for the new functions. They can still be in the same PR, I can just more easily filter away the refactoring changes.

I'm away till Friday so will do a more indepth review then, on first glance it looks great though!

@Mistuke
Copy link
Contributor

Mistuke commented Mar 27, 2022

Also, WindowsString == CWString and WindowsFilePath == ByteString if I'm reading this correctly?

@Mistuke
Copy link
Contributor

Mistuke commented Mar 27, 2022

Also for your create flags here https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L20 you'll want to add FILE_FLAG_OVERLAPPED for WinIO otherwise it'll force the handle to behave synchronously (and so the I/O manager will be forced sequentially while this handle is in use).

See https://github.com/ghc/ghc/blob/378c0bba7d132a89dd9c35374b7b4bb5a4730bf7/libraries/base/GHC/IO/Windows/Handle.hsc#L904

You can also conditionally add the flag depending on which I/O manager is in use, see https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0#io-manager-winio-related-changes

@Mistuke
Copy link
Contributor

Mistuke commented Mar 27, 2022

Also which format will fp be here? https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L15 I assume a device path? i.e.\\?\?

@Mistuke
Copy link
Contributor

Mistuke commented Mar 27, 2022

Also there's a small bug with WriteMode https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L27

WriteMode expects the pointer to be at the start of the file, so opening a file in write mode expects the file to be overwritten (where-as append mode appends to the end).

So you have to reset the file size after opening it https://github.com/ghc/ghc/blob/378c0bba7d132a89dd9c35374b7b4bb5a4730bf7/libraries/base/GHC/IO/Windows/Handle.hsc#L856 otherwise the file won't be truncated on write.

for AppendMode you also probably want GENERIC_WRITE mode so the user can move the pointer around.

@hasufell
Copy link
Member Author

hasufell commented Mar 28, 2022

Also, WindowsString == CWString and WindowsFilePath == ByteString if I'm reading this correctly?

newtype WindowsString = WS { unWFP :: BS.ShortByteString }

It's a ShortByteString (unpinned bytearray), which is expected to be in UTF-16 format. That's why there is a specialized module System.AbstractFilePath.Data.ByteString.Short.Word16 for operations such as:

unfoldr :: (a -> Maybe (Word16, a)) -> a -> ShortByteString

...so we don't do any conversion whatsoever.

WindowsFilePath is just a type synonym to WindowsString.

We then use the specialized functions:

useAsCWString :: ShortByteString -> (Ptr Word16 -> IO a) -> IO a

The endgoal then is for the user to use AbstractFilePath (as demonstrated in the file-io package), where:

#if defined(mingw32_HOST_OS) || defined(__MINGW32__)
type PlatformString = WindowsString
#else
type PlatformString = PosixString
#endif

newtype AbstractFilePath = AbstractFilePath PlatformString

which has this nice trick that you can't unpack AbstractFilePath to the wrong platform, because WindowsString and PosixString have different constructors. And still allows you to write platform specific code via those.

@hasufell
Copy link
Member Author

Thanks for the PR! Can I ask you to split it into two commits? One for the refactoring for the extraction to the internal functions and one for the new functions. They can still be in the same PR, I can just more easily filter away the refactoring changes.

I also want to note that another way to do this PR is heavy use of CPP and then include with the appropriate functions imported and maybe ifdefs in function bodies. filepath relies on this heavily, but I wouldn't claim that it makes it more maintainable. Up to you.

@hasufell
Copy link
Member Author

Also which format will fp be here? https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L15 I assume a device path? i.e.\\?\?

The format is not specified. Does createFile expect a device path? My tests seem to indicate that regular shorthand paths work as well.

@Mistuke
Copy link
Contributor

Mistuke commented Mar 28, 2022

Also which format will fp be here? https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L15 I assume a device path? i.e.\\?\?

The format is not specified. Does createFile expect a device path? My tests seem to indicate that regular shorthand paths work as well.

It does not expect it, but it accepts it. Without using a device path you'll regress the toolchain's long path support. GHC has internally mapped to device paths when the API supports it for many releases now.

@hasufell
Copy link
Member Author

hasufell commented Mar 28, 2022

Also which format will fp be here? https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L15 I assume a device path? i.e.\\?\?

The format is not specified. Does createFile expect a device path? My tests seem to indicate that regular shorthand paths work as well.

It does not expect it, but it accepts it. Without using a device path you'll regress the toolchain's long path support. GHC has internally mapped to device paths when the API supports it for many releases now.

Doesn't that cause problems, because we're disabling filepath interpretation? E.g. C:/foo/bar now means something different. So I'm not really sure what is expected here. My gut feeling is that I don't want to mess with user filepaths at all. That was one of the reasons for AFPP, although more about leaving encoding alone (which is not really a problem on windows, because we know the encoding).

Or put another way: is there safe way to convert arbitrary paths to \\?\ paths, emulating the entire windows filepath interpretation layer?


Edit: \\?\C:foo.txt is also different from C:foo.txt. Resolving it to an absolute path will not preserve semantics?

@Mistuke
Copy link
Contributor

Mistuke commented Mar 28, 2022

Also which format will fp be here? https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L15 I assume a device path? i.e.\\?\?

The format is not specified. Does createFile expect a device path? My tests seem to indicate that regular shorthand paths work as well.

It does not expect it, but it accepts it. Without using a device path you'll regress the toolchain's long path support. GHC has internally mapped to device paths when the API supports it for many releases now.

Doesn't that cause problems, because we're disabling filepath interpretation? E.g. C:/foo/bar now means something different. So I'm not really sure what is expected here.

No, because when users use C:/foo/bar they expect it to mean the same thing as C:\foo\bar and we take care of that during the conversion. and on the win32 API layer / and \ mean the exact same thing. Should the user want / to not mean folder then they have to explicitly disable path processing in their input by using \\?\ in which case we don't touch the PATH.

My gut feeling is that I don't want to mess with user filepaths at all. That was one of the reasons for AFPP, although more about leaving encoding alone (which is not really a problem on windows, because we know the encoding).

The reason we do this is so that users can give ANY path to the compiler and have it work. The user shouldn't have to do \\?\ when they want a long path because 99% of users don't know what this is. Neither do scripts or other programs that can produce paths GHC consumes.

We in GHC decided to make this work out of the box because of things like cabal new-build that produce very long normalc:\ paths.

This isn't about AFPP or whatever. You're overriding openFile in an external module and not adhering to what the expectations are. If the user has to input \\?\ to get a long path it also means every tool down the pipeline must understand and interpret this correctly. Configure scripts and friends don't for instance. So we made an implicit decision to convert paths internally but not expose them externally unless the user has explicitly used a \\?\ path.

I don't particularly care what the library does or does not do with it's PATHs. I care about the fact that you're affecting the behavior of the I/O system in an external module that presumably will be part of a boot library.

Or put another way: is there safe way to convert arbitrary paths to \\?\ paths, emulating the entire windows filepath interpretation layer?

Edit: \\?\C:foo.txt is also different from C:foo.txt. Resolving it to an absolute path will not preserve semantics.

Correct. This path is almost non-existent in the real world, Very few people know what it does. Supporting this very arcane path does not warrant regressing the compiler's long path support for the vast majority of users. I'm pretty sure the conversion would fail for that path anyway so we'd use it as is.

@hasufell
Copy link
Member Author

Can you point me to the code where the conversion is done?

@Mistuke
Copy link
Contributor

Mistuke commented Mar 28, 2022

Can you point me to the code where the conversion is done?

https://github.com/ghc/ghc/blob/0619fb0fb14a98f04aac5f031f6566419fd27495/utils/fs/fs.c#L48

@hasufell
Copy link
Member Author

hasufell commented Mar 29, 2022

Also there's a small bug with WriteMode https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L27

WriteMode expects the pointer to be at the start of the file, so opening a file in write mode expects the file to be overwritten (where-as append mode appends to the end).

So you have to reset the file size after opening it https://github.com/ghc/ghc/blob/378c0bba7d132a89dd9c35374b7b4bb5a4730bf7/libraries/base/GHC/IO/Windows/Handle.hsc#L856 otherwise the file won't be truncated on write.

I think this is not needed, because we use Win32.cREATE_ALWAYS instead of Win32.oPEN_ALWAYS for WriteMode.

The documentation says:

Creates a new file, always.
If the specified file exists and is writable, the function overwrites the file, the function succeeds, and last-error code is set to ERROR_ALREADY_EXISTS (183).

https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea

From my tests, truncation seems to work.

for AppendMode you also probably want GENERIC_WRITE mode so the user can move the pointer around.

I tried this and it seems to lead to the file pointer moving back to the start of the file, so I don't get append behavior anymore.

@Mistuke
Copy link
Contributor

Mistuke commented Mar 29, 2022

Also there's a small bug with WriteMode https://github.com/hasufell/file-io/blob/master/System/File/AbstractFilePath/Windows.hs#L27
WriteMode expects the pointer to be at the start of the file, so opening a file in write mode expects the file to be overwritten (where-as append mode appends to the end).
So you have to reset the file size after opening it https://github.com/ghc/ghc/blob/378c0bba7d132a89dd9c35374b7b4bb5a4730bf7/libraries/base/GHC/IO/Windows/Handle.hsc#L856 otherwise the file won't be truncated on write.

I think this is not needed, because we use Win32.cREATE_ALWAYS instead of Win32.oPEN_ALWAYS for WriteMode.

The documentation says:

Creates a new file, always.
If the specified file exists and is writable, the function overwrites the file, the function succeeds, and last-error code is set to ERROR_ALREADY_EXISTS (183).

Yes but this means that errno is different in your implementation than what Base returns. It may not matter for most use cases but It's not the expected result value for GHC programs. Anything checking GetLastError () will diverge with your implementation.

I also now notice an additional problem... with mio you get the in-process locking code because hANDLEtoHandle eventually calls mkFD which registers the locks, but for winio the helper was added for console handles not files.

Since you're relying on the very latest win32 I can just add file support to hANDLEtoHandle which should work..

https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea

From my tests, truncation seems to work.

for AppendMode you also probably want GENERIC_WRITE mode so the user can move the pointer around.

I tried this and it seems to lead to the file pointer moving back to the start of the file, so I don't get append behavior anymore.

Hmmm.. are you sure you tested the right thing? access attributes don't influence the disposition. See [2]. The problem here is that the differing attributes change what happens during subsequent calls to CreateFile.

For AppendMode you use FILE_APPEND_DATA, while GENERIC_WRITE is a combination of ALL write attributes including FILE_WRITE_ATTRIBUTES and FILE_WRITE_DATA.

[1] https://docs.microsoft.com/en-us/windows/win32/fileio/file-access-rights-constants
[2] https://docs.microsoft.com/en-us/windows/win32/fileio/creating-and-opening-files

@hasufell
Copy link
Member Author

hasufell commented Apr 7, 2022

Since you're relying on the very latest win32 I can just add file support to hANDLEtoHandle which should work..

Sounds good.

Hmmm.. are you sure you tested the right thing? access attributes don't influence the disposition. See [2]. The problem here is that the differing attributes change what happens during subsequent calls to CreateFile.

@@ -21,5 +21,5 @@
   accessMode = case iomode of
     ReadMode      -> Win32.gENERIC_READ
     WriteMode     -> Win32.gENERIC_WRITE
-    AppendMode    -> Win32.fILE_APPEND_DATA
+    AppendMode    -> Win32.gENERIC_WRITE .|. Win32.fILE_APPEND_DATA
     ReadWriteMode -> Win32.gENERIC_READ .|. Win32.gENERIC_WRITE

What it now does is write to the file from the start, without truncating. If I revert the patch, appending works again.

*System.File.AbstractFilePath> readFile "lol.txt"
"tzaaakk"
*System.File.AbstractFilePath> appendFile "lol.txt" "123"
*System.File.AbstractFilePath> readFile "lol.txt"
"123aakk"

@hasufell
Copy link
Member Author

hasufell commented Apr 7, 2022

So, I think Win32.gENERIC_READ .|. Win32.fILE_APPEND_DATA works and https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setfilepointer indicates that gENERIC_READ might be enough to set the file pointer.


The alternative is:

--- a/System/File/AbstractFilePath/Windows.hs
+++ b/System/File/AbstractFilePath/Windows.hs
@@ -8,6 +8,7 @@ import System.AbstractFilePath.Windows ( WindowsFilePath )

 import qualified System.Win32 as Win32
 import qualified System.Win32.WindowsString.File as WS
+import Control.Monad (when)
 #if defined(__IO_MANAGER_WINIO__)
 import GHC.IO.SubSystem
 #endif
@@ -31,12 +32,18 @@ openFile fp iomode = bracketOnError
 #endif
       Nothing)
     Win32.closeHandle
-    Win32.hANDLEToHandle
+    toHandle
  where
+  toHandle h = do
+    when (iomode == AppendMode ) $ do
+       r <- Win32.setFilePointerEx  h 0 Win32.fILE_END
+       err <- Win32.getLastError
+       when (r == 4294967295 && err /= 0) $ Win32.failWith "openFile" err
+    Win32.hANDLEToHandle  h
   accessMode = case iomode of
     ReadMode	   -> Win32.gENERIC_READ
     WriteMode     -> Win32.gENERIC_WRITE
-    AppendMode    -> Win32.fILE_APPEND_DATA
+    AppendMode    -> Win32.gENERIC_WRITE .|. Win32.fILE_APPEND_DATA
     ReadWriteMode -> Win32.gENERIC_READ .|. Win32.gENERIC_WRITE

   createMode = case iomode of

@hasufell
Copy link
Member Author

ping

@Mistuke
Copy link
Contributor

Mistuke commented Apr 27, 2022

Ah sorry,

The alternative is:

I prefer the alternative as that's what we currently do, I put up #202 to fix the API so you can clean up the code. setFilePointerEx like the other Win32 calls throws an exception, so you should just catch it and close the handle.

I will merge #202 after work today.

@Mistuke Mistuke mentioned this pull request Apr 27, 2022
@Mistuke
Copy link
Contributor

Mistuke commented Apr 27, 2022

+  toHandle h = do
+    when (iomode == AppendMode ) $ do
+       r <- Win32.setFilePointerEx  h 0 Win32.fILE_END
+       err <- Win32.getLastError
+       when (r == 4294967295 && err /= 0) $ Win32.failWith "openFile" err
+    Win32.hANDLEToHandle  h

Just realized that the API was already correct, you don't need this, on failure the API raises an exception. You just need a finally here to always close the handle.

@hasufell
Copy link
Member Author

hasufell/file-io@b33d31c

@Mistuke
Copy link
Contributor

Mistuke commented Apr 30, 2022

Cheers, did the dependent changes on this get merged get so the CI passes? If so I can merge this one.

@hasufell
Copy link
Member Author

Cheers, did the dependent changes on this get merged get so the CI passes? If so I can merge this one.

I'll add one more test suite to filepath that proves that old and new versions are equivalent. Maybe it's overkill, but I like to sleep well. Then this can be merged.

I don't know which GHC version will be the first to depend on filepath-2.0 (maybe 9.6 or later), so you maybe want to maintain two branches for Win32?

@Mistuke Mistuke reopened this Jul 13, 2022
@Mistuke
Copy link
Contributor

Mistuke commented Jul 13, 2022

alright, the external CI now shows the same error. (see Appveyor)

@hasufell hasufell force-pushed the AFPP branch 2 times, most recently from aa97962 to 0f06662 Compare July 13, 2022 16:19
@hasufell hasufell force-pushed the AFPP branch 2 times, most recently from c351218 to 995f3cc Compare July 13, 2022 20:11
@hasufell
Copy link
Member Author

hasufell commented Jul 13, 2022

The CI fails, because both filepath and bytestring have a lower bound on base >=4.9 and that's unlikely to change. ghc-8.0.2 is really the oldest we care about. I can't even install ghc-7.6.3 on linux.

@Mistuke
Copy link
Contributor

Mistuke commented Jul 13, 2022

The CI fails, because both filepath and bytestring have a lower bound on base >=4.9 and that's unlikely to change. ghc-8.0.2 is really the oldest we care about. I can't even install ghc-7.6.3 on linux.

Well, Win32 doesn't need to run on Linux :) Please put the exports for the FPP stuff behind an

 if impl(ghc >= 8.0)

as I don't want to lose compatibility with older GHCs, we've worked very hard to maintain it.

@hasufell
Copy link
Member Author

hasufell commented Jul 14, 2022

The CI fails here, because cabal build also pulls filepath into the resolution, even if it's not needed, because it's listed in cabal.project as source-repository package.

There are two known workarounds:

  1. use cabal pre-release which allos ghc conditionals in cabal.project
  2. use multiple cabal.project files and select them manually in command line by checking ghc --version or so

I'm not sure how to fix the private CI.

@Mistuke
Copy link
Contributor

Mistuke commented Jul 14, 2022

We already had a dependency on filepath before, could we just add the constrain on the version of filepath needee under the FFp path?

@hasufell
Copy link
Member Author

We already had a dependency on filepath before, could we just add the constrain on the version of filepath needee under the FFp path?

This is not about Win32.cabal. It's about cabal.project. The cabal ticket is here haskell/cabal#5444

Whether you use source-repository-package or packages results in the same issue.

@Mistuke
Copy link
Contributor

Mistuke commented Jul 14, 2022 via email

@hasufell
Copy link
Member Author

So should it go green once it is?

That depends on whether this is the only issue with this PR. I can't tell, because I don't know how to install these ancient GHCs.

@Mistuke
Copy link
Contributor

Mistuke commented Jul 14, 2022 via email

@hasufell
Copy link
Member Author

Can't you add --allow-newer to allow cabal to ignore the upper bounds to test on the CI?

Doubt so, because base is not reinstallable.

@hasufell
Copy link
Member Author

hasufell commented Jul 25, 2022

@Mistuke the private CI fails and I think this is not a sustainable workflow for contributors. I can't see the logs and don't wanna invest more time in blind-folded CI fixing. The public CI succeeds, feel free to adjust this PR.

@Mistuke Mistuke merged commit 931497f into haskell:master Jul 29, 2022
@hasufell
Copy link
Member Author

@Mistuke is this making a hackage release?

@Mistuke
Copy link
Contributor

Mistuke commented Jul 29, 2022

@Mistuke is this making a hackage release?

Yes https://hackage.haskell.org/package/Win32 I'll fix the docs later when I feel a bit better.

@hasufell
Copy link
Member Author

Thanks. Did you catch something in the amazonas?

@Mistuke
Copy link
Contributor

Mistuke commented Aug 2, 2022

Thanks. Did you catch something in the amazonas?

haha, luckily no, just exhausted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants