Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProjFS API does not work in Windows container #268

Closed
hach-que opened this issue Sep 1, 2022 · 21 comments
Closed

ProjFS API does not work in Windows container #268

hach-que opened this issue Sep 1, 2022 · 21 comments
Assignees
Labels
enhancement New feature or request

Comments

@hach-que
Copy link

hach-que commented Sep 1, 2022

I'm trying to run a console application that uses the Win32 ProjFS API to project into the filesystem. However, the application is getting a 0x80070002 HRESULT (File Not Found) from the PrjStartVirtualizing function, which isn't a documented return code. The same console app works on the host.

The Dockerfile for my container does install the required optional component with:

RUN powershell.exe -Command `
    Set-ExecutionPolicy -Force Bypass; `
    Enable-WindowsOptionalFeature -Online -FeatureName Client-ProjFS -NoRestart

and it reports it as "no restart required". But nevertheless the API fails which makes me suspect something is missing in the Docker images. I've confirmed this is an issue with the following images, even though they should definitely support this API (it's been supported since 1809).

  • mcr.microsoft.com/windows:20H2-KB5014699-amd64
  • mcr.microsoft.com/windows:20H2

This is when running in Hyper-V isolation mode (which should mean the containers have their own kernel). Any ideas?

@hach-que
Copy link
Author

hach-que commented Sep 1, 2022

This also doesn't work with mcr.microsoft.com/windows/server:ltsc2022.

@hach-que
Copy link
Author

hach-que commented Sep 1, 2022

I've updated the source code repository for the console application using ProjFS so that Microsoft folks can try to reproduce this. If you clone the repository and then in the root of the repository run:

test\test.bat

It will build the app and the container image and then try to run it. The source code is currently targeting mcr.microsoft.com/windows:20H2-KB5014699-amd64 but that can be changed in test.Dockerfile. You obviously need Docker Engine running on the machine for this to work.

@vrapolinario
Copy link
Contributor

Just out of curiosity, have you tried the Server image? https://mcr.microsoft.com/en-us/product/windows/server/about
Also, I don't think it would make any difference, but have you tried process isolation?

@hach-que
Copy link
Author

hach-que commented Sep 1, 2022

I have tried the server image (see above re: ltsc2022). It gives the same error.

Unfortunately I'm on Windows 11 build 22000, so I don't believe there's any process isolation compatible images (at the very least the Windows 20H2 image doesn't work that way).

Here's a Dr. Memory trace of all the syscalls: drmemorytrace.txt. I couldn't really parse out which syscall in particular is causing the failure, but maybe it will be more useful to Microsoft folks who can view the implementation of the ProjectedFSLib DLL.

@vrapolinario
Copy link
Contributor

Alright, sorry I missed the note on the server image. I saw that and my mind read servercore.
You are correct that older images, such as Windows, need Hyper-V isolation on Windows 11. However, the Server image is based on the Server 2022 wave, so process isolation should work for that image. Again, not that it should make much difference, just wondering what the result is.

@hach-que
Copy link
Author

hach-que commented Sep 1, 2022

Same issue with process isolation on 2022 image:

virtualizing 'C:\test.uep' into 'C:\ProgramData\projected'...
_PrjStartVirtualizing failed
failed to start virtualization: FileNotFound

@hach-que
Copy link
Author

hach-que commented Sep 2, 2022

I also get the 0x80070002 HRESULT (file not found) when testing the ProjFS example project that Microsoft provides: https://github.com/Microsoft/Windows-classic-samples/tree/main/Samples/ProjectedFileSystem. That is when built in Release mode inside mcr.microsoft.com/windows:20H2-KB5014699-amd64. That eliminates any .NET Core stuff being an issue since the sample there is just pure VC++.

Again, works totally fine on the host.

@hach-que
Copy link
Author

hach-que commented Sep 2, 2022

Here's the syscall trace for regfs under Server 2022 container, in both Hyper-V and Process isolation modes. These are much shorter than the other trace, so are hopefully more useful in diagnosing the issue.

drmemorytrace 2022 hyperv.txt
drmemorytrace 2022 process.txt

@hach-que
Copy link
Author

hach-que commented Sep 2, 2022

I suspect this syscall is the one that's failing. Inside the container, trying to create \Global??\FltMgrMsg fails. On the host it succeeds.

Container:

NtCreateFile
	arg 0: 0x00000025cf9ff6b0 (type=HANDLE*, size=0x8)
	arg 1: 0x100003 (type=unsigned int, size=0x4)
	arg 2: len=0x30, root=0x0, name=38/40 "\Global??\FltMgrMsg", att=0x40, sd=0x0000000000000000, sqos=0x0000000000000000 (type=OBJECT_ATTRIBUTES*, size=0x8)
	arg 3: 0x00000025cf9ff720 (type=IO_STATUS_BLOCK*, size=0x8)
	arg 4: <null> (type=LARGE_INTEGER*, size=0x8)
	arg 5: 0x0 (type=named constant, size=0x4)
	arg 6: 0x0 (type=named constant, size=0x4)
	arg 7: FILE_OPEN_IF (type=named constant, value=0x3, size=0x4)
	arg 8: 0x0 (type=named constant, size=0x4)
	arg 9: <NYI> (type=<struct>*, size=0x8)
	arg 10: 0x2b (type=unsigned int, size=0x4)
    failed (error=0xc0000034) =>
	arg 0: 0x00000025cf9ff6b0 => 0xffffffffffffffff (type=HANDLE*, size=0x8)
	arg 3: status=0x10, info=0x1 (type=IO_STATUS_BLOCK*, size=0x8)
	retval: 0xc0000034 (type=NTSTATUS, size=0x4)

Host:

NtCreateFile
	arg 0: 0x00000092e133f518 (type=HANDLE*, size=0x8)
	arg 1: 0x100003 (type=unsigned int, size=0x4)
	arg 2: len=0x30, root=0x0, name=38/40 "\Global??\FltMgrMsg", att=0x40, sd=0x0000000000000000, sqos=0x0000000000000000 (type=OBJECT_ATTRIBUTES*, size=0x8)
	arg 3: 0x00000092e133f580 (type=IO_STATUS_BLOCK*, size=0x8)
	arg 4: <null> (type=LARGE_INTEGER*, size=0x8)
	arg 5: 0x0 (type=named constant, size=0x4)
	arg 6: 0x0 (type=named constant, size=0x4)
	arg 7: FILE_OPEN_IF (type=named constant, value=0x3, size=0x4)
	arg 8: 0x0 (type=named constant, size=0x4)
	arg 9: <NYI> (type=<struct>*, size=0x8)
	arg 10: 0x2b (type=unsigned int, size=0x4)
    succeeded =>
	arg 0: 0x00000092e133f518 => 0x198 (type=HANDLE*, size=0x8)
	arg 3: status=0x0, info=0x0 (type=IO_STATUS_BLOCK*, size=0x8)
	retval: 0x0 (type=NTSTATUS, size=0x4)

This is also the last syscall the process makes in the container before the syscalls start to substantially diverge. I couldn't find much information on FltMgrMsg other than it's got something to do with Mini-Filter Drivers and communication ports? Other than that it seems to basically be undocumented.

@hach-que
Copy link
Author

hach-que commented Sep 2, 2022

Of note, NTSTATUS 0xc0000034 is STATUS_OBJECT_NAME_NOT_FOUND. Maybe the Docker engine isn't mapping this device from the host?

I've tried mapping it with -v "\Global??\FltMgrMsg:\Global??\FltMgrMsg" but Docker doesn't seem to like it. If there's another way to map this device on a Windows container let me know.

@hach-que
Copy link
Author

hach-que commented Sep 2, 2022

This is likely the cause, though I haven't figured out how to get the driver to load yet:

PS C:\> fltmc
The FltMgr.sys driver is not currently loaded.

@hach-que
Copy link
Author

hach-que commented Sep 2, 2022

Hmm, what's weird is that sc query fltmgr shows the driver as loaded, but fltmc tells us that it's not. This is with Hyper-V isolation:

C:\>fltmc
The FltMgr.sys driver is not currently loaded.

C:\>sc query fltmgr

SERVICE_NAME: fltmgr
        TYPE               : 2  FILE_SYSTEM_DRIVER
        STATE              : 4  RUNNING
                                (STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0

@hach-que
Copy link
Author

hach-que commented Sep 2, 2022

Some further testing with trying to manually load fltmgr.sys.

If you try to manually create the service inside the container and start it, you get error 1077 which is "No attempts to start the service have been made since the last boot.":

C:\>sc.exe create customflt type=filesys binPath=C:\Windows\system32\drivers\fltmgr.sys
[SC] CreateService SUCCESS

C:\>sc.exe start customflt
SERVICE_NAME: customflt
        TYPE               : 2  FILE_SYSTEM_DRIVER
        STATE              : 1  STOPPED
        WIN32_EXIT_CODE    : 1077  (0x435)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0
        PID                : 0
        FLAGS              :

If you try to create the service in the Dockerfile:

RUN sc create customflt type=filesys binPath=C:\Windows\system32\drivers\fltmgr.sys start=boot

and then check it's status after the container starts, you get error 31 which is "A device attached to the system is not functioning.":

C:\>sc query customflt

SERVICE_NAME: customflt 
        TYPE               : 2  FILE_SYSTEM_DRIVER  
        STATE              : 1  STOPPED 
        WIN32_EXIT_CODE    : 31  (0x1f)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0

It's just super weird that the original fltmgr service is started according to sc.exe but it's not usable in the container.

@hach-que
Copy link
Author

hach-que commented Sep 3, 2022

So this applies for process-isolation containers, but after further experimentation with implementing a Docker volume plugin (i.e. running ProjFS on the host, with the intention that the host runs ProjFS and then the container sees the mounted volume), I've discovered that containers seem to bypass the ProjFS layer on the host and they only see the hydrated state. They don't trigger the callbacks in ProjFS.

This means that if you use ProjFS and map it to a Docker container, you can't see any subdirectories or read any files. But if you open the mapped folder on the host in e.g. explorer.exe or open the files inside the projected area in something like Notepad, then the container will be able to see the files as it will see them in their hydrated state. This is likely to be a bug in either the Windows Kernel, the filesystem filter drivers, ProjFS or all of the above.

This means that we can't even bind mount directories from the host to workaround ProjFS not being available in the container.

@vrapolinario ProjFS is critical to our workloads for performance reasons. Unreal Engine consists of over 250,000 individual small files, which neither NTFS nor the Docker image commit and extraction processes handle well. It can take hours to create a container image and then extract it on other machines. By using ProjFS we could significantly reduce the time expended on this, since any given container will only need to access a small subset of the 250,000 files.

@vrapolinario
Copy link
Contributor

Thanks for all the details. I'll have to hand it over to someone else. @fady-azmy-msft can you please check on this?

@ghost
Copy link

ghost commented Oct 9, 2022

This issue has been open for 30 days with no updates.
, please provide an update or close this issue.

1 similar comment
@ghost
Copy link

ghost commented Nov 9, 2022

This issue has been open for 30 days with no updates.
, please provide an update or close this issue.

@fady-azmy-msft
Copy link
Contributor

Apologies for the late reply, I overlooked this issue. I've reached out to an internal team to look at this and created an internal ticket (#41474371) to track this.

@fady-azmy-msft
Copy link
Contributor

Hey @hach-que , I heard back from the team. ProjFS currently isn't supported in Windows containers, but this is something we'll consider for the future.

@fady-azmy-msft fady-azmy-msft self-assigned this Nov 29, 2022
@fady-azmy-msft fady-azmy-msft added the enhancement New feature or request label Nov 30, 2022
@hach-que
Copy link
Author

@fady-azmy-msft Can this please be re-opened?

I've been trying to use WinFsp as an alternative, but as this uses minifilters it runs into the same problem. I believe it is because the HCS CreateComputeSystem is trying to incorrectly apply a minifilter to a volume that isn't actually a block storage device. Without this being fixed, it's impossible to attach any volume in Windows that is not just backed by a block storage device (i.e. presumably VHD and cloud block storage work because they attach as disk controllers). This bug prevents a whole class of Kubernetes/CSI volume drivers that are useful for applications running in Windows containers.

@hach-que
Copy link
Author

Actually, I just realised that this issue and my current issue are slightly different (I'm not trying to mount the filesystem from within the container; it is a volume on the host that CreateComputeSystem refuses to work with), so I will open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants