Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption issue on Sonoma 14.1.1 with apple silicon #45

Open
dhilung opened this issue Dec 22, 2023 · 12 comments
Open

Data corruption issue on Sonoma 14.1.1 with apple silicon #45

dhilung opened this issue Dec 22, 2023 · 12 comments

Comments

@dhilung
Copy link

dhilung commented Dec 22, 2023

Data gets corrupted (different md5s) or the fuse filesystem hangs with the following script when run inside a fuse mounted dir on Sonoma 14.1.1 with apple silicon (M1 Ultra)

dd if=/dev/random of=data.bin bs=1M count=64
for i in {1..10}; do echo /bin/cp data.bin data$i.bin \&; done | bash
md5 *

Tested with sshfs, encfs, and ifuse. I had to ln -s /usr/local/lib/libfuse-t.dylib /usr/local/lib/libfuse.2.dylib post install for these tools to work with fuse-t.

When the filesystem hangs, killing the mounting process, trying to unmount with umount or sudo diskutil unmount force, and killing go-nfsv4 process does not help. Requires reboot.

My use-case was using rsync over ifuse mount, which resulted data corruption for relatively large files (>20MB).

Thanks.

@macos-fuse-t
Copy link
Owner

Hmm, I've just tried with sshfs from (brew install macos-fuse-t/homebrew-cask/fuse-t-sshfs) and it worked.
Could you try running native tools compiled with fuse-t without hacking shared libraries and let me know?
Please also attach fuse-t log file found in ~/Library/Application\ Support/fuse-t/logs/nfsv4-go.log. It would be helpful if you enable debug option first in /Library/Application\ Support/fuse-t/cfg/fuse-t.ini

@donald30018
Copy link

donald30018 commented Dec 27, 2023

Ditto this on Intel MacOS.
Would be the same situation though - hacky access (drop in lol) not native.

EDIT: Okay I modified the makefile for ifuse to use your dylib instead of just renaming. Same issue, corrupted file even without touching rwsize.
nfsv4-go.log output:

time="2023-12-27T15:58:49-05:00" level=info msg="comm socket: 5"
time="2023-12-27T15:58:49-05:00" level=info msg="montor socket: 7"
time="2023-12-27T15:58:49-05:00" level=info msg="NFS server version 1.0.33 (ecba4ef6829d6ef5a14a49952d44d9e5b0dbbcbb) running at 127.0.0.1:52100"
time="2023-12-27T15:58:49-05:00" level=info msg="Mounting: /Users/donaldgreen/ios"
time="2023-12-27T15:58:49-05:00" level=info msg="mount [-o port=52100,mountport=52100,vers=4,namedattr -t nfs fuse-t:/ /Users/donaldgreen/ios]"
time="2023-12-27T15:58:49-05:00" level=warning msg="fuse listxattr returned error: -78"
time="2023-12-27T15:58:49-05:00" level=warning msg="fuse listxattr returned error: -78"
time="2023-12-27T15:58:50-05:00" level=warning msg="fuse listxattr returned error: -78"
time="2023-12-27T15:58:52-05:00" level=error msg="Lookup() of {1 0}/.com.apple.mobile_container_manager.metadata.plist failed"
time="2023-12-27T15:58:54-05:00" level=warning msg="fuse listxattr returned error: -78"

End up with a file roughly the correct size but corrupted.

@donald30018
Copy link

donald30018 commented Dec 27, 2023

Okay didn't send debug as an option. Here is a big ass log file:
fuset.log

I didn't pass any options besides debug. Not 100% sure if ifuse does

@macos-fuse-t
Copy link
Owner

macos-fuse-t commented Dec 27, 2023

Is it for sshfs with dd script? What should I run to reproduce.

@donald30018
Copy link

donald30018 commented Dec 27, 2023

(Sorry if this doesn't answer your question).
Disk utility show APFS. The fuse is initiated with ifuse which is part of libimobiledevice, which handles the io with the iphone
Used rsync and a copy using the gui in Finder.

Recreating would require an iphone and installing libimobiledevice and ifuse, then mounting the phone (ifuse --container com.some.appname -o debug /dir/to/mount/to), then copying a file over. The app has to support File Sharing.

And the corruption issue does not happen with MacFuse, only fust-t. That said macfuse is considerably slower (~3Mb/s)

@dhilung
Copy link
Author

dhilung commented Dec 31, 2023

I tested with brew install macos-fuse-t/homebrew-cask/fuse-t and brew install macos-fuse-t/homebrew-cask/fuse-t-sshfs and the issue persists.

Please note, I tested after clearing previous link at /usr/local/lib/.

I see a lot of fuse listxattr returned error: -78 and opLookup failed with err: fuse lookup returned error -2 errors after enabling debug=true.

nfsv4-go_1.log

My test script was:

sysctl -a|grep -P "kern.version|brand_string"
brew install macos-fuse-t/homebrew-cask/fuse-t
brew install macos-fuse-t/homebrew-cask/fuse-t-sshfs
cat /Library/Application\ Support/fuse-t/cfg/fuse-t.ini
rm ~/Library/Application\ Support/fuse-t/logs/nfsv4-go.log

mkdir /tmp/a
mkdir /tmp/b
sshfs localhost:/tmp/a /tmp/b
sleep 1
cd /tmp/b

dd if=/dev/random of=data.bin bs=1M count=64
for i in {1..10}; do echo /bin/cp data.bin data$i.bin \&; done | bash

cp ~/Library/Application\ Support/fuse-t/logs/nfsv4-go.log /tmp/nfsv4-go_1.log
md5 /tmp/a/*
md5 *
cp ~/Library/Application\ Support/fuse-t/logs/nfsv4-go.log /tmp/nfsv4-go_2.log
echo done

And an instance of a test output:

$ bash script.sh
kern.version: Darwin Kernel Version 23.1.0: Mon Oct  9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000
machdep.cpu.brand_string: Apple M1 Ultra
Warning: Not upgrading fuse-t, the latest version is already installed
Warning: Not upgrading fuse-t-sshfs, the latest version is already installed
[Default]
debug=true
;nfc=false
;readonly=false
;namedattr=true
;attrcache=true
;nobrowse=false
;rwsize=0
;noatime=false
;nomtime=false
;location=fuse-t
64+0 records in
64+0 records out
67108864 bytes transferred in 0.516410 secs (129952681 bytes/sec)
MD5 (/tmp/a/data.bin) = 61a0fe6bccbd9dd6a94828e770107d63
MD5 (/tmp/a/data2.bin) = 00a4b91ec9d7a00800355ffb05e9dd53
MD5 (data.bin) = 61a0fe6bccbd9dd6a94828e770107d63
^C^C^Z
[1]+  Stopped                 bash script.sh
$ kill %1
[1]+  Terminated: 15          bash script.sh

$ md5 /tmp/a/*
MD5 (/tmp/a/data.bin) = 61a0fe6bccbd9dd6a94828e770107d63
MD5 (/tmp/a/data1.bin) = cce1390c41798529f404362322b1c764
MD5 (/tmp/a/data10.bin) = 9cd5d7fa729d3b1d57e6f34e464826c9
MD5 (/tmp/a/data2.bin) = d166089a727d8b61fc71f1c1170e3328
MD5 (/tmp/a/data3.bin) = 44b64ac1cfcda068f16e6fb48212fab1
MD5 (/tmp/a/data4.bin) = 5bbe93605a5865be7a1e10244ea243f3
MD5 (/tmp/a/data5.bin) = 8ca59b4a58b8b96d23417f4b6c5df048
MD5 (/tmp/a/data6.bin) = 3d2289ff39b40cb3f86c464b938be7a1
MD5 (/tmp/a/data7.bin) = ad325d1d490a62f44c53ea9156ce7347
MD5 (/tmp/a/data8.bin) = 7f2da2c8ce96cd41b432f5f0e239022a
MD5 (/tmp/a/data9.bin) = 74dc8bb29f98a983baf1a15def0cc904

$ ls b
ls: fts_read: Permission denied
$ umount b
umount: unmount(/private/tmp/b): Invalid argument
$ sudo diskutil umount b
Password:
Volume Data on disk3s5 failed to unmount: dissented by PID 1 (/sbin/launchd)
Dissenter parent PPID 0 (kernel_task)

$ grep error /tmp/nfsv4-go_1.log
time="2023-12-30T19:49:33-05:00" level=warning msg="fuse listxattr returned error: -78"
time="2023-12-30T19:49:33-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:33-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:33-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:33-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:33-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:33-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:33-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=warning msg="fuse listxattr returned error: -78"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=warning msg="fuse listxattr returned error: -78"
time="2023-12-30T19:49:34-05:00" level=warning msg="fuse listxattr returned error: -78"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
time="2023-12-30T19:49:34-05:00" level=debug msg="opLookup failed with err: fuse lookup returned error -2"
$

As you can see, I had to terminate the script as it hangs forever.

@macos-fuse-t
Copy link
Owner

Could you test again with the latest sshfs package
https://drive.google.com/file/d/1QVTyysvi3jsWctc4LLu0InbHmBBMmQzI/view?usp=drive_link
Errors -78 and -2 can be safely ignored. As for now I'm unable to reproduce this issue on my end, still trying to figure out what causes it.

@macos-fuse-t
Copy link
Owner

Update: I reproduced the issue.
What happens is that cp command returns before the actual copy is completed. If you put a sleep between the copy command and md5, it returns correct results. I suspect there's weird IO caching somewhere in place. I'm looking for a mitigation

@macos-fuse-t
Copy link
Owner

Update 2:
I reproduced similar lockups just with the plain NFS mount. Unfortunately it seems to be a macos kernel bug related to simultaneous writing and reading same files on NFS shares. I'm looking for a possible workaround.

@donald30018
Copy link

Update 2: I reproduced similar lockups just with the plain NFS mount. Unfortunately it seems to be a macos kernel bug related to simultaneous writing and reading same files on NFS shares. I'm looking for a possible workaround.

Appreciate you looking into it :)
lmk if I can help with any testing or anything.

@macos-fuse-t
Copy link
Owner

Update 3: For now I don't see any possible workaround since the lockups occur in the macos NFS client code before it reaches the server. I opened another report to apple but I won't be holding too much hope for a quick fix. As a developer it really frustrates me working with such a crappy OS, reminds me of Windows 95 if you're old enough to experience it.

@donald30018
Copy link

Only to play Roller Coaster Tycoon 🎢

Thanks for looking into it. All of my woes with the project I'm working on are MacOS/iOS related so I feel the pain.
For now (potentially helpful for you @dhilung), Libimobiledevice added an AFC Client (afcclient) that lets you pull stuff directly from the App Sandbox using, well, AFC.

Being able to mount as a volume is better for many reasons, but it's something. Appreciate all of your work @macos-fuse-t

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants