Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest build on OS X yosemite causes disk errors #1859

Closed
andrewsawczyn opened this issue Dec 20, 2014 · 48 comments
Closed

Latest build on OS X yosemite causes disk errors #1859

andrewsawczyn opened this issue Dec 20, 2014 · 48 comments
Milestone

Comments

@andrewsawczyn
Copy link

@andrewsawczyn andrewsawczyn commented Dec 20, 2014

I originally thought this was an issue with HomeBrew but am able to duplicate the issue while building directly from git and by downloading directly from fishshell.com

After building and installing fish 2.1.1 I receive the following error in the yosemite disk utility... incorrect number of file hard links". It requires a reboot into single user mode to repair and will return if fish is used. Another user has confirmed the issue. If I build and install from macports this does not occur, but I've noticed that they are at version 2.1.0 and not the latest 2.1.1.

Thoughts? If I download the pre-build app from fishshell.com the same thing happens. I am happy to further research and debug the problem if anyone has tips on how to tackle this. I am a developer but it's been awhile since I've worked in the c world.

fyi, clean install of yosemite with the latest xcode. I have isolated this down to fish.

@afarrell
Copy link

@afarrell afarrell commented Dec 21, 2014

I wouldn't normally leave a "+1" type post but since this one is pretty obscure and people might doubt a userspace tool could cause filesystem errors:

I can reproduce this using similar methodology:
Install yosemite, xcode, homebrew etc but not fish: no disk errors
brew install fish (2.1.1): no disk errors
chsh -s /usr/local/bin/fish and reboot: disk errors (mostly "invalid hard link count").

Following the same process with fish 2.1.0 (also installed via homebrew) does not trigger the errors.

If 2.1.1 is installed, rebuilding the directory structure with Diskwarrior works until the system is booted. Then the errors return.
If 2.1.0 is installed but 2.1.1 is not installed then rebuilding the directory structure works and the errors do not return (yet).

@zanchey
Copy link
Member

@zanchey zanchey commented Dec 21, 2014

I don't have OS X, but my money would be on the hard link that fish creates in /tmp for the old socket path so that new versions keep on working. We don't remove it on shutdown (I can't remember the semantics right now, but it may not be safe to do so). I wonder if there is something that HFS+ doesn't like about that behaviour. Could you confirm that your /tmp is located on an HFS or HFS+ filesystem?

@andrewsawczyn
Copy link
Author

@andrewsawczyn andrewsawczyn commented Dec 21, 2014

"I wouldn't normally leave a "+1" type post but since this one is pretty obscure and people might doubt a userspace tool could cause filesystem errors:"

Right there with you. Thought I had hardware issues, replaced my SSD, ran memory tests but everything checked out.

While researching this, I did find mention of a similar issue with Time Machine and sockets and/or named pipes with similar end results. I think zanchey may be on to something.

@zanchey
Copy link
Member

@zanchey zanchey commented Dec 22, 2014

You could try removing /tmp/fishd.socket.$USER before shutting down and see if that helps.

@ridiculousfish ridiculousfish added this to the next-minor milestone Dec 22, 2014
@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Dec 22, 2014

This sounds very serious

@jefferai
Copy link

@jefferai jefferai commented Dec 25, 2014

It is. Among other things, it prevents you from performing many common disk-modifying operations, such as installing Boot Camp or resizing partitions. Some of these can be performed within system recovery, but importantly, some can't (if for instance you are using CoreStorage volumes), because as soon as you fix the errors and reboot into normal mode, errors return.

@blomma
Copy link

@blomma blomma commented Dec 25, 2014

I myself experienced this hard link error on my macbook air, at the time i didn't tie it into fish and since I've repaired it has shown up again. the only way to repair it via dropping into single user mode and executing the good old fsck.

Im typing this into a brand new imac with home-brew and fish installed, /tmp.fishd.socket.$USER exists and i have zero hardlink problems. Both have ssd drives, so if fish is responsible for this it isn't a sure thing to happen. Im also running 2.1.1

@ghost
Copy link

@ghost ghost commented Dec 27, 2014

I too have isolated this down to fish, version 2.1.1, using a clean install of Yosemite incl. Command Line Developer Tools, Homebrew and bash. The results are consistent between having Core Storage enabled and reverting the volume to its native type.

The disk error appear when running fish the first time.

$ fish -c 'exit'
$ diskutil verifyVolume Macintosh\ HD

Started file system verification on disk0s2 Macintosh HD
Verifying file system
Using live mode
Performing live verification
Checking Journaled HFS Plus volume
Checking extents overflow file
Checking catalog file
Checking multi-linked files
Incorrect number of file hard links
Checking catalog hierarchy
Checking extended attributes file
Checking volume bitmap
Checking volume information
The volume Macintosh HD was found corrupt and needs to be repaired
File system check exit code is 8
Error: -69845: File system verify or repair failed
Underlying error: 8: POSIX reports: Exec format error

And disappear when removing the fishd socket.

$ rm /tmp/fish.$USER/fishd.socket
$ diskutil verifyVolume Macintosh\ HD

Started file system verification on disk0s2 Macintosh HD
Verifying file system
Using live mode
Performing live verification
Checking Journaled HFS Plus volume
Checking extents overflow file
Checking multi-linked files
Checking catalog hierarchy
Checking extended attributes file
Checking volume bitmap
The volume Macintosh HD appears to be OK
File system check exit code is 0
Finished file system verification on disk0s2 Macintosh HD

On a side note, having this in config.fish should help to temporarily circumvent the issue.

function on_exit --on-process %self
  rm /tmp/fish.$USER/fishd.socket
end
@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Dec 28, 2014

Thanks for this investigation niclasgelin. My laptop died; I should be able to take a closer look next week when the replacement arrives.

@ghost
Copy link

@ghost ghost commented Dec 28, 2014

No problem ridiculousfish. After spending a couple of hours suspecting the SSD being up for early retirement I wanted to at least share my findings.

@TimSoethout
Copy link

@TimSoethout TimSoethout commented Jan 17, 2015

I am experiencing the same issue.

diskutil verifyvolume <volume-guid>
Started file system verification on disk1 Macintosh HD
Verifying storage system
Checking volume
disk0s2: Scan for Volume Headers
disk0s2: Scan for Disk Labels
Logical Volume Group <guid> on 1 device
disk0s2: Scan for Metadata Volume
Logical Volume Group has a 17 MB Metadata Volume with double redundancy
Start scanning metadata for a valid checkpoint
Load and verify Segment Headers
Load and verify Checkpoint Payload
Load and verify Transaction Segment
Incorporate 0 newer non-checkpoint transactions
Load and verify Virtual Address Table
Load and verify Segment Usage Table
Load and verify Metadata Superblock
Load and verify Logical Volumes B-Trees
Logical Volume Group contains 1 Logical Volume
Load and verify <guid>
Load and verify <guid>
Load and verify Freespace Summary
Load and verify Block Accounting
Load and verify Live Virtual Addresses
Newest transaction commit checkpoint is valid
Load and verify Segment Cleaning
The volume <guid> appears to be OK
Storage system check exit code is 0
Verifying file system
Using live mode
Performing live verification
Checking Journaled HFS Plus volume
Checking extents overflow file
Checking catalog file
Incorrect number of file hard links
Checking catalog hierarchy
Checking extended attributes file
Checking volume information
File system check exit code is 8
Error: -69845: File system verify or repair failed
Underlying error: 8: POSIX reports: Exec format error

But it keeps coming back even when removing fish. It's driving me crazy.
This happens on a FileVault enabled disk and running Yosemite.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Jan 17, 2015

My new Mac is up and running. I'll take a crack at this today.

@vaxo
Copy link

@vaxo vaxo commented Feb 1, 2015

I have this problem too. When will the next minor version?
I can't live without FISH :-)

@Stapelzeiger
Copy link

@Stapelzeiger Stapelzeiger commented Feb 1, 2015

@vaxo downgrading solved the problem for me. You could do the same while waiting for a fix. (I'm running version 2.0.0, I haven't tried any others except 2.1.1)

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 2, 2015

Still reproduces on 10.10.2

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 2, 2015

@zanchey What would you think about using a symlink instead of a hard link for the socket file in /tmp? Any downside?

@zanchey
Copy link
Member

@zanchey zanchey commented Feb 2, 2015

It would need testing on Ubuntu, as the Yama security module does its best to block it - I did try that initially and I'm pretty sure it doesn't work.

I'm really quite impressed that it's possible to corrupt the filesystem from userland, although most Linuxes long went down the path of using a RAM-backed FS for /tmp.

@zanchey
Copy link
Member

@zanchey zanchey commented Feb 2, 2015

I wonder if nc -Ul /tmp/foo &; ln /tmp/foo /tmp/bar causes the same problems.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 2, 2015

Good idea zanchey. I'd love to find some simple repro steps for filling a bug against OS X. It looks like that particular command doesn't trigger it, but a similar one might.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 2, 2015

I take it back, that totally works! But you have to restart afterwards

So the steps are nc -Ul /tmp/foo &; ln /tmp/foo /tmp/bar then restart. I speculate that whatever clears the contents of /tmp does it in a way that messes up hard link counts. I'll file a bug.

One nasty workaround would be to use symlinks only on OS X, and hard links on Linux.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 2, 2015

This doesn't seem to reproduce with ordinary files, e.g. touch /tmp/foo ; ln /tmp/foo /tmp/bar

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 2, 2015

Filed at bugreporter.apple.com and http://openradar.appspot.com/radar?id=6387155723091968

@zanchey
Copy link
Member

@zanchey zanchey commented Feb 3, 2015

I'd be happy to roll a 2.1.2 just for OS X and nuke the link entirely on the proviso that "reboot your machine" goes in the release notes. If there's no old versions installed then the link never gets touched.

@zanchey
Copy link
Member

@zanchey zanchey commented Feb 4, 2015

(OTOH is solved in master :-)

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 4, 2015

I hate the idea of leaving a minor version with a serious bug like this - it means that when we regress issues we'll corrupt our filesystem. Maybe the thing to do is to commit a simple fix to 2.1.2 (perhaps as simple as disabling the link on OS X) but not release it, instead releasing a 2.2.0 from master.

@nmalzieu
Copy link

@nmalzieu nmalzieu commented Feb 6, 2015

@ridiculousfish @zanchey I get the same problem. I'm not sure I understand correctly though:
the bug happens in 2.1.1, but not 2.0.0.
Should I downgrade to 2.0.0 ? Or is it fixed in master and should I build from master ?
Thanks !

@DomT4
Copy link

@DomT4 DomT4 commented Feb 6, 2015

(OTOH is solved in master :-)

The problem is solved in the Master branch? Is there a specific set of commits with the fixes? It'd perhaps be worthwhile to get OS X package managers to backport the fix, if a new release with a fix isn't completely imminent.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 6, 2015

This is fixed in master by virtue of removing fishd entirely. This was an involved process with a lot of changes.

Probably the simplest thing to do would be to use a symlink on OS X, assuming it works.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 8, 2015

I confirmed that a symlink on OS X allows existing fish 2.0 instances to continue working, and does not produce the disk errors.

@DomT4
Copy link

@DomT4 DomT4 commented Feb 8, 2015

Are there plans for a new minor release or is the advice for people to use the git build for now?

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 9, 2015

Let's do a release with just this fix. It can be OS X only (the Linux code isn't changing).

@DomT4
Copy link

@DomT4 DomT4 commented Feb 9, 2015

Alright, sounds good, Cheers! Will keep an eye out for that.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 10, 2015

New branch Integration_2.1.2 contains the fix

@zanchey
Copy link
Member

@zanchey zanchey commented Feb 11, 2015

As the code is unchanged, is there any utility in doing new builds for Linux?

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 11, 2015

We should test on Linux but I don't think there's value in doing a release on Linux.

@zanchey zanchey modified the milestones: 2.1.2, next-minor Feb 11, 2015
MerelyAPseudonym pushed a commit to MerelyAPseudonym/homebrew that referenced this issue Feb 21, 2015
This is just a bugfix release for OS X.
For more information, see:
* fish-shell/fish-shell@b3aa187
* fish-shell/fish-shell#1859
@MerelyAPseudonym
Copy link

@MerelyAPseudonym MerelyAPseudonym commented Feb 21, 2015

@DomT4 @ridiculousfish I created a Homebrew PR to use version 2.1.2: Homebrew/legacy-homebrew#37009

MikeMcQuaid added a commit to Homebrew/legacy-homebrew that referenced this issue Feb 21, 2015
This is just a bugfix release for OS X.
For more information, see:
* fish-shell/fish-shell@b3aa187
* fish-shell/fish-shell#1859

Closes #37009.

Signed-off-by: Mike McQuaid <mike@mikemcquaid.com>
@zanchey
Copy link
Member

@zanchey zanchey commented Feb 22, 2015

Although I appreciate the enthusiasm, after the mess with 2.1.1 I think it would have been better if updating Homebrew waited for the actual release (i.e. tarball and release announcement). Otherwise we get issues filed like #1953 (prevented by using the tarball).

We ended up moving the tag for 2.1.1 about six times in the end due to showstoppers discovered just before the release.

@DomT4
Copy link

@DomT4 DomT4 commented Feb 22, 2015

Apologies, from the discussion here I got the impression there wasn't going to be a full release for this as only OS X users needed the minor update, so I didn't say anything in the Homebrew/Homebrew PR on that front. I'll tee up a PR to use the official tarball when one is available.

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 25, 2015

2.1.2 is finally released on the main site fishshell.com

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Feb 25, 2015

Homebrew PR here: Homebrew/legacy-homebrew#37177

@DomT4
Copy link

@DomT4 DomT4 commented Feb 25, 2015

Thanks for the PR! 👍

@zanchey zanchey closed this Feb 27, 2015
@ekristen
Copy link

@ekristen ekristen commented Apr 7, 2015

I'm still seeing this issue on 2.1.2 (installed via Homebrew) with FileVault enabled on 10.10.2

@andrewsawczyn
Copy link
Author

@andrewsawczyn andrewsawczyn commented Apr 7, 2015

I am not. Although I'm not using FileVault.

I uninstalled fish, rebooted, installed fish, launched iTerm, played around, rebooted. Ran disk utility a few times in that process and I am not seeing multi linked files. I'll try to run disk utility for the next few days and see what happens.

@alec-c4
Copy link

@alec-c4 alec-c4 commented Jul 7, 2015

hi, guys. is this issue completely fixed on Yosemite or not?

@ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Jul 7, 2015

Yes, fish 2.1.2 will no longer cause filesystem corruption, though it cannot repair corruption that has already occurred (which AFAIK is harmless)

@DomT4
Copy link

@DomT4 DomT4 commented Jul 7, 2015

We haven't had any more issues around this reported to Homebrew.

@FrankZhang002
Copy link

@FrankZhang002 FrankZhang002 commented Aug 30, 2015

I using OS X 10.10.5, I have the same issue, but disk utility looks like can fix it?

@faho
Copy link
Member

@faho faho commented Aug 30, 2015

@FrankZhang002: Caused by a fish version of at least 2.1.2?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.