New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more KBFS health checks #42

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
3 participants
@songgao
Copy link

songgao commented Oct 24, 2017

Hey there! Thanks for the awesome project!

We've seen several bug reports where kbsecret users had process written into a dir that's supposed to be the KBFS mount point, while it's not mounted. It seems it's because kbsecret only checks for existence of /keybase, but doesn't make sure it's actually mounted. To mitigate that, this PR checks for /keybase/.kbfs_status existence, which is a (hidden) special file that's very unlikely to get written into. In addition (suggested by @strib), check content of /keybase/.kbfs_status to make sure user is logged in.

I know close to nothing about ruby, so please bear with me for any stupid mistakes, and feel free to throw it away if there's a more idiomatic/cleaner way to do this!

Thank you for contributing to KBSecret! Please fill out the items below to help us merge
your work more quickly.

  • Have you run make test locally and ensured that all tests pass?
  • Have you run make coverage locally and ensured that coverage did not drop?

If you're changing something in the library:

  • Have you introduced unit tests for your changes?

If you're changing something in the CLI:

  • Have you updated the manual pages and shell completion (if necessary)?
  • Have you introduced CLI unit tests for your changes and ensured that make test-cli passes?

Please describe your changes briefly here. Thank you!

more KBFS health checks
Hey there! Thanks for the awesome project!

We've seen several bug reports where `kbsecret` users had process written into a dir that's supposed to be the KBFS mount point, while it's not mounted. It seems it's because `kbsecret` only checks for existence of `/keybase`, but doesn't make sure it's actually mounted. To mitigate that, this PR checks for `/keybase/.kbfs_status` existence, which is a (hidden) special file that's very unlikely to get written into. In addition (suggested by @strib), check content of `/keybase/.kbfs_status` to make sure user is logged in.

I know close to nothing about ruby, so please bear with me for any stupid mistakes, and feel free to throw it away if there's a more idiomatic/cleaner way to do this!
@woodruffw

This comment has been minimized.

Copy link
Member

woodruffw commented Oct 24, 2017

Hey there! Thanks for the awesome project!

Thanks! Same to you; Keybase is an amazing program 😄

I know close to nothing about ruby, so please bear with me for any stupid mistakes, and feel free to throw it away if there's a more idiomatic/cleaner way to do this!

This approach looks good to me, but I have Ruby support for Keybase broken out into a few auxiliary libraries so I'll probably implement these checks there. Thanks for the tip about /keybase/.kbfs_status!

check content of /keybase/.kbfs_status to make sure user is logged in.

Do you happen to know if this information will ever differ from current_user in ~/.config/keybase/config.json? That's where I'm currently getting the logged-in user from.

@woodruffw woodruffw added the bug label Oct 24, 2017

woodruffw added a commit to kbsecret/keybase-unofficial-local that referenced this pull request Oct 24, 2017

kbfs: Fill in Keybase::Local::KBFS
Implements `KBFS.mounted?` and `KBFS.status`. `mounted?` tests whether
or not KBFS is currently mounted, and `status` returns an
`OpenStruct` mapping of the status file used to detect KBFS's state.

See kbsecret/kbsecret#42

@woodruffw woodruffw closed this in 99df935 Oct 24, 2017

@woodruffw

This comment has been minimized.

Copy link
Member

woodruffw commented Oct 25, 2017

99df935 should fix this, based on my local testing.

I noticed, though, that keybase ctl stop doesn't actually stop kbfsfuse. When Keybase isn't running but kbfsfuse is, any reads under /keybase hang until Keybase is started again.

Do you know if that's intentional? If it is, I probably need to add an extra test for whether kbfsfuse is running, to prevent hangs.

@strib

This comment has been minimized.

Copy link

strib commented Oct 25, 2017

@woodruffw: keybase ctl commands only affect the background keybase service, but not kbfsfuse. That is intentional.

Just because kbfsfuse is running, though, doesn't mean it's successfully mounted at /keybase. (For example, if there was pre-existing data there.)

@woodruffw

This comment has been minimized.

Copy link
Member

woodruffw commented Oct 25, 2017

Thanks for the clarification @strib!

Since kbfsfuse can run without being successfully mounted, is there some way to test for a successful mount that doesn't involve stat(2)/access(2)-ing /keybase/.kbfs_status? I'd like to prevent those hangs, if possible.

@songgao

This comment has been minimized.

Copy link
Author

songgao commented Oct 25, 2017

@woodruffw Thanks for implementing the checks in the right place!

Did you notice significant hang on /keybase/.kbfs_status? I think typically stat on that file should be fairly quick. If it hangs, stat on /keybase/public is likely gonna hang too. Checking output of mount or its equivalent wouldn't work because if kbfsfuse process crashes or is killed with SIGKILL, /keybase would be left there broken.

Is there a way to have a timeout on that stat in ruby, and raise an exception if it's longer than e.g. 1s?

@woodruffw

This comment has been minimized.

Copy link
Member

woodruffw commented Oct 25, 2017

Did you notice significant hang on /keybase/.kbfs_status?

Yeah, but only when kbfsfuse was running and keybase was not running, e.g. after calling keybase ctl stop. In every other circumstance, stat-ing the file was pretty quick.

Is there a way to have a timeout on that stat in ruby, and raise an exception if it's longer than e.g. 1s?

I think I could use the Timeout module for that. I'll do some testing 😄

@woodruffw

This comment has been minimized.

Copy link
Member

woodruffw commented Oct 25, 2017

Looks like Timeout can't interrupt a syscall like stat or access, which makes some sense (I'd have to use a special external library like terminator).

Would it be possible to introduce something like keybase fs status to distinguish between "mounted", "mounted but not talking to Keybase", and "not mounted"? It's only the second state in that list that causes hangs.

@songgao

This comment has been minimized.

Copy link
Author

songgao commented Oct 25, 2017

Ugh; sorry for the wrong suggestion and thanks for trying it out! I think the reason why stat hangs sometimes is because for dynamically generated content, we actually "read" the content to get the size: https://github.com/keybase/kbfs/blob/39534e4a6ffe95f84a5480d60b3606112fa93206/libfuse/special_read_file.go#L33-L35

I'll see tomorrow if we can put something together.

@strib

This comment has been minimized.

Copy link

strib commented Oct 25, 2017

/keybase/.kbfs_metrics is an existing file that would work as a quick solution. It doesn't require network or service connectivity.

We can't use keybase fs commands to relay that second state, since it implies that keybase can already talk to kbfs and vice versa. I think checking a special .kbfs file in the mounted directory, plus a check to make sure the background keybase service is running, should be enough.

@woodruffw

This comment has been minimized.

Copy link
Member

woodruffw commented Oct 25, 2017

Ugh; sorry for the wrong suggestion and thanks for trying it out!

Not a problem at all! I'd love it if I could use just plain old stat to do this, but I'll go with what @strib suggested for now.

@songgao

This comment has been minimized.

Copy link
Author

songgao commented Oct 25, 2017

Sounds good!

woodruffw added a commit to kbsecret/keybase-unofficial-local that referenced this pull request Oct 25, 2017

kbfs: Refine `mounted?` conditions
`Keybase::Local::KBFS.mounted?` now tests whether both Keybase
and KBFS are running before testing for a file on the mountpoint.

This prevents `stat(2)` hangs whenever KBFS itself is running
but Keybase isn't.

See kbsecret/kbsecret#42

@songgao songgao deleted the songgao:patch-1 branch Oct 31, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment