Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Solaris FEN support #263

Open
wants to merge 9 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@isaacdavis
Copy link

commented Aug 28, 2018

This pull request fixes #12.

I'm an employee of Joyent, Inc., and have completed the Google corporate CLA.

What does this pull request do?

This pull request adds fsnotify support for Solaris and illumos via Solaris FEN. It is heavily based on #196, authored by @gfrey. As such, and with his approval, I have added @gfrey in the AUTHORS file in addition to myself.

One major revision to @gfrey's work concerns liveness when the watcher is closed when there is an unconsumed event in the channel - the old pull request would sometimes hang on the TestWatcherClose test. The new concurrency model is borrowed from fsnotify's inotify implementation.

Another major revision concerns proper translation of FEN events to fsnotify events, in particular the FILE_RENAME_TO FEN event.

Where should the reviewer start?

These pages are useful as an introduction to FEN:

The illumos man pages for port_create and related functions are also helpful for understanding how to interface with FEN.

The discussion on #196 is also worth reading. Some points I'll address here:

  • The use of syscall appears necessary here because golang.org/x/sys/unix does not (to my knowledge) provide a way to get atime and ctime.
  • cgo is used because, for illumos, the officially supported interface to the OS is libc rather than syscalls, and thus user-level code should refrain from making syscalls directly.

How should this be manually tested?

I ran all tests on a SmartOS host. On SmartOS, Go can be installed from Joyent's pkgsrc repository. fsnotify can then be built and tested as normal.

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Aug 30, 2018

The Travis CI build is failing only on MacOS on go 1.8. I'm not sure why this is, given that the code I changed affects only Solaris. I'm additionally unable to reproduce the failure on my own Mac. If anyone can offer insight, I would much appreciate it...

@gfrey gfrey referenced this pull request Aug 30, 2018

Open

MacOS CI-Tests failing #264

@gfrey

This comment has been minimized.

Copy link

commented Aug 30, 2018

I created issue #264. Seems to be a flake (it was locally and last PR builds seem to have worked).

@nathany

This comment has been minimized.

Copy link
Member

commented Aug 30, 2018

#264 is now fixed on master. Thanks @gfrey. Please rebase.

Is there a CI solution for Solaris?

Is this sufficiently reviewed by people familiar with Solaris? I'll also do a review before it is merged.

@gdey

This comment has been minimized.

Copy link
Contributor

commented Aug 31, 2018

kick integrations.

@gdey gdey closed this Aug 31, 2018

@gdey gdey reopened this Aug 31, 2018

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Aug 31, 2018

Looks like macOS builds are still failing even with the fix.

@nathany Joyent would be happy to offer a Triton container on the Joyent Public Cloud in which to build and test fsnotify. Under-the-hood this is a SmartOS zone that we'd administer and give you access to. I could assist with installing Go and setting up the build. You could then integrate this with the CI system of your choice. Please let me know if you're interested or have questions - feel free to email isaac.davis@joyent.com.

@nathany

This comment has been minimized.

Copy link
Member

commented Aug 31, 2018

@isaacdavis A CI solution would be great, though I can't say I have any idea how to integrate a Triton container with a hosted solution like Travis CI.

Also, before adding Solaris support, I want to be sure there is a point person (or people would be better) for Solaris support that are handling any issues. /cc @gfrey

We also want to be sure that Solaris support is documented, especially if there are behaviour differences between operating systems that users need to know about.

One other note: at GopherCon we have been discussing standalone packages for each OS that fsnotify relies on. This would likely look like github.com/fsnotify/fsnotify/fen. @gdey is starting on a shared test helper package to reuse across packages.

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Aug 31, 2018

@nathany It's a little hacky, but I suppose in Travis you could write a script that ssh'ed in to the container and ran the tests there, sending back the results. This would also work with any other hosted CI solution, I think.

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Sep 5, 2018

I've discovered a bug in the current implementation under review:

The port_associate function accepts a pointer to an object describing the entity being watched. port_get, when returning from an event, returns the same pointer as the portev_object member of the port_event_t struct returned from port_get. Thus, if the event-handling code uses this pointer to retrieve information about the file affected (as we currently do), it must still point to valid memory.

In associateFile, we construct a file_obj and pass a pointer to it to port_associate. The Go runtime can relocate or garbage-collect this file_obj, rendering the pointer invalid. We thus may get garbage data back when we retrieve the object upon returning from port_get.

This bug does not manifest in the test suite, as there is not enough memory churn to trigger an overwrite of the file_obj memory. I discovered it when running Prometheus on SmartOS using fsnotify to watch config files.

I'll have a fix committed by tomorrow (2018-09-06) at noon PST.

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Sep 27, 2018

I'd like to check up on the status of this - what work needs to be done to get this merged? Please let me know what I can do to help.

@nathany
Copy link
Member

left a comment

Thanks for your work on this Isaac.

For this review, please keep in mind that I'm not familiar with FEN, nor have I exercised this code or even ensured that it compiles (I don't have a Solaris environment, nor am I personally planning to set one up).

Structurally, I would prefer if the cgo code were more isolated as we have done for fsevents, but I'm not sure if that can even be done to the same extent.

It appears that FEN requires that a malloced file_obj_t remain allocated for the lifetime of the watch, is that correct?

Show resolved Hide resolved fen.go Outdated
Show resolved Hide resolved fen.go Outdated
Show resolved Hide resolved fen.go Outdated
fen.go Outdated

// Handle all children of the directory.
for _, finfo := range files {
err := handler(filepath.Join(path, finfo.Name()), finfo)

This comment has been minimized.

Copy link
@nathany

nathany Sep 27, 2018

Member

This list of of files includes subdirectories.

Is this the behaviour you want? To call associateFile/disassociateFile on a subdirectory?

(keeping it non-recursive for now is probably best, as the other platforms are not recursive yet)

This comment has been minimized.

Copy link
@isaacdavis

isaacdavis Oct 8, 2018

Author

My understanding is that, if we add a watch to a directory, we want to watch all of its immediate children for changes - this would include children that are directories. This isn't recursive in that we don't descend into those subdirectories and watch the files/directories therein, walking down the entire tree - we just watch the subdirectory vnode itself and stop there.

Is this incorrect? Should children that are directories be explicitly omitted?

This comment has been minimized.

Copy link
@nathany

nathany Oct 8, 2018

Member

That's correct.

This comment has been minimized.

Copy link
@isaacdavis
Show resolved Hide resolved fen.go Outdated
Show resolved Hide resolved fen.go Outdated
@nathany

This comment has been minimized.

Copy link
Member

commented Sep 27, 2018

@isaacdavis Apologies for the delay in reviewing your work. Here are some considerations before merging this.

I'm not personally willing to take on the responsibility of supporting Solaris and FEN myself, setting up CI, or any of that. What I'd like to see is yourself, but preferably multiple people, taking ownership of the FEN support.

The Solaris group would work with myself and hopefully some others on cross-platform API considerations (rename to, recursive watching, filtering, etc.). If there are bugs in the FEN support, I need to know there are people familiar with FEN that will triage issues and provide fixes. If/when we do a v2 API, I need to know there are people familiar with FEN that will help us move fsnotify forward on all platforms.

Honestly, I'd like to "recruit" small teams for each platform, but Solaris is a good place to start.

@nathany

This comment has been minimized.

Copy link
Member

commented Sep 27, 2018

@isaacdavis In the near term I'd like to break fsnotify up into a few packages within the same repository. I've already begun some work in that direction with kqueue, by isolating the syscall wrappers.

This would allow someone to import "github.com/fsnotify/fsnotify/fen" and use it directly if they really want. My idea with the wrapper is to provide a very low-level platform-specific Go-ish API that provides documentation and testing (improving our means of testing fsnotify will be part of this effort). Also, it could provide all the things for the platform, whereas fsnotify is lowest common denominator.

Not saying this needs to happen before the PR is merged, but letting you know the direction I'm thinking of taking things.

The big question is how low-level is very low level. For example, the kqueue API takes file handles, not names of files to watch, so the kqueue wrapper may operate at that low level with very little in the way of amenities.

Part of my goal with this is to associate ownership to each subpackage. A small team managing each of fen, inotify, kqueue, fsevents -- possibly some people on multiple teams, and everyone participating in the common API design as well.

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Sep 27, 2018

@nathany Thanks for your thorough responses. I'm happy to take ownership of FEN support and get involved in this project!

Regarding CI: After talking with some others at Joyent, we think the easiest solution would be to set up a Jenkins instance on a VM in the Joyent Public Cloud, and run the build and tests there. This would integrate well with Github. I would be your point of contact for administering this. Does that sound reasonable to you?

One other note: I only have the resources to provide illumos support - the build and tests would run on illumos, not Oracle Solaris. illumos diverged from Oracle Solaris in 2010, and there is no guarantee that the two operating systems share a common FEN API.

If this all sounds reasonable, let me know and I'll go ahead with addressing the code review and setting up CI. Thanks!

@nathany

This comment has been minimized.

Copy link
Member

commented Sep 27, 2018

@isaacdavis Thank you. That all sounds good to me. Use whatever CI works for you [though I will mention Drone.io may be worth looking into, as an open source alternative to Jenkins written in Go]

I think we'll document in the README who the teams are for each platform with respective GitHub handles, and perhaps figure out how to do some @ lists [I can look into these things].

Hopefully we will see some other people commit to support FEN, and maybe even Solaris specifically, but we can move ahead either way if it's okay with you. /cc @gfrey @4ad @amitkris

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Sep 27, 2018

@nathany Sounds good. I'll get started setting everything up.

fsnotify.go Outdated
@@ -62,5 +62,5 @@ func (e Event) String() string {
return fmt.Sprintf("%q: %s", e.Name, e.Op.String())
}

// Common errors that can be reported by a watcher
// ErrEventOverflow: Common errors that can be reported by a watcher

This comment has been minimized.

Copy link
@houndci-bot

houndci-bot Oct 4, 2018

comment on exported var ErrEventOverflow should be of the form "ErrEventOverflow ..."

This comment has been minimized.

Copy link
@nathany

nathany Mar 8, 2019

Member

note: if you rebase on master, this was resolved

@isaacdavis isaacdavis force-pushed the isaacdavis:solaris-fen branch from abfda93 to 62cfb90 Oct 7, 2018

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Oct 9, 2018

Hi @nathany - In addition to addressing your comments, I've added a Jenkinsfile. I have a Jenkins server running at fsnotify-jenkins.englab.joyent.us that is configured to work with this Jenkinsfile - I've tested it with my fork of the repo. In order to integrate Jenkins with Github, the Jenkins server will need to authenticate as a user with write access to this repo. I can set you up a Jenkins account (which should be done anyway) and we can add your credentials, or, alternatively, if I'm given write access to this repo, I can set up auth using my own account. Let me know how best to move forward.

EDIT 2019-01-07: A third option would be to create a Github user (named e.g. "fsnotify automation") for the purpose of automating Jenkins integration. This makes more sense than explicitly linking Jenkins to someone's personal account.

@moorereason moorereason referenced this pull request Oct 12, 2018

Open

Add Solaris build #3500

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Oct 18, 2018

@nathany circling back to this again - let me know what needs to be done to move forward. The Jenkins server is ready to go - see my above comment. Thanks!

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Jan 7, 2019

Hi @nathany - happy 2019! Just checking up on this. Joyent is still eager to get FEN support merged. I've updated my above comment about Jenkins slightly.

@nathany

This comment has been minimized.

Copy link
Member

commented Mar 8, 2019

@isaacdavis An "fsnotify automation" account sounds good. I've also sent you an invite to grant write access and can bump your permissions further once accepted.

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Mar 8, 2019

@nathany great - I've accepted the invite.

@isaacdavis

This comment has been minimized.

Copy link
Author

commented Mar 11, 2019

@nathany ok, I've created an automation account. This account will need to be given write access to the repo -- once that's done, I can set up the Jenkins webhook and build job.

@jubalskaggs

This comment has been minimized.

Copy link

commented May 31, 2019

@isaacdavis thank you for your work on this - I'm just chiming in to say that we're testing this pr on solaris 11 (also illumos) and it's working just fine. I'd love to see solaris-fen merged into mainline fsnotify :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.