Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using forkProcess for running snippets #195

Closed
dcoutts opened this issue May 27, 2014 · 24 comments
Closed

Investigate using forkProcess for running snippets #195

dcoutts opened this issue May 27, 2014 · 24 comments

Comments

@dcoutts
Copy link
Collaborator

dcoutts commented May 27, 2014

The IdeSession's API for running snippets puts the session into a "running" mode where further updates to sources and data files are not allowed. (Of course underneath the reason is that the snippet blocks the use of the GHC session.) It would be preferable for using the IdeSession if there were no "running" mode and if a running snippet were mostly independent of the session it was started from. It would allow further updates to the code and info queries etc. It would allow isolation-runner to avoid keeping an extra copy of the session just for the purpose of running snippets, and this would simplify isolation-runner and use fewer resources.

So, the task is to investigate if we could fork the whole ide-backend-server process when we run a snippet, so that that process can continue mostly independently, and the parent can be used to continue to do session updates.

This would involve:

  • using the forkProcess action, which is known to be somewhat hairy and not brilliantly tested. (We know of at least one deadlock during shutown)
  • looking into how/when we could use it in the already-rather-hairy code to run and control snippets
  • changing our use of our RPC infrastructure, probably to set up a new RPC connection with the new process.

And anything else that we think of or run into as we investigate this.

@snoyberg has allocated 2 days initially to the investigation. If it looks like it might pan out, it will certainly need plenty of time reserved for testing to make sure forkProcess does not bite us.

@edsko edsko self-assigned this May 27, 2014
edsko added a commit that referenced this issue May 28, 2014
This refactoring introduces two changes:

1. IdeSessionUpdate is more structured, so that we can inspect them and
   potentially allow them to be executed while the session is in Running mode.

2. Removed various fields from IdeIdleState and moved them to a new
   PendingRemoteChanges datatype, which furthermore is an argument to a _new_
   IDE state IdePendingChanges -- which happens only after the initial session
   start or a session restart. This makes it clear that there can not be any
   pending changes in idle mode, normally -- and (this was the original
   purpose) definitely no pending changes in running mode.

All this sets things up for #176, but a full implementation of #176 will be
delayed until #195.
@edsko
Copy link
Collaborator

edsko commented Jun 9, 2014

@snoyberg Quick question for you: do you guys use registerTerminationCallback at all? Or can we potentially remove it from the API? (It was meant originally for internal use, but I'm not sure if you are using it, too.)

@snoyberg
Copy link
Member

snoyberg commented Jun 9, 2014

@edsko Looks like we're not using it at all, feel free to remove it.

@edsko
Copy link
Collaborator

edsko commented Jun 9, 2014

@snoyberg Ok, great, thanks for checking!

@dcoutts
Copy link
Collaborator Author

dcoutts commented Jun 17, 2014

@snoyberg So we think it's less than a day's work to get it ready for you to try out. So we'll do that and also start on our own further tests. We should be able to squeeze that in within the next few days.

Obviously because we've not done extensive tests ourselves we're not expecting flawless results from your tests. In particular we're currently aware that cancelling of snippets is not always working reliably, though we can change it to kill the whole process. Also, as mentioned the debugging API is not covered, and we currently make no attempt to track and shut down the running snippet processes when the IdeSession is terminated. Depending on your point of view this could be a feature or a bug: it certainly means that you have to always use the RunActions to the end (either normal or forced termination).

@snoyberg
Copy link
Member

Thanks for the update. I'm not worried about the lack of tracking of running snippet processes, but thank you for pointing it out. I'll look forward to hearing when I can start testing this branch.

@edsko
Copy link
Collaborator

edsko commented Jun 20, 2014

@dcoutts @snoyberg Branch fork-snippets is now ready for testing. Some points to be aware of:

  • The session does not keep track of running snippets, just like it does not keep track of running executables. Hence, restarting a session and session shutdown does not affect any snippets you may have running. You need to interrupt or forceCancel these manually. I have not fully updated the test suite to take this into account, so running ghc-errors will leave some servers running at the moment.
  • For some reason some snippets become non-interruptible when running after forkProcess. A blackhole (direct, unconditional recursion) is one of them, which makes a certain amount of sense, but there are also some other examples (forever $ loop 1 appears to be one of them) for which I cannot see any reason at all. Either way, be aware that you might need to use forceCancel rather than interrupt more often (forceCancel will send SIGKILL to the process). On the other hand you can force cancel a snippet without affecting the rest of the session.
  • I haven't tested concurrent snippets very exhaustively yet; ghc-errors contains two tests, one that runs 9 instances of the same snippet and interacts with all of them concurrently, and one that spawns a snippet, makes a change to it, recompiles, starts it, etc. to have 9 different snippets running and then interacts with those concurrently. Both of these tests work, and the remainder of the test suite works as well with the exception of a handful of expected errors -- but of course these still exercise the session in a strictly sequential fashion.
  • We are using forkProcess rather than forkIO. This is necessary because running a snippet requires changes to the global process state: in particular, we need to redirect stdin, stdout and stderr (and, Haskell side, we use withArgs to change the value returned by getArgs). While this doesn't affect the API as such, it is worth bearing in mind that these forks are not as cheap as forkIO. Each new process communicates with the client through a new triplet of named pipes, which are created in the session temp directory.

This affects both the client and the server to be sure to use both (a mismatch should be detected).

@snoyberg
Copy link
Member

I kicked off our test suite against this branch. Overall, this is looking very good. A few minor issues I noticed:

  • I got the following output from one of our tests, though the test itself passed:

ide-backend-server: /tmp/isolation-runner-workdir/projects/9101/session.17083/rpc.26512/stdin: openFile: does not exist (No such file or directory)

I believe it came from the test "downloads binary which embeds static files"

  • We had one test that depended on the directory layout inside the folders ide-backend creates. It looks like you've added an extra level to the hierarchy, so I needed to replace ../repo with ../../repo. This is not a problem at all, I'm just noting it down for myself for later.
  • There is one test which seems to be legitimately failing: "RunnerSpec, interactive evaluation, interrupt with a catch-all". This sounds like it could be your second bullet point. I need to investigate further, though I may not have time for the next few days.

@snoyberg
Copy link
Member

Sure enough, replacing a call to restartSession with forceCancel fixes that test, which makes perfect sense.

I'm pushing this codebase to our QA environment (https://learning-site-qa.fpcomplete.com/), and will try to do some testing on it next week (I'm out of the office on Sunday and Monday).

Pinging @mgsloan as well, please look over this issue to get a basic idea of what's going on, and if you have a chance, do some testing on QA once this is built.

@edsko
Copy link
Collaborator

edsko commented Jun 20, 2014

I'm glad that things seem to be working reasonably okay. Mind that while it's great to be testing this now, I would certainly recommend against using it in production just yet.. Also, can you confirm if that stdin test failure is deterministic? Because it's rather odd (stdin is one of the named pipes it creates for communication with the server).

@edsko
Copy link
Collaborator

edsko commented Jun 20, 2014

Oh, hah, I just ran this against the 7.8 server (rather than the 7.4 server that I had been testing with and I got that stdin problem on one of the tests too.

@snoyberg
Copy link
Member

It happens fairly reliably if I run the following two tests:

  • launching a web process twice results in ResponseTimeout (#1146)
  • downloads binary which embeds static files

I'm not planning on putting this in production immediately, but I would like to get an idea in the next few days whether this will ultimately be production quality, so I can make the necessary changes on our side. I've held up implementing our optimized runs feature for word on this branch, and I do need to either implement it based on this code, or figure out another way to make this work.

@snoyberg
Copy link
Member

One more note to myself: if we move to this branch, I need to add code to call interrupt/forceCancel in the event of the project being terminated, since killing IdeSession is no longer sufficient.

(Yes @edsko, your notes above make this clear, but putting it as an explicit note to myself like this makes it less likely that I'll forget to do so.)

@edsko
Copy link
Collaborator

edsko commented Jun 20, 2014

yes, that does look like that is feasible. I have found no problems with forkProcess so far that cannot be overcome (that doesn't mean that there aren't any, of course). I'd like like to test this more thoroughly and smooth out the problems that still exist, because although this affects only a small part of the code base, it does involve some rather low level changes so I want to be sure we've tested it thoroughly.

@edsko
Copy link
Collaborator

edsko commented Jun 23, 2014

I think the problem with the stdin: openFile: not found is that each snippet should (but currently doesn't) run it its own temp directory, which is completely independent of the temp directory of the main server (so that deleting the server temp directory when the session is terminated does not affect the snippets). I guess that snippets should run in independent directories anyway, we may want to update files for one snippet independent from another? This should be done client side, however, so that we can forceCancel the snippets client-side (and remove the corresponding directory after sending SIGKILL). So this is one open issue currently.

There is an independent problem however, of which I have not yet tracked down the cause. I will look at that first.

@edsko
Copy link
Collaborator

edsko commented Jun 23, 2014

Note to self: another issue is that terminated snippets currently are zombies until the main server terminates.

@mgsloan
Copy link
Contributor

mgsloan commented Jun 23, 2014

Seems to work well for me on QA!

Beyond needing to change the global process state, another reason to use forkProcess is that you can't reliably kill all the threads that a given program starts. For example, with our own ghci-runners for development (yesod devel, or my own turbo-devel), a tricky issue is that cancelling a thread in ghci (via Ctrl-C) doesn't kill the threads that it's spawned. So, the server will continue running, and hold the resources that are needed by subsequent sessions. The workaround is to use a fresh ghci for every reboot. Do these issues affect ide-backend process running? Maybe forkProcess is also the solution for doing faster yesod devel / turbo-devel?!

@edsko
Copy link
Collaborator

edsko commented Jun 23, 2014

There appears to be an issue that the last message sent by the server (final outcome of the snippet) does not always arrive at the client (hGetSome reporting end-of-input prematurely). I need to track this down further.

@edsko
Copy link
Collaborator

edsko commented Jun 23, 2014

@mgsloan Indeed, that's true. That problem is now implicitly gone.

@snoyberg
Copy link
Member

Actually, we don't want each snippet running in its own temporary directory. One of the things we're trying to get for free out of this change is allowing us to update data files while a snippet is running.

@edsko
Copy link
Collaborator

edsko commented Jun 24, 2014

@snoyberg Yes, but presumably updating data files for one snippet should not affect data files for all other snippets running concurrently?

@snoyberg
Copy link
Member

In theory, yes. In practice, what we want is that when the user updates a data file for the project, all running snippets immediately see the updated data file. We also don't intend to let users actually run snippets concurrently. So I think the simplest solution is to just have the project itself and all running snippets share a data file directory. I'm open to other ideas.

@edsko
Copy link
Collaborator

edsko commented Jun 24, 2014

Ah, okay. Well that's easier for me, so I won't object :)

@edsko
Copy link
Collaborator

edsko commented Jun 24, 2014

@snoyberg I have just pushed a commit to the fork-snippets branch which fixes a bug in the RPC infrastructure which could cause the last (few) messages by a server to be lost when the server terminated. We never noticed this before because server termination was very rare, but now that we spawn a new server for each snippet it is much more important. I recommend you update to this new commit as soon as possible to avoid spurious deadlocks/exceptions.

@Mikolaj I am now no longer seeing this problem with the #58 test. Can you please verify on Linux?

@mgsloan
Copy link
Contributor

mgsloan commented Jun 24, 2014

I'm now using forkProcess for turbo-devel as well - it used to need to switch between two ghci processes. This made the code much cleaner + a bit faster, definitely less CPU / memory usage! Good stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants