Skip to content
This repository has been archived by the owner on Mar 22, 2021. It is now read-only.

Fix non terminating tests in Windows #44

Merged
merged 30 commits into from Aug 6, 2019
Merged

Fix non terminating tests in Windows #44

merged 30 commits into from Aug 6, 2019

Conversation

jneira
Copy link
Contributor

@jneira jneira commented Jul 9, 2019

  • The pr basically ensures the Exit message is sent to server process before killing its listener thread to shutdown it gracefully.
  • As commented in a previour pr in windows you cant kill a thread that is blocked reading from a Handler (like the server listener is) so the failing tests were blocked in the killThread call.
  • I had to convert a raw error decoding an empty response from server in a SessionException to avoid throwing it when server had been exited previously.
    • Using the IORef exitOk hack previously in LSP.Test
    • Ideally we should have a nicer way to know if a server is running or had been exited, saving its state in SessionContext or SessionState or using some ack mechanism with messages, but it could be done in another pr

Hopefully this would fix haskell/haskell-ide-engine#713

@jneira jneira changed the title Fix win tests [WIP] Fix win tests Jul 10, 2019
@jneira
Copy link
Contributor Author

jneira commented Jul 10, 2019

I have to take a look, the tests are failing or showing a stacktrace intermittently

@jneira
Copy link
Contributor Author

jneira commented Jul 10, 2019

It seems in windows 7 (my original test env) it works fine, but in windows 10 doesnt.

@jneira
Copy link
Contributor Author

jneira commented Jul 10, 2019

Ok, i had to use the shutdown step with its ack to break the server listener loop (sorry for the imperative jargon 😆 )
I am afraid that if the server becomes unresponsive this will not work and the hie process will not terminate in windows. To ensure that the raw killThread will continue being called in non-windows systems i've used finally and timeout.

@jneira
Copy link
Contributor Author

jneira commented Jul 11, 2019

  • Another take: to make sure we can kill the server listener thread i've forced the kill of server process before
  • The lsp test-suit pass with that change
  • The func-test suite of hie terminates for windows (see output)
  • I' ve skipping the manual javascript session passes a test test in travis for now to make pass the windows ci build

@@ -104,7 +104,7 @@ main = hspec $ do

it "don't throw when no time out" $ runSessionWithConfig (def {messageTimeout = 5}) "hie" fullCaps "test/data/renamePass" $ do
loggingNotification
liftIO $ threadDelay 10
liftIO $ threadDelay $ 10 * 1000000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bubba Not sure it it was done intencionally, but test did not terminate in my windows 7 with 10

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I presume I meant to write 10s not 10us!

@lukel97
Copy link
Owner

lukel97 commented Jul 11, 2019

Thanks so much for helping out with this. I’ll take a closer look at this later this week

@jneira
Copy link
Contributor Author

jneira commented Jul 11, 2019

The test of manual js session is failing for linux cause i am trying now to decode effectively the response to examine if it is a RspShutdown and the js server is returning:

{"jsonrpc":"2.0", "id":0,"result":null}

instead the valids for the Aeson encoding

{"jsonrpc":"2.0", "id":0} -> ResponseMessage { _jsonrpc ="2.0", _id =1, _result = Nothing}

or

{"jsonrpc":"2.0", "id":0, "result"=""} -> ResponseMessage { _jsonrpc ="2.0", _id =1, _result = ""}

@@ -120,6 +120,7 @@ matchResponseMsgType req = case req of
where decoded x = fromMaybe (error $ "Couldn't decode response for the request type: "
++ show req ++ "\n" ++ show x)
(decode x)
removeNullResult x = maybe x (<> "}") (B.stripSuffix ",\"result\":null}" x)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An ugly hack to sanitize the response from the manual javascript session
Maybe we should have a generic way to remove fields with null values or change the json encoding to make field:null = _field=Nothing

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry about the javascript test too much. This looks like its caused by how haskell-lsp doesn't parse responses with null results (which should probably be fixed at some point). You can just revert this and let the javascript test fail for now


-- | Exit the server after request its shutdown
exitServer :: Session()
exitServer = request_ Shutdown (Nothing :: Maybe Value) >> sendNotification Exit ExitParams
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exposed to the API? The server should automatically be exited once the withSession closure finishes

Copy link
Contributor Author

@jneira jneira Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to expose it cause it is used in Language.Haskell.LSP.Test.Replay.
I think it should be in Session or in a module upstream, like the server listener handler (their clean up is closely related), and reduce the number of params of runSessionWithHandles but it would need move quite code (about requests/reponses) from Test to another module on top of Session, due to circular dependencies.

@@ -120,6 +120,7 @@ matchResponseMsgType req = case req of
where decoded x = fromMaybe (error $ "Couldn't decode response for the request type: "
++ show req ++ "\n" ++ show x)
(decode x)
removeNullResult x = maybe x (<> "}") (B.stripSuffix ",\"result\":null}" x)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry about the javascript test too much. This looks like its caused by how haskell-lsp doesn't parse responses with null results (which should probably be fixed at some point). You can just revert this and let the javascript test fail for now

@lukel97
Copy link
Owner

lukel97 commented Jul 11, 2019

I just cleared the windows cache on travis, I've seen that error pop up some times when using the wrong version of hie. Once there's a successful build of windows it should (hopefully) run a lot quicker

@jneira jneira changed the title [WIP] Fix win tests Fix non terminating tests in Windows Jul 11, 2019
@jneira
Copy link
Contributor Author

jneira commented Jul 12, 2019

I bumped up the version of hie to 0.11.0.0 but the same tests are failing in windows.
I swear it works on my machine! 🤣

@jneira
Copy link
Contributor Author

jneira commented Jul 12, 2019

I've executed the hie tests using this pr version of lsp-test in the azure ci and i am afraid the user doesn't have permissions to terminate proccesses 😟

Warming up HIE cache...
HIE cache is warmed up
func-test.exe: terminateProcess: permission denied (Permission denied)

@jneira
Copy link
Contributor Author

jneira commented Jul 12, 2019

Otoh i've set up an appveyor build for lsp-test itself using stack (like i am using in local) and the test suite passed: https://ci.appveyor.com/project/jneira/lsp-test/builds/25932605

In a second build i've got the same erorrs than travis one: https://ci.appveyor.com/project/jneira/lsp-test/builds/25934347

@jneira
Copy link
Contributor Author

jneira commented Jul 15, 2019

Although i am able to run the test suite of lsp-test and hie succesfully using this pr i am worried by the ci results

  • The builds of hie in azure (the next official ci) are not stable for windows, some of them terminate as expected (with 16 legitimate errors) but others doesnt terminate or exits with terminateProcess: permission denied (see https://dev.azure.com/jneira/haskell-ide-engine/_build/results?buildId=99 in which we can see the three cases in the same build). In linux and macos it fails as expected, like before this pr.
  • I cant get to run a new succesfull build of lsp-test in appveyor or travis since this one. I ve tried to clear the cache, different hie commits...

@jneira
Copy link
Contributor Author

jneira commented Jul 16, 2019

The tests are a little bit more clear:

  • I've trigged a travis build with the original hie commit and it seems it will go better (linux build is ok, mac os failed by timeout at the end and windows continue failing with 4 fails)
  • The different errors trying the hie func-test were because i forgot to use the pr in all stack-*.yaml. In fact there is only one:

@lukel97
Copy link
Owner

lukel97 commented Jul 20, 2019

I've tidied up the exitServer stuff, it shouldn't be needed by the replay module since the captured file should be replaying back the exit request and notification anyway. This PR passes tests for me locally (except for the javascript one, but I've created a PR to fix that haskell/lsp#181) so I'm happy to merge this if you are?

@jneira
Copy link
Contributor Author

jneira commented Jul 20, 2019

@bubba well, i am little bit worried for the hie tests in azure-ci (https://dev.azure.com/jneira/haskell-ide-engine/_build/results?buildId=167):

  • some of them are failing at compile time due to a error in cpp conditions of Compat (currently fixing)
  • other continues failing with terminateProcess: permission denied: seems to affects all ghc versions with process < 1.6.4.0
    • i've just reproduced in local with cabal v2-test -w "D:\bin\stack\x86_64-windows\ghc-8.4.2\bin\ghc"
  • Sometimes the case withTimeout > times out fails. I still have to reproduce the error and find the cause.

The first one is easy to fix but not sure about the second, the fix in process involved changing c code.
I would like try to make a workaround a little more time but we `ll have to decide what to do if we do not get it.

@jneira
Copy link
Contributor Author

jneira commented Jul 22, 2019

I have some problems with cpp conditionals in Compat. It turns out that cleanupProcess is in the public api of process since 1.6.4.0 but the 1.6.3.0 version shipped with ghc 8.6.1 and 8.6.2 has it in the public api (see haskell-ide-engine build in https://dev.azure.com/jneira/haskell-ide-engine/_build/results?buildId=178)
So i am afraid i have to hide it for 1.6.3.0 although it causes warnings about dodgy imports with the official 1.6.3.0 release of hackage and the versions shipped with ghc 8.4.*.

@jneira
Copy link
Contributor Author

jneira commented Jul 22, 2019

I've need to add ignoresigPipe to the Compat version of cleanupProcess or terminate process kept hanging.
@bubba I've added two stack.yaml files for the lastest minor versions of ghc used in hie, cause they have helped in fixing issues with different version of the process package. Maybe it should belong to another pr, let me know if it is the case and i will delete them.

@jneira
Copy link
Contributor Author

jneira commented Jul 24, 2019

Ok, i think it could be the last needed change after a bit of yak shaving with process.
To avoid the issue with permission denied i've decided to copy withServer for process <= 1.6.4.0 to make it use the local version of cleanupProcess with the low level IOError ignored.

I think it is not the correct solution in the general case, cause we are ignoring legitimate permission denied errors in windows and other os's but i think it could work just enough for our case.

Running the hie func-test in azure using this version of lsp-test: https://dev.azure.com/jneira/haskell-ide-engine/_build/results?buildId=185

@jneira
Copy link
Contributor Author

jneira commented Jul 24, 2019

@bubba ok, i think this is done, tests in azure build have failed correctly 😆
The build on travis has failed for linux and macos due to timeout.

@jneira
Copy link
Contributor Author

jneira commented Jul 24, 2019

As process-1.6.4.0 was not shipped with any ghc and stack doesnt let use it, i've tested locally cabal v2-run lsp-test:tests:test --constraint process==1.6.4.0 with success.

Copy link
Owner

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this out locally and it works great on mac, and the window stuff seems to be in a lot better of a state than it was previously. Thanks for the work on this!

@lukel97 lukel97 merged commit d126623 into lukel97:master Aug 6, 2019
@cocreature
Copy link
Contributor

@jneira Thanks a lot for cleaning up the mess that I produced and fixing it!

@jneira jneira deleted the fix-win-tests branch August 9, 2019 12:33
lukel97 added a commit to haskell/lsp that referenced this pull request Feb 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get tests working on windows
3 participants