Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: windows gomote push failed to delete dist.exe #10706

Closed
bradfitz opened this issue May 5, 2015 · 12 comments

Comments

Projects
None yet
4 participants
@bradfitz
Copy link
Member

commented May 5, 2015

Interrupting a Windows build with gomote seems to keep dist.exe running somewhere, preventing it from being deleted later:

ante:go $ gomote run windows-amd64-gce go/src/make.bat  
##### Building Go bootstrap tool.  
cmd/dist  

##### Building Go toolchain using C:\workdir\go1.4.  
bootstrap/internal/gc/big  
bootstrap/internal/obj  
bootstrap/internal/obj/arm  
bootstrap/internal/gc  
bootstrap/internal/ld  
bootstrap/5l  
bootstrap/internal/obj/x86  
bootstrap/6l  
bootstrap/internal/obj/arm64  
bootstrap/7l  
bootstrap/5g  
bootstrap/6g  
bootstrap/7g  
bootstrap/8g  
bootstrap/8l  
bootstrap/internal/obj/ppc64  
bootstrap/9l  
bootstrap/9g  
bootstrap/asm/internal/arch  
bootstrap/asm/internal/flags  
bootstrap/asm/internal/lex  
bootstrap/internal/asm  
bootstrap/asm/internal/asm  
bootstrap/old5a  
bootstrap/asm  
bootstrap/old6a  
bootstrap/old8a  
bootstrap/old9a  
^C

ante:go $ gomote push windows-amd64-gce  
2015/05/05 13:45:18 Deleting remote files: ["go/src/cmd/dist/dist.exe"]  
Error running push: Deleting remote unwanted files: 500 Internal Server Error; body: remove C:\workdir\go\src\cmd\dist\dist.exe: Access is denied.  

Or maybe we need to delete harder somehow?

/cc @alexbrainman @adg @crawshaw

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 5, 2015

I've noticed that after a couple minutes, this resolves itself. Why? Is there something we could be doing after the command is interrupted (in our http.CloseNotifier-receiving code) to more aggressive kill processes or delete things?

@adg

This comment has been minimized.

Copy link
Contributor

commented May 5, 2015

When the HTTP connection hangs up, the buildlet calls cmd.Process.Kill()
but the process may not have quit by the time you issued the next "push".

Hard to say what we could do beyond this. Maybe do the cmd.Wait in a
separate goroutine and block future "pushes" until the process actually
exits? Seems a bit gross.

On 6 May 2015 at 07:07, Brad Fitzpatrick notifications@github.com wrote:

I've noticed that after a couple minutes, this resolves itself. Why? Is
there something we could be doing after the command is interrupted (in our
http.CloseNotifier-receiving code) to more aggressive kill processes or
delete things?


Reply to this email directly or view it on GitHub
#10706 (comment).

@alexbrainman

This comment has been minimized.

Copy link
Member

commented May 6, 2015

I am trying to debug this, so I followed steps in golang.org/x/build/cmd/buildlet/README and I got stuck on:

# curl -k --user :foo -d "cmd=src/make.bat" https://alexpc:5936
/exec                                                                                
##### Building C bootstrap tool.                                                     
cmd/dist                                                                             
go tool dist: FAILED: git rev-parse --abbrev-ref HEAD                                

The repo from https://go.googlesource.com/go/+archive/3b76b017cabb.tar.gz does not have .git directory. What do I need to do to get past this message?

As to the reason of

remove C:\workdir\go\src\cmd\dist\dist.exe: Access is denied.

(just a guess) you kill process when connection is disconnected, but I suspect you don't disconnect properly. Does killing gomote process clears remote side of its TCP connection? Maybe it takes a while. You also don't check cmd.Process.Kill() returned error, perhaps, if you add more error checking / printing, we'll know more.

But if you help me with error above maybe I can reproduce it here. Thank you.

Alex

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 6, 2015

The coordinator and gomote binaries push a VERSION file to the buildlets first, so git isn't required to build.

The gomote binary accepts a "host:port" string as a name, to use it against localhost.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented May 7, 2015

Thank you Brad. I caught him (C:\workdir\go\src\cmd\dist\dist.exe) in the act. More details coming.

Alex

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 7, 2015

Exciting. :)

@alexbrainman

This comment has been minimized.

Copy link
Member

commented May 7, 2015

You have used your jokes quota today. Please stop.

Alex

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 7, 2015

No, I'm actually excited. I love a good debugging mystery, and clues!

Thanks for investigating this.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented May 7, 2015

I can see make.bat process gets killed, but none of its children are. cmd/dist.exe process runs to completion even after gomote is disconnected. That is why you saw "remove C:\workdir\go\src\cmd\dist\dist.exe: Access is denied." message, because you tried to rebuild executable for process that is still running - this is not allowed on Windows. But then cmd/dist.exe completes (it takes minute or so), and you can rebuild it then.

Our mistake. We all (mainly me) assumed that killing process kills all its children. But that is not true on Windows. Not by default. I can come up with 2 ways to implement process tree killing.

We can enumerate all processes on the system and kill all children of a process. See syscall.Getppid for something similar.

Or we can use "recommended" way - associate our process with a job object https://msdn.microsoft.com/en-us/library/ms684161(VS.85).aspx We already used job objects before to gather performance stats on windows (golang.org/x/benchmarks/driver/driver_windows.go).

I don't know how important it is to kill all children here in buildlet. All these computers are rebooted often anyway. Brad?

Alex

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 7, 2015

I want to fix this. The purpose of gomote is interactive development. If you can't stop a build and restart it quickly, it's very annoying.

Rather than change Go for now, we can just change the buildlet to enumerate all the processes on the children.

Thanks for the tips and debugging.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented May 8, 2015

@gopherbot

This comment has been minimized.

Copy link

commented May 8, 2015

CL https://golang.org/cl/9837 mentions this issue.

@golang golang locked and limited conversation to collaborators Jun 25, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.