Skip to content

x/build/cmd/buildlet: remaining processes not killed #29319

@Helflym

Description

@Helflym

When buildlet times out, some processes remain in background and they need to be killed manually.
It seems that buildlet only kills its first children and not all its offsprings.

It occurs on aix/ppc64 builder. But I think it's a more general issue.

Here are some processes remaining after builds had failed on aix/ppc64.
cmd/buildlet seems to only kill "go tool dist test ... "

    root 15008262        1   0 14:43:03      -  0:00 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/pkg/tool/aix_ppc64/dist test --no-rebuild --banner=XXXBANNERXXX: test:1_10
    root  7930506 15008262   0 14:43:06      -  0:01 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/test/runtest.exe --shard=1 --shards=10
    root 15860332  7930506   0 14:43:15      -  0:00 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/bin/go run -gcflags= fixedbugs/issue4667.go
    root 16449854 15860332 117 14:43:16      - 244:33 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/pkg/tool/aix_ppc64/link -o /ramdisk8GB/workdir-host-aix-ppc64-osuosl/tmp/go-build817241358/b001/exe/issue4667 -importcfg /ramdisk8GB/workdir-host-aix-ppc64-osuosl/tmp/go-build817241358/b001/importcfg.link -s -w -buildmode=exe -buildid=0SI_pU16NxBjYDL8rkvo/L9t_iML1T1ahnSRDhIh8/1p0X_Ta401ybIH1jFEpb/0SI_pU16NxBjYDL8rkvo -extld=gcc /ramdisk8GB/workdir-host-aix-ppc64-osuosl/tmp/go-build817241358/b001/_pkg_.a

    root  4391522        1   0 03:47:21      -  0:00 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/pkg/tool/aix_ppc64/dist test --no-rebuild --banner=XXXBANNERXXX: test:3_10
    root 10158466  4391522   0 03:47:24      -  0:01 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/test/runtest.exe --shard=3 --shards=10
    root 10748370 10158466 356 03:47:34      - 1877:52 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/bin/go build -gcflags -S=2 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/test/codegen/memcombine.go


    root 12583384        1   0 14:07:27      -  0:00 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/pkg/tool/aix_ppc64/dist test --no-rebuild --banner=XXXBANNERXXX: test:2_10
    root 20709728 12583384   0 14:07:29      -  0:01 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/test/runtest.exe --shard=2 --shards=10
    root 14418236 20709728   0 14:07:35      -  0:00 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/bin/go tool compile -e /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/test/fixedbugs/bug151.go
    root 17760658 14418236 469 14:07:35      - 1051:28 /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/pkg/tool/aix_ppc64/compile -e /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/test/fixedbugs/bug151.go

Looking at the code, killProcessTree is called when the builder is killed by the coordinator (right?).
However, in my understanding, it only kills one process and not the whole process tree (cf https://github.com/golang/build/blob/master/cmd/buildlet/buildlet.go#L1590).
I think it should call syscall.Kill with -p.Pid in order to kill the whole process group.

Maybe it's a duplicate of #15778.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Buildersx/build issues (builders, bots, dashboards)

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions