New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rkt: rkt stop with ssh-based kvm stop #2438

Merged
merged 11 commits into from Jul 7, 2016

Conversation

Projects
None yet
@jjlakis
Contributor

jjlakis commented Apr 15, 2016

Idea of stopping pod with KVM flavor based on discussion in #1959 (I used most of the commits from this PR):
Moving SSH functionalities from enter_kvm.go to networking/kvm.go for reuse purposes, and also reusing this code in stop_kvm to provide proper shutting down mechanism (systemctl halt).
@iaguis @jellonek

Show outdated Hide outdated networking/kvm.go Outdated

@jonboulle jonboulle added this to the v1.5.0 milestone Apr 15, 2016

@jellonek

This comment has been minimized.

Show comment
Hide comment
@jellonek

jellonek Apr 15, 2016

Contributor

@iaguis after @jjlakis will reorganize imports - please look on this proposal for additional kvm covering commits for #1959

Contributor

jellonek commented Apr 15, 2016

@iaguis after @jjlakis will reorganize imports - please look on this proposal for additional kvm covering commits for #1959

Show outdated Hide outdated stage1/stop_kvm/stop_kvm.go Outdated
Show outdated Hide outdated Documentation/subcommands/stop.md Outdated
Show outdated Hide outdated networking/kvm.go Outdated
@jjlakis

This comment has been minimized.

Show comment
Hide comment
@jjlakis

jjlakis Apr 18, 2016

Contributor

@iaguis @jellonek Docs fixed. Could you take a look on this again?

Contributor

jjlakis commented Apr 18, 2016

@iaguis @jellonek Docs fixed. Could you take a look on this again?

Show outdated Hide outdated stage1/stop_kvm/stop_kvm.go Outdated
@iaguis

This comment has been minimized.

Show comment
Hide comment
@iaguis

iaguis Apr 19, 2016

Member

I don't think the ssh code belongs to networking, that package is concerned with setting up a pod's net and not executing commands over ssh. I think it should live somewhere in stage1.

Actually, I'd really like if we could have a sane directory structure for the stage1 images. Something like:

.
└── stage1
    ├── common
    ├── fly
    │   ├── enter
    │   ├── gc
    │   ├── run
    │   └── stop
    ├── kvm
    │   ├── common
    │   ├── enter
    │   ├── gc
    │   ├── run
    │   └── stop
    └── nspawn
        ├── enter
        ├── gc
        ├── run
        └── stop

And then the ssh code would live in stage1/kvm/common.

But for now, the kvm and nspawn stage1 images' code is very much intertwined so for now I guess we could put it in stage1/common/ssh. Ideas?

cc @krnowak

Member

iaguis commented Apr 19, 2016

I don't think the ssh code belongs to networking, that package is concerned with setting up a pod's net and not executing commands over ssh. I think it should live somewhere in stage1.

Actually, I'd really like if we could have a sane directory structure for the stage1 images. Something like:

.
└── stage1
    ├── common
    ├── fly
    │   ├── enter
    │   ├── gc
    │   ├── run
    │   └── stop
    ├── kvm
    │   ├── common
    │   ├── enter
    │   ├── gc
    │   ├── run
    │   └── stop
    └── nspawn
        ├── enter
        ├── gc
        ├── run
        └── stop

And then the ssh code would live in stage1/kvm/common.

But for now, the kvm and nspawn stage1 images' code is very much intertwined so for now I guess we could put it in stage1/common/ssh. Ideas?

cc @krnowak

@s-urbaniak

This comment has been minimized.

Show comment
Hide comment
@s-urbaniak

s-urbaniak Apr 20, 2016

Contributor

👍 for stage1/common/ssh, once it lives there, it will be easy to factor somewhere else.

Contributor

s-urbaniak commented Apr 20, 2016

👍 for stage1/common/ssh, once it lives there, it will be easy to factor somewhere else.

@jjlakis

This comment has been minimized.

Show comment
Hide comment
@jjlakis

jjlakis Apr 27, 2016

Contributor

@iaguis FYI:
--force check fails while trying to remove netns, file or resource is busy:
https://semaphoreci.com/coreos/rkt/branches/pull-request-2438/builds/26
When I added flag "MNT_FORCE" to syscall that unmounts netns, problem was solved but there was another - file or resource busy on stage1/rootfs/proc:
https://semaphoreci.com/coreos/rkt/branches/pull-request-2438/builds/31
It looks like some descriptors or processes using container files are still up after SIGKILL.

Contributor

jjlakis commented Apr 27, 2016

@iaguis FYI:
--force check fails while trying to remove netns, file or resource is busy:
https://semaphoreci.com/coreos/rkt/branches/pull-request-2438/builds/26
When I added flag "MNT_FORCE" to syscall that unmounts netns, problem was solved but there was another - file or resource busy on stage1/rootfs/proc:
https://semaphoreci.com/coreos/rkt/branches/pull-request-2438/builds/31
It looks like some descriptors or processes using container files are still up after SIGKILL.

@iaguis

This comment has been minimized.

Show comment
Hide comment
@iaguis

iaguis Apr 28, 2016

Member

That's weird, does it happen also on your local machine?

Member

iaguis commented Apr 28, 2016

That's weird, does it happen also on your local machine?

@jjlakis

This comment has been minimized.

Show comment
Hide comment
@jjlakis

jjlakis Apr 28, 2016

Contributor

No, I can't reproduce it locally.

Contributor

jjlakis commented Apr 28, 2016

No, I can't reproduce it locally.

@s-urbaniak s-urbaniak modified the milestones: v1.6.0, v1.5.0 Apr 28, 2016

@iaguis

This comment has been minimized.

Show comment
Hide comment
@iaguis

iaguis Apr 28, 2016

Member

Sounds a bit like #2232

Member

iaguis commented Apr 28, 2016

Sounds a bit like #2232

@iaguis

This comment has been minimized.

Show comment
Hide comment
@iaguis

iaguis Apr 28, 2016

Member

On Semaphore:

Semaphore: ~/rkt $ sudo ./builds/build-rkt-coreos/build-rkt-1.4.0+git/bin/rkt run coreos.com/etcd:v2.0.10
image: using image from file /home/runner/rkt/builds/build-rkt-coreos/build-rkt-1.4.0+git/bin/stage1-coreos.aci
image: searching for app image coreos.com/etcd
image: remote fetching from URL "https://github.com/coreos/etcd/releases/download/v2.0.10/etcd-v2.0.10-linux-amd64.aci"
pubkey: prefix: "coreos.com/etcd"
key: "https://coreos.com/dist/pubkeys/aci-pubkeys.gpg"
gpg key fingerprint is: 8B86 DE38 890D DB72 9186  7B02 5210 BD88 8818 2190
    CoreOS ACI Builder <release@coreos.com>
Are you sure you want to trust this key (yes/no)?
yes
Trusting "https://coreos.com/dist/pubkeys/aci-pubkeys.gpg" for prefix "coreos.com/etcd" after fingerprint review.
Added key for prefix "coreos.com/etcd" at "/etc/rkt/trustedkeys/prefix.d/coreos.com/etcd/8b86de38890ddb7291867b025210bd8888182190"
pubkey: prefix: "coreos.com/etcd"
key: "https://coreos.com/dist/pubkeys/app-signing-pubkey.gpg"
gpg key fingerprint is: 18AD 5014 C99E F7E3 BA5F  6CE9 50BD D3E0 FC8A 365E
    Subkey fingerprint: 5B10 53CE 38EA 2E0F EB95  6C05 95BC 5E3F 3F1B 2C87
    Subkey fingerprint: 55DB DA91 BBE1 849E A27F  E733 A6F7 1EE5 BEDD BA18
    Subkey fingerprint: B261 4119 157B E592 32DF  D2AA F804 F413 7EF4 8FD3
    CoreOS Application Signing Key <security@coreos.com>
Are you sure you want to trust this key (yes/no)?
yes
Trusting "https://coreos.com/dist/pubkeys/app-signing-pubkey.gpg" for prefix "coreos.com/etcd" after fingerprint review.
Added key for prefix "coreos.com/etcd" at "/etc/rkt/trustedkeys/prefix.d/coreos.com/etcd/18ad5014c99ef7e3ba5f6ce950bdd3e0fc8a365e"
image: downloading signature from https://github.com/coreos/etcd/releases/download/v2.0.10/etcd-v2.0.10-linux-amd64.aci.asc
Downloading signature: [=======================================] 819 B/819 B
Downloading ACI: [=============================================] 3.79 MB/3.79 MB
image: signature verified:
  CoreOS ACI Builder <release@coreos.com>
networking: loading networks from /etc/rkt/net.d
networking: loading network default with type ptp
[ 2162.775149] etcd[5]: 2016/04/28 13:09:40 etcd: no data-dir provided, using default data-dir ./default.etcd
[ 2162.775427] etcd[5]: 2016/04/28 13:09:40 etcd: listening for peers on http://localhost:2380
[ 2162.775647] etcd[5]: 2016/04/28 13:09:40 etcd: listening for peers on http://localhost:7001
[ 2162.775897] etcd[5]: 2016/04/28 13:09:40 etcd: listening for client requests on http://localhost:2379
[ 2162.776580] etcd[5]: 2016/04/28 13:09:40 etcd: listening for client requests on http://localhost:4001
[ 2162.776777] etcd[5]: 2016/04/28 13:09:40 etcdserver: datadir is valid for the 2.0.1 format
[ 2162.776979] etcd[5]: 2016/04/28 13:09:40 etcdserver: name = default
[ 2162.777152] etcd[5]: 2016/04/28 13:09:40 etcdserver: data dir = default.etcd
[ 2162.777272] etcd[5]: 2016/04/28 13:09:40 etcdserver: member dir = default.etcd/member
[ 2162.777389] etcd[5]: 2016/04/28 13:09:40 etcdserver: heartbeat = 100ms
[ 2162.777507] etcd[5]: 2016/04/28 13:09:40 etcdserver: election = 1000ms
[ 2162.777626] etcd[5]: 2016/04/28 13:09:40 etcdserver: snapshot count = 10000
[ 2162.777739] etcd[5]: 2016/04/28 13:09:40 etcdserver: advertise client URLs = http://localhost:2379,http://localhost:4001
[ 2162.777898] etcd[5]: 2016/04/28 13:09:40 etcdserver: initial advertise peer URLs = http://localhost:2380,http://localhost:7001
[ 2162.778021] etcd[5]: 2016/04/28 13:09:40 etcdserver: initial cluster = default=http://localhost:2380,default=http://localhost:7001
[ 2162.778146] etcd[5]: 2016/04/28 13:09:40 etcdserver: start member ce2a822cea30bfca in cluster 7e27652122e8b2ae
[ 2162.778261] etcd[5]: 2016/04/28 13:09:40 raft: ce2a822cea30bfca became follower at term 0
[ 2162.778376] etcd[5]: 2016/04/28 13:09:40 raft: newRaft ce2a822cea30bfca [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
[ 2162.778491] etcd[5]: 2016/04/28 13:09:40 raft: ce2a822cea30bfca became follower at term 1
[ 2162.778609] etcd[5]: 2016/04/28 13:09:40 etcdserver: added local member ce2a822cea30bfca [http://localhost:2380 http://localhost:7001] to cluster 7e27652122e8b2ae
[ 2164.076913] etcd[5]: 2016/04/28 13:09:41 raft: ce2a822cea30bfca is starting a new election at term 1
[ 2164.077917] etcd[5]: 2016/04/28 13:09:41 raft: ce2a822cea30bfca became candidate at term 2
[ 2164.079089] etcd[5]: 2016/04/28 13:09:41 raft: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 2
[ 2164.079569] etcd[5]: 2016/04/28 13:09:41 raft: ce2a822cea30bfca became leader at term 2
[ 2164.080080] etcd[5]: 2016/04/28 13:09:41 raft.node: ce2a822cea30bfca elected leader ce2a822cea30bfca at term 2
[ 2164.080540] etcd[5]: 2016/04/28 13:09:41 etcdserver: published {Name:default ClientURLs:[http://localhost:2379 http://localhost:4001]} to cluster 7e27652122e8b2ae

On another terminal:

Semaphore: ~/rkt $ sudo ./builds/build-rkt-coreos/build-rkt-1.4.0+git/bin/rkt list
UUID        APP IMAGE NAME      STATE   CREATED STARTED NETWORKS
9990566e    etcd    coreos.com/etcd:v2.0.10 running now now default:ip4=172.16.28.2
Semaphore: ~/rkt $ sudo ./builds/build-rkt-coreos/build-rkt-1.4.0+git/bin/rkt stop --force 9990566e
"9990566e-7a4c-46c0-83c6-8ab4c931d5f7"
Semaphore: ~/rkt $ ps aux | grep systemd
root       378  0.0  0.0  51540  1740 ?        Ss   12:33   0:00 /lib/systemd/systemd-udevd --daemon
root       540  0.0  0.0  43448  1848 ?        Ss   12:33   0:00 /lib/systemd/systemd-logind
root     17817  0.0  0.0  36320  2696 ?        Ss   13:09   0:00 /usr/lib/systemd/systemd --default-standard-output=tty --log-target=null --show-status=0
root     17819  0.0  0.1  30824  5076 ?        Ss   13:09   0:00 /usr/lib/systemd/systemd-journald
root     17858  0.0  0.0  11744   936 pts/4    S+   13:09   0:00 grep --color=auto systemd

So the processes are still there after we kill systemd-nspawn.

Member

iaguis commented Apr 28, 2016

On Semaphore:

Semaphore: ~/rkt $ sudo ./builds/build-rkt-coreos/build-rkt-1.4.0+git/bin/rkt run coreos.com/etcd:v2.0.10
image: using image from file /home/runner/rkt/builds/build-rkt-coreos/build-rkt-1.4.0+git/bin/stage1-coreos.aci
image: searching for app image coreos.com/etcd
image: remote fetching from URL "https://github.com/coreos/etcd/releases/download/v2.0.10/etcd-v2.0.10-linux-amd64.aci"
pubkey: prefix: "coreos.com/etcd"
key: "https://coreos.com/dist/pubkeys/aci-pubkeys.gpg"
gpg key fingerprint is: 8B86 DE38 890D DB72 9186  7B02 5210 BD88 8818 2190
    CoreOS ACI Builder <release@coreos.com>
Are you sure you want to trust this key (yes/no)?
yes
Trusting "https://coreos.com/dist/pubkeys/aci-pubkeys.gpg" for prefix "coreos.com/etcd" after fingerprint review.
Added key for prefix "coreos.com/etcd" at "/etc/rkt/trustedkeys/prefix.d/coreos.com/etcd/8b86de38890ddb7291867b025210bd8888182190"
pubkey: prefix: "coreos.com/etcd"
key: "https://coreos.com/dist/pubkeys/app-signing-pubkey.gpg"
gpg key fingerprint is: 18AD 5014 C99E F7E3 BA5F  6CE9 50BD D3E0 FC8A 365E
    Subkey fingerprint: 5B10 53CE 38EA 2E0F EB95  6C05 95BC 5E3F 3F1B 2C87
    Subkey fingerprint: 55DB DA91 BBE1 849E A27F  E733 A6F7 1EE5 BEDD BA18
    Subkey fingerprint: B261 4119 157B E592 32DF  D2AA F804 F413 7EF4 8FD3
    CoreOS Application Signing Key <security@coreos.com>
Are you sure you want to trust this key (yes/no)?
yes
Trusting "https://coreos.com/dist/pubkeys/app-signing-pubkey.gpg" for prefix "coreos.com/etcd" after fingerprint review.
Added key for prefix "coreos.com/etcd" at "/etc/rkt/trustedkeys/prefix.d/coreos.com/etcd/18ad5014c99ef7e3ba5f6ce950bdd3e0fc8a365e"
image: downloading signature from https://github.com/coreos/etcd/releases/download/v2.0.10/etcd-v2.0.10-linux-amd64.aci.asc
Downloading signature: [=======================================] 819 B/819 B
Downloading ACI: [=============================================] 3.79 MB/3.79 MB
image: signature verified:
  CoreOS ACI Builder <release@coreos.com>
networking: loading networks from /etc/rkt/net.d
networking: loading network default with type ptp
[ 2162.775149] etcd[5]: 2016/04/28 13:09:40 etcd: no data-dir provided, using default data-dir ./default.etcd
[ 2162.775427] etcd[5]: 2016/04/28 13:09:40 etcd: listening for peers on http://localhost:2380
[ 2162.775647] etcd[5]: 2016/04/28 13:09:40 etcd: listening for peers on http://localhost:7001
[ 2162.775897] etcd[5]: 2016/04/28 13:09:40 etcd: listening for client requests on http://localhost:2379
[ 2162.776580] etcd[5]: 2016/04/28 13:09:40 etcd: listening for client requests on http://localhost:4001
[ 2162.776777] etcd[5]: 2016/04/28 13:09:40 etcdserver: datadir is valid for the 2.0.1 format
[ 2162.776979] etcd[5]: 2016/04/28 13:09:40 etcdserver: name = default
[ 2162.777152] etcd[5]: 2016/04/28 13:09:40 etcdserver: data dir = default.etcd
[ 2162.777272] etcd[5]: 2016/04/28 13:09:40 etcdserver: member dir = default.etcd/member
[ 2162.777389] etcd[5]: 2016/04/28 13:09:40 etcdserver: heartbeat = 100ms
[ 2162.777507] etcd[5]: 2016/04/28 13:09:40 etcdserver: election = 1000ms
[ 2162.777626] etcd[5]: 2016/04/28 13:09:40 etcdserver: snapshot count = 10000
[ 2162.777739] etcd[5]: 2016/04/28 13:09:40 etcdserver: advertise client URLs = http://localhost:2379,http://localhost:4001
[ 2162.777898] etcd[5]: 2016/04/28 13:09:40 etcdserver: initial advertise peer URLs = http://localhost:2380,http://localhost:7001
[ 2162.778021] etcd[5]: 2016/04/28 13:09:40 etcdserver: initial cluster = default=http://localhost:2380,default=http://localhost:7001
[ 2162.778146] etcd[5]: 2016/04/28 13:09:40 etcdserver: start member ce2a822cea30bfca in cluster 7e27652122e8b2ae
[ 2162.778261] etcd[5]: 2016/04/28 13:09:40 raft: ce2a822cea30bfca became follower at term 0
[ 2162.778376] etcd[5]: 2016/04/28 13:09:40 raft: newRaft ce2a822cea30bfca [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
[ 2162.778491] etcd[5]: 2016/04/28 13:09:40 raft: ce2a822cea30bfca became follower at term 1
[ 2162.778609] etcd[5]: 2016/04/28 13:09:40 etcdserver: added local member ce2a822cea30bfca [http://localhost:2380 http://localhost:7001] to cluster 7e27652122e8b2ae
[ 2164.076913] etcd[5]: 2016/04/28 13:09:41 raft: ce2a822cea30bfca is starting a new election at term 1
[ 2164.077917] etcd[5]: 2016/04/28 13:09:41 raft: ce2a822cea30bfca became candidate at term 2
[ 2164.079089] etcd[5]: 2016/04/28 13:09:41 raft: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 2
[ 2164.079569] etcd[5]: 2016/04/28 13:09:41 raft: ce2a822cea30bfca became leader at term 2
[ 2164.080080] etcd[5]: 2016/04/28 13:09:41 raft.node: ce2a822cea30bfca elected leader ce2a822cea30bfca at term 2
[ 2164.080540] etcd[5]: 2016/04/28 13:09:41 etcdserver: published {Name:default ClientURLs:[http://localhost:2379 http://localhost:4001]} to cluster 7e27652122e8b2ae

On another terminal:

Semaphore: ~/rkt $ sudo ./builds/build-rkt-coreos/build-rkt-1.4.0+git/bin/rkt list
UUID        APP IMAGE NAME      STATE   CREATED STARTED NETWORKS
9990566e    etcd    coreos.com/etcd:v2.0.10 running now now default:ip4=172.16.28.2
Semaphore: ~/rkt $ sudo ./builds/build-rkt-coreos/build-rkt-1.4.0+git/bin/rkt stop --force 9990566e
"9990566e-7a4c-46c0-83c6-8ab4c931d5f7"
Semaphore: ~/rkt $ ps aux | grep systemd
root       378  0.0  0.0  51540  1740 ?        Ss   12:33   0:00 /lib/systemd/systemd-udevd --daemon
root       540  0.0  0.0  43448  1848 ?        Ss   12:33   0:00 /lib/systemd/systemd-logind
root     17817  0.0  0.0  36320  2696 ?        Ss   13:09   0:00 /usr/lib/systemd/systemd --default-standard-output=tty --log-target=null --show-status=0
root     17819  0.0  0.1  30824  5076 ?        Ss   13:09   0:00 /usr/lib/systemd/systemd-journald
root     17858  0.0  0.0  11744   936 pts/4    S+   13:09   0:00 grep --color=auto systemd

So the processes are still there after we kill systemd-nspawn.

@squall0gd

This comment has been minimized.

Show comment
Hide comment
@squall0gd

squall0gd Apr 29, 2016

Contributor

It looks like #2427

Contributor

squall0gd commented Apr 29, 2016

It looks like #2427

@alban

This comment has been minimized.

Show comment
Hide comment
@alban

alban May 3, 2016

Member

https://semaphoreci.com/coreos/rkt/branches/pull-request-2438/builds/31:

gc: unable to remove pod "02be039b-75ac-4173-9574-b005f151d97c": remove /tmp/datadir-730628398/pods/exited-garbage/02be039b-75ac-4173-9574-b005f151d97c/stage1/rootfs/proc: device or resource busy

This sounds like #1922: Semaphore has Linux 3.13 so it does not have the patch torvalds/linux@8ed936b from Linux 3.18 (#1922 (comment)). If you cannot reproduce it locally, I bet it is because you have Linux >= 3.18.

Member

alban commented May 3, 2016

https://semaphoreci.com/coreos/rkt/branches/pull-request-2438/builds/31:

gc: unable to remove pod "02be039b-75ac-4173-9574-b005f151d97c": remove /tmp/datadir-730628398/pods/exited-garbage/02be039b-75ac-4173-9574-b005f151d97c/stage1/rootfs/proc: device or resource busy

This sounds like #1922: Semaphore has Linux 3.13 so it does not have the patch torvalds/linux@8ed936b from Linux 3.18 (#1922 (comment)). If you cannot reproduce it locally, I bet it is because you have Linux >= 3.18.

@jjlakis

This comment has been minimized.

Show comment
Hide comment
@jjlakis

jjlakis Jun 27, 2016

Contributor

@alban @steveeJ @krnowak Comments applied. I left fly stopping as it is (since catching SIGTERM by sh is not related to stop implementation)

Contributor

jjlakis commented Jun 27, 2016

@alban @steveeJ @krnowak Comments applied. I left fly stopping as it is (since catching SIGTERM by sh is not related to stop implementation)

Show outdated Hide outdated stage1/stop/stop.go Outdated
Show outdated Hide outdated stage1/stop_kvm/stop_kvm.go Outdated
@alban

This comment has been minimized.

Show comment
Hide comment
@alban

alban Jul 5, 2016

Member

@jjlakis I added a few comments and it needs to be rebased on the new origin/master but I think it will be good after that 👍

Member

alban commented Jul 5, 2016

@jjlakis I added a few comments and it needs to be rebased on the new origin/master but I think it will be good after that 👍

"io/ioutil"
"os"
"github.com/shirou/gopsutil/process"

This comment has been minimized.

@alban

alban Jul 6, 2016

Member

This import does not seem to work on arm

/cc @lucab @jjlakis

@alban

alban Jul 6, 2016

Member

This import does not seem to work on arm

/cc @lucab @jjlakis

This comment has been minimized.

@jjlakis

jjlakis Jul 6, 2016

Contributor

Bumping gopsutils to 2.1 solves the problem

@jjlakis

jjlakis Jul 6, 2016

Contributor

Bumping gopsutils to 2.1 solves the problem

This comment has been minimized.

@alban

alban Jul 6, 2016

Member

Fixed in #2876

@alban

alban Jul 6, 2016

Member

Fixed in #2876

Show outdated Hide outdated glide.yaml Outdated

@alban alban referenced this pull request Jul 6, 2016

Merged

glide: bump gopsutils to 2.1 #2876

@alban

This comment has been minimized.

Show comment
Hide comment
@alban

alban Jul 6, 2016

Member

@jjlakis could you rebase on master?

Member

alban commented Jul 6, 2016

@jjlakis could you rebase on master?

iaguis added some commits Jan 11, 2016

*: add stop for nspawn flavor
The stage1 implementation sends SIGTERM to systemd-nspawn, which causes
an orderly shutdown of stage1.
*: add stop for fly flavor
The stage1 implementation sends SIGTERM to the process started by rkt
fly.
*: add a pod uuid argument to stage1's stop and --force
Makes getting the lkvm VM name less ugly.

Add --force to forcibly stopping a pod.
@alban

This comment has been minimized.

Show comment
Hide comment
@alban

alban Jul 6, 2016

Member
$ rkt stop ff69f44f
kill: sending signal to 28584 failed: Operation not permitted
unable to terminate process 28584: <nil>
stop: stop: error stopping "ff69f44f-b29c-4929-a6b4-86718323a69a": exit status 1
stop: stop: failed to stop 1 pod(s)

Should we check for root? I guess it's fine as it is because with "fly", we could start a process running as non-root without systemd-nspawn, so that process could be stopped without being root.

But the message <nil> above is something to check.

Member

alban commented Jul 6, 2016

$ rkt stop ff69f44f
kill: sending signal to 28584 failed: Operation not permitted
unable to terminate process 28584: <nil>
stop: stop: error stopping "ff69f44f-b29c-4929-a6b4-86718323a69a": exit status 1
stop: stop: failed to stop 1 pod(s)

Should we check for root? I guess it's fine as it is because with "fly", we could start a process running as non-root without systemd-nspawn, so that process could be stopped without being root.

But the message <nil> above is something to check.

@alban

This comment has been minimized.

Show comment
Hide comment
@alban

alban Jul 6, 2016

Member
rkt_stop_test.go:26: ImageID redeclared in this block

Since this merely uses ImageID for imgID.path as a constant, the Sprintf line 61 could use image directly and remove ImageID/imgID completely.

Member

alban commented Jul 6, 2016

rkt_stop_test.go:26: ImageID redeclared in this block

Since this merely uses ImageID for imgID.path as a constant, the Sprintf line 61 could use image directly and remove ImageID/imgID completely.

@alban

This comment has been minimized.

Show comment
Hide comment
@alban

alban Jul 7, 2016

Member

TestRktStop fails on the "fly" flavor on fedora-24:

08:19:16 --- FAIL: TestRktStop (6.55s)
08:19:16    rkt_tests.go:115: Running command: /home/fedora/workspace/rkt-github-ci/os_type/fedora-24/stage1_flavor/fly/builds/build-rkt-fly/build-rkt-1.9.1+git/tmp/functional/rkt --dir=/tmp/datadir-038408751 --local-config=/tmp/localdir-751961538 --system-config=/tmp/systemdir-617708089 --user-config=/tmp/userdir-543922244 --insecure-options=image prepare /home/fedora/workspace/rkt-github-ci/os_type/fedora-24/stage1_flavor/fly/builds/build-rkt-fly/build-rkt-1.9.1+git/tmp/functional/test-tmp/rkt-stop-test.aci
08:19:16    rkt_tests.go:115: Running command: /home/fedora/workspace/rkt-github-ci/os_type/fedora-24/stage1_flavor/fly/builds/build-rkt-fly/build-rkt-1.9.1+git/tmp/functional/rkt --dir=/tmp/datadir-038408751 --local-config=/tmp/localdir-751961538 --system-config=/tmp/systemdir-617708089 --user-config=/tmp/userdir-543922244 --insecure-options=image run-prepared --interactive 1dc49fe8-4cb4-49b1-aed6-48a4883b7daf
08:19:16    rkt_stop_test.go:66: Running test #0, /home/fedora/workspace/rkt-github-ci/os_type/fedora-24/stage1_flavor/fly/builds/build-rkt-fly/build-rkt-1.9.1+git/tmp/functional/rkt --dir=/tmp/datadir-038408751 --local-config=/tmp/localdir-751961538 --system-config=/tmp/systemdir-617708089 --user-config=/tmp/userdir-543922244 stop 1dc49fe8-4cb4-49b1-aed6-48a4883b7daf
08:19:16    rkt_tests.go:115: Running command: /home/fedora/workspace/rkt-github-ci/os_type/fedora-24/stage1_flavor/fly/builds/build-rkt-fly/build-rkt-1.9.1+git/tmp/functional/rkt --dir=/tmp/datadir-038408751 --local-config=/tmp/localdir-751961538 --system-config=/tmp/systemdir-617708089 --user-config=/tmp/userdir-543922244 stop 1dc49fe8-4cb4-49b1-aed6-48a4883b7daf
08:19:16    rkt_tests.go:140: rkt terminated with unexpected status -1, expected 0
08:19:16        Output:
08:19:16        Enter text:
08:19:16 FAIL
Member

alban commented Jul 7, 2016

TestRktStop fails on the "fly" flavor on fedora-24:

08:19:16 --- FAIL: TestRktStop (6.55s)
08:19:16    rkt_tests.go:115: Running command: /home/fedora/workspace/rkt-github-ci/os_type/fedora-24/stage1_flavor/fly/builds/build-rkt-fly/build-rkt-1.9.1+git/tmp/functional/rkt --dir=/tmp/datadir-038408751 --local-config=/tmp/localdir-751961538 --system-config=/tmp/systemdir-617708089 --user-config=/tmp/userdir-543922244 --insecure-options=image prepare /home/fedora/workspace/rkt-github-ci/os_type/fedora-24/stage1_flavor/fly/builds/build-rkt-fly/build-rkt-1.9.1+git/tmp/functional/test-tmp/rkt-stop-test.aci
08:19:16    rkt_tests.go:115: Running command: /home/fedora/workspace/rkt-github-ci/os_type/fedora-24/stage1_flavor/fly/builds/build-rkt-fly/build-rkt-1.9.1+git/tmp/functional/rkt --dir=/tmp/datadir-038408751 --local-config=/tmp/localdir-751961538 --system-config=/tmp/systemdir-617708089 --user-config=/tmp/userdir-543922244 --insecure-options=image run-prepared --interactive 1dc49fe8-4cb4-49b1-aed6-48a4883b7daf
08:19:16    rkt_stop_test.go:66: Running test #0, /home/fedora/workspace/rkt-github-ci/os_type/fedora-24/stage1_flavor/fly/builds/build-rkt-fly/build-rkt-1.9.1+git/tmp/functional/rkt --dir=/tmp/datadir-038408751 --local-config=/tmp/localdir-751961538 --system-config=/tmp/systemdir-617708089 --user-config=/tmp/userdir-543922244 stop 1dc49fe8-4cb4-49b1-aed6-48a4883b7daf
08:19:16    rkt_tests.go:115: Running command: /home/fedora/workspace/rkt-github-ci/os_type/fedora-24/stage1_flavor/fly/builds/build-rkt-fly/build-rkt-1.9.1+git/tmp/functional/rkt --dir=/tmp/datadir-038408751 --local-config=/tmp/localdir-751961538 --system-config=/tmp/systemdir-617708089 --user-config=/tmp/userdir-543922244 stop 1dc49fe8-4cb4-49b1-aed6-48a4883b7daf
08:19:16    rkt_tests.go:140: rkt terminated with unexpected status -1, expected 0
08:19:16        Output:
08:19:16        Enter text:
08:19:16 FAIL
@iaguis

This comment has been minimized.

Show comment
Hide comment
@iaguis

iaguis Jul 7, 2016

Member

TestRktStop fails on the "fly" flavor on fedora-24:

We need to add --silent-sigterm to the inspect options so it returns 0. At least for the fly flavor.

Member

iaguis commented Jul 7, 2016

TestRktStop fails on the "fly" flavor on fedora-24:

We need to add --silent-sigterm to the inspect options so it returns 0. At least for the fly flavor.

@alban

This comment has been minimized.

Show comment
Hide comment
@alban

alban Jul 7, 2016

Member

@jjlakis thanks!

The "fly" test was added yesterday on Jenkins, that's why it only shows today. @iaguis is fixing the test on fly in a separate PR.

Member

alban commented Jul 7, 2016

@jjlakis thanks!

The "fly" test was added yesterday on Jenkins, that's why it only shows today. @iaguis is fixing the test on fly in a separate PR.

@alban alban merged commit 1f80a08 into rkt:master Jul 7, 2016

6 of 8 checks passed

Jenkins Build finished.
Details
Jenkins (fedora-24, fly flavor) Failure :(
Details
Jenkins (debian-8, coreos flavor) Success!
Details
Jenkins (fedora-22, coreos flavor) Success!
Details
Jenkins (fedora-23, coreos flavor) Success!
Details
Jenkins (fedora-24, coreos flavor) Success!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
semaphoreci The build passed on Semaphore.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment