Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubexit does not shutdown child when tombstone is written before it is started #8

Open
RemcodM opened this issue Aug 10, 2020 · 1 comment · May be fixed by #10 or #11
Open

Kubexit does not shutdown child when tombstone is written before it is started #8

RemcodM opened this issue Aug 10, 2020 · 1 comment · May be fixed by #10 or #11

Comments

@RemcodM
Copy link
Contributor

RemcodM commented Aug 10, 2020

Hi again!

I'm back with a new race condition that we discovered while running jobs with kubexit on a very busy cluster. We have a job with 2 containers. Let's call them A and B. B is a sidecar for A. That means B is shutdown by kubexit whenever A exits. In the same way, A waits for B to be started before it starts up (birth dependency).

Most of the time, B is correctly shutdown. However, we have encoutered a case in which the cluster is very busy and the sidecar container is not executed directly. This causes A to timeout while waiting for B and a tombstone is written. However, after some time B is still started and this tombstone seems to be ignored. See the logs below.

Log for A:

2020-08-07T08:52:44.293468982Z Name: a
2020-08-07T08:52:44.293489226Z Graveyard: /tools
2020-08-07T08:52:44.293493754Z Tombstone: /tools/a
2020-08-07T08:52:44.293496807Z Birth Deps: b
2020-08-07T08:52:44.293596327Z Death Deps: N/A
2020-08-07T08:52:44.293607134Z Birth Timeout: 30s
2020-08-07T08:52:44.293668456Z Grace Period: 30s
2020-08-07T08:52:44.293674509Z Pod Name: <job-pod-name-redacted>
2020-08-07T08:52:44.293678070Z Namespace: default
2020-08-07T08:52:44.293827152Z Watching pod updates...
2020-08-07T08:52:44.313168690Z Event Type: ADDED
2020-08-07T08:53:14.293834919Z Error: timed out waiting for birth deps to be ready: 30s
2020-08-07T08:53:14.293863713Z Skipping ShutdownNow: child process not running
2020-08-07T08:53:14.293868155Z Waiting for child process to exit...
2020-08-07T08:53:14.293871619Z Pod Watch(<job-pod-name-redacted>): done
2020-08-07T08:53:14.295046706Z Child Exited(-1): exec: not started
2020-08-07T08:53:14.295057380Z Updating tombstone: /tools/a

Log for B:

2020-08-07T08:53:45.421230593Z Name: b
2020-08-07T08:53:45.421250095Z Graveyard: /tools
2020-08-07T08:53:45.421253775Z Tombstone: /tools/b
2020-08-07T08:53:45.421257170Z Birth Deps: N/A
2020-08-07T08:53:45.421260596Z Death Deps: a
2020-08-07T08:53:45.421276535Z Birth Timeout: 30s
2020-08-07T08:53:45.421280010Z Grace Period: 30s
2020-08-07T08:53:45.421283380Z Pod Name: N/A
2020-08-07T08:53:45.421338269Z Namespace: N/A
2020-08-07T08:53:45.421349512Z Watching graveyard...
2020-08-07T08:53:45.421353055Z Starting: <child-command-redacted>
2020-08-07T08:53:45.423142247Z Creating tombstone: /tools/b
2020-08-07T08:53:45.423483854Z Tombstone modified: b
2020-08-07T08:53:45.423494806Z Tombstone modified: b
2020-08-07T08:53:45.423498733Z Waiting for child process to exit...
2020-08-07T08:53:45.464698537Z <some logs for child here>

Notice the timestamps: B is started at 08:53:45, while A was already finished at 08:53:14. I think kubexit ignores existing tombstones when launching. When I have some spare time, I will look into it.

RemcodM added a commit to RemcodM/kubexit that referenced this issue Aug 10, 2020
This commit changes the `Watch` function inside the `tombstone` package to
also emit an initial event besides the `fsnotify` events. This initial event
is called immediatly when `Watch` is called and the watcher has been setup.

This change allows kubexit to detect tombstones written before kubexit was
started. This prevents possible race conditions as described by karlkfi#8.

In order for this change to work, the `tombstone.EventHandler` type was
changed. It now requires a function with 3 arguments: The graveyard, the
tombstone and the operation instead of an `fsnotify.Event`. Reason being
that the initial event is not an `fsnotify.Event`. The functions implementing
an `tombstone.EventHandler` are changed accordingly.

This change on its own introduces a new bug where the tombstone is written
as part of an initial event, but the child process will still start because
`child.Start()` is being called after the watcher has been setup. To overcome
this issue, the shutdown state of the child is tracked in a new flag, which is
set if `ShutdownNow()` or `ShutdownWithTimeout()` is executed.
RemcodM added a commit to RemcodM/kubexit that referenced this issue Aug 10, 2020
This commit changes the `Watch` function inside the `tombstone` package to
also emit an fake 'Created' event besides the real `fsnotify` events. This
initial event is send out immediatly when `Watch` is called and the watcher
has been setup.

This change allows kubexit to detect tombstones written before kubexit was
started. This prevents possible race conditions as described by karlkfi#8.

This change on its own introduces a new bug where the tombstone is written
as part of an initial event, but the child process will still start because
`child.Start()` is being called after the watcher has been setup. To overcome
this issue, the shutdown state of the child is tracked in a new flag, which is
set if `ShutdownNow()` or `ShutdownWithTimeout()` is executed.
stephpalis referenced this issue in acquia/fn-kubexit Jul 13, 2021
This commit changes the `Watch` function inside the `tombstone` package to
also emit an initial event besides the `fsnotify` events. This initial event
is called immediatly when `Watch` is called and the watcher has been setup.

This change allows kubexit to detect tombstones written before kubexit was
started. This prevents possible race conditions as described by #8.

In order for this change to work, the `tombstone.EventHandler` type was
changed. It now requires a function with 3 arguments: The graveyard, the
tombstone and the operation instead of an `fsnotify.Event`. Reason being
that the initial event is not an `fsnotify.Event`. The functions implementing
an `tombstone.EventHandler` are changed accordingly.

This change on its own introduces a new bug where the tombstone is written
as part of an initial event, but the child process will still start because
`child.Start()` is being called after the watcher has been setup. To overcome
this issue, the shutdown state of the child is tracked in a new flag, which is
set if `ShutdownNow()` or `ShutdownWithTimeout()` is executed.
stephpalis referenced this issue in acquia/fn-kubexit Jul 13, 2021
* Ensure tombstones created before kubexit started are read

This commit changes the `Watch` function inside the `tombstone` package to
also emit an initial event besides the `fsnotify` events. This initial event
is called immediatly when `Watch` is called and the watcher has been setup.

This change allows kubexit to detect tombstones written before kubexit was
started. This prevents possible race conditions as described by #8.

In order for this change to work, the `tombstone.EventHandler` type was
changed. It now requires a function with 3 arguments: The graveyard, the
tombstone and the operation instead of an `fsnotify.Event`. Reason being
that the initial event is not an `fsnotify.Event`. The functions implementing
an `tombstone.EventHandler` are changed accordingly.

This change on its own introduces a new bug where the tombstone is written
as part of an initial event, but the child process will still start because
`child.Start()` is being called after the watcher has been setup. To overcome
this issue, the shutdown state of the child is tracked in a new flag, which is
set if `ShutdownNow()` or `ShutdownWithTimeout()` is executed.
@jackblk
Copy link

jackblk commented Sep 21, 2022

For future reader (and me), there are some working forks:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants