Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

clean ${PAI_WORK_DIR} before mv content to this folder #3695

Merged
merged 5 commits into from Oct 15, 2019

Conversation

Binyang2014
Copy link
Contributor

In prod env, we saw when node down and up. The pod running on this node
may be restarted by kubelet. Even though the restartPolicy set to
Never. If that happens, the init container script will failed when
executing mv command. To prevent this, we clean the ${PAI_WORK_DIR}
before mv content to this folder.

In prod env, we saw when node down and up. The pod running on this node
may be restarted by kubelet. Even though the `restartPolicy` set to
`Never`. If that happens, the init container script will failed when
executing `mv` command. To prevent this, we clean the `${PAI_WORK_DIR}`
before mv content to this folder.
@yqwang-ms
Copy link
Member

yqwang-ms commented Oct 9, 2019

We should make our runtime Idempotent, since k8s restartPolicy=Never seems to be at least once execution like FC. So please confirm it and refine other places in runtime code.

Seems at least for below code, after create container, kubelet does not record the container id for recovery, so if kubelet is killed just after StartContainer, kubelet will create another container even if restartPolicy=Never, because kubelet does not know it has already created before.

https://github.com/kubernetes/kubernetes/blob/e62ed95ecd300d0dc56d5d19dcbae93e3794d1bc/pkg/kubelet/kuberuntime/kuberuntime_container.go#L126-L148

https://github.com/kubernetes/kubernetes/blob/e62ed95ecd300d0dc56d5d19dcbae93e3794d1bc/pkg/kubelet/container/helpers.go#L64-L71

@@ -78,6 +79,10 @@ PAI_RUNTIME_DIR=${PAI_WORK_DIR}/runtime.d

PAI_LOG_DIR=${PAI_WORK_DIR}/logs/${FC_POD_UID}

# Clean ${PAI_WORK_DIR} and ${PAI_LOG_DIR} since they may contain last execution content. (rarely happen, but seen in real world)
rm -rf ${PAI_WORK_DIR}/*
rm -rf ${PAI_LOG_DIR}/*
Copy link
Member

@abuccts abuccts Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to clean log dir? keep the previous log may be helpful #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will change the code


In reply to: 332808965 [](ancestors = 332808965)

# Move previous logs to another folder. Notice: for init.log, some part will append to previous log file
LOG_FILES=$(find $PAI_LOG_DIR -maxdepth 1 -type f)
if [[ ! -z $LOG_FILES ]]; then
RANDOM=$$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we print warning log here?

Copy link
Member

@yqwang-ms yqwang-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Binyang2014 Binyang2014 merged commit 5512170 into master Oct 15, 2019
@Binyang2014 Binyang2014 deleted the binyli/mv_failed branch October 16, 2019 02:03
@hzy46 hzy46 mentioned this pull request Apr 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants