-
Notifications
You must be signed in to change notification settings - Fork 450
Description
Objective
The goal of this feature is to export the git-sync errors to a file which can be shared by other sidecar containers.
Motivation
The current git-sync process outputs the error information to standard out, which is inaccessible from outside the container. Users have to dump the logs using kubectl logs in order to check the error details in the git-sync process. This proposal provides users the capability to check the errors directly from other sidecar containers.
Goals
- Share the most recent git-sync error with other sidecar containers.
Non-Goals
- This proposal does not intend to redirect all logging information to a file.
- The error file only keeps the most recent error. It is not supposed to be used as the error history.
High-level Design
There will be a new flag to indicate whether to generate the error file. If the flag is turned on, the error file may or may not exist depending on the sync status.
If sync succeeds, the file will be removed. Otherwise, the file will be created or overwritten with the most recent error message.
To guarantee atomic update to the error file, we always write to a temp file first and then rename that to the error file. This enables the client to know the file has been updated even if the contents are the same.
The client will poll on the error file to get the error message on failures.
Implementation details
Add a flag for --error-file
var flErrorFile = flag.String("error-file", envString("GIT_SYNC_ERROR_FILE", ""), "the path to the error file where to dump the most recent error details")
When to export the error information
1. Before calling os.Exit() with a non-zero code
There are some common steps if the process exits abnormally with a non-zero code: output the error to standard error, print the usage information (optional), export the error to the error file and exit the process. Therefore, we extract these steps into a function.
// handleError prints the error to the standard error, prints the usage if the `printUsage` flag is true,
// exports the error to the error file and exits the process with the exit code.
func handleError(exitCode int, printUsage bool, format string, a ...interface{}) {
fmt.Fprintf(os.Stderr, format, a...)
if printUsage {
flag.Usage()
}
exportError(fmt.Sprintf(format, a...))
os.Exit(exitCode)
}
Below are the cases when the process exits with a non-zero code:
-
Flag initialization errors
If any flag is invalid, export the error before exiting by callinghandleError(1, true|false, "ERROR: can't ....: %v\n", err) -
Unhandled pid1 errors: exit via
handleError(127, false, "ERROR: unhandled pid1 error: %v\n", err) -
Errors in
syncRepo
log.Error(err, "too many failures, aborting", "failCount", failCount)
exportError(err.Error())
os.Exit(1)
- Errors in
revIsHash
log.Error(err, "can't tell if rev is a git hash, exiting", "rev", *flRev)
exportError(err.Error())
os.Exit(1)
2. Before retry
log.Error(err, "unexpected error syncing repo, will retry")
exportError(err.Error())
log.V(0).Info("waiting before retrying", "waitTime", waitTime(*flWait))
cancel()
When to clean up the error file
If the sync is successful, we need to clean up the error file generated from the previous sync if it exists. There are three cases to clean up the file:
1. If it is a one-time sync
if *flOneTime {
deleteErrorFile()
os.Exit(0)
}
2. If --rev is a commit hash
else if isHash {
log.V(0).Info("rev appears to be a git hash, no further sync needed", "rev", *flRev)
deleteErrorFile()
sleepForever()
}
3. Before the next sync
failCount = 0
deleteErrorFile()
log.V(1).Info("next sync", "wait_time", waitTime(*flWait))
Validation, export and delete functions
The validation function for the new flag:
// validateErrorPath validates if the parent directory of `--error-file` exits and creates one if not exits.
// It also checks if the parent directory has the read and write permission.
func validateErrorPath(errPath string) {
errDir := filepath.Dir(errPath)
if errDirInfo, err := os.Stat(errDir); err != nil {
fmt.Printf("The error parent path %s doesn't exist. Attempt to create it\n", errDir)
if err = os.MkdirAll(errDir, 0755); err != nil {
fmt.Fprintf(os.Stderr, "ERROR: can't create directory: %v\n", err)
os.Exit(1)
}
} else if !errDirInfo.IsDir() {
fmt.Fprintf(os.Stderr, "ERROR: The error parent path %s is not a directory\n", errDir)
os.Exit(1)
} else if err = syscall.Access(errDir, syscall.O_RDWR); err != nil {
fmt.Fprintf(os.Stderr, "ERROR: can't access %s: %v\n", errDir, err)
os.Exit(1)
}
}
The export function:
// exportError writes the error content to the error file.
func exportError(content string) {
if *flErrorFile != "" {
tmpFile, err := ioutil.TempFile(os.TempDir(), "err-")
if err != nil {
fmt.Fprintf(os.Stderr,"Cannot create temporary file: %v\n", err)
os.Exit(1)
}
if _, err = tmpFile.WriteString(content); err != nil {
fmt.Fprintf(os.Stderr, "ERROR: can't write to temporary file: %v\n", err)
os.Exit(1)
}
defer func() {
if err := tmpFile.Close(); err != nil {
fmt.Fprintf(os.Stderr, "ERROR: can't close temporary file: %v\n", err)
}
}()
if err := os.Rename(tmpFile.Name(), *flErrorFile); err != nil {
fmt.Fprintf(os.Stderr, "ERROR: can't rename to error file: %v\n", err)
os.Exit(1)
}
}
}
The delete function to clean up the error file:
// deleteErrorFile deletes the error file.
func deleteErrorFile() {
if *flErrorFile != "" {
if _, err := os.Stat(*flErrorFile); err != nil {
if os.IsNotExist(err) {
return
} else {
fmt.Fprintf(os.Stderr, "ERROR: can't check the status of the error file: %v\n", err)
os.Exit(1)
}
}
if err := os.Remove(*flErrorFile); err != nil {
fmt.Fprintf(os.Stderr, "ERROR: can't delete the error file: %v\n", err)
os.Exit(1)
}
}
}