Skip to content

Conversation

@wallrj-cyberark
Copy link
Member

@wallrj-cyberark wallrj-cyberark commented Sep 2, 2025

The use of backoff.Retry in LoginUsernamePassword was hiding authentication errors, because the response.Success=false was not being wrapped in a backoff.Permanent error.
This caused the identity client to hang when supplied with bad credentials.

The simplest solution would have been to wrap this error:

if !startAuthResponse.Success {
return response, fmt.Errorf("got a failure response from request to start authentication: message=%q, error=%q", startAuthResponse.Message, startAuthResponse.ErrorID)
}

I have chosen to remove the backoff.Retry altogether, because the agent already retries failed "pushes" higher up the stack.

  • Eliminated the use of the backoff library for retry logic in authentication methods
  • Simplified error handling by directly returning errors without wrapping them in backoff.Permanent.
  • Removed unnecessary retry logic for 4xx errors in doStartAuthentication and doAdvanceAuthentication.
  • Improved code readability by streamlining the LoginUsernamePassword function.

… real APIs

- Introduced `identity_test.go` to validate `LoginUsernamePassword` behavior.
- Added mock server support for `failureUser` to simulate bad user scenarios.
- Included new test data for `failureUser` in `start_authentication_bad_user_session_id.json`.
- Implemented `SkipIfNoEnv` utility to conditionally skip tests based on environment variables.

Signed-off-by: Richard Wall <richard.wall@cyberark.com>
- Eliminated the use of the backoff library for retry logic in authentication methods
- Simplified error handling by directly returning errors without wrapping them in backoff.Permanent.
- Removed unnecessary retry logic for 4xx errors in `doStartAuthentication` and `doAdvanceAuthentication`.
- Improved code readability by streamlining the `LoginUsernamePassword` function.

Signed-off-by: Richard Wall <richard.wall@cyberark.com>
"EndpointAuthenticationEnabled": false
},
"Version": "1.0",
"SessionId": "bad-user-session-id",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SessionId value is important but a bit of a hack.
We want this SessionId to be passed on to the "AdvanceAuthentication" step, and causes that step to fail.
This simulates how the real identity API fails when you use an unknown username.

This change isn't related to the bad-password-hanging problem, but it allows me to write better integration tests for the LoginUsernamePassword function.


backoffPolicy := backoff.NewConstantBackOff(10 * time.Second)

_, err := backoff.Retry(ctx, operation, backoff.WithBackOff(backoffPolicy))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the root cause of the bad-password-hanging problem.
The client was retrying the start and advance authentication steps indefinitely every 10 seconds.


// If we got a 4xx error, we shouldn't retry
return backoff.Permanent(err)
return fmt.Errorf("got unexpected status code %s from request to advance authentication in CyberArk Identity API", httpResponse.Status)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all these backoff.Permanet wrappers because they were intended to break out of the backoff retry in LoginUsernamePassword...it was not meant to signal permanent failures to functions higher up.

},
expectedError: `^got a failure response from request to advance authentication: ` +
`message="Authentication \(login or challenge\) has failed\. ` +
`Please try again or contact your system administrator\."`,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are error prefixes. The suffix contains error IDs which I didn't have time to look into. I'm not sure if the error IDs are constants or whether they are unique to each request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran these tests before committing the fix and observed the tests hang until I killed them:

$ make test-unit
...
    mock.go:79: POST /Security/StartAuthentication
    identity.go:303: I0903 15:49:03.346458] made successful request to StartAuthentication source="Identity.doStartAuthentication" summary="NewPackage"
    mock.go:79: POST /Security/AdvanceAuthentication
    mock.go:79: POST /Security/StartAuthentication
    identity.go:303: I0903 15:49:13.463032] made successful request to StartAuthentication source="Identity.doStartAuthentication" summary="NewPackage"
    mock.go:79: POST /Security/AdvanceAuthentication
    mock.go:79: POST /Security/StartAuthentication
    identity.go:303: I0903 15:49:23.471064] made successful request to StartAuthentication source="Identity.doStartAuthentication" summary="NewPackage"
    mock.go:79: POST /Security/AdvanceAuthentication

...

signal: interrupt

DONE 206 tests, 2 failures in 261.042s
make: *** [make/test-unit.mk:5: test-unit] Error 130

you can see the 10s intervals between retry attempts.

@wallrj-cyberark wallrj-cyberark marked this pull request as ready for review September 3, 2025 15:53
@wallrj-cyberark wallrj-cyberark merged commit 90f92d7 into master Sep 3, 2025
2 checks passed
@wallrj-cyberark wallrj-cyberark deleted the handle-auth-failures branch September 3, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants