Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PDS adapter steps resilient to network problems #2280

Closed
de-jcup opened this issue May 31, 2023 · 0 comments · Fixed by #2286
Closed

Make PDS adapter steps resilient to network problems #2280

de-jcup opened this issue May 31, 2023 · 0 comments · Fixed by #2286

Comments

@de-jcup
Copy link
Member

de-jcup commented May 31, 2023

Situation

After #2273 we are able to deploy continously new PDS versions. That works fine when we have K8s and use the continous deployment there.

But when we have only ONE instance for the PDS defined, and we directly kill the PDS process we have a downtime and there is no response to SecHub server while executing a SecHub job via PDS adapter:

None of the consultants (1) gave any proposal, so rethrow exception
 org.springframework.web.client.ResourceAccessException:
I/O error on GET request for
"https://single-instance-pds-scancode:8123/api/job/33d5b1ce-f558-4da5-834c-771ab97246ba/status": 
Connect to single-instance-pds-scancode:8123 [single-instance-pds-scancode/192.138.120.134] failed: 
Connection refused; nested exception is org.apache.http.conn.HttpHostConnectException: 
Connect to single-instance-pds-scancode:8123 [pds-scancode/192.138.120.134] failed: Connection refused
...

The same when there would be any network connection problems.

Wanted

When there is temporary no communication available to PDS server, the PDS adapter call shall be resilient and retry the current step

Solution

Provide a resilience consultant for PDS adapter which is configurable (by executor configuration parameters) and has some nice defaults.
It shall be able handle per default 3 resource access exceptions and to wait 10 seconds before retry.

Details

Normally the resilience consultants are used by product executors, but it makes more sense to use them inside PDS adapter directly for every single step to avoid breaking the complete running PDS execution (and retry by PDSResilienceConsultant)

For example: when the upload is not possible because of network problems, try it multiple times, if the job creation does not work, try it multiple times, if the job wait fails, try ONE wait multiple times before final exit...

Existing PDS resilience part in executor

After investigating and fixing #2281 there is the question if the PDS resilient consultant is really still helpful here or if the resilience logic shall all be done inside the adapter because of treating steps better.

PDSResilienceConsultant does handle HttpClientErrorException errors, means: There is at least connection available and those errors should normally not happen because of network problems.

Because this another type of error, we keep PDSResilienceConsultant as is at the moment (maybe later we could move this error handling also into the adapter, but this is not part of this issue).

@de-jcup de-jcup changed the title Make PDS adapter more resilient to resource exceptions Make PDS adapter resilient to resource exceptions May 31, 2023
@de-jcup de-jcup self-assigned this May 31, 2023
@de-jcup de-jcup added this to the Server 0.43.0 milestone May 31, 2023
de-jcup added a commit that referenced this issue Jun 1, 2023
- changed unit tests to junit5
- adopted unit tests to work without spring
@de-jcup de-jcup changed the title Make PDS adapter resilient to resource exceptions Make PDS adapter steps resilient to network problems Jun 1, 2023
de-jcup added a commit that referenced this issue Jun 2, 2023
- ResilientActionExecutor infinite loop problem handled
- renamed our FailableRunnable to RunOrFail, avoids name clashes
  with FailableRunnable from apache commons lang
- added unit tests
de-jcup added a commit that referenced this issue Jun 2, 2023
- ResilientActionExecutor infinite loop problem handled
- renamed our FailableRunnable to RunOrFail, avoids name clashes
  with FailableRunnable from apache commons lang
- added unit tests
- added missing SPDX headers
de-jcup added a commit that referenced this issue Jun 2, 2023
@sven-dmlr sven-dmlr modified the milestones: Server 0.43.0, Server 1.0.0 Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants