-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRITICAL: if first tsl.refresh() fails, then library gets into invalid state and not able to verify any signatures #109
Comments
As a side note, talking to developers, it seems that everybody has gotten this problem in production at some point, and everybody had to spend time investigating why OCSP suddenly starts failing after working properly with the same config (pointing to a useless wiki page in this case), then resorting to hourly or even more frequent TSL refreshes as a workaround. |
Thank you for your valuable input on debugging this problem. The fix has been added to the next release scope, which is currently planned in second part of May. |
In regards to getting information about the TSL state after the refresh, in the current Digidoc4j (version 4.3.0), the summary of the TSL validation can be queried via For the next release of Digidoc4j, we are proposing an option to configure a callback: public interface TSLRefreshCallback {
boolean ensureTSLState(TLValidationJobSummary summary);
} The callback will enable the execution of the same TSL health checking logic regardless of whether the TSL refresh was triggered manually via The callback will be called after the TSL refresh has been run but before the TSL refresh time is updated. Depending on the state of the TSL, the callback can throw an exception in order to completely stop further processing, or it can return either A default implementation of the callback will be provided by Digidoc4j, but creating custom implementations of the callback will also be possible in case some specific behaviour is necessary for deciding whether a TSL refresh has been successful or not. Any feedback on the proposal is welcome. |
We have nothing against a callback, but the default behaviour SHOULD NOT replace current state of TSL with an empty one in case of loading failure. That means it should continue using the old TSL until it gets a working new one. Whether the refresh was triggered manually or automatically during signing. The library should never be left in an invalid state where it doesn't accept any signatures just because europa.int website did not respond at the time of refresh. |
If LOTL or any required trusted lists (or all trusted lists) fail to refresh, new default handling will throw an exception which stops the processes that triggered the TSL refresh and marks the TSL in Digidoc4j as expired. Meaning that it is not possible to continue with validation/signing until the refresh succeeds. This is done to ensure that LOTL and TSL are refreshed in regular intervals (by default 24h). TSL and LOTL themselves have expiration of no longer than 6 months, but it is obligatory for the TSL provider to publish all the changes in trust services within 24h of receiveing the change request. This means that any change with trust service (compromise of CA certificate for example) will be present in TSL on 24h interval. Using TSL for longer period increases the possibility of using outdated information. It is possible to override this behavior with custom callback function, if the risks of using old TSL info are acceptable for your business case. |
It happened again to our production system today, we retried it automatically (as a workaround to this bug), and it was successful the second time:
I think that a much better default would be the following in case of refresh failure:
Such behaviour will not need tweaking by every developer. That would be a sane default bahavour. I am pretty sure that most developers want to have a stable system by default. Currently, people discover this problem in production and have to implement workarounds. It would be nice if workarounds would not be needed for most projects. BTW, as a side note, |
Thank you for your input on this topic. LOTL/TSL handling is a complicated process handled by multiple software modules. Since we are already in the release process of the version 5.0.0, its not feasible to introduce any more changes right now. However, we are open to all suggestions and we are gathering feedback in order to have more information for considering improvements in TSL handling for the next release. |
@naare what is the new behaviour in case refresh() is not called manually? Will it just keep failing after the first fail? |
Digidoc4j 5.0.0 has now been released with changes in TSL loading. The original problem where refresh was not automatically retried when the first load failed (cache was still marked as updated) should be fixed. Please take a look on the release notes where you can also find references to relevant wiki chapters. |
I will close the issue as the 5.0.0 has been released for a year and no additional feedback has been given. |
Like is recommended, we call
tsl.refresh()
manually on the app start.However, if there is no cache yet and the first request fails, the library does not remember that it is left in the invalid state and doesn't try to reload TSL on demand like it would do if
tsl.refresh()
wouldn't be called manually.If this happens,
tsl.refresh()
logs some errors from other threads, but completes without exceptions.All subsequent signature verifications result in OCSP errors until
tsl.refresh()
is called manually again and doesn't fail internally.We have experienced this problem in several different applications on production. This is a critical issue.
Log of TSL loading failure:
Logs when signature verification fails (due to previous TSL loading failure):
The text was updated successfully, but these errors were encountered: