-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support/historyarchive: Provide mechanism for retrying XDR stream errors #1899
support/historyarchive: Provide mechanism for retrying XDR stream errors #1899
Conversation
4ab8df8
to
1df1212
Compare
If a SingleLedgerStateReader fails to read a single bucket entry the reading process is aborted. This commit provides a retry mechanism which creates new XDR streams in case of XDR stream errors.
1df1212
to
7f08075
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inability to seek is unfortunate, looks good and sounds like a good idea to support retries 👏.
7f08075
to
28d1396
Compare
28d1396
to
f50b1e0
Compare
"github.com/stellar/go/xdr" | ||
) | ||
|
||
var log = logpkg.DefaultLogger.WithField("service", "expingest") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't use logger because it can interfere with other log package used by ingest
package user. That's why we added #1510. I think that returning an error after maxStreamRetries
is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it to log = logpkg.DefaultLogger.WithField("component", "SingleLedgerStateReader")
I would like to keep the log statements because it could be useful to know which type of errors cause the stream to disconnect.
Also, we only log whenever we have to retry the xdr stream and that condition shouldn't happen so often. So I don't think it will end up bloating our log stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about our log stream but the future users that use exp/ingest
in their apps. Personally, I wouldn't want any of packages in my my app to log anything to the output without my consent. @leighmcculloch what's your opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Go have an equivalent of SLF4J? I don't think logging from a library is unreasonable, as long as it's not imposing a logger on the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to leave logging, I think the best way would be to pass an object that implements Logger
interface with Info
, Error
, ..., methods. Unfortunately, there is no standard library interface like this (see: golang/go#13182).
break | ||
} | ||
|
||
stream.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should handle err
here. Can lead to undetected memory/fd leak otherwise.
Co-Authored-By: Bartek Nowotarski <bartek@nowotarski.info>
@bartekn could you take another look? thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except logging but I don't want to block merging because of this. Great we have retry code finally, less error notifications expected 😄.
53a4294
to
a14b17a
Compare
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
If a SingleLedgerStateReader fails to read a single bucket entry the reading process is aborted.
This commit provides a retry mechanism which creates new XDR streams in case of XDR stream errors.
Close #1649
Why
XdrStream.ReadOne
method returns this error from time to time:This happens when the stream connection times out. To make ingestion pipelines more robust we should provide a way to retry when encountering these errors.
Known limitations
Assume we have an XDR stream for a very large bucket and we encounter a transient error halfway through. Unfortunately, when we create the new XDR stream, we need to consume half of the bucket file so that we resume from the same point of failure. This means that we waste time and bandwidth re-downloading the part of the bucket file we have already successfully processed. We cannot avoid re-downloading because the history archive buckets are GZIP compressed and it is not possible to seek to a random position in a gizp compressed file (see https://stackoverflow.com/questions/429987/compression-formats-with-good-support-for-random-access-within-archives ).
If it were possible to seek to any position within a gzip file we would be able to start the new XDR stream from the point of failure without any wasted bandwidth using an HTTP range request (see https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests )