-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: archive jobs to object storage #3721
Conversation
70a0d39
to
bd9e096
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #3721 +/- ##
==========================================
+ Coverage 68.55% 68.82% +0.27%
==========================================
Files 344 347 +3
Lines 51565 51989 +424
==========================================
+ Hits 35349 35783 +434
+ Misses 13935 13921 -14
- Partials 2281 2285 +4
☔ View full report in Codecov by Sentry. |
bd9e096
to
dd87ee0
Compare
) | ||
|
||
for _, job := range jobs { | ||
j, err := marshalJob(job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to use json.Encoder instead and write directly to the gzWriter
https://pkg.go.dev/encoding/json#NewEncoder.
Less lines of code and possibly less memory allocations
archiver/worker.go
Outdated
|
||
fileUploader, err := w.storageProvider.GetFileManager(workspaceID) | ||
if err != nil { | ||
log.Errorw("Skipping storing errors since no file manager is found", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under which conditions this error is possible? will we get error logs if customer has not configured backups ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under which conditions this error is possible?
This particular scenario can only happen when the workspace
is not part of the config this server instance serves.
The logs will contain workspaceID
if that's what you're talking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it reached this point, these jobs will be marked aborted in the next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use a sentinel error to make this error path more straightforward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only blocking request is checking file.Close() error
https://www.joeshaw.org/dont-defer-close-on-writable-files/
if err != nil { | ||
return "", fmt.Errorf("failed to open file %s: %w", path, err) | ||
} | ||
defer func() { _ = file.Close() }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[blocking] It is not a good idea to ignore the error from file.Close()
, during close we might need to flush data to disk, and if it fails data will be lost. Given the criticality of those backups, we need to be extra careful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are only opening the file for reading and sending its contents to the object storage.
If this operation fails, the jobs will not be marked as succeeded and archiver will retry, i.e. a new file will be created containing the same jobs. What should we do in case closing the file that we opened for reading fails here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, due to various comments the order of events was unclear. No a blocking comment, but if masrhalJob
we don't close the gzWriter
if err != nil { | ||
return "", fmt.Errorf("failed to open file %s: %w", path, err) | ||
} | ||
defer func() { _ = file.Close() }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are only opening the file for reading and sending its contents to the object storage.
If this operation fails, the jobs will not be marked as succeeded and archiver will retry, i.e. a new file will be created containing the same jobs. What should we do in case closing the file that we opened for reading fails here?
jobsdb.WithClearDB(options.ClearDB), | ||
jobsdb.WithDSLimit(&a.config.processorDSLimit), | ||
jobsdb.WithSkipMaintenanceErr(config.GetBool("Processor.jobsDB.skipMaintenanceError", false)), | ||
jobsdb.WithJobMaxAge( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future improvement: jobsdb.HandleT#doCleanup
method picks up jobs in executing state as well.
This means that both the following are theoretically possible:
- Duplicate terminal states are recorded for a job, one by the cleanup goroutine and another by the goroutine which marked jobs as executing.
- The cleanup goroutine marks a job as aborted and the other goroutine marks it as failed.
Thus we might need to reconsider in the future having the following in mind:
- Dealing with jobs in executing state
- Keeping pending events consistent
cf09e34
to
ea4d09e
Compare
665c6c8
to
30b3b84
Compare
Description
Processor writes the events to a new jobsdb - archivalJobsDB
Archiver reads jobs from archive jobsdb and uploads to user configured object storage(if eligible)
Enabled for gateway jobs(archived at a source level, and further in an hourly folder in the source prefix)
Linear Ticket
Notion Link
feed the archival
Security