-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry hangfire jobs #197
base: master
Are you sure you want to change the base?
Retry hangfire jobs #197
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, I would like to generalize the transactional outbox. There are other places where I would like to use it.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @johnml1135)
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs
line 270 at r1 (raw file):
) ); o.AddRepository<PlatformMessage>("platform_message_outbox");
The naming convention for collections is a plural noun, so if we change the name of the doc to OutboxMessage
, it would be outbox_messages
.
src/SIL.Machine.AspNetCore/Models/PlatformMessage.cs
line 3 at r1 (raw file):
namespace SIL.Machine.AspNetCore.Models; public record PlatformMessage : IEntity
We might want to use this for other messages in the future, so I would just call this Message
or OutboxMessage
.
src/SIL.Machine.AspNetCore/Models/PlatformMessage.cs
line 7 at r1 (raw file):
public string Id { get; set; } = ""; public int Revision { get; set; } = 1; public required string Method { get; init; }
I think an enum would be better.
src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs
line 5 at r1 (raw file):
namespace SIL.Machine.AspNetCore.Services; public class PlatformMessageOutboxService(
You should separate the hosted service from the service that enqueues messages. The hosted service should only run on a single server (the engine server). Messages can be enqueued from any server. You won't be able to use the _messageInOutbox
flag. You also won't need to lock in the ProcessMessagesAsync
method.
src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs
line 67 at r1 (raw file):
bool allMessagesSuccessfullySent = true; IReadOnlyList<PlatformMessage> messages = await _messages.GetAllAsync();
Optimally, we would watch the collection for changes using the SubscribeAsync
method. If we use polling, it might be more efficient to use ExistsAsync
to check if the outbox is empty.
src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs
line 88 at r1 (raw file):
switch (message.Method) { case "BuildStartedAsync":
You can drop the Async
from the names.
src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs
line 205 at r1 (raw file):
// log error message.Attempts++; await _messages.UpdateAsync(m => m.Id == message.Id, b => b.Set(m => m.Attempts, message.Attempts));
You can use the Inc
operator.
Is there a good way to? I couldn't think of a way to convert function calls to strings (or even binary data) that could be stored in a mongoDB. Unless there is some package that can auto-grab the GRPC calls and make them into an outbox, I don't have a good solution. I couldn't find one in my quick search. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Ok - I'll subscribe to changes in the MongoDB instead to have a quick turn time. |
Previously, ddaspit (Damien Daspit) wrote…
Switched to a subscribe - and checking first with Exists. |
Previously, ddaspit (Damien Daspit) wrote…
Ok - I see that Async is not relevant when we are queueing up the message to be sent. |
Previously, ddaspit (Damien Daspit) wrote…
Done |
Here is what is likely the issue: https://discuss.hangfire.io/t/recurring-jobs-do-not-automatically-get-retried-after-application-crash-net-core-service/9160/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some ideas on how we could generalize it. You don't need to worry about it right now.
Also, inserting a message to the outbox and the corresponding model update need to occur in the same transaction. For example, the IBuildJobService.BuildJobStartedAsync
call and the IPlatformService.BuildStartedAsync
call should occur in the same transaction.
Reviewed 7 of 8 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @johnml1135)
src/SIL.Machine.AspNetCore/Services/IMessageOutboxService.cs
line 5 at r3 (raw file):
public interface IMessageOutboxService { public Task EnqueueMessageAsync(
We need to ensure that messages are delivered in order for a particular build. I would pass in a groupId
to this method that is used to ensure that all messages with the same group id are delivered in order. In the case of a build, the groupId
would be the build id.
src/SIL.Machine.AspNetCore/Services/ServalPlatformService.cs
line 118 at r3 (raw file):
); } await _outboxService.EnqueueMessageAsync(
I'm concerned about the size of the MongoDB document. This could potentially get very big and there is a size limit on a MongoDB document. Maybe we could keep the pretranslate.trg.json
file until we have successfully sent the pretranslations to Serval.
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs
line 39 at r3 (raw file):
if (message.Attempts > 3) // will fail the 5th time { await PermanentlyFailedMessage(message, e);
It could potentially fail 5 times very quickly. An expiration time would be better.
Ok fixed. |
Previously, ddaspit (Damien Daspit) wrote…
Major updates. |
Previously, ddaspit (Damien Daspit) wrote…
We should talk about this - what is the desired behavior if a message is unable to be sent and why? Why would we be retrying multiple times and what might resolve itself - and how we should respond? |
From MongoDB:
Since we only have one process, it will be the seconds and the counter that determine the order, which should work. In contrast, UtcNow relies on the underlying system timer, which may be up to 15ms - potentially creating a conflict. |
Previously, johnml1135 (John Lambert) wrote…
4 days - basically, how long could the Serval server conceivably be down? |
f41b5a1
to
3e19b1d
Compare
Previously, ddaspit (Damien Daspit) wrote…
Saved the large files to hard drive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 24 files reviewed, 5 unresolved discussions (waiting on @johnml1135)
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs
line 38 at r4 (raw file):
This seems like a relevant quote from the documentation:
While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:
Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and
Are generated by clients, which may have differing system clocks.
Unfortunately, this means we can't rely on the ObjectId
. This means we need to use a sequence number to guarantee the ordering. We have two options:
- Create an outbox collection that stores the current outbox state. The outbox state would contain the next sequence number. Use the
UpdateAsync
method to increment the sequence number and return the new value. The MongoDB update should be atomic. - Create a unique index on the sequence number field in the message collection. In a loop, query the database for the message with the max sequence number, increment the number, and attempt to insert the message with the new sequence number.
I think I would prefer option 1, since it doesn't require a loop and could be used to store other metadata about the outbox. It would also allow us to have multiple outboxes, which could be useful.
Previously, ddaspit (Damien Daspit) wrote…
Since we only have one client generating these (that is, for one build or engine ID, which is what we are concerned about) I believe the current solution should be acceptable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 24 files reviewed, 5 unresolved discussions (waiting on @johnml1135)
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs
line 38 at r4 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Since we only have one client generating these (that is, for one build or engine ID, which is what we are concerned about) I believe the current solution should be acceptable.
The engine server and the build server can generate messages. Even if we had only one client generating messages, I don't want our design to limit our ability to scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 25 files reviewed, 4 unresolved discussions (waiting on @ddaspit)
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs
line 38 at r4 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
The engine server and the build server can generate messages. Even if we had only one client generating messages, I don't want our design to limit our ability to scale.
1 - done.
f174807
to
2aa8fe8
Compare
2aa8fe8
to
ba69059
Compare
* Fixed auto-retry as per this forum post: https://discuss.hangfire.io/t/recurring-jobs-do-not-automatically-get-retried-after-application-crash-net-core-service/9160 * MongoDB can't handle documents greater than 16MB * Treat messages from one id as a group * Kill failing messages over 4 days old * Universal message incrementer * Add testing
ba69059
to
1b274be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 16 files at r4, 7 of 9 files at r5, 3 of 4 files at r6, 16 of 16 files at r8, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @johnml1135)
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs
line 267 at r8 (raw file):
); o.AddRepository<OutboxMessage>("outbox_messages"); o.AddRepository<Sequence>("outbox_message_index");
The collection should be named message_outboxes
to be consistent with the collection naming conventions that we currently use.
src/SIL.Machine.AspNetCore/Models/Sequence.cs
line 3 at r8 (raw file):
namespace SIL.Machine.AspNetCore.Models; public record Sequence : IEntity
This should be named MessageOutbox
.
src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs
line 234 at r8 (raw file):
async (ct) => { await using (await @lock.WriterLockAsync(cancellationToken: ct))
I don't think that the lock will work properly inside of a transaction. The lock needs to be acquired and released outside of the transaction.
src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs
line 30 at r8 (raw file):
) )!; string id = Sequence.IndexToObjectIdString(outboxIndex.CurrentIndex);
The order index and the message id should be separate properties.
src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs
line 38 at r8 (raw file):
RequestContent = requestContent }; if (requestContent.Length > MaxDocumentSize)
I would prefer to always save the pretranslations to disk, so that the behavior is consistent and easier to debug. This would also allow us to avoid deserializing and serializing the pretranslations an extra time. The RequestContent
could be the path to the pretranslations json file.
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs
line 5 at r8 (raw file):
namespace SIL.Machine.AspNetCore.Services; public class MessageOutboxHandlerService(
MessageOutboxDeliveryService
is a clearer name.
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs
line 17 at r8 (raw file):
private readonly ILogger<MessageOutboxHandlerService> _logger = logger; protected TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(10); protected TimeSpan MessageExpiration { get; set; } = TimeSpan.FromDays(4);
This should be configurable.
Previously, ddaspit (Damien Daspit) wrote…
So the message index should be named |
Previously, ddaspit (Damien Daspit) wrote…
Done |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done |
Previously, ddaspit (Damien Daspit) wrote…
As per our previous conversations, Since we only have one client generating these (that is, for one build or engine ID, which is what we are concerned about) I believe the current solution should be acceptable. I don't see a realistic case where there will be two separate clients updating the same build synchronously (at least for any plans for the next 2 years). Therefore, I don't see a need for the added complexity. |
Previously, ddaspit (Damien Daspit) wrote…
As per the other comment, I believe that message_outbox and outbox_message could be confusing - one is really an index... |
#158
This change is