-
We got hit by the issue with failing cluster transactions on client failover (reported and fixed here: #15700). This still occurs when multiple threads are processing the documents. When the used database node is brought down, some threads fail with the message like I can't provide a neat unit test, but the testing program below reproduces the issue pretty reliably. Steps to reproduce:
Also, sometimes, a task fails with an exception "A task was canceled". This looks like the failover didn't work properly -- is this expected? Reproduction code: using NLog;
using Raven.Client.Documents;
using Raven.Client.Documents.Session;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace RavenDB5Tests.Tryouts
{
internal class ClusterWideTransactionsConcurrent
{
private static Logger _logger = LogManager.GetCurrentClassLogger();
public static void ClusterWideTransactionsConcurrentTest()
{
using (var store = new DocumentStore
{
Urls = new[] { "http://localhost:8080", "http://localhost:8081", "http://localhost:8082" },
Database = "ClusterWideTransactionsConcurrentTest",
}.Initialize())
{
var tasks = new Dictionary<string, Task>();
for (int i = 0; i < 5; i++)
{
var taskIdent = "task-" + i;
tasks[taskIdent] = Task.Run(() => ProcessDocument(store, taskIdent))
.ContinueWith(task =>
{
_logger.Warn($"{taskIdent} finished.");
if (task.Exception != null)
{
_logger.Error(task.Exception);
}
});
}
var tasksValues = tasks.Values.ToArray();
_logger.Info("Waiting for tasks to finish.");
Task.WaitAll(tasksValues);
_logger.Info("Tasks finished.");
}
}
private static async Task ProcessDocument(IDocumentStore store, string taskId)
{
var ident = Guid.NewGuid().ToString();
_logger.Info($"[{taskId}] Creating doc {ident}.");
using (var session = store.OpenAsyncSession(new SessionOptions { TransactionMode = TransactionMode.ClusterWide }))
{
var doc = new Doc
{
Id = "doc-" + ident,
};
await session.StoreAsync(doc);
await session.SaveChangesAsync();
}
var rnd = new Random();
for (int i = 0; i < 1000; i++)
{
_logger.Info($"[{taskId}] {i} Processing doc {ident}.");
using (var session = store.OpenAsyncSession(new SessionOptions { TransactionMode = TransactionMode.ClusterWide }))
{
var doc = await session.LoadAsync<Doc>("doc-" + ident);
doc.Progress = i;
await Task.Delay(1000 + rnd.Next(0, 250));
await session.StoreAsync(doc);
await session.SaveChangesAsync();
}
}
}
public class Doc
{
public string Id { get; set; }
public int Progress { get; set; }
}
}
} |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
Assume that the failure happens when the transaction has been accepted to the cluster, but not confirmed by the client. The error you list here can only happen on the initial creation, which is very strange and unlikely. Here is a stand alone test, which I was able to reproduce the error you are seeing.
|
Beta Was this translation helpful? Give feedback.
-
FYI, this is handled here: #16701 |
Beta Was this translation helpful? Give feedback.
-
Also #17181 and possibly #17082. See #17490 (comment). |
Beta Was this translation helpful? Give feedback.
I didn't try using
TestDriver
, basically, we have facilities in place to test clusters, etc that make doing this a lot easier.The test itself doesn't fail consistently, so we'll need to investigate, I reproduced the concurrency issue.
Will probably take a few days to resolve properly. Once we have that, I'll look into the cancelled exception