Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JetStream publish performance #450

Closed
safayatborhan opened this issue Mar 22, 2024 · 17 comments
Closed

JetStream publish performance #450

safayatborhan opened this issue Mar 22, 2024 · 17 comments

Comments

@safayatborhan
Copy link

safayatborhan commented Mar 22, 2024

Observed behavior

I have a dotnet 6 application running on nats.net. If I migrate to nats.net.v2, the performance is degrading drastically. Here is the code I am running with nats.net.v2 library:

var natsOptions = NatsOpts.Default;
natsOptions = natsOptions with { Url = "localhost" };
await using var nats = new NatsConnection(natsOptions);
var js = new NatsJSContext(nats);
var config = new StreamConfig(name: "EVENTS", subjects: new[] { "events.>" });
config.Storage = StreamConfigStorage.File;
var stream = await js.CreateStreamAsync(config);

var tasks = new List<Task>();
for (int i = 0; i < 1000; i++)
{
    var task = Task.Run(async () =>
    {
        while (true)
        {
            var sw = Stopwatch.StartNew();
            for (var i = 0; i < 2; i++)
            {
                await js.PublishAsync<object>(subject: "events.page_loaded", data: null);
                await js.PublishAsync<object>(subject: "events.mouse_clicked", data: null);
                await js.PublishAsync<object>(subject: "events.mouse_clicked", data: null);
                await js.PublishAsync<object>(subject: "events.page_loaded", data: null);
                await js.PublishAsync<object>(subject: "events.mouse_clicked", data: null);
                await js.PublishAsync<object>(subject: "events.input_focused", data: null);
            }
            Console.WriteLine($"Total time taken: {sw.Elapsed.TotalSeconds}");
        }
    });
    tasks.Add(task);
}

await Task.WhenAll(tasks);

Sample output:
image

Here is the same code I am running with nats.net library:

Options opts = ConnectionFactory.GetDefaultOptions("localhost");
ConnectionFactory connectionFactory = new ConnectionFactory();
var conn = connectionFactory.CreateConnection(opts);
IJetStream jetStream = conn.CreateJetStreamContext();
IJetStreamManagement jetStreamManagement = conn.CreateJetStreamManagementContext();

jetStreamManagement.AddStream(StreamConfiguration.Builder()
                .WithName("EVENTS")
                .WithStorageType(StorageType.File)
                .WithSubjects("events.>")
                .Build());

var tasks = new List<Task>();
for (int i = 0; i < 1000; i++)
{
    var task = Task.Run(() =>
    {
        while (true)
        {
            var sw = Stopwatch.StartNew();
            for (var i = 0; i < 2; i++)
            {
                jetStream.Publish(subject: "events.page_loaded", data: null);
                jetStream.Publish(subject: "events.mouse_clicked", data: null);
                jetStream.Publish(subject: "events.mouse_clicked", data: null);
                jetStream.Publish(subject: "events.page_loaded", data: null);
                jetStream.Publish(subject: "events.mouse_clicked", data: null);
                jetStream.Publish(subject: "events.input_focused", data: null);
            }

            Console.WriteLine($"Total time taken: {sw.Elapsed.TotalSeconds}");
        }
    });
    tasks.Add(task);
}

await Task.WhenAll(tasks);

Sample output:
image

Expected behavior

The latest library should be as fast as how it was before.

Server and client version

Nats.net 2.1.2

Host environment

No response

Steps to reproduce

Repo link, if want to reproduce.

@mtmk
Copy link
Collaborator

mtmk commented Mar 22, 2024

not sure if this is a good test. v1 doesn't actually seem to be publishing as fast:

nats stream ls
╭───────────────────────────────────────────────────────────────────────────────────╮
│                                      Streams                                      │
├──────────┬─────────────┬─────────────────────┬───────────┬─────────┬──────────────┤
│ Name     │ Description │ Created             │ Messages  │ Size    │ Last Message │
├──────────┼─────────────┼─────────────────────┼───────────┼─────────┼──────────────┤
│ EVENTSv1 │             │ 2024-03-22 12:13:22 │ 928,673   │ 45 MiB  │ 0s           │
│ EVENTSv2 │             │ 2024-03-22 12:13:22 │ 2,250,102 │ 108 MiB │ 0s           │
╰──────────┴─────────────┴─────────────────────┴───────────┴─────────┴──────────────╯

@mtmk
Copy link
Collaborator

mtmk commented Mar 22, 2024

I have a dotnet 6 application running on nats.net. If I migrate to nats.net.v2, the performance is degrading drastically. Here is the code I am running with nats.net.v2 library:

@safayatborhan, where are you seeing the performance in your real application? Could you elaborate on your use case a little more?

In your test application v1 Publish() is looking faster but overall performance doesn't show that, so a little confused about that tbh.

(Thank you for the example repo btw 💯 a lot easier for me to validate the issues)

@safayatborhan
Copy link
Author

safayatborhan commented Mar 22, 2024

I have a dotnet 6 application running on nats.net. If I migrate to nats.net.v2, the performance is degrading drastically. Here is the code I am running with nats.net.v2 library:

@safayatborhan, where are you seeing the performance in your real application? Could you elaborate on your use case a little more?

In your test application v1 Publish() is looking faster but overall performance doesn't show that, so a little confused about that tbh.

(Thank you for the example repo btw 💯 a lot easier for me to validate the issues)

Hi @mtmk,
I can see how overall performance is better for V2. In the example, we are awaiting for each message to be processed. That's why the processing time is higher for V2. Sorry for the confusion.

In actual scenario, we are trying to publish around 10000 messages/sec. Each message size is around 1kb. For processing each message the legacy NATS client (stan) took around 1.77E-05. But as it is out of life, we migrated to Jetstream and here is the data for v1 and v2 client of Jetstream:
V1: 0.0000208 approximately
V2: 0.0020981 approximately

@darkwatchuk
Copy link
Contributor

darkwatchuk commented Mar 23, 2024

I see the same magnitude of difference too. Also, starting the v1 app seems to be instant, however the v2 app takes several seconds to start.....

Edit : I believe I was testing this in the VS IDE and whilst V1 is faster there, V2 is faster outside

@safayatborhan
Copy link
Author

Hi @mtmk ,

This time I am testing by awaiting each message to be processed. Latest code has been pushed to repo. Look at the difference now:

image

After running around 5-7 minutes:
image

@mtmk
Copy link
Collaborator

mtmk commented Mar 23, 2024

I'm afraid your metric above doesn't make sense to me as meaningful comparison. I'm seeing very different results on different machines with different number of concurrent tasks. But, when tuned (for example number of tasks), overall I'm not seeing hardly any difference unfortunately. I assume you're interested in throughput and I suggest to have a look at this issue:

Having said that, I think there are improvements we can make in the request-reply pattern (which I will start investigating soon #453) but I don't think it would make a material difference to how many messages you can send per second.

edit: I can reproduce similar results as above, when run on a single core cloud machine. but when run on my desktop machine with multiple cores, results are very different:

dotnet run -c release:
V1 Total time taken: 0.4129713    V2 Total time taken: 0.1883956
V1 Total time taken: 0.4114895    V2 Total time taken: 0.1772481
V1 Total time taken: 0.4122838    V2 Total time taken: 0.1846692

after running a few seconds:
│ EVENTSV1 │             │ 2024-03-23 23:05:38 │ 588,580   │ 604 MiB │ 1.86s        │
│ EVENTSV2 │             │ 2024-03-23 23:05:38 │ 1,302,142 │ 1.3 GiB │ 1.38s        │

when compiled AOT:

                                  V2 Total time taken: 0.1448319
                                  V2 Total time taken: 0.1446241
                                  V2 Total time taken: 0.1365826

(this is running code from your repo https://github.com/safayatborhan/Memory.Test with no changes)

@mtmk mtmk changed the title nats.net.v2 is slower than nats.net JetStream Publish Perfromance Mar 23, 2024
@mtmk mtmk changed the title JetStream Publish Perfromance JetStream publish performance Mar 23, 2024
@caleblloyd
Copy link
Collaborator

@safayatborhan I also cannot recreate the latency that you are seeing. I forked your Memory.Test repo to caleblloyd/Nats.Net.V2.ConcurrencyTests and changed a few things. Tested adding latency between the program and the nats-server and V2 ran much better than V1.

What OS and hardware are you getting those results on?

@to11mtm
Copy link
Collaborator

to11mtm commented Apr 11, 2024

What OS and hardware are you getting those results on?

Also @safayatborhan if you can please confirm which version of the .NET Runtime your Net6 tests were run against, especially if it is not 6.0.6 or newer (just to be safe, this was something AkkaDotNet encountered, but was fixed in 6.0.6 and newer)

Agreed that hardware could also make a huge difference here (and may still enlighten on opportunities.) I certainly have guesses as to what could happen with a low core count/etc but am working on being better about my tangents. :)

@safayatborhan
Copy link
Author

@caleblloyd and @to11mtm ,
Thanks for your good inputs. I can still confirm it's slower in my machine.
Here is the configurations:
Jetstream server: nats-server-v2.10.10-windows-386
Hardware configuration: 11th Gen Core i7 (3 GHz), 32 GB Ram

This is surprising to me that you are getting different result.

@caleblloyd
Copy link
Collaborator

windows-386 is a 32-bit architecture, can you try running the windows-amd64 nats-server instead? It is a 64-bit architecture that will work with Intel CPUs.

@safayatborhan
Copy link
Author

@caleblloyd
I am getting similar response after changing the server to AMD64(nats-server-v2.10.14-windows-amd64). After running few min:

image

@mtmk
Copy link
Collaborator

mtmk commented May 16, 2024

@safayatborhan did you make any progress? it'd be good to get on the same page on this 😅

@darkwatchuk
Copy link
Contributor

darkwatchuk commented May 16, 2024

If it's any help, there are significant differences between running the tests inside the Visual Studio IDE vs outside, even on Release x64 code.

Inside the IDE V1 seems to always win, outside of the IDE, V2 wins.

E.g. On my 24 Core desktop... Windows 11

10 Tasks

Release - Inside of IDE

V1 Avg: 0.014, Min: 0.003, Max: 0.058, Max Threads: 18
V2 Avg: 0.031, Min: 0.002, Max: 0.072, Max Threads: 18

Release - Outside of IDE

V1 Avg: 0.006, Min: 0.001, Max: 0.067, Max Threads: 19
V2 Avg: 0.002, Min: 0.001, Max: 0.016, Max Threads: 14

50 Tasks

Release - Inside of IDE

V1 Avg: 0.074, Min: 0.004, Max: 0.184, Max Threads: 37
V2 Avg: 0.192, Min: 0.100, Max: 0.233, Max Threads: 26

Release - Outside of IDE

V1 Avg: 0.024, Min: 0.002, Max: 0.114, Max Threads: 24
V2 Avg: 0.006, Min: 0.002, Max: 0.013, Max Threads: 24

@mtmk
Copy link
Collaborator

mtmk commented May 16, 2024

thanks @darkwatchuk it defo helps 💯 so the figures are ms per message? lower is better?

I know I'm jumping the gun and going on a tangent maybe (sorry @to11mtm 😅) but I'm also not convinced the method of measuring performance as suggested above isn't producing helpful or practical results unfortunately. what are your thoughts and if you agree how should we measure it?

@darkwatchuk
Copy link
Contributor

Yes, lower is better. This was running the code from the provided repo. As @safayatborhan is running Windows, it could be that he's running and testing from Visual Studio maybe and starts accidentally in debugging mode with even with the release build. Just guessing. But certainly from what I can see it produces drastically different results and can easily give the wrong impression. Clearly the VS debugging overhead for V2 is higher for V1.

@safayatborhan
Copy link
Author

@darkwatchuk
Thanks for your valuable insights. And you are right to the fact that, I was running those test under IDE. And also I am getting similar result as yours.

@mtmk
Copy link
Collaborator

mtmk commented May 22, 2024

Thanks so much, @darkwatchuk, for getting to the bottom of this. This highlighted that there is a performance improvement we can make for request-reply. However, it probably won't help in this scenario, but nevertheless, it did help highlight that. I will close this issue now. Thanks.

@mtmk mtmk closed this as not planned Won't fix, can't repro, duplicate, stale May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants