New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance hit in NEST 6.x #3236
Comments
Thanks for opening @kaaresylow. I'm keen to dive into this but I'm unlikely to get around to it until next week. |
@kaaresylow I've put together a benchmark of NEST 5.6.2 against 6.1.0, using an The test is pretty simple using System;
using System.Collections.Generic;
using System.IO;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Environments;
using BenchmarkDotNet.Horology;
using BenchmarkDotNet.Jobs;
using Elasticsearch.Net562;
namespace NestVersionBenchmark.Benchmarks
{
[Config(typeof(Config))]
public class GithubIssue3236
{
private class Config : ManualConfig
{
public Config()
{
Add(
Job.Default
.With(Platform.AnyCpu)
.With(Jit.Default));
Add(
Job.Default
.With(Platform.AnyCpu)
.With(Jit.RyuJit));
Add(MemoryDiagnoser.Default);
}
}
private readonly Nest562.ElasticClient _client562;
private readonly Nest610.ElasticClient _client610;
public GithubIssue3236()
{
var bytes = GetResponseBytes();
var settings562 = new Nest562.ConnectionSettings(new SingleNodeConnectionPool(new Uri("http://localhost:9200")), new InMemoryConnection(bytes));
_client562 = new Nest562.ElasticClient(settings562);
var settings610 = new Nest610.ConnectionSettings(new Elasticsearch.Net610.SingleNodeConnectionPool(new Uri("http://localhost:9200")), new Elasticsearch.Net610.InMemoryConnection(bytes));
_client610 = new Nest610.ElasticClient(settings610);
}
[Benchmark]
public Nest562.IGetResponse<Document> Get562()
{
return _client562.Get<Document>(1, idx => idx
.Index("documents")
.Type("documentmodel"));
}
[Benchmark]
public Nest610.IGetResponse<Document> Get610()
{
return _client610.Get<Document>(1, idx => idx
.Index("documents")
.Type("documentmodel"));
}
public class Document
{
public IEnumerable<PlaceBoundary> Boundaries { get; set; }
}
public class PlaceBoundary
{
public string Type { get; set; }
public GeoJsonBoundary Boundary { get; set; }
}
public class GeoJsonBoundary
{
public string Type { get; set; }
public object Coordinates { get; set; }
public object Bbox { get; set; }
public object Crs { get; set; }
}
private static byte[] GetResponseBytes()
{
using (var stream = typeof(GithubIssue3236).Assembly.GetManifestResourceStream("NestVersionBenchmark.Benchmarks.GithubIssue3236.json"))
using (var memoryStream = new MemoryStream())
{
stream.CopyTo(memoryStream);
return memoryStream.ToArray();
}
}
}
} On the machine I'm currently on, the results are BenchmarkDotNet=v0.10.14.585-nightly, OS=Windows 10.0.16299.371 (1709/FallCreatorsUpdate/Redstone3)
Intel Core i7-2920XM CPU 2.50GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
Frequency=2433513 Hz, Resolution=410.9286 ns, Timer=TSC
[Host] : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2633.0
Job-HASSSB : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 64bit LegacyJIT/clrjit-v4.7.2633.0;compatjit-v4.7.2633.0
Job-IYUISG : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2633.0
Platform=AnyCpu
5.6.2 and 6.1.0 benchmarks look very similar. Would you be able to share further information about
|
@russcam thanks for looking into this.
The only difference between the two tests are the version of the NEST library |
Have a chance to look at this again now @kaaresylow. Would you be able to re-run your benchmarks with NEST 6.2.0? There were several performance improvements made that show it to be faster than 5.x. |
We recently upgraded to 6.3.1 from 2.x and took a significant performance hit on the clients. I have been profiling our system and my findings might be relevant to the issue above.
Here is the part of the call tree from my profiling that covers the ReadJson method:
As I understand it, the Deserialize method does the real work, the remaining three are all about creating the stream to be parsed into Deserialize. When creating the stream the JSON is parsed and it is parsed again in the Deserialize method, so the JSON is actually parsed twice. If there was a way to parse the JsonReader directly into the Deserialize method we can cut it down to one parse and cut the cost of the ReadJson method by approx. 50%. I have made an experiment in a fork of NEST that confirms this. To do the experiment I added a new interface to NEST and extended the ReadJson method like this:
I then implemented the IDeserializeFromJsonReader interface in my SourceSerializer and ran a new profiling session. The ReadJson method then looked like this in the call tree:
From a performance point of view there is a lot to win by adding an interface like IDeserializeFromJsonReader, but I'm unsure if it will cause other problems... I'm happy to create a PR with the above changes plus an implementation in the InternalSerializer, if you're interested. |
@fritsduus Thank you for the detailed information. There is a cost associated with using One of the reasons for internalizing Json.NET in the client is so that it can be replaced with a faster serializer implementation going forward. As such, we wouldn't want to introduce APIs that expose Json.NET specific implementation details such as a We expect to start on the serializer work shortly, and have been looking at SpanJson and Utf8Json as potential candidates. |
@russcam Thank you for your response. I think your long term goal of replacing Json.NET with a faster serializer is very good. I was also looking at creating a SourceSerializer using Jil or Utf8Json, but then I realized the internal cost of the SourceConverter which couldn't really be removed by such a solution. I don't see a simple way to pass the Stream at the moment. It looks like a larger task to replace the serializer. I was aiming for a quick win with the suggested solution as we are in need of a solution to our problem before Black Friday. |
We're using Utf8Json as a serializer/deserializer for NEST client right now with great success (in our case, we saw up to 10 times improved performance in deserialization for our workload). |
Just a quick heads up that we expect to switch to Utf8Json as the default internal serializer with 7.0. One nice thing that @russcam found is that the hand off of the stream to the user defined serializer is now much much faster. |
@DSilence can you please provide an example of how you had used Utf8Json as a serializer/deserializer for NEST? |
@florin-nedelcu In our case, only a couple of heavy search requests required a more performant deserialization, so for them I've used a low level client. var searchResult = await _elasticClient.LowLevel.SearchAsync<BytesResponse>(indexName, PostData.Serializable(searchDescriptor));
var result = Utf8Json.JsonSerializer.Deserialize<ElasticPointResponse>(searchResult.Body); If there is a need for full replacement I suggest referring to https://www.elastic.co/guide/en/elasticsearch/client/net-api/master/custom-serialization.html. Keep in mind that some internal types will require customized formatters for UTF8JSON (e.g. geography). |
@Mpdreamz when is 7.0 planned for release? |
@SidHuntt As a policy, we don't give timelines for when releases will be out, I'm afraid. There's an open PR to switch over to utf8json #3583 which we are looking to merge in along with other changes necessary for 7.0, so that we can publish a prerelease version of the client for 7.0, with the new serializer, to receive feedback on. We're hoping to do that shortly. |
7.0 now uses Utf8Json as the serialiser, as detailed on the GA release blog post: https://www.elastic.co/blog/nest-and-elasticsearch-net-7-0-now-ga |
NEST/Elasticsearch.Net version:
5.6.2
6.1.0
Elasticsearch version:
5.5.1
6.2.3
Description of the problem including expected versus actual behavior:
In the process of upgrading our Elasticsearch setup from version 5.x to 6.x we have noticed a performance impact that translates into slower response time in our service that uses Elasticsearch as data storage. The slower response is mostly visible when dealing with large documents with complex structures e.g. geojson.
The two Elasticsearch clusters performs equally when we are testing them directly via Apache ab so I guess the performance hit is related to the NEST library.
I have done some performance testing using BenchmarkDotNet (on a simplified implementation described further down) where NEST v. 6 has a noticeable higher response compared to v. 5.
I hope you can shed some light on why we are seeing this behaviour.
Thanks
Steps to reproduce:
Insert document https://gist.github.com/kaaresylow/ece0d2dd88175653fb4fa44710297d14
The text was updated successfully, but these errors were encountered: