Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace internal serializer with utf8json #3583

Closed
wants to merge 179 commits into from

Conversation

russcam
Copy link
Contributor

@russcam russcam commented Mar 4, 2019

This PR replaces both the SimpleJson (Elasticsearch.Net) and JSON.Net (in NEST) based internal serializers with a serializer based on utf8json.

The serializer source has been included in Elasticsearch.Net to allow

  • further modification for the specific need of the client
  • mark all serializer types as internal

Research as part of #3493 demonstrates an immediate 2x performance improvement in serialization, and it is expected that this could be further improved by

  • using Span<T>, Memory<T> where possible
  • reusing ArrayPool<T>
  • cleaning and simplifying the interface between serializer and clients, now that the serializer source is included.

All unit tests pass in this PR. One test, ExtendingNestTypes, is currently skipped until a general serialization routine is in place. This PR is opened now because there are further changes coming related to 7.x, so the preference is to merge this work in now and fix the remaining failing integration tests which at point of opening are:

Failed collections:
 - ReadOnly: Tests.Aggregations.Bucket.Composite.CompositeAggregationUsageTests
 - ReadOnly: Tests.Aggregations.Metric.TopHits.TopHitsAggregationUsageTests
 - ReadOnly: Tests.Search.Request.NestedSortUsageTests
 - ReadOnly: Tests.Search.Request.SearchAfterUsageTests
 - ReadOnly: Tests.Search.Request.SortUsageTests
 - TimeSeries: Tests.XPack.Rollup.RollupJobCrudTests
 - XPack: Tests.XPack.License.GetTrialLicenseStatus.GetTrialLicenseStatusApiTests
 - XPack: Tests.XPack.License.StartTrialLicense.StartTrialLicenseApiTests
 - XPack: Tests.XPack.Migration.MigrationUpgrade.InvalidMigrationUpgradeApiTests
---Reproduce: -----
build.bat seed:51817 integrate 6.4.2 "timeseries,xpack,readonly" "compositeaggregation,tophitsaggregation,nestedsort,searchafter,sort,gettriallicensestatus,starttriallicense,invalidmigrationupgrade,rollupjobcrud"
--------
=== TEST EXECUTION SUMMARY ===
   Tests  Total: 1458, Errors: 0, Failed: 29, Skipped: 13, Time: 689.411s

russcam and others added 30 commits November 13, 2018 11:02
This commit adds an overload to MemoryStreamFactory that accepts an
index and count, to allow a stream to be created from an ArraySegment<byte>
This commit is a minimum viable prototype of using utf8json serializer to
deserializer an ISearchResponse<T>. Utf8Json unit test has been used to
test deserialization.

Notable points:

1. Hit<T> requires a custom formatter to be resolved at the
IJsonFormatterResolver level because it contains a generic type property
whose formatter, SourceFormatter<T> cannot be resolved using JsonFormatterAttribute. If
this can be resolved, then it would be possible to attribute Hit<T> with JsonFormatterAttribute
and typeof(HitFormatter<>), and have typeof(SourceFormatter<>) resolved for the _source field.

  For now instead, initialize the SourceFormatter<T> inside the HitFormatter<T> ctor.
2. Implementation does not handle different field casings
3. HitFormatter<T> avoids allocations for property names by using AutomataDictionary. This dictionary
lives outside of the generic HitFormatter<T> to avoid creating an instance for each T.
4. Both JsonReader and JsonWriter are structs passed by ref, so cannot be captured inside of local
functions or lambda expression bodies. Would need to be passed as a ref parameter. An example
is JoinFieldFormatter's Serialize method.
This commit is a minimum viable prototype of using utf8json serializer to
serialize an ISearchRequest. Utf8Json unit tests have been used to
test serialization of MatchQuery and BoolQuery.

Notable points:

1. utf8json does not have a similar concept to [JsonObject(MemberSerialization.OptIn)] to
only serialize those members that have been explicitly attributed with DataMemberAttribute.
This is something that would ideally be needed as it is cumbersome to set [IgnoreDataMember]
on all types that should be ignored.
2. ConnectionSettings is retrieved by casting IJsonFormatterResolver to a known concrete
implementation that exposes them a s a property. Not ideal, but it works.
3. utf8json does not make a distinction between an integer token and a float token as Json.NET
does. This is not much of a problem, since the bytes for the token can be inspected to determine
if they contain a decimal point, and use utf8json's internals to deserialize accordingly. Also, this
is needed only in cases where an integer/double distinction is necessary. See FuzzinessFormatter
for an example.
4. The equivalent to JsonConverter, IJsonFormatter<T> only has a generic variant. In several places
in the client, we may serialize using the an interface, but deserialize using the concrete implementation.
This is handled by ConcreteInterfaceFormatter<TConcrete, TInterface>, where the formatter
is IJsonFormatter<TInterface>. An interesting case is when the concrete type should be serialized
as the interface; in such scenarios, end up with two formatters, one for the concrete type and one
for the interface, where formatters reference the others' serialize/deserialize implementations. See
QueryContainerFormatter and QueryContainerInterfaceFormatter for examples.
This branch includes both the following PRS opened against upstream
neuecc/Utf8Json#130
neuecc/Utf8Json#131
Custom formatters are no longer required, since
neuecc/Utf8Json#130 allows a generic formatter
to be constructed for a property
Json.NET completely removed from NEST and replaced with Utf8Json with no
compilation errors.

First unit test run: Total: 3137, Errors: 0, Failed: 1155, Skipped: 12, Time: 9.655s
This commit fixes the serialization of property mappings.
IProperties is a dictionary of PropertyName -> IProperty,
where IProperty is a concrete instance of a derived property type
that implements an interface derived from IProperty. In such
a setup, each IProperty should be serialized with a generic formatter
typed to the derived interface type to ensure all properties marked with
DataMemberAttribute are serialized.
This commit updates SerializeUsingClientDefault to be generic.
When the type is boxed as an object, the catch-all
IJsonFormatter<object> will be resolved.
This commit fixes the geo unit tests, except GeoShapeQuery tests.
Doubles that are whole integer values are serialized without
the fractional value.
Remove superfluous ReadNext() call
Types are instantiated inside of an IL generated formatter,
therefore need to be made public, sicne the generated formatter
lives in a separate assembly.
Consume the JsonToken when it is a string,
as it represented auto fuzziness
This commit adds Utf8Json source to the project and removes
it as a git submodule
This commit moves Utf8Json source into the Elasticsearch.Net project and removes Utf8Json as a separate referenced project. Include license
header in every Utf8Json source file.

Remove StringEnumAttribute from Nest and use StringEnumAttribute from Elasticsearch.Net for all enums that must be serialized as strings.

Remove Utf8Json signing key.
This commit changes the lazy deserialization of LazyDocument to Hit<Document> to use
IJsonFormatterResolver, because Hit<> must be deserialized by the internal serializer
This commit internalizes all IJsonFormatter<T> types in Elasticsearch.Net and NEST.
Sign dynamically generated assemblies with the public key of Elasticsearch.Net, and
allow internals of both Elasticsearch.Net and NEST to be visible to generated assemblies.
This commit adds the package references of Utf8Json to Elasticsearch.Net nuspec file.
Also fix Elasticsearch.Net version dependency in Nest to a specific version.
This commit internalizes all the utf8json types in Elasticsearch.Net. Add InternalsVisibleToAttribute
to allow generated assemblies access to internal types.

}

[AttributeUsage(AttributeTargets.Interface, AllowMultiple = false, Inherited = false)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small summary would be good for this one since its used in our domain poco's quite a bit

Copy link
Member

@Mpdreamz Mpdreamz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, needs rebasing against 7.x.

Moving all the json converters over to formatters is a monumental effort. @russcam 🎉 🎈 🍰 💯

The changes needed against our domain POCO's and descriptors is minimal.

Several other items have been reviewed on our weekly syncs already.

It'd be good to create a follow up meta ticket post merge with known outstanding items and areas with room for improvements.

@russcam russcam changed the base branch from master to 7.x March 22, 2019 03:17
@russcam
Copy link
Contributor Author

russcam commented Mar 22, 2019

changed base branch from master to 7.x. Will resolve conflicts now

@Mpdreamz
Copy link
Member

Merged as part of #3608

@Mpdreamz Mpdreamz closed this Mar 25, 2019
@Mpdreamz Mpdreamz deleted the feature/utf8json-serializer branch March 25, 2019 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants