Merge cached validation #133

wravery · 2020-12-14T05:34:42Z

Fixes #126

This branch incorporates most of the changes from PR #126. Rather than building a separate schema structure specifically for validation, it generates an optimized representation of the schema which is shared between Validation and Introspection. There's still an option to disable Introspection in the schemagen tool with --no-introspection, but that's mostly for cases where you want to disable Introspection and not just to get the performance benefit. The overhead of re-enabling Introspection on top of the shared schema is minimal.

There's more to do to optimize allocations in the peg::graphql grammar and parsing, but I think we'll target a separate PR for that.

Passing --no-introspection to schemagen will suppress all Introspection support in the generated files: - no __schema and __type resolvers - no AddTypesToSchema

…cppgraphqlgen into merge-cached-validation

In our internal usages, there is no need to allocate the string, keep only the string_view to the operationType. Keep the old function, but mark as deprected.

barbieri · 2020-12-14T11:54:32Z

src/GraphQLResponse.cpp

+			MapData copy {};
+
+			copy.map.reserve(other.size());
+			for (const auto& entry : other)


did you test if this is triggering and how frequently?

In my PR I did some tests and found out that it wasn't needed if we Copy-on-Write, that's why I had a shared_ptr<> for the internal data, then copy became just a reference. During modification I did a unique() and if false I'd copy there, on demand (search for willModify).

After I did that, it turns that no copies were being done.

I suspect that switching to a result type eliminated all of the use cases for copying. Maybe it can be a move-only type now, I'll try deleting the copy constructor/assignment operator and see.

Ah, no. It does get copied. It's used by variables and directives, which both get passed around as a Map, but neither of those is used in the sample for the benchmark.

do you think these needs to be copied to the resolvers? Is there any use case where these would be changed?

Anyway, as you can see in my PR, the COW is pretty simple to do. Just use a shared_ptr and check if unique() before doing any modification (ex: emplace_back()). If so, then you create a copy before modifying.

barbieri · 2020-12-14T11:55:08Z

src/GraphQLResponse.cpp

+			ListType copy {};
+
+			copy.reserve(other.size());
+			for (size_t i = 0; i < other.size(); ++i)


same, I did share_ptr this one, COW and at the end, never copied.

barbieri · 2020-12-14T11:57:04Z

src/GraphQLResponse.cpp

+		}
+
+		case Type::String:
+			_data = { StringData { other.get<StringType>(), other.maybe_enum() } };


also here did the COW

barbieri · 2020-12-14T12:07:35Z

include/graphqlservice/GraphQLService.h


-				wrappedParams.errorPath.push(size_t { 0 });
+				children.reserve(wrappedResult.size());
+				wrappedParams.errorPath.push_back(size_t { 0 });


this part you should avoid, you modify the list, then force the copy of the modified (ResolverParams(wrappedParams)), which is confusing and code become uglier.

Take a look at my PR, there I just keep a reference to the previous SelectionSetParams, then errorPath is not even a list/queue/vector anymore and it becomes copy-free, each SelectionSetParams is a node in an implicit linked list. You just pay the price on errors, which we expect to be not the common case.

Also the code becomes simpler, no need to push, set, reset... the index

I have been experimenting with a version of this along with the mempool in https://github.com/wravery/cppgraphqlgen/tree/parser-mempool. The thing I don't particularly like about putting the link in the SelectionSetParams is that there are places in client code (e.g. the EdgeContraints template in TodayMock) which use move semantics on just the errorPath, and then it outlives the SelectionSetParams. So if it's relying on a reference to the parent SelectionSetParams struct, or even a reference to another field_path, the memory might have been freed by the time it tries to dereference the moved copy. It's not trivially relocatable from a single node, it needs to do a deep copy or otherwise keep a separate allocation alive for the parent.

To get around this I tried wrapping the segments in a std::shared_ptr and building a linked list from each segment to its parent. That was a little slower, and it used more memory. Most recently I tried allocating them on the stack and only copying the parent into a shared_ptr if a child is allocated on the stack. That seems slightly better, but it's still hitting std::shared_ptr quite a bit.

I think we're paying a huge price to cope with this EdgeConstraints usage...

Actually, why do we need this to be handled by that class? Shouldn't the class just throw? I recall some parts of the code detected there was no path/location information and would properly enhance the errors before copying them.

shared_ptr doesn't seem like the way to go, in particular if we keep changing the errorPath as currently done by the list converter (something I'd also avoid).

I've been worried about breaking backwards compatibility, figuring that existing external code might have implemented this like the sample. But I went back and checked the history for the sample, and I already modified it to add this reference in a minor version update (https://github.com/microsoft/cppgraphqlgen/releases/tag/v3.1.0). So maybe that's enough of a precedent to change this and drop the new code from EdgeConstraints. As you said, it should be able to just throw the exception without decoration and let the shared resolver code fill it in.

It should noticeably break the build though if it's still used this way, since otherwise users will get a runtime error and crash.

barbieri · 2020-12-14T12:09:03Z

include/graphqlservice/GraphQLService.h

+							for (auto& error : value.errors)
 							{
-								data.emplace_back(std::move(entry.second));
+								document.errors.push_back(std::move(error));


since you're changing the types, worth looking into making errors a list, so you can splice(), which is simpler (1 line) and cheaper (although errors are not the hot path).

For large enough nodes, a std::list may be worth it. But otherwise you do pay a premium in memory usage per node to store the extra next/prev link pointers. It's also bad from a data locality/cache perspective since every node is allocated separately, so I generally avoid using them. Some more background here: https://www.stroustrup.com/bs_faq.html#list.

yeah, I get the difference between list/vector and even variations such as inlist (inline lists, where each element is a linked list node per se).

Really, the wasted amount or locality does not matter for handling errors, even more compared to the current situation where we're creating a new array and moving everything -- or before, where those were actually response::Value maps, with the map + array, strings getting copied. We already improved a lot, the move to list/splice would basically save a minimum while making the code simper (no more reserve/loop/...).

That's true, propagating errors happens a lot more than the final packaging into a response::Value, and std::list::splice can do that in constant time.

barbieri · 2020-12-14T12:14:04Z

include/Validation.h

 };

-using ValidateTypeKinds = std::map<std::string_view, introspection::TypeKind>;
+using ValidateTypes = std::map<std::string_view, ValidateType>;


any reason to keep using map instead of unordered_map?

I'm acutally thinking of replacing them all with sorted vectors: https://youtu.be/fHNmRkzxHWs?t=2818

The key with vectors is to avoid reallocating on append by reserving or resizing to the exact size you need, and not inserting/removing other than at the end. But for most of these objects that should be doable, they're static for the lifetime of the service.

if you're moving to sorted and looking into static... why not target array? There is a generator tool to help that, then we can try to get more stuff into .ro-data section of the binaries.

There is no need to get the vector helpers.

The binary search is not as fast as an unordered map, but I think it doesn't matter much for these use cases. If so we can look into integrating https://www.gnu.org/software/gperf/manual/gperf.html or an equivalent later.

if you're moving to sorted and looking into static... why not target array? There is a generator tool to help that, then we can try to get more stuff into .ro-data section of the binaries.

Most of them still have a size which varies, so it couldn't be built into the graphqlservice library directly. The generated code would need to pass a span/range into graphqlservice to get dynamic sizing backed by std::array with a fixed size in the generated code.

C++20 adds span/range, so that would make sense whenever cppgraphqlgen moves to that language version (I'm tentatively planning on doing that for v4.0). There's also the gsl (C++ Core Guideline Support Library) which adds lookalike definitions of std::span<>, so that's another option that makes C++20 adoption easier down the road. But I'm reluctant to add another build dependency unless/until it's absolutely worth it.

The binary search is not as fast as an unordered map, but I think it doesn't matter much for these use cases. If so we can look into integrating https://www.gnu.org/software/gperf/manual/gperf.html or an equivalent later.

I'm not so worried about the quality of the std::hash implementation for std::string_view, and integers/pointers can just hash to their own value.

The multiple levels of indirection and poor data locality (cache misses) in std::unordered_map overwhelm the savings on lookup compared to a binary search, until you get to a very large number of items. Anecdotal benchmarks I've heard say you need at least 100 entries to make std::unordered_map break even, but really we should do some of our own benchmarking/profiling.

You mentioned you are dealing with large numbers of fields per object, so maybe you're more likely to hit that break even point than I expected. If you want to try comparing them for your scenario, take a look at fa772a1 where I already did this for the ResolverMap generated for Object. If you locally change where it declares the ResolverMap type as a std::vector<std::pair<>> back to std::unordered_map and change the lookup in SelectionVisitor::visitField back to a call to find, you can see if/what difference it makes.

barbieri · 2020-12-14T12:57:21Z

@wravery thanks for merging this, it did improve a lot, however my original PR was producing something reasonably better.

Massif comparing master (cde90d7) and my PR (a142c6c) shows:

Code	Largest	Smallest
`cde90d7`	156,328	145,808
`a142c6c`	122,600	106,664
Difference	+27.5%	+36.7%

DHAT also points out to big differences:

Code	t-gmax	t-end
`cde90d7`	7,377,431	699,095,217
`a142c6c`	6,307,022	312,019,366
Difference	+16.9%	+124%

Okay, the "t-end" is not that meaningful, but the total allocated at peak is.

DHAT points out that lots of your memory allocation is at field_path allocation, in particular GraphQLService.h:139 leads to errorPath. Take a look at my commits https://github.com/profusion/cppgraphqlgen/commit/a142c6c9035ee46123127e9ad0764614952c9055, https://github.com/profusion/cppgraphqlgen/commit/7eb32e49c67afab87ab71b1c8f0d240da3a68b4e and https://github.com/profusion/cppgraphqlgen/commit/25afd869f8b15caf2ecc406861a5787cce745678

I'd also point out to the conversion optimizations, it's clean and faster: https://github.com/profusion/cppgraphqlgen/commit/cb52a63bfcb7c73bdc000eed4e063d803e3a83f7 and https://github.com/profusion/cppgraphqlgen/commit/7d746b594f2696e9b215990c7c486e2475272f84 (this last one is more important to people using std::launch::async as we're still doing).

wravery and others added 19 commits December 10, 2020 15:49

Add a benchmark based on sample

f7b7ed8

schemagen allow introspection to be disabled

3956732

Passing --no-introspection to schemagen will suppress all Introspection support in the generated files: - no __schema and __type resolvers - no AddTypesToSchema

Skip validation for now without Introspection

91249d5

Generate a compact schema

4bf0b78

Remove uneeded forward declares

88ed5c6

Make a weak singleton GetSchema function

8eaf8c2

Not passing tests yet, but refactor is done

3dd30f0

Fixed the rest of the tests

bf1221c

Split out graphqlservice_nointrospection

f9d584c

Export schema accessors for introspection in separate DLL

5fb0f42

Cherry-pick new unit test to pin behavior

b8fdb0f

Optimize response::Value to avoid memory copies

5e1a4ff

Merge branch 'merge-cached-validation' of https://github.com/wravery/…

64f2dae

…cppgraphqlgen into merge-cached-validation

Fix order of template specialization defintions

2586c89

optimize Request::findOperationDefinition()

9184e91

In our internal usages, there is no need to allocate the string, keep only the string_view to the operationType. Keep the old function, but mark as deprected.

Only call validate from findOperationDefinition

b1b734b

Store results in ResolverResult until final return

fde76cb

Skip descriptions for --no-introspection

d0c8ed4

Map directly to types instead of names

b88ba8e

wravery merged commit cde90d7 into microsoft:master Dec 14, 2020

wravery deleted the merge-cached-validation branch December 14, 2020 06:09

barbieri reviewed Dec 14, 2020

View reviewed changes

Merge cached validation #133

Merge cached validation #133

Uh oh!

Conversation

wravery commented Dec 14, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

barbieri commented Dec 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants