Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullReferenceException #1950

Closed
STRATZ-Ken opened this issue Oct 26, 2020 · 29 comments
Closed

NullReferenceException #1950

STRATZ-Ken opened this issue Oct 26, 2020 · 29 comments

Comments

@STRATZ-Ken
Copy link

Description

I have a very large GraphQL API server. Upon fixing some issues with a Sentry reporting agent, I am finally getting my errors in Sentry. I recently found a NullReferenceException which is coming with little to know information.

System.NullReferenceException: Object reference not set to an instance of an object.
  File "/home/runner/work/graphql-dotnet/graphql-dotnet/src/GraphQL/Execution/ExecutionHelper.cs", line 68, col 17, in GetFieldDefinition
  File "/home/runner/work/graphql-dotnet/graphql-dotnet/src/GraphQL/Execution/ExecutionStrategy.cs", line 87, col 9, in SetSubFieldNodes
  File "/home/runner/work/graphql-dotnet/graphql-dotnet/src/GraphQL/Execution/ExecutionStrategy.cs", line 280, col 9, in CompleteNode

Steps to reproduce

I am not really sure. Due to my logging being broken for so long, I am not sure when this started. I was on the pre-release of 3.0.0, which it occurred. I just upgraded to 3.1.0 and it still occurs.

Environment

IIS
.Net Core 3.1
Entity Framework
WebSockets (GraphQL)
Windows Server 2012 R2.

@STRATZ-Ken
Copy link
Author

Please close this ticket, it was posted in the wrong GitHub Repo. I have moved this to the server side, where it belongs. You may delete as well. My mistake.

@sungam3r
Copy link
Member

From the stacktrace I see that the error belongs to this repo.

@STRATZ-Ken STRATZ-Ken reopened this Oct 27, 2020
@sungam3r
Copy link
Member

@STRATZ-Ken Could you please provide a minimum repro?

@sungam3r sungam3r added the needs investigation The described problem requires additional time to study label Oct 28, 2020
@STRATZ-Ken
Copy link
Author

@sungam3r Its really not possible. I cannot even replicate it. It seems to start happening after about an hour or two.

Some food for thought. I did make a pretty big change. I use Entity Framework with DBPoolingContext. This requires all the types to be Scoped if I do Dependency Injection. Which led me to #648 . After reading that performance could be hurt, I changed my services to Singleton, and then used the "hack" in #648 provided to get my DBContext inside each of the Types for GraphQL.

After about 12 hours, I would have a massive memory leak from using Scoped services, plus overall the performance was garbage. Since switching to Singleton about 18 hours ago, I have not had a single outage or missed API hit. My RAM is stead around 8-12GB and I am doing nearly 12,000 calls per minute.

I don't know if the above could be the issue, but it was worth noting.

@Shane32
Copy link
Member

Shane32 commented Oct 28, 2020

Just wondering:

  • which of the many "hacks" available are you using to get the db context?
  • Are you using a serial execution strategy now? Previously?
  • Did you make these changes with the upgrade to 3.0?
  • How did you access the dbcontext with the old version of graphql? No DI?
  • How many cores is used to attain this performance?
  • Are you utilizing any type of caching for document parsing?
  • Is this a published graphql API or for internal/private/controlled use (in which case the queries are probably always the same)?

@STRATZ-Ken
Copy link
Author

@Shane32

protected override IExecutionStrategy SelectExecutionStrategy(ExecutionContext context)
        {
            if (context.Operation.OperationType == OperationType.Query)
            {
                return new SerialExecutionStrategy();
            }
            return base.SelectExecutionStrategy(context);
        }
  • No, I was on 3.1 Pre-release, and the latest build, both had this issue. Even in 2.X I had this issue.
  • I used DI and fed the DBContext down the type change from the queries. Though, I was forced to have Scoped Types/Queries with the Scoped DBContext.
  • Our servers have 36 cores.
  • I am not sure exactly what this means. Can you elaborate?
  • You can find the documentation on my API here: https://api.stratz.com/graphiql/ It is public. The queries are pretty static, minus what ever parameters the user sends in.

I hope this helps.

@sungam3r
Copy link
Member

In case of memory leak you can try any memory profiler, for example JetBrains dotMemery. It has 10 or 15 days free trial period.

@STRATZ-Ken
Copy link
Author

Yeah, This is exactly what I did. I found that only calling GraphQL endpoints, there after a while there was 14million 'objects'. After diving deeper, the objects were based on GraphQL. Which led me to #648

@sungam3r
Copy link
Member

If you can already see what objects are in memory, then it will not be difficult to find the cause. If you can't, you can send a dump.

@Shane32
Copy link
Member

Shane32 commented Oct 28, 2020

Are you utilizing any type of caching for document parsing?

I am not sure exactly what this means. Can you elaborate?

The DocumentExecuter accepts a Query and/or a Document options to process a query against. If you do not provide a Document, then the Query is passed to the parser (separate repo) to be converted into an AST (Abstract Syntax Tree) Document. The Document is a read-only object which is not affected by variables or the schema, but simply and only the Query. So you can cache a query (string) and not have to convert it into a Document for similar queries. I have logic that runs like this:

  1. Create a static MemoryCache instance with a maximum size of, say, 50MB
  2. For each request, check if the Query is contained within the MemoryCache
  3. If it is not, use the parser to convert the Query into a Document. Then run a DocumentValidator against the Document with the default set of rules (CoreRules) to verify the query is valid. Store the Document in the cache with the Query as the key, the Size set to Query.Length, and a sliding expiration of an hour.
  4. Execute the query with the parsed Document, with no rules (Enumerable.Empty<IValidationRule>()).

This should save some CPU time per request if your queries are pretty repetitive, as it eliminates both parsing and validation of the incoming queries. It also reduces the number of allocations by quite a bit. If it's being hit by many different people's implementations, then you might find a wide variety of queries stored in the cache, even for similar queries. So you might need to allocate a large buffer -- but it should be effective.

Note that if you have any validation rules that inspect the incoming variables (none are included by default), then those would have to run for each document.

@STRATZ-Ken
Copy link
Author

2020-10-28 12:51:13.4817|ERROR|CaptainJackWebApi.Startup|System.NullReferenceException: Object reference not set to an instance of an object.
   at CaptainJackWebApi.Models.Helpers.ScopedObjectGraphType`1.GetService[A]() in C:\TeamCity\buildAgent\work\74a25ffdc2b87a63\CaptainJackWebApi\Models\Helpers\ScopedObjectGraphType.cs:line 18
   at CaptainJackWebApi.Models.Types.SteamAccountType.<>c__DisplayClass0_0.<.ctor>b__23(IEnumerable`1 ids) in C:\TeamCity\buildAgent\work\74a25ffdc2b87a63\CaptainJackWebApi\Models\Types\SteamAccountType.cs:line 38
   at GraphQL.DataLoader.DataLoaderContextExtensions.<>c__DisplayClass1_0`2.<WrapNonCancellableFunc>b__0(T arg, CancellationToken cancellationToken) in /home/runner/work/graphql-dotnet/graphql-dotnet/src/GraphQL/DataLoader/DataLoaderContextExtensions.cs:line 13
   at GraphQL.DataLoader.BatchDataLoader`2.FetchAsync(IEnumerable`1 list, CancellationToken cancellationToken) in /home/runner/work/graphql-dotnet/graphql-dotnet/src/GraphQL/DataLoader/BatchDataLoader.cs:line 65
   at GraphQL.DataLoader.DataLoaderPair`2.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken) in /home/runner/work/graphql-dotnet/graphql-dotnet/src/GraphQL/DataLoader/DataLoaderPair.cs:line 77
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /home/runner/work/graphql-dotnet/graphql-dotnet/src/GraphQL/Execution/ExecutionStrategy.cs:line 239|Object reference not set to an instance of an object.|CaptainJackWebApi.Startup+<>c__DisplayClass8_0.<ConfigureServices>b__25|359|<CompleteDataLoaderNodeAsync>d__8.MoveNext => ExecutionStrategy.ProcessNodeUnhandledException => <>c__DisplayClass8_0.<ConfigureServices>b__25

This is the latest error I am getting.

Line 18 is :

return _httpContextAccessor.HttpContext.RequestServices.GetRequiredService<A>();

Which is from the link above : #648 (comment)

@Shane32
Copy link
Member

Shane32 commented Oct 28, 2020

Probably because you are missing an await somewhere, or you are running an async method that does not return a Task (e.g. async void MyMethod(), and the context is now null. I had this happen a couple days ago.

@STRATZ-Ken
Copy link
Author

STRATZ-Ken commented Oct 28, 2020

var loader = accessor.Context.GetOrAddBatchLoader<long, ProSteamAccount>("ProSteamAccount", (ids) => GetService<CaptainJackContext>().FindProSteamAccount());

public static async Task<IDictionary<long, ProSteamAccount>> FindProSteamAccount(this CaptainJackContext context)
{
 // Do Work
}

@Shane32
Copy link
Member

Shane32 commented Oct 28, 2020

You do realize that with 3.0, you can just call context.RequestServices.GetRequiredService<A>() and don't need the patch you described? Further, you can write a couple-line extension method to put GetRequiredService directly on the context. This ignores the IHttpContextAccessor entirely -- however, if what I described is indeed the problem, then you'll have weird issues because Asp.Net Core sets the context properties back to null after it's dropped out of scope. It might even re-use the context, but I don't know about that. And, there's nothing wrong with the patch you're using now. It just isn't necessary anymore.

@STRATZ-Ken
Copy link
Author

@Shane32 Totally correct, but to me it looks cleaner. At least this stack trace gives a bit more information so I know exactly what call is having issues.

@Shane32
Copy link
Member

Shane32 commented Oct 28, 2020

Is the problem intermittent or reproducable?

@STRATZ-Ken
Copy link
Author

Intermittent. I cannot produce the issue locally. Though, it may be a specific command that someone globally is running, I tried 3 of 4 different types that called that object, none produced the error locally.

@Shane32
Copy link
Member

Shane32 commented Oct 28, 2020

Well, my primary suggestion is to look hard for an async that doesn't get awaited somewhere in the chain. The only other thing, which also happened to me, is if somehow the data loader accessors are singletons instead of scoped. I heavily use the dataloaders in my code and put in tons of time on them, but don't use the data loader accessors. So I could look at that again.
But my gut feeling is that it's not the problem, if you've followed the docs.

@sungam3r
Copy link
Member

If it is not, use the parser to convert the Query into a Document. Then run a DocumentValidator against the Document with the default set of rules (CoreRules) to verify the query is valid. Store the Document in the cache with the Query as the key, the Size set to Query.Length, and a sliding expiration of an hour.

Query as the key does not work itself. There are also variables and operation name so key is actually should be a tuple (query, vars, operation).

@Shane32
Copy link
Member

Shane32 commented Oct 28, 2020

Query as the key does not work itself.

Sure it does. I'm not caching results. I'm only caching the query-to-AST conversion and validation rule processing -- neither of which are based on the operation name or variables.

Note that validation rules can inspect variables, but none of the default validation rules do so. Any custom validation rule that inspects variables can be executed for each request, after the AST has been cached and it has passed the other validation rules.

@Shane32
Copy link
Member

Shane32 commented Oct 28, 2020

I have a separate caching mechanism that detects specific predefined queries that are requested without variables, and responds with a predefined response which has already been exectued, serialized and gzip/brotli compressed.

@sungam3r
Copy link
Member

Sure it does. I'm not caching results. I'm only caching the query-to-AST conversion and validation rule processing -- neither of which are based on the operation name or variable

OK. Good.

Note that validation rules can inspect variables, but none of the default validation rules do so.

Wow. Unexpectedly for me. Then it's clear.

@sungam3r
Copy link
Member

Note that validation rules can inspect variables, but none of the default validation rules do so.

@Shane32 Just a reminder from me to check this since I have an impession that it's not true.

@STRATZ-Ken Did you manage to solve your problem?

@sungam3r sungam3r added needs confirmation The problem is most likely resolved and requires verification by the author and removed needs investigation The described problem requires additional time to study labels Mar 11, 2021
@Shane32
Copy link
Member

Shane32 commented Mar 12, 2021

@Shane32 Just a reminder from me to check this since I have an impession that it's not true.

I just did a cross-reference search for ValidationContext.Inputs, which only gives a single result: in the sample InputFieldsAndArgumentsOfCorrectLength custom validation rule, which we knew about when we wrote it. Also, the same custom rule is the only implementation of IVariableVisitorProvider.

@Shane32
Copy link
Member

Shane32 commented Mar 12, 2021

Other rules may check the types of the variables, but that is stored within the document as VariableDefinitions. It is assume, from the validation rules perspective, that the variables are able to successfully parse to a valid value of the type specified within the document. Variable coercion may fail, of course, but if the document validation has passed, then any valid inputs would represent an executable and valid document.

@sungam3r
Copy link
Member

I seem to understand that I confused the validation of types with validation of values. InputFieldsAndArgumentsOfCorrectLength indeed the only rule that validates variable values. The remaining rules validates the document, and the actual values of variables can cause problems only when executing already validated document.

I think I understand my concern now. Look,

 if (saveInCache && validationResult.IsValid)
  {
      _documentCache[options.Query] = document;
  }

Validation rules can check variables so validationResult.IsValid may be false but GraphQL.Language.AST.Document does not contain variables so now DocumentExecuter does not cache the syntactically valid GraphQL document only due to the fact that the GraphQL request contains bad variables.

@sungam3r
Copy link
Member

Now tracked in #2390. Still waiting for comments from @STRATZ-Ken about initial problem.

@STRATZ-Ken
Copy link
Author

Currently, I am not having the issue. I am not exactly sure what made it go away. I do believe it has to do with how you call data through your Repo layers with Entity Framework and GraphQL. I really wish I could say more on the topic.

@sungam3r
Copy link
Member

OK. Closing for now.

@sungam3r sungam3r removed the needs confirmation The problem is most likely resolved and requires verification by the author label Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants