Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"This operation is only allowed using a successfully authenticated context"" #2593

Open
barclayadam opened this issue Aug 29, 2019 · 3 comments

Comments

@barclayadam
Copy link

commented Aug 29, 2019

Hi,

We are trying to get to the bottom of what seems like an almost random occurrence of an exception. We are unable to, at the moment, get a reliable reproduction in our own code, nor on a small repro project.

The exception happens in NpgsqlWriteBuffer, on the underlying SslStream (note we use Marten to handle our interactions with npgsql):

Marten.Exceptions.MartenCommandException: Marten Command Failure:$
select d.data, d.id, d.mt_version from public.mt_doc_mailmerges as d where CAST(d.data ->> 'Processed' as boolean) != :arg0$
$
Exception while writing to stream ---> Npgsql.NpgsqlException: Exception while writing to stream ---> System.InvalidOperationException: This operation is only allowed using a successfully authenticated context.
   at void System.Net.Security.SslState.CheckThrow(bool authSuccessCheck, bool shutdownCheck)
   at _SslStream System.Net.Security.SslState.get_SecureStream()
   at IAsyncResult System.Net.Security.SslStream.BeginWrite(byte[] buffer, int offset, int count, AsyncCallback asyncCallback, object asyncState)
   at Task System.IO.Stream.BeginEndWriteAsync(byte[] buffer, int offset, int count)+(Stream stream, ReadWriteParameters args, AsyncCallback callback, object state) => { }
   at Task<TResult> System.Threading.Tasks.TaskFactory<TResult>.FromAsyncTrim<TInstance, TArgs>(TInstance thisRef, TArgs args, Func<TInstance, TArgs, AsyncCallback, object, IAsyncResult> beginMethod, Func<TInstance, IAsyncResult, TResult> endMethod)
   at Task System.IO.Stream.BeginEndWriteAsync(byte[] buffer, int offset, int count)
   at Task System.IO.Stream.WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
   at Task System.IO.Stream.WriteAsync(byte[] buffer, int offset, int count)
   at async Task Npgsql.NpgsqlWriteBuffer.Flush(bool async)
   --- End of inner exception stack trace ---
   at async Task Npgsql.NpgsqlWriteBuffer.Flush(bool async)
   at async Task Npgsql.NpgsqlCommand.SendExecute(bool async)
   at async ValueTask<DbDataReader> Npgsql.NpgsqlCommand.ExecuteDbDataReader(CommandBehavior behavior, bool async, CancellationToken cancellationToken)
   at void Marten.Services.CommandRunnerExtensions+<>c__DisplayClass3_0<T>+<<FetchAsync>b__0>d.MoveNext()
   at void Marten.Services.ManagedConnection+<>c__DisplayClass41_0<T>+<<ExecuteAsync>b__0>d.MoveNext()
   at async Task<T> Marten.Services.ManagedConnection.ExecuteAsync<T>(NpgsqlCommand cmd, Func<NpgsqlCommand, CancellationToken, Task<T>> func, CancellationToken token)
   --- End of inner exception stack trace ---
   at void Marten.Services.ManagedConnection.handleCommandException(NpgsqlCommand cmd, Exception e)
   at async Task<T> Marten.Services.ManagedConnection.ExecuteAsync<T>(NpgsqlCommand cmd, Func<NpgsqlCommand, CancellationToken, Task<T>> func, CancellationToken token)
   at async Task<T> Marten.Services.CommandRunnerExtensions.FetchAsync<T>(IManagedConnection runner, IQueryHandler<T> handler, IIdentityMap map, QueryStatistics stats, ITenant tenant, CancellationToken token)
   at async Task<IEnumerable<MailMergeEntity>> 

The issue does not happen at any particular place in our code, it can happen at any point when a query or command is being executed against Postgres.

The only thing we have been able to determine is it will happen more often if one part the request we have been testing takes a long time when connecting to another 3rd party (i.e. disable that service and reduce overall latency of a single request by ~10s and this becomes much harder, if not impossible, to reproduce).

If we put a breakpoint logging point in Flush in NpgsqlWriteBuffer we can see that Underlying.IsAuthenticated turns from true to false and then fails. The underlying reason for the failure is the SslStream "loses" it's authentication, but I do not understand why.

Further technical details

Npgsql version: 4.0.9
PostgreSQL version: 10.9 (hosted in Azure)
Operating system: Windows 10

@roji

This comment has been minimized.

Copy link
Member

commented Aug 29, 2019

This is indeed odd.

First, have you looked at the server-side PostgreSQL logs to see if any messages are logged around the failure time? That might provide some clues.

One thought is that perhaps the connection is simply lost/broken, and this is manifesting as an SSL/TLS issue (although it really isn't). If so, depending on the root cause you might be able to make it go away by turning on keepalive.

Finally, the only thing this brings to mind is the infamous SSL renegotiation feature, where the SSL stack would periodically initiate renegotiations. This feature has been disabled in PostgreSQL for quite a while, but can you please post exactly which server of PostgreSQL you're connecting to?

@barclayadam

This comment has been minimized.

Copy link
Author

commented Sep 2, 2019

I've not looked at the PostgresSQL logs to help track this down, and TBH without looking not sure how much access Azure gives for the managed instances.

I do not have a reproduction, but I do believe I know what the issue was (at least sort of). We grabbed our connections from StructureMap, with most of the connections being managed by a nested container that meant at the end of each request it would be disposed.

We had one instance in an attribute that used a ServiceLocator, which was wired with the root container and therefore did not have the correct disposal semantics.

The "3rd party taking a long time" bit was a red herring. We disabled the component, which happened to also remove the problematic instantiation and non-explicit disposal of a connection.

So, in short, this seems to manifest if you have connections being opened and closed correctly + a handful NOT being disposed of correctly (although I believe they would generally have been cleared up after a period of time as they were IDLE).

So whilst this was caused by poor usage of the API, the manifestation of it is very odd and definitely does not lead you down the right path. Not sure if this is something that could be improved on in the library? Either handling the exception and throwing a more detailed one with possible reasons, or by handling the connections that are not disposed of explictly differently?

BTW the Postgres server is an Azure managed instance, with the version string of PostgresSQL 10.9, Compiled by Visual C++ build 1800, 64-bit

@roji

This comment has been minimized.

Copy link
Member

commented Sep 2, 2019

@barclayadam thanks for investigating and reporting back, good to hear it wasn't an issue in the driver itself.

Now that you understand exactly what happened, any chance you could post a small code sample which shows exactly how to reproduce this? Npgsql definitely does have some safeguards against incorrect user usage, but it's far from perfect and can indeed be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.