-
-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running multiple Bus tests causes error code 3 #151
Comments
Similar issue with req/rep taken in parallel. |
Ditto survey. |
So I'm not sure what "error code 3" is, as you're not running my verbatim test programs. (NNG error code 3 means "invalid", which can be many different things.) Having said that, some of these tests make use of "fixed" (as in hard-coded) addresses and will definitely not play well with other instances of the same test running. In some cases, even other instances of different tests won't play well, unless parameters are supplied to the test to select non-conflicting port numbers. (With IPC we don't have port numbers, but we use numeric values to give each test a unique address.) I'm fairly confident that any poor interaction between different test instances is just this address conflict. (Inproc should be 100% free of these though.) |
My tests are not using hard coded addresses. Each individual unit test is calling for a different address, with the strategy being this: internal static class PortExtensionMethods
{
private const int MinPort = 10000;
private const int MaxPort = 10999;
private static int? _port;
private static int GetPort(int delta)
{
_port = _port ?? 0;
_port += delta;
// ReSharper disable once PossibleInvalidOperationException
return (_port = Max(MinPort, Min(_port.Value, MaxPort))).Value;
}
public static string WithPort(this string addr, int delta = 1)
{
return $"{addr}:{GetPort(delta)}";
}
public static string BuildAddress(this SocketAddressFamily family)
{
var uuid = Guid.NewGuid();
// ReSharper disable once SwitchStatementMissingSomeCases
switch (family)
{
case InProcess:
return $"inproc://{uuid}";
case InterProcess:
return $"ipc://pipe/{uuid}";
case IPv4:
return $"tcp://127.0.0.1".WithPort();
case IPv6:
return $"tcp://[::1]".WithPort();
case Unspecified:
default:
throw new ArgumentException($"Address invalid for family '{family}'", nameof(family));
}
}
} IPC and InProc both utilize a UUID for uniqueness. Where as TCP IPv4 and IPv6 both utilize a different port. Which address is subsequently used for both listen and dial protocol operations. |
At this point I don't have any ideas. I will write a test program showing multiple concurrent uses just to prove that it isn't nng. Failing that, I'd need your code -- I still feel pretty strongly that somehow or other address reuse is to blame. There are some relatively straight-forward ways to verify this, but they do involve either adding printf's in the code, or looking at it with a debugger. |
I'm not buying it. You can read the code here for yourself. Since posting this draft addressing strategy, I updated it slightly to account for each test fixture. So each test fixture gets its own range from a unique base, etc. Even the InProcess stuff is unique, based on UUID with every iteration:
But which even the InProcess stuff falls over when run concurrently with the IPv4 and IPv6. All of the default tests are InProc, consistent with your C tests, and my C++ tests. The only difference is the Test Address from InProc to IPv4 to IPv6 Bus test. So methinks there is something, perhaps SOCKADDR confusion, going on. Of course, I can't say for sure, and I don't want to white box test your code line for line. |
I doubt there's SOCKADDR confusion happening. These aren't used for IPC, and unfortunately the error cases I've received from you are too "vague" from the point of nng for me to reasonably know what the heck is going on. I'm wondering if there is something going on that I don't understand -- for example you indicated that inproc tests fail when run in parallel... that makes me thing that maybe I'm misunderstanding something fairly fundamental in the platform. For example, are the tests actually run in separate processes, or do they share address spaces? I'd have though you had separate .exe files and were running each one separately, but if that's not the case then a bunch of other things may be called into doubt. (This would be reflective of my general ignorance about C# and .NET.) |
Generally speaking, on most platforms at least, nng tries to provide a stack backtrace when it aborts in panic. Having that information would be far less "vague" than just "error code 3." |
I'm not sure what else I can tell you there. I'm just copying what has been reported to me to the clipboard. I could be wrong, but I seem to recall that the VS test runner is a multi-process runner. It can be "trained" to run sequentially, but I'm not sure what that helps by doing so. I pushed my repo this morning, at your convenience. |
Respectfully, you need to drop this "verbatim" expectation. That's unrealistic, considering these efforts, and efforts like it, will be wrappers. Heck, if you don't want it wrapped, just say so. But I think NNG would be helped by the exposure. As long as the calling conventions are correct, and so far, I think I've demonstrated far and away that they are, or corrected when they weren't, what's left is usage, however awkward or clumsy it may be. |
@gdamore So I was curious, was it multi process. No, it is not. And I will prove it. However, the test runner is multi threaded. You said it should be re-entrant, but obviously, it is not. I don't know what that means to you, or for NNG, but it is what it is as far as I'm concerned. I think it is unrealistic to expect the host application not be at least threaded.
|
I will look at this more closely. Running in a single process will break
certain expectations that the tests make. (nng is threadsafe. but the tests
themselves were not designed with this understanding. )
For example nng_fini is something that should only be called at program
shutdown. It will disrupt everything else. By design. It is a global
cleanup and that function does have documented caveats. Most programs do
not need to call it but I do in my tests to facilitate memory leak testing
with valgrind.
Another example is the use of inproc addresses. I expect that I can freely
avoid worrying about conflicts with them so my tests are kind of cavalier.
But in the a single process environment like you describe much more care is
required.
With respect to verbatim tests - during development (and lets face it nng
is quite young) I hope that bug reports can include information like stack
backtraces that will help me diagnose.
Honestly maybe your efforts to develop these wrappers are premature.
Particularly given your apparent lack of comfort with the underlying C. I
would hope that most folks working to build language bindings would have
enough comfort to do a little whitebox analysis while doing wrapper
development. Of course this is harder given the lack of adequate nng docs
and the general state of flux in the code itself.
I do think these other language bindings are useful and I have tried to be
supportive of your work. But it is also exhausting somewhat to have to
write test programs to verify or disprove every hypothesis you put forward
and our experience together thus far has shown me that it is unwise to take
your statements at face value if they conflict with my own understanding.
This very issue of single process vs multiprocess test execution is a
perfect example.
While nng is in development it is important that bugs be backed by the
simplest reproducible test case. Those test cases should ideally be in C to
avoid possible confusion caused by runtimes or bugs in application or
wrapper code. Asking me to debug in runtimes and languages with which I
have no experience is incredibly consuming and distracts from the work I
need to do to finish nng proper.
…On Sun, Nov 5, 2017 at 4:32 AM Michael W Powell ***@***.***> wrote:
@gdamore <https://github.com/gdamore> So I was curious, was it multi
process. No, it is not. And I will prove it. However, the test runner *is*
multi threaded. You said it *should* be re-entrant, but obviously, it is
not. I don't know what that means to you, or for NNG, but it is what it is
as far as I'm concerned. I think it is unrealistic to expect the host
application not be at least threaded.
BusTests (3 tests) [0:07.035] Aborted
That_Bus1_delivers_message_to_both_Bus2_and_Bus3 [0:00.000] Aborted
That_Bus2_delivers_message_to_Bus1_and_Bus3_times_out [0:03.634] Inconclusive: Test not run
That_default_socket_correct [0:03.401] Inconclusive: Test not run
Current process Id: 10308
Managed thread Id: 13
Running protocol tests for address family 'IPv6'.
Testing using address 'tcp://[::1]:10352'.
Given three 'Nanomsg2.Sharp.Protocols.Bus.LatestBusSocket' instances.
IPv4BusTests (3 tests) [0:21.820] Aborted
That_Bus1_delivers_message_to_both_Bus2_and_Bus3 [0:00.000] Aborted
That_Bus2_delivers_message_to_Bus1_and_Bus3_times_out [0:11.026] Inconclusive: Test not run
That_default_socket_correct [0:10.793] Inconclusive: Test not run
Current process Id: 10308
Managed thread Id: 12
Running protocol tests for address family 'IPv4'.
Testing using address 'tcp://127.0.0.1:10368'.
Given three 'Nanomsg2.Sharp.Protocols.Bus.LatestBusSocket' instances.
IPv6BusTests (3 tests) [0:42.819] Aborted
That_Bus1_delivers_message_to_both_Bus2_and_Bus3 [0:00.000] Aborted
That_Bus2_delivers_message_to_Bus1_and_Bus3_times_out [0:21.409] Inconclusive: Test not run
That_default_socket_correct [0:21.410] Inconclusive: Test not run
Current process Id: 10308
Managed thread Id: 9
Running protocol tests for address family 'InProcess'.
Testing using address 'inproc://037efff8-2b92-417e-a14c-62736239e01d'.
Given three 'Nanomsg2.Sharp.Protocols.Bus.LatestBusSocket' instances.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#151 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABPDffwWIJZHvmo2BAOHJG59R9ZSjMvAks5szarPgaJpZM4QSL8D>
.
|
I documented, it's definitely not multi-process. It is multi-thread. |
As for simplest repro case, that's up to you to figure out. My suggestions would be to include these elements: multiple-transports involved, definitely multi-threading involved. You may even be able to reproduce using the same boilerplate unit testing. As a back drop, I have not diverged into brand new unit tests. I am using the same patterns for testing, the same sequence of operations, etc, that you have done in the C code, and that I modeled in the C++ code, in the C# code. So there shouldn't be any great mysteries there. |
You are not using the same patterns for testing. The very issue of being
concurrent in a single process is HUGELY different.
Given your unwillingness to work in C (at least to help with diagnosis and
building reproduction cases for your bugs in C) I am going to suggest that
maybe it would be better to put this work aside for now.
In general you are very quick to assume that the problem is in my code
rather than in yours, and you have made many disparaging comments about
things you simply do not understand (header file inclusion, endian safe
byte swaps, and exposure of message headers to allow for zerocopy (and raw
mode sockets too but I didn’t get into that).
This has not been a great collaboration, and to be honest I simply do not
have the cycles to continue as we have been.
Later we can reassess once others have gained some more experience with
nng. I hope to have time at some point to look more closely at your work
anyway.
I will in the meantime write a multithreaded mulitprototocol and multi
transport test because I think it will be useful to have that.
…On Sun, Nov 5, 2017 at 8:48 AM Michael W Powell ***@***.***> wrote:
As for simplest repro case, that's up to you to figure out. My suggestions
would be to include these elements: multiple-transports involved,
definitely multi-threading involved. You may even be able to reproduce
using the same boilerplate unit testing.
As a back drop, I have not diverged into brand new unit tests. I am using
the same patterns for testing, the same sequence of operations, etc, that
you have done in the C code, and that I modeled in the C++ code, in the C#
code. So there shouldn't be any great mysteries there.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#151 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABPDfV4Islrdn9fQ3c8NoYSb-31Qbmjxks5szebJgaJpZM4QSL8D>
.
|
Like hell I am not using the same pattern. The boilerplate code is patterned after the same test. The ONLY difference is that I run 2+ concurrently. OF THE SAME BOILERPLATE TESTS. That is the ONLY difference. |
That is a HUGE difference.
…On Sun, Nov 5, 2017 at 9:31 AM Michael W Powell ***@***.***> wrote:
Like hell I am not using the same pattern. The boilerplate code is
patterned after the same test. The ONLY difference is that I run 2+
concurrently. *OF THE SAME BOILERPLATE TESTS*. That is the *ONLY*
difference.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#151 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABPDfUk92XFP2aG2V_zCQLA3UdJHtrMvks5szfECgaJpZM4QSL8D>
.
|
You aren't hearing what I've said. The test itself hasn't changed. We're here because NNG apparently has a problem with at least two elements: 1) multi-threading, and/or 2) support for multiple concurrent transports. As long as the boilerplate tests run against a homogenous transport, the tests run fine. But when heterogeneous transports are involved, and/or threading, it's a problem. Literally instead of just running one boilerplate test fixture, I select 2+ to run. Falls over immediately. Which is why I'm asking whether NNG is re-entrant (read: thread-safe). |
Can I suggest that you refactor whatever static tables you are maintaining into a self-contained struct? You're already doing that, more or less, with the Message concept, and even with the AIO concept, which is seen opaquely. Along the same lines. That way whatever state is maintained travels with the caller's context. Perhaps an |
There are reasons I did not do that - mostly around legacy nanomsg api
support. There are also some weird considerations around fork() on UNIX
systems.
Just don't call nng_fini in your program. That will NOT solve all of your
difficulties but it might solve many of them. nng_fini is explicitly NOT
thread safe. (It is the only function in nng for which this is true.) Most
programs should probably not bother since all it does is throw away global
state - like the list of open sockets and the global worker thread, winsock
state etc.
I repeat. I wrote my tests with various undocumented assumptions- and just
blindly copying them into other scenarios may have unexpected results.
Those tests were meant to serve my specific purposes for testing internals
(and in some cases validating assumptions- the load balance test for
example verifies a behavior that I assume but which is NOT part of any
formal promise in the API.). My use of URLs is another example - I freely
reuse URLs in inproc tests because those tests were designed to be confined
to their own process.
Frankly if you do not understand the API to confidently write your own
tests (with your own assumptions) probably you shouldn't be trying to build
these bindings yourself. A good exercise before that would be to try to
write some example programs in nng in C to familiarize yourself with the
native API. If that feels like to high a hurdle then I have to politely
request that you defer this effort to someone else who can do that.
I will also reiterate that at this point I simply lack the bandwidth to
continue to try to support you in this endeavor.
…On Sun, Nov 5, 2017 at 1:00 PM Michael W Powell ***@***.***> wrote:
Can I suggest that you refactor whatever static tables you are maintaining
into a self-contained struct? You're already doing that, more or less, with
the Message concept, and even with the AIO concept, which is seen opaquely.
Along the same lines. That way whatever state is maintained travels with
the caller's context. Perhaps an nng_start(...) to link up with the
``nng_fini()`. That sort of thing. I would personally have more confidence
that NNG was re-entrant and thread-safe under the hood that way.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#151 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABPDfUG3VY-0CfKsL7z57vq_oAdQBI-Mks5sziHjgaJpZM4QSL8D>
.
|
To be clear, in this case I am not calling I understand that the steps in your tests are undocumented. There are no ivory towers here, though. I was able to reverse engineer my focused message pipe unit tests, for example. |
I've done a bunch more analysis. I'm closing this bug, as the real issue is #150 -- there is a bug in the Windows IPC code. Stay tuned. |
I'm just not buying it. This has nothing whatsoever to do with IPC. Again, run individually, Bus and Pipeline tests, for instance, run successfully.
However, run together, different threads, they fall over immediately.
|
Please see the multistress test for a test in C that demonstrates bus, pair0, pub/sub, and req/rep all running simultaneously. This issue is not reproducible for me, and I have no time to debug your environment. If you want to submit a bug report demonstrating the problem in C, please feel free to do so in another bug. This issue is closed, however. |
Taken individually, InProc, IPv4, or IPv6, the tests all run just fine. But run in parallel causes problems.
However, by contrast, the pipeline tests are running just fine, individually, or altogether in parallel (same set as above).
Generally, I am following the same approach as the C code, closer to the C++ code, probably even a bit "simpler" due no smart pointer gymnastics going on. I am also not bothering to vet any of the protocol/peer stuff, just focusing on messaging, the socket protocols, etc.
I'll have my repo committed this weekend so you can take a look.
The text was updated successfully, but these errors were encountered: