Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

missing translator for node_dtrace_http_*_request_t types #2667

Closed
wesolows opened this Issue Feb 2, 2012 · 8 comments

Comments

Projects
None yet
4 participants

wesolows commented Feb 2, 2012

Following the change in e142fe2, the node provider declares two separate types for HTTP client and server requests. However, the node.d library contains only a single translator, for the now obsolete node_dtrace_http_request_t type. This allows it to work correctly with older node binaries, but will generate errors of the form

dtrace: invalid probe specifier node57257:::http-client-request { trace(args[0]->url); }: in action list: failed to resolve native type for args[0]

when (and only when) one of the fields of a relevant translated structure is referenced in an enabling within a newer node binary.

The correct fix is to have four translators in the library: from the obsolete node_dtrace_http_request_t to each of the new node_http_request_{client,server}_t, and two "identity" translators from node_dtrace_http_{client,server}_request_t to node_http_{client_server}_request_t. This will also allow for future changes to the format of the internal client structure.

I'm picking this one up.

To elaborate a bit on what Keith explained:

The crux of the issue is that my original change added a field to the server probes' structures but not the client probes' structures. As a result, the C type that gets passed into these probes was split from node_dtrace_http_request_t into node_dtrace_http_client_request_t and node_dtrace_http_server_request_t. However, the names of the corresponding structures in the library file (node.d) were not changed, so when dtrace(1M) goes to translate a script that looks like this:

node57257:::http-client-request { trace(args[0]->url); }

the binary is providing node_dtrace_http_client_request_t, but dtrace(1M) doesn't know anything about that (because it's not in the node.d library file), and it spits out:

dtrace: invalid probe specifier node57257:::http-client-request { trace(args[0]->url); }: in action list: failed to resolve native type for args[0]

The suggested fix is to split the structures in the provider file and define four translators, one for each of the following cases:

  • node_dtrace_http_request_t → node_http_server_request_t (for server probes from binaries built prior to e142fe2)
  • node_dtrace_http_request_t → node_http_client_request_t (for client probes from binaries built prior to e142fe2)
  • node_dtrace_http_server_request_t → node_http_server_request_t (for server probes from binaries built after e142fe2)
  • node_dtrace_http_client_request_t → node_http_client_request_t (for client probes from binaries built after e142fe2)

The one consolation is that the only changes necessary are to the provider file, which can be overridden with the "-L" argument to dtrace(1M). Existing binaries will continue to work.

The test plan must include testing both client and server probes using both translated arguments without using "xlate", for each of:

  • old binaries with the new provider file
  • new binaries with the new provider file

The changes to the provider file will break existing D scripts. It would be possible to preserve compatibility with old scripts by defining translators from each of server_request and client_request to request_t, but It's unlikely that anyone is using this except Joyent's own Cloud Analytics, which will deliver the new provider file with a new version of the software that generates updated scripts.

I forgot to mention that the workaround is to use the xlate operator directly, casting the argument to the appropriate type so that dtrace(1M) doesn't need to know what kind it is. This explains why Cloud Analytics works fine after this change, and may explain why I didn't see this during testing.

This was a bit more subtle than we'd realized in a few ways:

  • If you trace 'node*:::http-client-request', this will works fine if there's another node process running from before e142fe2. The DTrace compiler appears to be satisfied by the presence of the types from the old process's provider, and for some reason it also uses the translator for probes fired from newer process (even though the source type of the translator doesn't match what the provider is sending).
  • There's only one node_http_request_t translated type, not a pair of server and client ones, so we only needed three translators.
  • I ran into what appears to be a dtrace compiler bug where it considers two native structures having the same size as the same for the purposes of determining whether a translator has been redeclared.

A working fix is available here:
davepacheco/node@4a92541

but it's still waiting for review.

Here are the scripts I used to test the fix: https://gist.github.com/1d99633ce16952cd4267

and here's what I tested:

Node v0.4.10:

  • node v0.4.10 binary, HTTP server, per-pid enabling: show all fields (done)
  • node v0.4.10 binary, HTTP server, global enabling: show all fields (done)
  • node v0.4.10 binary, HTTP client, per-pid enabling: show all fields (FAIL: wrong socket fields only)
  • node v0.4.10 binary, HTTP client, global enabling: show all fields (FAIL: wrong socket fields only)
  • node v0.4.10 binary, HTTP server, using 'xlate', all fields, (done)
    don't start process until after the enabling is created
  • node v0.4.10 binary, HTTP client, using 'xlate', all fields, (FAIL: wrong socket fields only)
    don't start process until after the enabling is created

Node v0.6.8:

  • node v0.6.8 binary, HTTP server, per-pid enabling: show all fields (done)
  • node v0.6.8 binary, HTTP server, global enabling: show all fields (done)
  • node v0.6.8 binary, HTTP client, per-pid enabling: show all fields (FAIL: wrong socket fields only)
  • node v0.6.8 binary, HTTP client, global enabling: show all fields (FAIL: wrong socket fields only)
  • node v0.6.8 binary, HTTP server, using 'xlate', all fields, (done)
    don't start process until after the enabling is created
  • node v0.6.8 binary, HTTP client, using 'xlate', all fields, (FAIL: wrong socket fields only)
    don't start process until after the enabling is created

Both:

  • both binaries, HTTP server, global enabling: show all fields (done)
  • both binaries, HTTP client, global enabling: show all fields (done)

All of the FAILs were because of an unrelated issue with the HTTP client probes, which I verified already existed (and also shouldn't be affected by my change). All of my tests were 32-bit, as I'm not sure 64-bit SmartOS is supported yet. (I also didn't change anything that should affect this.)

I managed to test both the client probes and that 64-bit (mostly) works using a fake node binary that just fires these probes with known valid input. This is available at https://github.com/davepacheco/fakenode.

Final fix is davepacheco/node@815419e (same as previous but with minor style fixes).

davepacheco added a commit to davepacheco/node that referenced this issue Feb 3, 2012

Fix #2667
missing translator for node_dtrace_http_*_request_t types

@bnoordhuis bnoordhuis closed this in 9fb088e Feb 3, 2012

I hit the same issue with node 0.8.9, and it was due to another reason: I had no /usr/lib/dtrace/node.d library on my SmartOS. While some SmartOS versions may have it included, others do not. If you miss the file, download latest version of node.d from https://raw.github.com/joyent/node/master/src/node.d, put it in /var/lib/dtrace/ and run dtrace using -L option: dtrace -L /var/lib/dtrace/ -n 'node*:::http-client-request { trace(args[0]->url); }'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment