Skip to content

Generate Transaction IDs in Nginx

License

Unknown, GPL-3.0 licenses found

Licenses found

Unknown
LICENSE
GPL-3.0
COPYING
Notifications You must be signed in to change notification settings

synacor/nginx-module-txid120

Repository files navigation

txid120

If your application consists of many smaller parts, such as in a microservice architecture, it can be difficult to correlate requests made as the result of some end-user action. For example, if your website combines data provided from many other services behind the scenes, one request from the user's browser can result in many requests to other parts of your network as it collects the information it needs to render the page. It can be useful to know which of these requests are related to each other. To do this, you can assign each transaction (a request from an end user and all resulting internal requests) a Transaction ID.

Transaction IDs are unique strings that can be used to follow a chain of related requests or messages as they move through your network. This module generates these IDs.

Setup

This is an Nginx module; it is meant to be used on an nginx proxy at the edge of your datacenter. This module will fill a variable named $txid120 for use in an Nginx directive like proxy_set_header. Set a header (maybe Txid?) using the value of this variable on every request entering your network. If you use the same Nginx instance to also handle requests between services already within your network, don't reassign the header for those requests - the idea is that all requests related to the original end user request have the same Transaction ID.

In the simplest case, you can do something like this:

proxy_set_header Txid $txid120;

The module can be compiled into your build of Nginx the same as any other Nginx module.

If you want to test the generation logic on your system before you build it into Nginx, the included test.pl script will compile a standalone version of the code that generates one Transaction ID and exits; it will then analyze the output of this program to ensure that the generated timestamp is valid and that the random entropy seems sane:

$ ./test.pl 
build    : 8778 bytes written 0.15 seconds ago [ OK ]
txid raw : 1H47Yf3DqfG8bW2frPAT [ OK ]
txid b64 : BTEHkrDP2rSIniCr3bMf
txid hex : 05310792b0cfdab4889e20abddb31f [ OK ]
txid thex: 05310792b0cfda
strt time: 1461283479.145378
txid time: 1461283479.146458 [ OK ]
end  time: 1461283479.147018
txid rand: b4889e20abddb31f [ OK ]

All 5/5 tests pass.

Usage

Once your datacenter-edge Nginx is assigning Transaction IDs to new transactions, your internal services need to be modified to pass any incoming Transaction IDs along to sub-requests they make. These are the general rules your services should follow:

  • If a service does receive a Transaction ID, it should pass that same Transaction ID in any sub-requests it makes. (Use the same header you chose in the setup step above.) It should also include the Transaction ID in any log messages it generates (so you can actually correlate events between services).
  • If a service does not receive a Transaction ID, it should not send one (not even an empty string) to other services. It should use an empty string (or - or some other useful indication that it was missing) in its log messages where a Transaction ID would have gone.

If some of your services speak something other than HTTP internally, you can arrange a way to pass Transaction IDs between them on that protocol as well. The goal is to be able to correlate all of the requests caused by a single initial request from an end user.

Extra Features

In addition to being very unique within your network (without having to rely on coordination, indexing, MAC addresses, etc), Transaction IDs generated by this module contain a timestamp and can be lexically sorted to put them in chronological order. (See the "Technical Details" section below for how this works.)

To extract the timestamp from a Transaction ID generated by this module, you can also use this handy tool.

Because Transaction IDs contain a timestamp generated when the transaction started (that is, when the initial end-user request entered your network), you can also use them for things like calculating how much total time has elapsed between the start of that transaction so far, regardless of how many sub-requests deep into your microservices you are (assuming your Nginx server and the local clock are reasonably synchronized).

Technical Details

The module is called txid120 because it generates a 120-bit identifier. (120 bits divides nicely into bytes and base64 encoding blocks, and so it is an efficient use of space; see the diagram below.) Services should treat the identifier as an opaque string, but it includes some useful properties.

The first 56 bits (7 bytes, or 9.33 characters in the encoded format) of the identifier represent a microsecond-granularity timestamp capable of representing times until the year 4147. The remaining bits are random. The identifier is encoded into 20 characters using a modified base64 alphabet which supports lexical sorting (0-9, :, @, A-Z, a-z). Because the timestamp is at the front, the identifiers can be easily sorted into the order they were generated (and therefore the order in which the requests initially entered your network) using something like LANG=C sort list_of_txids.log.

Format Diagram

Each bit maps to the following meaning (val), bytes (bin), base64 characters (b64), and encoding blocks (blk):

c = seconds (used starting in year 2106), s = seconds (used now/soon), u = microseconds, r = random
val cccsssssssssssssssssssssssssssssssssuuuuuuuuuuuuuuuuuuuurrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
bin 0.......1.......2.......3.......4.......5.......6.......7.......8.......9.......0.......1.......2.......3.......4.......
b64 0.....1.....2.....3.....0.....1.....2.....3.....0.....1.....2.....3.....0.....1.....2.....3.....0.....1.....2.....3.....
blk 0.......................1.......................2.......................3.......................4.......................

Entropy Analysis

The remaining 64 bits are random data, which means there is a very small probability of Transaction ID collisions within each microsecond. There can't be collisions across microseconds, since the microsecond-granularity time is included in the identifier.

At microsecond scale and assuming 10k reqs/sec (or, 10000*(1/1000000) requests per microsecond), 64 bits of entropy gives a collision probability of:

bitprob(10000*(1/1000000), 64)
= 1 - 1 / Math.pow(Math.E, (Math.pow(10000*(1/1000000),2) / (2 * Math.pow(2,64))))
= 1 - 1 / ( e ^ (0.0001 / 3.68e19) )
= 1 - 1 / ( e ^ (2.711e-24) )
= 1 - 1 / ( 1 + 2.711e-24 )
= 2.711e-24

So, in any given microsecond, we're likely to see a collision 2.711e-24 of the time. (0.000000000000000000000002711) Or, in other words, roughly one in every 3.69e23 microseconds will have a collision, or roughly once every 11.67 billion years, or roughly 1.2 times the expected lifetime of the Sun. Even at 100 million requests per second, the estimated collision rate is once every 142.8 years.

About

Generate Transaction IDs in Nginx

Resources

License

Unknown, GPL-3.0 licenses found

Licenses found

Unknown
LICENSE
GPL-3.0
COPYING

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published