MotivationBehindNssCache

Jamie Wilkinson edited this page Mar 14, 2015 · 2 revisions

The POSIX API that allows applications look up system databases -- the user database, the group database, and others -- can either be local to the system or in a remote directory service. There are some problems with this API that surface as noticeable delays in interactive behaviour, transient and unexpected failures, and a costly yet critical infrastructure to support the directory.

The Windows operating system, through its directory service Active Directory, does a better job of distributing and scaling this directory service than we (the community, with Open Source solutions) do. Our limits are mostly due to the factors that influenced the original UNIX design, as far back as the 1970s, and eventually codified into the POSIX spec in 1988.

We've changed how we do shared computing resources a lot since that first POSIX specification, and outgrown it; in particular as our machines have gotten smaller, the network has gotten larger, and the services required to support the system have gotten large, too.

The rest of this document will describe status quo, the problems therein, set the requirements for a solution, and then outline the implementation provided here on this site.

Problems with NSS and Directory Services

At the heart of the issues are two quiet assumptions, in part thanks to our roots in /etc/passwd (Read more background information on NSS for a discussion on this):

  • NSS never fails -- There is no EAGAIN condition in the POSIX spec for these functions
  • NSS is fast -- it's in the codepath of all these processes, many interactive

But having just introduced the network, and network services can both fail and can be slow.

Read more detailed descriptions of problems arising from the POSIX API and the introduction of the network.

Requirements for a Solution

Based on the problems elaborated on elsewhere, we can identify a few requirements for solving the problem.

  • 100% reliable.

The network is not reliable, therefore get the network out of the NSS codepath, between the user and the database.

  • Software is hard.

Let's minimise the amount of code between the user and the database.

Things like a dynamic (hit-and-miss) cache is hard, so let's not make it too complicated. Re-using simple, existing software that's been long-proven would be ideal.

  • Data persistence.

It'd be nice to keep data persistent -- a crashed daemon shouldn't result in the network coming back into our codepath.

Persistent across reboots is better yet -- don't require network access/sync on boot.

  • Administrative control

Detailed control over how things are refreshed; allows us greater flexibility in choosing our supporting infrastructure.

Establish a SLA for data freshness (5 minutes, 1 hour, etc) based on your needs; a lower SLA expected to allow for less supporting infrastructure (directory servers/caches/low latency networks)

  • Easy to understand

Definitely has to be easy for me (reboot monkey, backup tape operator) to use and debug.

Solution: An asynchronously updated local file-based cache

We're tied because of the implicit assumptions made in the POSIX standard, and UNIX application behaviour; specifically that these calls are expected to be fast, reliable, and consistent.

The best way to meet those assumptions is to just go back to /etc/passwd as a database! A full copy of the directory on local disk.

Yes, we're serious.

Local storage is well tested by libnss_files.so, all Linux installs use it, it just works. We could use either libnss_files.so or libnss_db.so, an alternate module that uses Berkeley DB format files.

We shouldn't need to worry about in-memory buffers, because the kernel buffer cache already takes care of this for us.

The network is gone, so it's fast and reliable. We can update this cache asynchronously, so a slow network means slow updates, but not a slow user experience! Spammy hosts will be confined to their own machine, and not impact remote resources.

A file on disk is persisitent between reboots!

Plain text files are easy to understand!

You might worry that we're trying to backport DNS to /etc/hosts, but this isn't the case. We're not trying to replicate the entire host->IP mapping for the internet, only the directory for a specific organisation. This turns out to be small and manageable even for large companies. We only need to ensure that it is kept up to date -- defining an update frequency sets out freshness SLA.

Summary

The solution is simple and obvious, the interesting challenge was identifying the problem and defining the right set of requirements.

We're not the first to come up with this idea; Red Hat has been peddling libnss-db and nss-updatedb for a long time.

Hopefully the existence of these wiki pages will make this idea more popular in the community.

Read more about nsscache features as our choice of solution to these problems.