Skip to content

Name Resolution Process

kwrodarmer edited this page Dec 1, 2016 · 1 revision

VDB Name Resolution

In addition to the ability to open and access VDB objects, all software built upon NCBI's VDB library obtains the ability to resolve accessions into paths - name resolution.

Name resolution is the process of turning an object name into an object location, essentially URN to URL. VDB is capable of locating an object in your host filesystem, in your local cache, in a site-wide repository, or remotely within NCBI. This process occurs automatically, and allows our tools access to the entire SRA.

For example, let's assume you want to use fastq-dump to convert a small run:
$ fastq-dump SRR000123

Most people assume their first step is to download the run, and only then to extract fastq, but the command works as given above with just the simple accession. We can trace through the steps:

1. Detect path or accession

A path is detected as anything having a slash character ('/'), or a simple name that does not conform to NCBI accession patterns. Put another way, if it doesn't have a slash and might be an accession, we assume initially that it is an accession.

A path is treated as expected, and is not subjected to name resolution.

2. Search the host filesystem

VDB uses a hierarchical configuration system to allow each user to customize search paths. We provide a utility for managing these paths called "vdb-config" (and will provide additional tools in the future).

Using search paths obtained from VDB configuration, the resolver will try to find the object within the host filesystem. If found, it will be named accession.sra, or in the case of our example, "SRR000123.sra".

VERY IMPORTANT - the ".sra" extension is added automatically by the resolver and should not be added by the user. To the contrary, if the user specifies the ".sra" extension, the specification will look like a path and no longer an accession, and the path will be treated as a file in the current working directory.

3. Search the site repository

Most users will not have a site repository available, but some do. If so, the site admins will have configured VDB to find objects in a shared area of network storage, accessible by multiple users. When VDB finds this search path in configuration, it will search for the file, e.g. "SRR000123.sra".

4a. Search the remote repository at NCBI - the SRA

VDB will contact the name resolution service at NCBI over HTTPS to request resolution of an accession. If found, the name resolver will reply with an URL that can be used to retrieve the object on-demand over HTTPS.

4b. Determine where to cache partial downloads

By default, VDB will cache data as they are retrieved over HTTPS, storing them within the host filesystem in the user's cache area (set up by configuration). This information not only serves as a means to store data as downloaded, but of course to read from previous results to reduce network traffic.


NOTE - objects that are cached are likely to start out life as partial files, but may eventually become complete. At that point, the object is promoted to a fully downloaded format and will be located by step #2 given above.