Skip to content

File Formats

Niema Moshiri edited this page Sep 7, 2020 · 22 revisions

User Configuration File Format

Please refer to the Configuration Files page for information about the format of FAVITES configuration files.

Contact Network File Format

For robustness to future development, we designed a file format similar to an edge list that must be used for the input Contact Network. The first portion of the file is a list of nodes, and the second portion is a list of edges.

  • "Node" lines have three tab-delimited sections:
    1. NODE (i.e., just the string NODE)
    2. This node's label
    3. Attributes of this node as comma-separated values, or a period (i.e., '.') if this node has no attributes
  • "Edge" lines have five tab-delimited sections:
    1. EDGE (i.e., just the string EDGE)
    2. The label of the node from which this edge leaves
    3. The label of the node to which this edge goes
    4. Attributes of this edge as comma-separated values, or a period (i.e., '.') if this edge has no attributes
    5. d (directed) or u (undirected) to denote whether or not this edge is directed (i.e., u -> v vs. u <-> v)
  • Lines beginning with the pound symbol (i.e., '#') and empty lines are ignored

Below is an example of this file format. Note that <TAB> is referring to a single tab character (i.e., '\t').

#NODE<TAB>label<TAB>attributes (csv or .)
#EDGE<TAB>u<TAB>v<TAB>attributes (csv or .)<TAB>(d)irected or (u)ndirected

NODE<TAB>Bill<TAB>USA,Mexico
NODE<TAB>Eric<TAB>USA
NODE<TAB>Curt<TAB>.
EDGE<TAB>Bill<TAB>Eric<TAB>.<TAB>d
EDGE<TAB>Curt<TAB>Eric<TAB>Friends<TAB>u

Transmission Network File Format

The file format of the transmission networks that are outputted by FAVITES are in the standard edge list format. Each line represents a single edge via three tab-delimited attributes:

  1. The label of the node from which this edge leaves
  2. The label of the node to which this edge goes
  3. The time at which this transmission event occurred

Self-edges (i.e., same node in columns 1 and 2) denote removal of infection, either via recovery or death. Edges with None in column 1 denote seed infections (i.e., infections from outside the population).

Below is an example of this file format. Note that <TAB> is referring to a single tab character (i.e., '\t').

None<TAB>Eric<TAB>0
Eric<TAB>Bill<TAB>1
Eric<TAB>Curt<TAB>2
Eric<TAB>Curt<TAB>3
Curt<TAB>Bill<TAB>4
Curt<TAB>Bill<TAB>5
Curt<TAB>Curt<TAB>6

Sequence and Phylogenetic Tree Identifiers

When FAVITES outputs viral lineages in the sequence files and in the phylogenetic trees, the identifiers are in the format viral_lineage|contact_network_node|time, e.g. N19|67|4.118017.

  • viral_lineage: Each viral lineage in the simulation process has its own unique identifier for ease of identification
  • contact_network_node: The contact network individual from which viral_lineage was sampled
  • time: The time at which viral_lineage was sampled from contact_network_node

Seed File Format

Some modules may require that you pass in the desired seed nodes by file (the seed_file parameter). This file should just contain the seed node names, delimited by newlines. Below is an example of this file format.

Eric
Bill

Sample Time File Format

The file format of sample times that can be used with FAVITES are in a simple tab-delimited format. Each line represents a single sample time via two tab-delimited attributes:

  1. The label of the node to be sampled
  2. The sample time

Multiple sample times can be specified per person by simply having multiple lines for that person. Below is an example of this file format. Note that <TAB> is referring to a single tab character (i.e., '\t').

Eric<TAB>1
Eric<TAB>2
Bill<TAB>3