Skip to content
ArjunBharioke edited this page Feb 22, 2015 · 25 revisions

Essential Connectome Data

The following describes the format of the three most essential datatypes available for the hackathon. Other relevant datastructures will be described in more detail in the DVID API wiki section. Descriptions of the skeletal representation of neuronal shapes is described in the section Neuronal Skeletons. The first type provides the graph as a list of synapses and connecting bodies/neurons (the terms body and neuron will mostly be used interchangeably though a body tends to include neuronal fragments that have not been identified). The second type provides meta information on the most important neurons. The third type is a more traditional graph representation of nodes and edges but the edges represent contact area, not synapse count, as in the first data type. (The first two datatypes are likely to be refactored in the future and the third type is specific to DVID.)

The datatypes are expressed using JSON. Datatype 1 and 2 and can be retrieved through this repo (as "synapses.json" and "neuronsinfo.json" respectively). All datatypes will be available via the DVID api described later and only available at the hackathon.

The first and third datatype provide information on all bodies that intersect the entire segmented dataset (larger than the seven column region that was manually verified and revised). As such, there are many small body fragments and untraceable orphan processes. Attendees should concentrate on important bodies. What is important? The neurons described in datatype 2 should be included. One might also consider very big bodies (determined by the graph in datatype 3) or bodies with a lot of synapses (determined by the synapse info in datatype 1).

Synapse Graph (JSON file format):

(Synapses may be referred to as "T-bars". These are structures specific to the fly visual system, where they are presynaptic specializations at all synapses.)

As expected for a hackathon on a connectomics, the most important datatype is the one that describes how the neurons are connected to each other. We provide this information as list of synapses/T-bars giving the pre-synaptic body and post-synaptic partners. Each synapse can have multiple partners. Synapse and partners will heretofore be referred to as elements. Each element has a representative X,Y,Z coordinate (right-handed coordinate system) which is inside the correct body. A body ID is provided for element (a body ID 0 indicates that the synapse is not on a body label). There is also a confidence field for each element. When a proofreader is less sure about the annotation, he/she marks it as 0.5. This field was not used extensively or consistently in our process but may provide some useful information.

To anyone familiar with cortical synapses, it is important to note that the synapses in the fly visual system are almost always divergent: they have a single presynaptic specialization (a "T-bar") which contacts multiple postsynaptic partners (marked by postsynaptic densities). This is evidence in the example of the data structure, below.

Example: (inlined comments are delimited by '#' and are included for clarity but are not considered legal JSON format)

{
"data": [ # list of synaspes
     {
        "T-bar": { # i.e pre-synaptic site
            "confidence": 1.0,
            "body ID": 501,
            "location": [241, 452, 143] # x,y,z
        }
        "partners": [ # post-synaptic partners for synapse
            {
                 "confidence" : 0.5, # not very confident
                 "body ID" : 315,
                 "location": [245, 470, 150]
            },
            {
                 "confidence" : 1.0,
                 "body ID" : 435,
                 "location": [225, 420, 180]
            }
        ]
     },
     {
        "T-bar": {
            "confidence": 1.0,
            "body ID": 501,
            "location": [501, 152, 543]
        }
        "partners": [ # sometimes there are >10 partners
            {
                 "confidence" : 1.0,
                 "body ID" : 315,
                 "location": [511, 172, 562]
            },
            {
                 "confidence" : 1.0,
                 "body ID" : 130,
                 "location": [501, 132, 523]
            },
            {
                 "confidence" : 1.0,
                 "body ID" : 1023,
                 "location": [511, 116, 513]
            },
            {
                 "confidence" : 1.0,
                 "body ID" : 1432,
                 "location": [512, 123, 566]
            }
        }
     }
]
}

In this example, body 501 has two T-bars and six total partners. One of the synapses/T-bars on body 501 connects to bodies 435 and 315. Another T-bar connects body 501 to bodies 315, 130, 1023, and 1432. In the provided synapses.json file, there are over 40,000 T-bars. Information on particular body ids like 501 can be retrieved from the next two datatypes.

Neuron Meta Info (JSON file format)

In our reconstruction, there are thousands of total body IDs. Only a few hundred of these refer to neurons that are large enough to identify by type (through comparison with other methods, e.g. through random silver staining of different cell types or genetic targeting of fluorescent proteins to different cell types).

For each of these larger bodies, we provide information (tagged to the numeric body ID) about the classification of the neuron, derived from the morphological properties of the reconstructed cell, as well as its connections within the connectome. This information is stored in the neuroninfo.json file.

First, each neuron can be classified into a cell type, by the shape of its projections. Different neurons of the same type are located at different spatial locations. Given the repetitive structure of the fly visual system (as detailed in the Introduction to the Fly Optic Lobe), cells that have arborizations in mostly a single column can be named by identifying this columnar identity (which defines their spatial position). In contrast, cells which are non-columnar are harder to name, and are simply described by a numeric identifier that iterates over the different cells.

Different types of cells can also be grouped together - through having arborizations that are anatomically similar. We divide this grouping into two levels. The first we term a "class". Classes include groupings of cells such as photoreceptors (R), or laminar monopolar cells (L) etc. The second, more inclusive grouping we term a "superclass". This groups together cells that provide input to the medulla (from the lamina) (Lamina neuron), as well as cells with arborization patterns entirely within the lamina (e.g. Intrinsic Medulla neurons).

In addition, we also include properties characterizing the spatial location of the cell, within the array of columns of the medulla. A detailed description of the rules by which we characterized each cell is here. The document also includes the specific cases where the classification was chosen to differ from the rules, however, we will summarize only the logic underlying the rules, below.

As introduced above, some cells have arborizations that are primarily within a single column ROI. These are termed "Single columnar" neurons. In contrast, other neurons (termed "Multi columnar") arborize over multiple columns. For each neuron, we computed the fraction of the total reconstructed volume within each of the column ROIs. Normalizing this by total fraction within only the 7 column ROIs, we termed any neurons with >90% of their volume within any single ROI, "Single columnar". This description is included in neuronsinfo.json, under the "Columnar spread" key. In addition, the computed fraction of the total volume within each of the ROIs is also included in the json file, under the key: "Column Volume Fraction". (If necessary, it is straightforward to recompute these values by querying the labelgraphs associated with each ROI, as detailed further here.)

In addition to the columnar spread, each cell can be located relative to the locations of the columns. Specifically, a cell can be localized either interior, or exterior to the column; this location is determined by the position of the massed volume of the cell (see the next paragraph for a quantitative definition). It is important to emphasize that this definition is not precisely the same as the single-, multi columnar distinction (despite its obvious similarity). For example, a cell could send its arborizations entirely to a single column (and hence be classified as single columnar), but have its main volume exterior to the column, within the space between individual columns. One example of such a cell is L4 home (Body ID: 21945).

We include this additional classification in the JSON file under the key: "Columnar location". The value of the key can be either "Interior" or "Exterior". For single columnar neurons, we identify the cell as interior if >30% of its volume can be localized to the column ROIs. This threshold was chosen because 30% of the total volume within the reconstruction is within the column ROIs. Therefore, a completely random cell which covers the entire region would necessarily have 30% of its volume within the column ROIs. Since such a cell would reasonably be termed "External", the threshold appears reasonable. For multi columnar neurons, since most such neurons do cover several columns, they are assumed to be all exterior. There are some exceptions to both rules (due to the structure of their anatomical connections). However, these cannot be identified through any specific quantitative rule and, hence, have been specifically defined as exceptions. They are detailed here, and a biological explanation of the reason for their classification is also provided.

Note: Both the Columnar Spread and Columnar Location keys are chosen to be the same for an entire cell type. Therefore, they depend on the identification of the cell, prior to any analysis - and are not entirely dependent only on the voxel reconstruction.

For neurons that are single columnar, and internal, we can identify a single column in which a majority of the arborization is located. For neurons within the central 7 columns (which have been most completely reconstructed), we have included this column under the key: "Column ID".

Importantly, all the above classifications do not depend on the connectome (i.e on the connections within cells). Instead, they are only a property of the morphology of the cell - and, thereby, agree with more traditional forms of anatomical classification. In general, they provide users who may be less familiar with the biological structure of the data set with some intuition about the system's underlying anatomical structure. This may be useful in comparing between the anatomical classifications, and any classifications derived from the synaptic connectivity within the connectome.

In addition to the anatomical classifications, the JSON file contains a quantitative description of the location of the synaptic elements within each reconstructed neuron. The presynaptic (or Tbar) locations are the locations where the neuron outputs synaptic inputs to other neurons. The fraction within each of the column ROIs is provided (key: "Column Tbar Fraction"), as well as the fraction within each medulla layer (introduced here) (key: "Layer Tbar Fraction"). Similarly, the postsynaptic (or PSD) locations - i.e. locations where the neuron receives synaptic input from other neurons - are also included, and are stored under the keys: "Column PSD Fraction" and "Layer PSD Fraction", respectively. (If necessary, it is straightforward to recompute these values by querying the column or layer ROIs with the locations of all synapses (located within the synapses.json, described above). Details on the query can be found here.)

We now show the neuroninfo.json file structure for two identified cells within the reconstruction:

Example:

{
"103": {
    "Class": "Dm", 
    "Column ID": "", 
    "Column PSD Fraction": {
        "A": "0.3454545455", 
        "B": "0.3818181818", 
        "C": "0", 
        "D": "0", 
        "E": "0", 
        "F": "0", 
        "home": "0.1363636364"
    }, 
    "Column Tbar Fraction": {
        "A": "0.2857142857", 
        "B": "0.4285714286", 
        "C": "0", 
        "D": "0", 
        "E": "0", 
        "F": "0", 
        "home": "0.2857142857"
    }, 
    "Column Volume Fraction": {
        "A": "0.417286948", 
        "B": "0.2645275129", 
        "C": "0", 
        "D": "0", 
        "E": "0", 
        "F": "0", 
        "home": "0.1184828293"
    }, 
    "Columnar Location": "Exterior", 
    "Columnar Spread": "Multi Columnar", 
    "Layer PSD Fraction": {
        "m1": "0.2342342342", 
        "m10": "0", 
        "m2": "0.7657657658", 
        "m3": "0", 
        "m4": "0", 
        "m5": "0", 
        "m6": "0", 
        "m7": "0", 
        "m8": "0", 
        "m9": "0"
    }, 
    "Layer Tbar Fraction": {
        "m1": "0.25", 
        "m10": "0", 
        "m2": "0.75", 
        "m3": "0", 
        "m4": "0", 
        "m5": "0", 
        "m6": "0", 
        "m7": "0", 
        "m8": "0", 
        "m9": "0"
    }, 
    "Name": "Dm3-1", 
    "Superclass": "Distal Medulla", 
    "Type": "Dm3"
}, 
"10319": {
    "Class": "L", 
    "Column ID": "home", 
    "Column PSD Fraction": {
        "A": "0", 
        "B": "0", 
        "C": "0", 
        "D": "0", 
        "E": "0", 
        "F": "0.0172413793", 
        "home": "0.8620689655"
    }, 
    "Column Tbar Fraction": {
        "A": "0", 
        "B": "0", 
        "C": "0", 
        "D": "0", 
        "E": "0", 
        "F": "0.0427350427", 
        "home": "0.9487179487"
    }, 
    "Column Volume Fraction": {
        "A": "0", 
        "B": "0", 
        "C": "1.96928842624444E-005", 
        "D": "0", 
        "E": "0.0006044598", 
        "F": "0.0250923941", 
        "home": "0.9284273158"
    }, 
    "Columnar Location": "Interior", 
    "Columnar Spread": "Single Columnar", 
    "Layer PSD Fraction": {
        "m1": "0.7459016393", 
        "m10": "0", 
        "m2": "0.0901639344", 
        "m3": "0", 
        "m4": "0.0081967213", 
        "m5": "0.1557377049", 
        "m6": "0", 
        "m7": "0", 
        "m8": "0", 
        "m9": "0"
    }, 
    "Layer Tbar Fraction": {
        "m1": "0.5702479339", 
        "m10": "0", 
        "m2": "0.0082644628", 
        "m3": "0", 
        "m4": "0", 
        "m5": "0.3884297521", 
        "m6": "0.0330578512", 
        "m7": "0", 
        "m8": "0", 
        "m9": "0"
    }, 
    "Name": "L1 home", 
    "Superclass": "Lamina", 
    "Type": "L1"
}
}

The key is this data structure is the body ID (the unique numeric identifier of each body within the dataset). The first neuron "Dm3-1" is the name of an "Dm3" type of neuron, which is multi columnar, and external to any single column. Therefore, we are unable to name a single, column ID. "Dm3" is a member of the "Dm" class, which is localized to the "Distal Medulla" superclass.

The second neuron "L1 home" is of type "L1". In contrast to the first cell, it is single columnar, and internal to a single column - in this case, the home column. It is a member of the "L" class, which is classified within the "Lamina" superclass (including neurons which have significant arborization within the lamina neuropil of the fly optic lobe).

Note: We emphasize that the body information in this data structure is to be used only as a test of connectomics analyses, and not for publishable biological insight. We have randomly changed some of the names of the neurons within the dataset. Hence, the connections are identical, and the structure of the circuit is unaffected. However, any conclusions including the names of specific cell types may be incorrect.

Body Overlap Graph (DVID Label Graph) (JSON file format)

This datastructure includes information on the size of a given body and all the bodies that physically touch it. The contact area between the bodies is recorded as an edge weight. The contact area is not a good proxy for connection strength in general, but there must be contact area for a connection to exist. Overlap between bodies and the body's size might be useful in some analyses, perhaps as a filter. The surface area for a body can be obtained by summing all of its edge weights.

The labelgraph can be obtained from DVID (it is not provided in the repo) as explained in the DVID API section. DVID contains the labelgraph for the entire reconstruction and labelgraph subsets restricted to specific ROIs (column a, layer1, etc). The user can extract the whole graph, a subgraph, or just query a particular body ID. In all cases, the data is returned with the format described in the next paragraph.

The graph is simply a list of Vertices and Edges. Each Vertex has weight that is its size in voxels (each voxel is 10nm x 10nm x 10nm). Each edge points to two vertices and has weight that is the contact area between those two bodies.

Example:

{
 "Vertices" : [ # unordered array of vertices
     {
          "Id" : 503,
          "Weight" : 2145667
     },
     {
          "Id" : 721,
          "Weight" : 10234
     }
 ],
 "Edges" : [ # unordered array of edges
     {
          "Id1" : 503,
          "Id2" : 721,
          "Weight" : 321
     },
     { # all connecting edges for subgraph of vertices are specified
          "Id1" : 503
          "Id2" : 1342,
          "Weight" : 854
     }
  }

This example shows a subgraph for nodes 503 and 721. The have contact area of approximately 321 units where each unit is a voxel face (10nm x 10nm). When retrieving a subgraph all connecting edges to 503 and 721 are shown so there can be vertex ids in the edge list that are not in the vertex list.

Table of Contents

Clone this wiki locally