# Accessing synapse and spine info of the MICrONS data

The MICrONS initiative provided a dense reconstruction of around a cubic milimeter of mouse brain tissue.

At OBI, we have converted that data into the SONATA format that is often used to represent biophysically-detailed computational models of neuronal circuitry. We believe that this is a useful resource for the community for the following reasons:
 1. It allows direct comparison of models to the data, as both are in the same format. In the future it may even be possible to simulate the MICrONS circuitry as one simulates the computational models.
 2. There are many useful code libraries for analyzing SONATA-formatted circuits.
 3. It is reduced representation of the data. While this discards a lot of information, what remains is still very useful for many purposes. And the reduced data can be more easily handled and analyzed faster.
 4. During the conversion to SONATA we added derived data. Specifically, high-quality morphology skeletons with extracted spines.


Here, we want to expand on point (4) above. We demonstrate some example of how to access spine- and synapse-related data.

### Summary of the analysis

This is less of an analysis and more of a demonstration of how to access spine- and synapse-related data of the MICrONS data, represented in the SONATA format. It serves to teach you the basics of structural analyses of SONATA circuits.

However, as examples, we calculate the fractions and number of shaft vs. spine synapses of 100 neurons, and the propertions of pre-synaptic neuron types.

## Importing code libraries and loading the data

We import a number of standard packages, as well as _bluepysnap_ and _neurom_. These two packages provide (as we will see) useful functionality for accessing the data in the SONATA format.

In [None]:
import numpy
import pandas

import bluepysnap as snap
from matplotlib import pyplot as plt

circ_fn = "circuit_config.json"
circ = snap.Circuit(circ_fn)

## Node populations

The neurons in a SONATA circuit can be split into different _node populations_.
To represent the MICrONS data, we have decided to split it as follows:
  - An "intrinsic" population that contains the neurons with somata inside the reconstructed volume. Except for a small fraction (~15%) of neurons in the very periphery of the volume. As these peripheral neurons are likely to be severely affected by an edge effect in terms of their connectivity and hence should be excluded from analyses.
  - A "virtual" population representing the ~15% of neurons in the periphery that were excluded.
  - An "extrinsic" population representing neurons outside the reconstructed volume innervating neurons inside it. As these neurons are outside the volume, we know nothing about them, except that they must exist. 

We displaying the names of the node populations.

In [None]:
display(list(circ.nodes))

The intrinsic node population ("microns_inotrinsic") is the most interesting one. Each neuron in the population is associated with a number of "node properties" that represent, e.g., its location, neuron type, etc.

We display the available node, i.e., neuron properties.

In [None]:
node_pop = circ.nodes["microns_intrinsic"]

display(node_pop.property_names)

For the intrinsic population, we load a number of available property values and display them.

In [None]:
node_properties_to_load = ["layer", "morphology", "mtype", "spine_info", "synapse_class", "x", "y", "z"]
nrn_props = node_pop.get(properties=node_properties_to_load)

display(nrn_props.head())

### Morphologies available for a fraction of neurons

As we saw, above the entry for "morphology" is "_NONE" for most neurons. That indicates that we have not yet skeletonized the morphology for that neuron. At the moment, we have made available morphologies for only 85 neurons, but that number is steadily growing.

Here, we create a DataFrame of neurons with available morphologies. Its index "node_ids" provides the identifiers of those neurons for future analyses.

In [None]:
nrn_props = nrn_props.loc[nrn_props["morphology"] != "_NONE"]
display(nrn_props.head())

Just as neurons, the synapses are also split into separate _edge populations_ that represent synapses between different pairs of node populations.

Display edge populations

In [None]:
edge_pop_names = list(circ.edges)
display(edge_pop_names)

Display available edge (i.e., synapse) properties

In [None]:
edge_properties_to_load = list(circ.edges[edge_pop_names[1]].property_names)

display(edge_properties_to_load)

We pick an exemplary neuron with available morphology.

Then we load its afferent synapse properties from all available edge populations. For that purpose, we define a helper function that iterates over edge populations.

In [None]:
nrn_id = nrn_props.index[10]

def synapses_from_all_edge_populations(nrn_id):
    syns = []; keys=[]
    for edge_pop in edge_pop_names:
        pop_syns = circ.edges[edge_pop].afferent_edges(nrn_id, properties=edge_properties_to_load)
        if len(pop_syns) > 0:
            syns.append(pop_syns)
            keys.append(edge_pop)
    syns = pandas.concat(syns, axis=0, keys=keys, names=["edge_population"])
    return syns

synapses = synapses_from_all_edge_populations(nrn_id)
display(synapses)

### Extrinsic vs. intrinsic innervation

With the loaded data, we can already calculate the number of extrinsic vs. intrinsic synapses. A theoretically important quantity.

Note that at the moment we make "extrinsic" synapses only available for neurons with available morphology. In the future we will provide extrinsic info also for other neurons.

In [None]:
synapses.reset_index()["edge_population"].value_counts()

We saw that most synapses are, indeed extrinsic!

We now write a quick widget that allows to perform that analysis for all neurons. 
Use the slider to iterate over neurons 

In [None]:
from ipywidgets import widgets

wgt_nrn_id = widgets.IntSlider(min=0, max=len(nrn_props)-1, step=1, value=0, description="Neuron index")

def display_fun(idx):
    synapses = synapses_from_all_edge_populations(nrn_props.index[idx])
    per_population_count = synapses.reset_index()["edge_population"].value_counts().sort_index()

    _ = plt.pie(per_population_count, labels=per_population_count.index)

i = widgets.interactive(display_fun, idx=wgt_nrn_id)
display(i)

### Small tangent: Presynaptic neuron types

This is unrelated to spines. But we can also look up the neuron types of the presynaptic neurons.

To that end, we simply use the "get" function of the corresponding pre-synaptic node population for the "mtypes" of the innervating neurons.

Note that this is NOT possible for extrinsic synapses. Because the sources of extrinsic synapse are by definition outside the reconstructed volume, so we have no information about those neurons. 

In [None]:
non_extrinsics = synapses.drop(index=["em_extrinsic__microns_intrinsic__chemical"], errors="ignore")

def lookup_presynaptic_property(df_in, property_names):
    edge_pop = df_in.index[0][0]
    source_name = circ.edges[edge_pop].source.name
    lo = circ.nodes[source_name].get(df_in["@source_node"], properties=property_names)
    return lo.reindex(df_in["@source_node"])

pre_mtypes = non_extrinsics.groupby("edge_population").apply(lookup_presynaptic_property, "mtype")
pre_mtype_counts = pre_mtypes.value_counts()

_ = plt.pie(pre_mtype_counts, labels=pre_mtype_counts.index)

The neuron type classification used above uses "PTC" to denote an inhibitory "proximally targeting cell", i.e., a Basket Cell. And "DTC" is an inhibitory "distally targeting cell", i.e. SST-positive neurons.

Again, we can also do this with a little interactive widget.

In [None]:
from ipywidgets import widgets

wgt_nrn_id = widgets.IntSlider(min=0, max=node_pop.count(), step=1, value=0, description="Neuron index")


def display_fun(nrn_id):
    synapses = synapses_from_all_edge_populations(nrn_id)
    non_extrinsics = synapses.drop(index=["em_extrinsic__microns_intrinsic__chemical"], errors="ignore")
    pre_mtypes = non_extrinsics.groupby("edge_population").apply(lookup_presynaptic_property, ["mtype"])    
    pre_mtype_counts = pre_mtypes["mtype"].value_counts().sort_index()

    ax = plt.figure().gca()
    _ = ax.pie(pre_mtype_counts, labels=pre_mtype_counts.index)

i = widgets.interactive(display_fun, nrn_id=wgt_nrn_id)
display(i)

## Finally: Accessing spine data

We write a small helper function to access and load the extracted spine data for a neuron.

We represent spines at three levels of detail:
  1. Surface meshes of spines
  2. Morphology-skeletons of spines, i.e., as line-segments with diameters
  3. As a simple line segment from the root of the spine to its tip.

At the moment and in this example, we only make (3) available. In the future, also (2) and (1) will be released.

In [None]:
import os, json

def read_spine_info(node_pop, nrn_id):
    spines_root = os.path.split(node_pop.config["alternate_morphologies"]["h5v1"])[0]
    spines_root = node_pop.config["alternate_morphologies"]["h5v1"]
    fn = os.path.join(spines_root, nrn_props.loc[nrn_id, "spine_info"]) + ".json"
    with open(fn, "r") as fid:
        spines = json.load(fid)
    return pandas.DataFrame(spines)

spine_info = read_spine_info(node_pop, nrn_id)
display(spine_info.head())

A quick explanation of the above:

Each row of the DataFrame represents a spine on the morphology of the neuron. The columns are as follows:
  - dendritic_sample_position: x,y,z coordinates of the location of the spine on the morphology skeleton, i.e., on the center line of the dendrite
  - surface_sample_position: x,y,z coordinates of the location of the root of the spine on the dendrite surface
  - direction_vector: The direction from surface_sample_position to dendritic_sample_position
  - orientation_vector: A vector pointing from surface_sample_position towards the tip of the spine
  - synaptic radius: The distance of the tip of the spine from surface_sample_position

## Fractions of shaft- and spine-synapses

We use the above to calculate for all neurons their numbers of spine- and shaft-synapses. 

In [None]:
for nrn_id in nrn_props.index:
    spines = read_spine_info(node_pop, nrn_id)
    nrn_props.loc[nrn_id, "spine_count"] = len(spines)  # Each row of the DataFrame is a spine. Hence len is the spine count.
    
    syns = synapses_from_all_edge_populations(nrn_id)
    count_on_spines = (syns["spine_id"] > -1).sum()
    count_on_shafts = (syns["spine_id"] <= -1).sum()
    nrn_props.loc[nrn_id, "syn_count_on_spines"] = count_on_spines
    nrn_props.loc[nrn_id, "syn_count_on_shafts"] = count_on_shafts


For all exemplars with spines and morphologies available, we plot the numbers of spine and shaft synapses and the total numbers of spines.

We see that in this dataset the number of spines is typically higher than the number of spine synapses, indicating some spines being still unoccupied and "looking for" a synaptic partner.
But there are also instances of more spine synapses than spines, indicating spines with multiple synapses.

In [None]:
plt.bar(range(len(nrn_props)), nrn_props["syn_count_on_spines"],
        color="green", label="Spine syn. count")
plt.bar(range(len(nrn_props)), 
        nrn_props["syn_count_on_shafts"],
        bottom=nrn_props["syn_count_on_spines"],
        color="blue", label="Shaft syn. count")
plt.bar(range(len(nrn_props)), -nrn_props["spine_count"],
        color="teal", label="Spine count")
plt.plot(range(len(nrn_props)), -nrn_props["syn_count_on_spines"],
         ls="--", color="black", lw=0.5, label="Spine syn count (mirrored)")

ax = plt.gca()
plt.legend()
ax.set_frame_on(False)
ax.set_xlabel("Neuron #")
ax.set_ylabel("Count")
ax.set_yticks(ax.get_yticks())
ax.set_yticklabels(numpy.abs(ax.get_yticks()))

Which presynaptic types place synapses on spines vs. shafts?

In [None]:
def lookup_presynaptic_property2(df_in, property_names):
    edge_pop = df_in.index[0][0]
    if edge_pop == "em_extrinsic__microns_intrinsic__chemical":
        return pandas.DataFrame([["extrinsic"] * len(property_names)] * df_in.shape[0],
                             index=df_in["@source_node"],
                             columns=property_names)
    source_name = circ.edges[edge_pop].source.name
    lo = circ.nodes[source_name].get(df_in["@source_node"], properties=property_names)
    return lo.reindex(df_in["@source_node"])

per_pre_mtype_fractions = []
for nrn_id in nrn_props.index:
    syns = synapses_from_all_edge_populations(nrn_id)
    pre_mtypes = syns.groupby("edge_population").apply(lookup_presynaptic_property2, ["mtype"]).set_index(syns.index)
    mtypes_and_is_on_spine = pandas.concat([pre_mtypes, syns["spine_id"] > -1], axis=1)
    res_for_neuron = mtypes_and_is_on_spine.value_counts().unstack("mtype", fill_value=0)
    res_for_neuron = res_for_neuron.loc[True] / res_for_neuron.sum()
    per_pre_mtype_fractions.append(res_for_neuron)

per_pre_mtype_fractions = pandas.concat(per_pre_mtype_fractions, axis=1)
mn_vals = per_pre_mtype_fractions.mean(axis=1)
smpl_counts = (~numpy.isnan(mn_vals)).sum(axis=0)
sem_vals = per_pre_mtype_fractions.std(axis=1) / numpy.sqrt(smpl_counts)

plt.bar(range(len(mn_vals)), mn_vals)
plt.errorbar(range(len(mn_vals)), mn_vals, yerr=sem_vals, ls="None")
plt.gca().set_xticks(range(len(mn_vals)))
plt.gca().set_xticklabels(mn_vals.index, rotation="vertical")
plt.gca().set_ylim([0.5, 1.0])
plt.gca().set_ylabel("Fraction syns. on spines")

## Plot exemplar morphology and its synapses

We plot an examplar morphology and all its afferent synapses. 

Shaft synapses in blue, spine synapses in green.

We see the shaft synapses are more prevalent around the soma than elsewhere. This is easier to see if you uncomment the last code line to zoom into the soma a bit more.

In [None]:
import neurom
import neurom.view

morph = neurom.load_morphology(node_pop.morph.get(nrn_id, extension="swc", transform=True))
syns = synapses_from_all_edge_populations(nrn_id)

neurom.view.plot_morph(morph, diameter_scale=3)
ax = plt.gca()
ax.scatter(syns.afferent_synapse_x[syns.spine_id == -1],
           syns.afferent_synapse_y[syns.spine_id == -1], s=5, alpha=0.3, color="blue")
ax.scatter(syns.afferent_synapse_x[syns.spine_id != -1],
           syns.afferent_synapse_y[syns.spine_id != -1], s=2, alpha=0.3, color="green")

ax.set_ylim([600, 300]); ax.set_xlim([420, 800])
# To zoom into the soma
# ax.set_ylim([500, 450]); ax.set_xlim([550, 600])

## Plot exemplar morphology, synapses -- and SPINES

We plot the same exemplar again, this time also with the extracted spines.

As mentioned above, at this point we only offer the very reduced representation of spines as lines. But soon more involved representations will be available. 

Still, we can see nicely how each spine projects outwards from its dendrite to "catch" its associated synapse.

In [None]:
import neurom
import neurom.view

morph = neurom.load_morphology(node_pop.morph.get(nrn_id, extension="swc", transform=True))
syns = synapses_from_all_edge_populations(nrn_id)
spines = read_spine_info(node_pop, nrn_id)

neurom.view.plot_morph(morph, diameter_scale=3)
ax = plt.gca()
ax.scatter(syns.afferent_synapse_x[syns.spine_id == -1],
           syns.afferent_synapse_y[syns.spine_id == -1], s=5, alpha=0.3, color="blue")
ax.scatter(syns.afferent_synapse_x[syns.spine_id != -1],
           syns.afferent_synapse_y[syns.spine_id != -1], s=2, alpha=0.3, color="green")

for _, spine in spines.iterrows():
    spine_root = numpy.array(spine.surface_sample_position)
    spine_dir = numpy.array(spine.orientation_vector)
    spine_dir = spine.synaptic_radius * spine_dir / numpy.linalg.norm(spine_dir)
    ax.plot([spine_root[0], spine_root[0] + spine_dir[0]],
            [spine_root[1], spine_root[1] + spine_dir[1]],
            color="black", lw=0.5)

ax.set_ylim([600, 300]); ax.set_xlim([400, 800])
# To zoom into the soma
ax.set_ylim([500, 450]); ax.set_xlim([550, 600])

## Calculating synapse path distances to the soma.

The synapse properties "afferent_section_id", "afferent_segment_id", "afferent_segment_offset" map each synapse to a location on the morphology skeleton. 

Hence, that information can be used to rapidly calculate path distances between pairs of synapses. Or the path distance of a synapse to the soma. This information is useful for applications in the field of dendritic clustering of synapses.

**NOTE**: What is calculated is the path distance from the root of the spine of a synapse, not including the length of the spine itself.

Here, we calculate for all synapses their path distance to the soma (represented by section and segment id 0) and create a histogram.

In [None]:
from conntility.subcellular import MorphologyPathDistanceCalculator

calc = MorphologyPathDistanceCalculator(morph.to_morphio())
relevant_cols = ["afferent_section_id", "afferent_segment_id", "afferent_segment_offset"]
soma = pandas.DataFrame({
    "afferent_section_id": [0], "afferent_segment_id": [0], "afferent_segment_offset": [0]
})

pds = calc.path_distances(soma, syns[relevant_cols])

H = numpy.histogram(pds, bins=50)
plt.bar(H[1][:-1], H[0], width=0.8*numpy.mean(numpy.diff(H[1])))
plt.gca().set_xlabel("Path distance (um)")
plt.gca().set_ylabel("Synapse count")