### Compute Focused Bodies

This notebook demonstrates how to use `compute_focused_bodies()`, which produces a table of all bodies in the volume that contain a minimum number of synapses or are above a minimum size (in voxels).

The synapse/voxel counts are included in the table.

You must provide synapses and supervoxel sizes as input to the function, and thus this function does not require any calls to DVID, except for a single call to the `/mappings` endpoint.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
import sys
import pydoc
import logging

import requests
from tqdm import tqdm_notebook
tqdm = tqdm_notebook

import numpy as np
import pandas as pd

from dvidutils import LabelMapper
from libdvid import DVIDNodeService

from neuclease.dvid import *
from neuclease.focused.ingest import compute_focused_bodies

In [3]:
handler = logging.StreamHandler(sys.stdout)
root_logger = logging.getLogger()
root_logger.handlers = []
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)
logging.getLogger('kafka').setLevel(logging.WARNING)

In [4]:
pwd

'/nrs/flyem/bergs'

#### Parameters

In [5]:
# Node/segmentation
uuid = '38c6'
master_seg = ('emdata3:8900', uuid, 'segmentation')

# Synapses file
synapse_samples = '/nrs/flyem/bergs/complete-ffn-agglo/sampled-synapses-38c6-locked.csv'

# Root SV Size file
root_sv_sizes_dir = '/groups/flyem/data/scratchspace/copyseg-configs/labelmaps/hemibrain/8nm/compute-8nm-extended-fixed-STATS-ONLY-20180402.192015'
root_sv_sizes = f'{root_sv_sizes_dir}/supervoxel-sizes.h5'

# Classification file (from masking model)
sv_classifications = '/nrs/flyem/bergs/sv-classifications.h5'

# Optional: Manually listed "bad bodies" that should be avoided (massive glia, etc.)
marked_bad_bodies = '/nrs/flyem/bergs/complete-ffn-agglo/bad-bodies-2018-10-01.csv'

# Parameters -- which bodies should be included in the results?
# NOTE: A body is included if it satisfies ANY of these
#      (doesn't need to satisfy ALL of them)
min_tbars = 2
min_psds = 10
min_body_size = int(10e6)

#### Compute! (Takes ~20 minutes and needs a LOT of RAM)

In [6]:
# This function does the following:
#
# 1. Apply synapse-based criteria
#   a. Load synapse CSV file
#   b. Map synapse SVs -> bodies (if needed)
#   c. Calculate synapses (tbars, psds) per body
#   d. Initialize set with bodies that have enough synapses

# 2. Apply size-based criteria
#   a. Calculate body sizes (based on supervoxel sizes and current mapping)
#   b. Add "big" bodies to the set

# 3. Apply "bad body" criteria
#   a. Read the list of "bad bodies"
#   b. Remove bad bodies from the set

focused_table = compute_focused_bodies( *master_seg,
                                        synapse_samples,
                                        min_tbars,
                                        min_psds,
                                        root_sv_sizes,
                                        min_body_size,
                                        sv_classifications,
                                        marked_bad_bodies,
                                        return_table=True )

Reading kafka messages from ['kafka.int.janelia.org:9092', 'kafka2.int.janelia.org:9092', 'kafka3.int.janelia.org:9092'] for emdata3:8900 / 38c6 / segmentation
Reading 1048499 kafka messages took 38.09354639053345 seconds
Fetching http://emdata3:8900/api/node/38c6/segmentation/mappings...
Fetching http://emdata3:8900/api/node/38c6/segmentation/mappings took 0:00:34.249898
Parsing mapping...
Parsing mapping took 0:00:06.726146
Constructing missing identity-mappings...
Constructing missing identity-mappings took 0:00:19.072801
Filtering for synapses...
*** Synapse table includes body 0 and was therefore probably generated from out-of-date data. ***
Filtering for synapses took 0:00:48.365104
Found 454774 with sufficient synapses
Filtering for body size...
Volume contains 188243164 supervoxels and 22.5 Teravoxels in total
Reading kafka messages from ['kafka.int.janelia.org:9092', 'kafka2.int.janelia.org:9092', 'kafka3.int.janelia.org:9092'] for emdata3:8900 / 38c6 / segmentation
Reading 10

In [7]:
focused_table.head()

Unnamed: 0_level_0,voxel_count,sv_count,PostSyn,PreSyn
body,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1497973422,12393316998,8977,8485,67
5813070019,10958567746,8062,7408,100
262840563,10492068511,6347,764,647
263199096,10129123402,2867,72,50
947573616,9338261326,8228,19239,4512


#### Write to disk

In [10]:
output_name = f'focused-{uuid}-{min_tbars}tbars-{min_psds}psds-{min_body_size / 1e6:.1f}Mv'

# As npy:
np.save(f'{output_name}.npy', focused_table.to_records(index=True))

# OR as CSV:
#focused_table.to_csv(f'{output_name}.npy', index=True, header=True)

#### See docs for more details:

In [11]:
print(pydoc.render_doc(compute_focused_bodies))

Python Library Documentation: function compute_focused_bodies in module neuclease.focused.ingest

ccoommppuuttee__ffooccuusseedd__bbooddiieess(server, uuid, instance, synapse_samples, min_tbars, min_psds, root_sv_sizes, min_body_size, sv_classifications=None, marked_bad_bodies=None, return_table=False)
    Compute the complete set of focused bodies, based on criteria for
    number of tbars, psds, or overall size, and excluding explicitly
    listed bad bodies.
    
    This function takes ~20 minutes to run on hemibrain inputs, with a ton of RAM.
    
    The procedure is:
    
    1. Apply synapse-based criteria
      a. Load synapse CSV file
      b. Map synapse SVs -> bodies (if needed)
      c. Calculate synapses (tbars, psds) per body
      d. Initialize set with bodies that have enough synapses
    
    2. Apply size-based criteria
      a. Calculate body sizes (based on supervoxel sizes and current mapping)
      b. Add "big" bodies to the set
    
    3. 