in-memory map should include "identity" entries #323

stuarteberg · 2019-07-10T20:27:44Z

Not ALL identities, just identities for supervoxels in bodies that contain more than one supervoxel.

See also: #297

stuarteberg · 2019-07-10T20:31:59Z

Here's a motivating use-case:

Given a list of, say, 300,000 bodies, what supervoxels do they contain? If the in-memory map contains "identity" entries for all bodies that contain two or more supervoxels, then the /mappings endpoint contains everything I need to find the answer.

Without identity entries, I must resort to one of the following options:

Read all 300k labelindexes for those bodies
Scan all blocks of every index to find the set of supervoxels referenced in them.

OR

Start by assuming that that all bodies are missing an identity row, and add it to the mapping.
Read the entire kafka log.
Remove any of the newly added rows that reference a "retired" supervoxel ID (due to a split).
Also determine which of the body IDs were introduced via a cleave operation, in which case there is no corresponding supervoxel ID. Remove those rows, too.

Right now, I'm using the second method, implemented here. But it's slow, complicated, and it depends on kafka.

stuarteberg · 2019-07-10T20:46:05Z

FWIW, the current in-memory mapping for the hemibrain contains 34M rows. If you include rows for identities (as requested here) and retired supervoxels (as specified in #297), then the mapping expands to 43M rows. (So, adding the extra rows I want incurs a 25% overhead, assuming hemibrain-like segmentation characteristics.)

In [0]: from neuclease.dvid import *

In [1]: master_seg
Out[1]: ('emdata4:8900', '0b0b540131954edaac208c93b0355629', 'segmentation')

In [2]: master_seg = ('emdata4:8900', '0b0b540131954edaac208c93b0355629', 'segmentation')

In [3]: mapping = fetch_mappings(*master_seg)
[2019-07-10 16:34:39,521] INFO Fetching http://emdata4:8900/api/node/0b0b540131954edaac208c93b0355629/segmentation/mappings?format=binary...
[2019-07-10 16:35:42,564] INFO Fetching http://emdata4:8900/api/node/0b0b540131954edaac208c93b0355629/segmentation/mappings?format=binary took 0:01:03.043196

In [4]: complete_mapping = fetch_complete_mappings(*master_seg)
[2019-07-10 16:36:11,569] INFO Reading kafka messages for flatteneddvidrepo-28841c8277e044a7b187dda03e18da13-data-026ee697756443529a314ae15e7c6364 from ['kafka.int.janelia.org:9092', 'kafka2.int.janelia.org:9092', 'kafka3.int.janelia.org:9092']
[2019-07-10 16:37:05,144] INFO Reading 1054934 kafka messages took 53.574223041534424 seconds
[2019-07-10 16:37:12,512] INFO Fetching http://emdata4:8900/api/node/0b0b540131954edaac208c93b0355629/segmentation/mappings?format=binary...
[2019-07-10 16:38:13,794] INFO Fetching http://emdata4:8900/api/node/0b0b540131954edaac208c93b0355629/segmentation/mappings?format=binary took 0:01:01.282041
[2019-07-10 16:38:13,794] INFO Constructing missing identity-mappings...
[2019-07-10 16:38:48,129] INFO Constructing missing identity-mappings took 0:00:34.334269

In [5]: mapping.shape
Out[4]: (33728671,)

In [6]: complete_mapping.shape
Out[5]: (42628597,)

In [7]: pd.unique(mapping.values).shape
Out[7]: (8973324,)

stuarteberg · 2020-05-20T01:59:29Z

Perhaps a more succinct way of stating what the in-memory map should hold is the following:

It should contain an entry for every supervoxel in the volume EXCEPT for supervoxels which:

are the only supervoxel in their body
AND
have the same ID as their body
AND
have never been involved in any mutation

DocSavage · 2022-06-03T03:05:18Z

Obviated by resolution of issue #361

stuarteberg added the enhancement label Jul 10, 2019

stuarteberg mentioned this issue Jan 20, 2020

Denormalizations for supervoxels #343

Open

stuarteberg added this to the VNC-start milestone May 20, 2020

stuarteberg mentioned this issue Feb 2, 2022

Mapping updates in response to certain mutations #361

Closed

DocSavage added the urgent label Feb 2, 2022

DocSavage self-assigned this Apr 20, 2022

DocSavage removed this from the VNC-start milestone May 7, 2022

DocSavage closed this as completed Jun 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in-memory map should include "identity" entries #323

in-memory map should include "identity" entries #323

stuarteberg commented Jul 10, 2019 •

edited

stuarteberg commented Jul 10, 2019 •

edited

stuarteberg commented Jul 10, 2019 •

edited

stuarteberg commented May 20, 2020 •

edited

DocSavage commented Jun 3, 2022

in-memory map should include "identity" entries #323

in-memory map should include "identity" entries #323

Comments

stuarteberg commented Jul 10, 2019 • edited

stuarteberg commented Jul 10, 2019 • edited

stuarteberg commented Jul 10, 2019 • edited

stuarteberg commented May 20, 2020 • edited

DocSavage commented Jun 3, 2022

stuarteberg commented Jul 10, 2019 •

edited

stuarteberg commented Jul 10, 2019 •

edited

stuarteberg commented Jul 10, 2019 •

edited

stuarteberg commented May 20, 2020 •

edited