Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in-memory map should include "identity" entries #323

Closed
stuarteberg opened this issue Jul 10, 2019 · 4 comments
Closed

in-memory map should include "identity" entries #323

stuarteberg opened this issue Jul 10, 2019 · 4 comments
Assignees

Comments

@stuarteberg
Copy link
Member

stuarteberg commented Jul 10, 2019

Not ALL identities, just identities for supervoxels in bodies that contain more than one supervoxel.

See also: #297

@stuarteberg
Copy link
Member Author

stuarteberg commented Jul 10, 2019

Here's a motivating use-case:

Given a list of, say, 300,000 bodies, what supervoxels do they contain? If the in-memory map contains "identity" entries for all bodies that contain two or more supervoxels, then the /mappings endpoint contains everything I need to find the answer.

Without identity entries, I must resort to one of the following options:

  1. Read all 300k labelindexes for those bodies
  2. Scan all blocks of every index to find the set of supervoxels referenced in them.

OR

  1. Start by assuming that that all bodies are missing an identity row, and add it to the mapping.
  2. Read the entire kafka log.
  3. Remove any of the newly added rows that reference a "retired" supervoxel ID (due to a split).
  4. Also determine which of the body IDs were introduced via a cleave operation, in which case there is no corresponding supervoxel ID. Remove those rows, too.

Right now, I'm using the second method, implemented here. But it's slow, complicated, and it depends on kafka.

@stuarteberg
Copy link
Member Author

stuarteberg commented Jul 10, 2019

FWIW, the current in-memory mapping for the hemibrain contains 34M rows. If you include rows for identities (as requested here) and retired supervoxels (as specified in #297), then the mapping expands to 43M rows. (So, adding the extra rows I want incurs a 25% overhead, assuming hemibrain-like segmentation characteristics.)

In [0]: from neuclease.dvid import *

In [1]: master_seg
Out[1]: ('emdata4:8900', '0b0b540131954edaac208c93b0355629', 'segmentation')

In [2]: master_seg = ('emdata4:8900', '0b0b540131954edaac208c93b0355629', 'segmentation')

In [3]: mapping = fetch_mappings(*master_seg)
[2019-07-10 16:34:39,521] INFO Fetching http://emdata4:8900/api/node/0b0b540131954edaac208c93b0355629/segmentation/mappings?format=binary...
[2019-07-10 16:35:42,564] INFO Fetching http://emdata4:8900/api/node/0b0b540131954edaac208c93b0355629/segmentation/mappings?format=binary took 0:01:03.043196

In [4]: complete_mapping = fetch_complete_mappings(*master_seg)
[2019-07-10 16:36:11,569] INFO Reading kafka messages for flatteneddvidrepo-28841c8277e044a7b187dda03e18da13-data-026ee697756443529a314ae15e7c6364 from ['kafka.int.janelia.org:9092', 'kafka2.int.janelia.org:9092', 'kafka3.int.janelia.org:9092']
[2019-07-10 16:37:05,144] INFO Reading 1054934 kafka messages took 53.574223041534424 seconds
[2019-07-10 16:37:12,512] INFO Fetching http://emdata4:8900/api/node/0b0b540131954edaac208c93b0355629/segmentation/mappings?format=binary...
[2019-07-10 16:38:13,794] INFO Fetching http://emdata4:8900/api/node/0b0b540131954edaac208c93b0355629/segmentation/mappings?format=binary took 0:01:01.282041
[2019-07-10 16:38:13,794] INFO Constructing missing identity-mappings...
[2019-07-10 16:38:48,129] INFO Constructing missing identity-mappings took 0:00:34.334269

In [5]: mapping.shape
Out[4]: (33728671,)

In [6]: complete_mapping.shape
Out[5]: (42628597,)

In [7]: pd.unique(mapping.values).shape
Out[7]: (8973324,)

@stuarteberg
Copy link
Member Author

stuarteberg commented May 20, 2020

Perhaps a more succinct way of stating what the in-memory map should hold is the following:

It should contain an entry for every supervoxel in the volume EXCEPT for supervoxels which:

  • are the only supervoxel in their body
    AND
  • have the same ID as their body
    AND
  • have never been involved in any mutation

@stuarteberg stuarteberg added this to the VNC-start milestone May 20, 2020
@DocSavage DocSavage self-assigned this Apr 20, 2022
@DocSavage DocSavage removed this from the VNC-start milestone May 7, 2022
@DocSavage
Copy link
Member

Obviated by resolution of issue #361

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants