Support for neuroglancer multi-resolution sharded meshes #340

perlman · 2020-05-07T14:09:56Z

This is an initial effort to read neuroglancer's multi-resolution sharded mesh format.

**TODO **

Request specific levels of detail
- ... and only request the relevant fragments over http
Convert from "stored model" space coordinates into correct model coordinates
Convert from "model space" into physical space
Concatenate meshes into a single mesh
Match function of other cloud-volume mesh functions
[~] Check for missing manifests
- (Via bulk prefetching of all manifests?)
- Added an exists method
General code cleanup...

** Other issues (that probably won't be addressed) **

read-only
- Eventual write support would be done via unsharded storage & a compaction stage
Only scale parameters in the homogenous transform are supported
Support for unsharded storage of multi-resolution meshes (much of the code would be similar, e.g. the mesh fragments are the same)
Fuse meshes? (not sure how important this is)

Sample code

import cloudvolume
test_ids = [ 710435991, 8392766771, 8393399736 ]
vol = cloudvolume.CloudVolume('https://storage.googleapis.com/fafb-ffn1-20190805/segmentation', mip=0, cache=False)
# Returns {'seg': Mesh}
for lod in reversed(range(4)):
   meshes = vol.mesh.get(test_ids, lod=lod, concat=True)
   for segment in meshes:
       mesh = meshes[segment]
       mesh.viewer()

…eption

william-silversmith · 2020-05-07T15:45:07Z

This is great Eric! Looking forward to seeing where this goes. :)

william-silversmith · 2020-05-07T15:45:28Z

Linked to issue #267

@jbms

Thanks to @jbms for the help & documentation.

perlman · 2020-05-09T16:53:57Z

This code can now read the sharded meshes from both FAFB-FFN1 & FlyEM Hemibrain.

@william-silversmith: I added a quick hack to get the manifest location back with its contents, and am currently creating a new SimpleStorage for retrieving the meshes. How would you suggest reworking this to be a bit cleaner?

@william-silversmith: What should the mesh.get() function return for lod meshes? (The current return value is a list of fragments, one per lod.)

Much thanks to @jbms for adding clarification on a few key points in the spec.

william-silversmith · 2020-05-09T17:42:00Z

[I] am currently creating a new SimpleStorage for retrieving the meshes. How would you suggest reworking this to be a bit cleaner?

I think this is fine for now. The multi-LOD spec departs from the ordinary sharded format so those departures would have to be stored somewhere... Perhaps a subclass of ShardReader would be slightly cleaner, but the performance cost of creating a new SimpleStorage is very low, the connection pool reuses connections. A ShardReader subclass would be able to avoid returning the offset from get_data, but it would also have to duplicate some code to maintain encapsulation.

The major reworking I'd be looking forward to would be threaded retrieval. I've been working on making that possible for ShardReader.exists, and the techniques can be ported to ShardReader.get_data. Once that happens, we can dramatically increase the performance here.

What should the mesh.get() function return for lod meshes? (The current return value is a list of fragments, one per lod.)

My feeling is that retrieving all the LODs all the time is kind of a waste of the benefits of the format. What about adding a get(segids, lod=scalar or list) flag? Similarly, for fusing, you can fuse the meshes retrieved within a given LOD. If you wanted to get a bit more sophisticated, perhaps we could have a way of retrieving meshes with respect to a point such that it cobbles together a cohesive view of the final mesh based on distance to that point similarly to neuroglancer.

Thanks for your excellent work! I like your code too Eric. <3 :) I'll give it a try a bit later today.

perlman · 2020-05-09T21:02:59Z

A ShardReader subclass would be able to avoid returning the offset from get_data, but it would also have to duplicate some code to maintain encapsulation.

Makes sense. Maybe we can put half of that work into a private method in SharedReader, which can then be called from the subclass? (The index lookup is an example of code I wouldn't want to be replicated.)

My feeling is that retrieving all the LODs all the time is kind of a waste of the benefits of the format. What about adding a get(segids, lod=scalar or list) flag?

Done (and each LOD is downloaded as it's own range request). Default is still all meshes defined in the mesh manifest.

Similarly, for fusing, you can fuse the meshes retrieved within a given LOD.

Next step. :)

If you wanted to get a bit more sophisticated, perhaps we could have a way of retrieving meshes with respect to a point such that it cobbles together a cohesive view of the final mesh based on distance to that point similarly to neuroglancer.

I'll leave this to someone else. ;-) I'd be content with a function to do mesh selection by resolution (e.g., I want the best mesh for 32 x 32 z 40).

cloudvolume/datasource/precomputed/mesh/sharded.py

william-silversmith · 2020-05-09T23:33:58Z

Maybe we can put half of that work into a private method in SharedReader, which can then be called from the subclass?

Sure! If you're up for it, give it a shot. If it gets awkward though, don't worry too much about the duplication of logic. I like to follow the "rule of three" where I'll sometimes let some code be duplicated until the third use so the commonalities become more clear. Just do whatever looks most elegant to you, which might include keeping the offset parameter.

If you wanted to get a bit more sophisticated, perhaps we could have a way of retrieving meshes with respect to a point such that it cobbles together a cohesive view of the final mesh based on distance to that point similarly to neuroglancer.

I'll leave this to someone else. ;-) I'd be content with a function to do mesh selection by resolution (e.g., I want the best mesh for 32 x 32 z 40).

That's fine, I was just spitballing. It's possible that feature is really only useful for visualization.

Thanks again for all the work you put into this Eric!

william-silversmith · 2020-05-19T15:13:03Z

Good news bad news time! Good news: I parallelized get_data so now it should be possible to dramatically speed this function up. Bad news: I suppose we need to re-integrate the return_offset flag.

cloudvolume/datasource/precomputed/mesh/__init__.py

william-silversmith · 2020-06-06T18:40:29Z

This is looking pretty good! We can always brush up the performance aspect later. Would you mind writing some test(s) for it? If you point me to a particular shard file, I can upload it to gs://seunglab-test/ so it can be forever tested.

It doesn't need to be anything too crazy, an integration test is sufficient.

perlman · 2020-06-06T23:02:52Z

This is looking pretty good! We can always brush up the performance aspect later. Would you mind writing some test(s) for it? If you point me to a particular shard file, I can upload it to gs://seunglab-test/ so it can be forever tested.

It doesn't need to be anything too crazy, an integration test is sufficient.

This sounds like a good idea. I don't have have the ability to generate any data -- would it make sense just to pull one of the shard files from any of the public neuroglancer datasets? (e.g. hemibrain).

william-silversmith · 2020-06-06T23:13:00Z

Yea that's perfect! Just use a public dataset.

…

On Sat, Jun 6, 2020, 7:03 PM Eric Perlman ***@***.***> wrote: This is looking pretty good! We can always brush up the performance aspect later. Would you mind writing some test(s) for it? If you point me to a particular shard file, I can upload it to gs://seunglab-test/ so it can be forever tested. It doesn't need to be anything too crazy, an integration test is sufficient. This sounds like a good idea. I don't have have the ability to generate any data -- would it make sense just to pull one of the shard files from any of the public neuroglancer datasets? (e.g. hemibrain). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#340 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AATGQSL6KT5NKCIURYZP233RVLDKRANCNFSM4M3LRE5A> .

perlman · 2020-06-07T00:36:53Z

This now includes a minimal test case, with a reference to the files accessed. When I have chance I can try to pick some good & bad IDs that all hash to the same shard file.

william-silversmith · 2020-06-08T17:12:50Z

This is hella cool. Thank you so much for your work!!

jefferis · 2020-06-08T17:14:19Z

Hurrah! Thank you both very much (especially @perlman in this case!).

william-silversmith · 2020-06-08T17:20:35Z

I'm going to make a new release sometime today. :)

perlman · 2020-06-08T18:15:59Z

I am sure there will be follow up PRs to fix bugs & add other useful interfaces (like a bbox() defined by the oct-tree regions). I'll see what pops up when @jefferis and others use it. ;)

jefferis · 2020-06-13T12:40:32Z

@perlman @william-silversmith: a couple of queries for one or both of you after using this in earnest a bit.

In contrast (I think) to some other mesh formats, the get method cannot accept strings (only ints). Would it be hard to change this? I often find 64 bit int support fragile across programming languages (e.g. base R only has int32, so r to python calls doesn't support int64).
Again some of the other mesh sources I have used (flywire) have a save method. Here there is none possibly because of the issue of multi-lod meshes. Is it worth adding? Alternatively is there a general pattern for saving (e.g. in obj) that is guranteed to work across mesh formats?

fcollman · 2020-06-13T17:44:07Z

I'll add my two cents on point 2. I think once you have vertices and faces there are a lot of mesh libraries out there for saving meshes. I think of cloud-volume as the interface for getting/saving highly distributed files for cloud based connectomics. If you want to save versions of meshes as simple files, I think there are existing packages that cover a large range of file formats. Meshparty for example uses trimesh, which enables it to save in obj (or ply, or many others).

So in that case meshparty calls cloud-volume to get a mesh, uses that to instantiate a class that is derived from a trimesh base class. We can then use that to save meshes in a variety of formats, though as we develop more analytic layers on top of the mesh we have created an hdf based format to store these extra analytics. One immediate relevant example, is a place to store the extra edges that link disconnected portions of proofread meshes. Others would include skeleton representations and the mappings between the skeleton and the mesh.

jefferis · 2020-06-13T18:56:15Z

I'll add my two cents on point 2. I think once you have vertices and faces there are a lot of mesh libraries out there for saving meshes.

Thanks, Forrest. That seems absolutely reasonable.

It's just that I wrote some wrapper code to fetch/save meshes from FlyWire and I was imagining that it would just work when I switched to hemibrain as a source, but it doesn't because the interface that cloudvolume presents seems to vary according to the type of mesh source – and to me that was a bit surprising. So it would be good to know if there is a recommended pattern for fetch+save.

PS MeshParty is v nice, but I've found it a bit awkward to install, so for people in the group who just want to fetch and save, I'd rather have it as an optional dependency.

fcollman · 2020-06-13T20:48:27Z

agreed about the install process, we have work to do there. Mostly I think by making more dependancies optional and writing better cross-platform instructions.

regarding interfaces, I can see how the LOD dictionary is both necessary and confusing. I would agree with you that a more uniform looking interface is a better design ideal. For example, making the highest LOD the default, and only returning that by default, and allowing people to specify LOD, or specify that they actually want the full dictionary.

william-silversmith · 2020-06-13T23:34:55Z

@jefferis It looks like this was an oversight. I'll make sure to add save back in there. The implementation of save is very simple and worth understanding:

def save(self, segids, filepath=None, file_format='ply'):
    segids = toiter(segids)

    mesh = self.get(segids, fuse=True, remove_duplicate_vertices=True)

    if file_format == 'obj':
      data = mesh.to_obj()
    elif file_format == 'ply':
      data = mesh.to_ply()
    elif file_format == 'precomputed':
      data = mesh.to_precomputed()
    else:
      raise NotImplementedError('Only .obj, .ply, and precomputed are currently supported.')

    if not filepath:
      filepath = str(segids[0]) + "." + file_format
      if len(segids) > 1:
        filepath = "{}_{}.{}".format(segids[0], segids[-1], file_format)

    try:
      filepath.write(data)
    except AttributeError:
      with open(filepath, 'wb') as f:
        f.write(data)

As you can see, each Mesh object comes equipped with a way to convert it into an obj or ply byte string. You can then do what you will with it. I've often thought that the save function is somewhat superfluous and too tightly coupled with get, but it is convenient for someone without a deep understanding of CloudVolume.

perlman · 2020-06-13T23:37:09Z

In contrast (I think) to some other mesh formats, the get method cannot accept strings (only ints). Would it be hard to change this? I often find 64 bit int support fragile across programming languages (e.g. base R only has int32, so r to python calls doesn't support int64).

I'll let @william-silversmith chime in. uint64 are used in many functions, all of which would need to add the hook to convert. Would it make sense to keep a small wrapper, in python, with your R code for doing the string to uint64 conversion?

It's just that I wrote some wrapper code to fetch/save meshes from FlyWire and I was imagining that it would just work when I switched to hemibrain as a source, but it doesn't because the interface that cloudvolume presents seems to vary according to the type of mesh source – and to me that was a bit surprising. So it would be good to know if there is a recommended pattern for fetch+save.

https://github.com/seung-lab/cloud-volume/blob/master/cloudvolume/datasource/precomputed/mesh/multilod.py#L72
vs
https://github.com/seung-lab/cloud-volume/blob/master/cloudvolume/datasource/graphene/mesh/unsharded.py#L137

With the exception of fusing (not implemented yet), the return "should" be the same{ segid: Mesh }.

Please send me a snippit of how you are accessing the meshes and I'll see what I can do. (It's quite likely I broke something in my continual tweaking.)

Alternatively is there a general pattern for saving (e.g. in obj) that is guranteed to work across mesh formats?

I am shoving the data into a standard cloud-volume Mesh object, which ought to behave the same as the others... Maybe there is an issue with the lack of normals?

regarding interfaces, I can see how the LOD dictionary is both necessary and confusing. I would agree with you that a more uniform looking interface is a better design ideal. For example, making the highest LOD the default, and only returning that by default, and allowing people to specify LOD, or specify that they actually want the full dictionary.

@fcollman: Using lod=0 by default is what I converged on with the 'final' version, with no explicit support for multiple lods. If someone has a use case for getting all LODs I'd implement a separate function and use a more efficient range request to fetch them all at once.

perlman · 2020-06-13T23:47:26Z

@jefferis It looks like this was an oversight. I'll make sure to add save back in there. The implementation of save is very simple and worth understanding

Aha! This function was never on my radar as something to implement.

Should it come for free in a base class?

It looks like I failed at having UnshardedLegacyPrecomputedMeshSource extend any existing class...

william-silversmith · 2020-06-14T01:32:59Z

I've been using UnshardedLegacyPrecomputedMeshSource as a base class in a few places now, but the more places we use it the more worried I become about the inheritance structure becoming brittle and hard to reason about. Adding it as a base class will fix this case though.

perlman · 2020-06-14T12:58:44Z

I've been using UnshardedLegacyPrecomputedMeshSource as a base class in a few places now, but the more places we use it the more worried I become about the inheritance structure becoming brittle and hard to reason about. Adding it as a base class will fix this case though.

It sounds like a BaseMeshSource would be useful? I'm not entirely sure which functions should be defined there, but it would go a long way towards having a standard interface for mesh access.

william-silversmith · 2020-06-14T13:08:15Z

That sounds about right. At the very least the save function should be defined in the base class.

…

On Sun, Jun 14, 2020, 8:58 AM Eric Perlman ***@***.***> wrote: I've been using UnshardedLegacyPrecomputedMeshSource as a base class in a few places now, but the more places we use it the more worried I become about the inheritance structure becoming brittle and hard to reason about. Adding it as a base class will fix this case though. It sounds like a BaseMeshSource would be useful? I'm not entirely sure which functions should be defined there, but it would go a long way towards having a standard interface for mesh access. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#340 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AATGQSJL5LJ3HBR6YPBQ7RDRWTCQ7ANCNFSM4M3LRE5A> .

jefferis · 2020-06-14T16:40:52Z

thanks everyone for additional info and @william-silversmith thanks for the save example.

Just a note that for my own libraries, for user facing code equivalent to this:

def save(self, segids, filepath=None, file_format='ply'):
    segids = toiter(segids)

I have a function that might replace toiter, which converts reasonable inputs (including string and numeric) to int64 ids with some basic validation checks. But my libraries are mostly intended for end users whereas I guess CloudVolume is a mix of developer/end user targeting.

william-silversmith and others added 7 commits May 7, 2020 07:47

fix(ShardReader.exists): missing files raising unhandled EmptyFileExc…

3306329

…eption

Read the multi-res manifest.

b282670

Add property access to manifest.

ba231bb

Read shard data.

9dda482

Read individual fragments (but not right yet).

cf410e3

Working meshes (wrong coords).

c358725

Version 1.14.1

a544333

william-silversmith added the feature A new capability is born. label May 7, 2020

william-silversmith self-requested a review May 7, 2020 14:34

william-silversmith added this to the Precomputed Sharded Format milestone May 7, 2020

perlman mentioned this pull request May 7, 2020

Downloading Janelia FlyEM Hemibrain Meshes #337

Closed

perlman added 4 commits May 7, 2020 23:59

First attempt at making sense of the mesh coordinates.

dbb07dd

Don't use homogeneous transform.

d92f4bd

Fixed parsing of meshes.

40cc214

Thanks to @jbms for the help & documentation.

Merge branch 'master' into sharded_mesh

c1fe938

william-silversmith assigned perlman May 9, 2020

Allow download of individiual LODs.

a8de412

william-silversmith reviewed May 9, 2020

View reviewed changes

perlman added 5 commits May 10, 2020 13:19

Change return type. Better errors.

be4404d

Change return to { lod: { segid: { Mesh, ... } } }

a3becd0

Merge branch 'master' into sharded_mesh

113ab0d

Add exists() function.

653a8aa

Concat mesh fragments.

bc0f005

perlman added 2 commits June 4, 2020 19:31

Get rid of call to self.reader.get_filename.

6c4d39f

No need for extra shard_file_offset.

c03122c

perlman changed the title ~~WIP: Support for neuroglancer multi-resolution sharded meshes~~ Support for neuroglancer multi-resolution sharded meshes Jun 4, 2020

william-silversmith reviewed Jun 5, 2020

View reviewed changes

cloudvolume/datasource/precomputed/mesh/__init__.py Outdated Show resolved Hide resolved

perlman added 2 commits June 6, 2020 16:23

Merge remote-tracking branch 'upstream/master' into sharded_mesh

994f164

Placeholder for future UnshardedMultiLevelPrecomputedMeshSource.

c52d423

perlman added 2 commits June 6, 2020 20:35

Throw exception if manifest not found.

7608562

Simple test for precompited multilod.

51065f1

perlman force-pushed the sharded_mesh branch from 4f21a26 to 51065f1 Compare June 7, 2020 00:35

william-silversmith merged commit dc2b8f4 into seung-lab:master Jun 8, 2020

jefferis mentioned this pull request Jul 6, 2020

support for fetching hemibrain dataset natverse/fafbseg#18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for neuroglancer multi-resolution sharded meshes #340

Support for neuroglancer multi-resolution sharded meshes #340

perlman commented May 7, 2020 •

edited

Loading

william-silversmith commented May 7, 2020

william-silversmith commented May 7, 2020

perlman commented May 9, 2020 •

edited

Loading

william-silversmith commented May 9, 2020

perlman commented May 9, 2020

william-silversmith commented May 9, 2020

william-silversmith commented May 19, 2020

william-silversmith commented Jun 6, 2020

perlman commented Jun 6, 2020

william-silversmith commented Jun 6, 2020 via email

perlman commented Jun 7, 2020

william-silversmith commented Jun 8, 2020

jefferis commented Jun 8, 2020

william-silversmith commented Jun 8, 2020

perlman commented Jun 8, 2020 •

edited

Loading

jefferis commented Jun 13, 2020

fcollman commented Jun 13, 2020

jefferis commented Jun 13, 2020 •

edited

Loading

fcollman commented Jun 13, 2020

william-silversmith commented Jun 13, 2020 •

edited

Loading

perlman commented Jun 13, 2020 •

edited

Loading

perlman commented Jun 13, 2020 •

edited

Loading

william-silversmith commented Jun 14, 2020

perlman commented Jun 14, 2020

william-silversmith commented Jun 14, 2020 via email

jefferis commented Jun 14, 2020

Support for neuroglancer multi-resolution sharded meshes #340

Support for neuroglancer multi-resolution sharded meshes #340

Conversation

perlman commented May 7, 2020 • edited Loading

william-silversmith commented May 7, 2020

william-silversmith commented May 7, 2020

perlman commented May 9, 2020 • edited Loading

william-silversmith commented May 9, 2020

perlman commented May 9, 2020

william-silversmith commented May 9, 2020

william-silversmith commented May 19, 2020

william-silversmith commented Jun 6, 2020

perlman commented Jun 6, 2020

william-silversmith commented Jun 6, 2020 via email

perlman commented Jun 7, 2020

william-silversmith commented Jun 8, 2020

jefferis commented Jun 8, 2020

william-silversmith commented Jun 8, 2020

perlman commented Jun 8, 2020 • edited Loading

jefferis commented Jun 13, 2020

fcollman commented Jun 13, 2020

jefferis commented Jun 13, 2020 • edited Loading

fcollman commented Jun 13, 2020

william-silversmith commented Jun 13, 2020 • edited Loading

perlman commented Jun 13, 2020 • edited Loading

perlman commented Jun 13, 2020 • edited Loading

william-silversmith commented Jun 14, 2020

perlman commented Jun 14, 2020

william-silversmith commented Jun 14, 2020 via email

jefferis commented Jun 14, 2020

perlman commented May 7, 2020 •

edited

Loading

perlman commented May 9, 2020 •

edited

Loading

perlman commented Jun 8, 2020 •

edited

Loading

jefferis commented Jun 13, 2020 •

edited

Loading

william-silversmith commented Jun 13, 2020 •

edited

Loading

perlman commented Jun 13, 2020 •

edited

Loading

perlman commented Jun 13, 2020 •

edited

Loading