-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for neuroglancer multi-resolution sharded meshes #340
Conversation
This is great Eric! Looking forward to seeing where this goes. :) |
Linked to issue #267 |
Thanks to @jbms for the help & documentation.
This code can now read the sharded meshes from both FAFB-FFN1 & FlyEM Hemibrain. @william-silversmith: I added a quick hack to get the manifest location back with its contents, and am currently creating a new SimpleStorage for retrieving the meshes. How would you suggest reworking this to be a bit cleaner? @william-silversmith: What should the mesh.get() function return for lod meshes? (The current return value is a list of fragments, one per lod.) Much thanks to @jbms for adding clarification on a few key points in the spec. |
I think this is fine for now. The multi-LOD spec departs from the ordinary sharded format so those departures would have to be stored somewhere... Perhaps a subclass of ShardReader would be slightly cleaner, but the performance cost of creating a new SimpleStorage is very low, the connection pool reuses connections. A ShardReader subclass would be able to avoid returning the offset from The major reworking I'd be looking forward to would be threaded retrieval. I've been working on making that possible for ShardReader.exists, and the techniques can be ported to ShardReader.get_data. Once that happens, we can dramatically increase the performance here.
My feeling is that retrieving all the LODs all the time is kind of a waste of the benefits of the format. What about adding a Thanks for your excellent work! I like your code too Eric. <3 :) I'll give it a try a bit later today. |
Makes sense. Maybe we can put half of that work into a private method in SharedReader, which can then be called from the subclass? (The index lookup is an example of code I wouldn't want to be replicated.)
Done (and each LOD is downloaded as it's own range request). Default is still all meshes defined in the mesh manifest.
Next step. :)
I'll leave this to someone else. ;-) I'd be content with a function to do mesh selection by resolution (e.g., I want the best mesh for 32 x 32 z 40). |
Sure! If you're up for it, give it a shot. If it gets awkward though, don't worry too much about the duplication of logic. I like to follow the "rule of three" where I'll sometimes let some code be duplicated until the third use so the commonalities become more clear. Just do whatever looks most elegant to you, which might include keeping the offset parameter.
That's fine, I was just spitballing. It's possible that feature is really only useful for visualization. Thanks again for all the work you put into this Eric! |
Good news bad news time! Good news: I parallelized get_data so now it should be possible to dramatically speed this function up. Bad news: I suppose we need to re-integrate the return_offset flag. |
This is looking pretty good! We can always brush up the performance aspect later. Would you mind writing some test(s) for it? If you point me to a particular shard file, I can upload it to gs://seunglab-test/ so it can be forever tested. It doesn't need to be anything too crazy, an integration test is sufficient. |
This sounds like a good idea. I don't have have the ability to generate any data -- would it make sense just to pull one of the shard files from any of the public neuroglancer datasets? (e.g. hemibrain). |
Yea that's perfect! Just use a public dataset.
…On Sat, Jun 6, 2020, 7:03 PM Eric Perlman ***@***.***> wrote:
This is looking pretty good! We can always brush up the performance aspect
later. Would you mind writing some test(s) for it? If you point me to a
particular shard file, I can upload it to gs://seunglab-test/ so it can be
forever tested.
It doesn't need to be anything too crazy, an integration test is
sufficient.
This sounds like a good idea. I don't have have the ability to generate
any data -- would it make sense just to pull one of the shard files from
any of the public neuroglancer datasets? (e.g. hemibrain).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#340 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATGQSL6KT5NKCIURYZP233RVLDKRANCNFSM4M3LRE5A>
.
|
This now includes a minimal test case, with a reference to the files accessed. When I have chance I can try to pick some good & bad IDs that all hash to the same shard file. |
This is hella cool. Thank you so much for your work!! |
Hurrah! Thank you both very much (especially @perlman in this case!). |
I'm going to make a new release sometime today. :) |
I am sure there will be follow up PRs to fix bugs & add other useful interfaces (like a bbox() defined by the oct-tree regions). I'll see what pops up when @jefferis and others use it. ;) |
@perlman @william-silversmith: a couple of queries for one or both of you after using this in earnest a bit.
|
I'll add my two cents on point 2. I think once you have vertices and faces there are a lot of mesh libraries out there for saving meshes. I think of cloud-volume as the interface for getting/saving highly distributed files for cloud based connectomics. If you want to save versions of meshes as simple files, I think there are existing packages that cover a large range of file formats. Meshparty for example uses trimesh, which enables it to save in obj (or ply, or many others). So in that case meshparty calls cloud-volume to get a mesh, uses that to instantiate a class that is derived from a trimesh base class. We can then use that to save meshes in a variety of formats, though as we develop more analytic layers on top of the mesh we have created an hdf based format to store these extra analytics. One immediate relevant example, is a place to store the extra edges that link disconnected portions of proofread meshes. Others would include skeleton representations and the mappings between the skeleton and the mesh. |
Thanks, Forrest. That seems absolutely reasonable. It's just that I wrote some wrapper code to fetch/save meshes from FlyWire and I was imagining that it would just work when I switched to hemibrain as a source, but it doesn't because the interface that cloudvolume presents seems to vary according to the type of mesh source – and to me that was a bit surprising. So it would be good to know if there is a recommended pattern for fetch+save. PS MeshParty is v nice, but I've found it a bit awkward to install, so for people in the group who just want to fetch and save, I'd rather have it as an optional dependency. |
agreed about the install process, we have work to do there. Mostly I think by making more dependancies optional and writing better cross-platform instructions. regarding interfaces, I can see how the LOD dictionary is both necessary and confusing. I would agree with you that a more uniform looking interface is a better design ideal. For example, making the highest LOD the default, and only returning that by default, and allowing people to specify LOD, or specify that they actually want the full dictionary. |
@jefferis It looks like this was an oversight. I'll make sure to add save back in there. The implementation of save is very simple and worth understanding: def save(self, segids, filepath=None, file_format='ply'):
segids = toiter(segids)
mesh = self.get(segids, fuse=True, remove_duplicate_vertices=True)
if file_format == 'obj':
data = mesh.to_obj()
elif file_format == 'ply':
data = mesh.to_ply()
elif file_format == 'precomputed':
data = mesh.to_precomputed()
else:
raise NotImplementedError('Only .obj, .ply, and precomputed are currently supported.')
if not filepath:
filepath = str(segids[0]) + "." + file_format
if len(segids) > 1:
filepath = "{}_{}.{}".format(segids[0], segids[-1], file_format)
try:
filepath.write(data)
except AttributeError:
with open(filepath, 'wb') as f:
f.write(data) As you can see, each Mesh object comes equipped with a way to convert it into an obj or ply byte string. You can then do what you will with it. I've often thought that the |
I'll let @william-silversmith chime in. uint64 are used in many functions, all of which would need to add the hook to convert. Would it make sense to keep a small wrapper, in python, with your R code for doing the string to uint64 conversion?
https://github.com/seung-lab/cloud-volume/blob/master/cloudvolume/datasource/precomputed/mesh/multilod.py#L72 With the exception of fusing (not implemented yet), the return "should" be the same Please send me a snippit of how you are accessing the meshes and I'll see what I can do. (It's quite likely I broke something in my continual tweaking.)
I am shoving the data into a standard cloud-volume Mesh object, which ought to behave the same as the others... Maybe there is an issue with the lack of normals?
@fcollman: Using lod=0 by default is what I converged on with the 'final' version, with no explicit support for multiple lods. If someone has a use case for getting all LODs I'd implement a separate function and use a more efficient range request to fetch them all at once. |
Aha! This function was never on my radar as something to implement. Should it come for free in a base class? It looks like I failed at having UnshardedLegacyPrecomputedMeshSource extend any existing class... |
I've been using |
It sounds like a BaseMeshSource would be useful? I'm not entirely sure which functions should be defined there, but it would go a long way towards having a standard interface for mesh access. |
That sounds about right. At the very least the save function should be
defined in the base class.
…On Sun, Jun 14, 2020, 8:58 AM Eric Perlman ***@***.***> wrote:
I've been using UnshardedLegacyPrecomputedMeshSource as a base class in a
few places now, but the more places we use it the more worried I become
about the inheritance structure becoming brittle and hard to reason about.
Adding it as a base class will fix this case though.
It sounds like a BaseMeshSource would be useful? I'm not entirely sure
which functions should be defined there, but it would go a long way towards
having a standard interface for mesh access.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#340 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATGQSJL5LJ3HBR6YPBQ7RDRWTCQ7ANCNFSM4M3LRE5A>
.
|
thanks everyone for additional info and @william-silversmith thanks for the save example. Just a note that for my own libraries, for user facing code equivalent to this:
I have a function that might replace |
This is an initial effort to read neuroglancer's multi-resolution sharded mesh format.
**TODO **
** Other issues (that probably won't be addressed) **
Sample code