-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large files issue #505
Comments
what chunk size did you use? this will control how many files you create. Also a code snippet would help debug what was going on in your info file definition in particular. |
Interesting, maybe I should try 1024 instead? I used
|
(7370/256)*(8768/256)*(1621/1) = 1,598,347 chunks.
do you want single z sections to be the chunks?
Right now you have ~1MB chunk files (256*256*16 bits)/(1024*1024 bits/MB)
You could probably get away with 5 MB chunks.
128x128x16 would be ~400K chunks.
…On Wed, Nov 3, 2021 at 6:00 PM manoaman ***@***.***> wrote:
Interesting, maybe I should try 1024 instead? I used chunk_size=[256,
256, 1] for chunking. The following is the code snippet for the info file
definition.
info = CloudVolume.create_new_info( # 'image' or 'segmentation'
# can pick any popular uint
# other options: 'jpeg', 'compressed_segmentation' (req. uint32 or uint64)
# X,Y,Z values in nanometers
# values X,Y,Z values in voxels
# rechunk of image X,Y,Z in voxels
# X,Y,Z size in voxels
num_channels=1,
layer_type='image',
data_type='uint16',
encoding='raw',
resolution=[4000, 4000, 4000],
voxel_offset=[0, 0, 0],
chunk_size=[256, 256, 1],
volume_size=[7370, 8768, 1621]
)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#505 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAF7ABL3LAOQTTMPYDAQGP3UKHSM3ANCNFSM5HKIBQVQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I overall think Forrest has a good suggestion, but I think bits and bytes might be mixed up. 512x512x1 chunks would give you a 4x reduction in the number of files and would be 512x512x1 x 2 bytes = 512 kiB each without compression. Chunking in Z can give better performance while scrolling though the initial upload is a little more complex to do. If you think you'd like to compact the number of files even further (it wasn't clear to me whether the file quote was per folder or for your whole account), after you upload you can use igneous to transfer to the sharded format using |
thank you @fcollman !! Initially with 256x256 chunk size, CloudVolume exited so it was quoted against a single folder. (I think the failure has to do with the storage's upper bound on how many files allowed to create.)
I intend to do the chunking in Z with Igneous. Can you tell me a little bet more on how Thank you all!! |
Glad we were able to help! The sharded format is a method for storing many chunks into a single file while still retaining random access to individual chunks. There's a slight performance penalty, but CloudVolume can read them just like the regular chunked format. You won't be able to write the sharded format easily without specialized knowledge except through Igneous (so no patching missing tiles). As an example of how to use igneous to generate the sharded version. The QUEUE variable is either an AWS sqs:// queue or a file folder that will be populated with queue files. You can read more here.
Make sure you have the latest igneous version as there was a bug fix in the last update. I tried to make sure that the shard generation takes a reasonable amount of RAM by sizing the files appropriately. The default uncompressed target size is 3.5GB each (could use up to 2x that, the generated shard will be smaller due to compression). You can see more options for the transfer with: One other thing to keep in mind is that downsampling sharded volumes generates only one additional level of heirarchy at a time. This can introduce a small integer truncation error per level. The regular down-sampling method avoids this issue for 5 mips at a time. This is because generating multiple sharded levels at a time would use unreasonable amounts of memory. You can read more about sharding here: https://github.com/seung-lab/cloud-volume/wiki/Sharding:-Reducing-Load-on-the-Filesystem |
Oh cool, I did not know igneous was available from pip install. I've given a try with
|
Hi m, The pip install / CLI version of igneous is newer so not everyone has learned about it yet. I'm glad you find it convenient! Can you provide a more complete command? It's a little hard to debug without seeing the path that triggered the error. |
Ops, sorry about that. I've given several tries after attempting the protocol and the format warnings and this is what I'm seeing so far. The cli command is something as the following:
source_dir contains the chunked files in Z.
Trying to follow this rule.
|
You can write simply:
|
Okay. I tried different combinations of FORMAT and PROTOCAL prefixes, and also without the prefixes. It turns out, I had to explicitly specify them. The following command seemed to run okay.
Waiting on the
|
That is fantastic! Just FYI, you can monitor queue progress with the command ptq = Python Task Queue |
The process still seems to be running and here is what I see from
|
It's done! It doesn't automatically exit.
…On Sat, Nov 6, 2021, 9:15 PM manoaman ***@***.***> wrote:
The process still seems to be running and here is what I see from ptq
status .... I'll try and give some time to check back later. Looks
completed from the status?
Inserted: 140
Enqueued: 0 (0.0% left)
Completed: 140 (100.0%)
Leased: 0 (--%) of queue
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#505 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATGQSJANLW6W767PWDAJSDUKXONPANCNFSM5HKIBQVQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Hi Will, I tested out this morning on Neuroglancer and the sharded file formats load great. I do see the size reduction in number of files generated and the total size the folder takes up on the storage. (144GB to 106G, 413193 files to 141 files) This is nice.
One thing I realized is that the chunks (loading tiles) on Neuroglancer seem to gotten much smaller. Is this because I specified This is the python script where I generate the precomputed chunks on z.
And after from igneous cli.
|
Hi m, Yep, the chunk size parameter is what is controlling the size of the tiles. Unfortunately, you'll need to generate a new shard layer. The existing one can't be modified in place. You can do this either from the original tiles or from the existing shards as a source. |
Okay. Let me try increasing the chunk size to 512,512,64 to see the change. It seems to be taking longer this time so I would have to see how the results come out. 😁 This is a bet off topic. Before getting the image stack to CloudVolume/Igneous, I had tough time splitting multi-page tiff about 245GB in size. It took me about 900GB~1TB memory to allocate on the high performance computing node, and use ImageMagick for splitting. (Anything smaller in memory size resulted in "out of memory" errors.) Is this typical with handling large files before even getting to the chunking stage? |
Those are pretty big chunks (33 MB) so your neuroglancer loading may become pretty slow. Might I suggest something closer to 256 x 256 x 16 (2 MB), 256 x 256 x 32 (4 MB), or 512 x 512 x 16 (8 MB)? I'll admit I haven't worked with very large single TIFF files myself, usually the files are split into single image slices. However, if you find a good TIFF library or use the right features from it, it should be very possible to work with it slice by slice instead of reading the whole thing into memory at once. You might have some luck perusing the documentation for the tifffile python package. If the images are not compressed, it seems you can read them as a memory mapped file: cgohlke/tifffile#52 If the package doesn't have what you need, it also links in the documentation to a number of other scientific TIFF packages that might have what you want.. |
Agh, thank you for reminding me about that. I should have considered the size of the chunks... From your experience, do you suggest each chunk to be somewhere 2MB ~ 8MB in size? (In fact, I was getting an error with 512,512,64 so I ended up using 512,512,16 instead.) It does seem like reading slice by slice would be a better approach to tackle large files. More z-stacks would give more challenges for sure. 😱 Thank you again for suggesting the tifffile package! |
I think it depends a lot on the expected storage technology and internet connection. I think somewhere around 500 KiB to 2 MiB is a good range if you have gigabit. To cover an XY plane, you'll need to download at least dozens of chunks, so fully consuming your bandwidth with a few chunks isn't ideal. You can go higher, it's just the latency will become more noticeable. It's also important not to go too thick as that will really increase latency somewhat uselessly. If you push that too far, Neuroglancer will limit the number of chunks downloaded because too much memory will be used by non-visible depth. Everything is chunking from the bottom of computer architecture to high minded stuff like petascale volumes. 😁 Hope the package is helpful! |
Hi Will, Sorry to bother you again with the continuous questions. If the shard format is going to be used in the first place, would it still matter what chunk sizes I specify in the pre-chunking stages with CloudVolume and Igneous (create_transfer_tasks)? Will Thanks! |
The final transfer command that creates the shards will also use whatever chunk size you specify. The previous chunk sizes are irrelevant, so you should pick them to be convenient for the initial uploading. |
Hi @william-silversmith , It's been awhile but I should have asked this question to begin with. What is the largest file size a 3d volumetric image can take before processing in CloudVolume for practical Neuroglancer viewing? What I mean by "practical" here is, chunks are fully loaded in the browser without hitting the RAM limit. (No black tiles in the display.) In this example, 3d volumetric image (TIFF) was 245GB in size. Is that too large to begin with? |
I see Jeremy answered your question in the linked discussion and I agree with him. Make sure to downsample your volume after uploading the initial set of tiles (pick a chunk size like 128x128x64). If you are still having problems visualizing the data, run downsampling again using the top mip level that was generated in the last step. This will build an even taller image pyramid. Once the pyramid is sufficiently tall, you will have no problems at all. |
Hi @william-silversmith , what do you mean by run downsampling again in CloudVolume/Igneous terms? Are you referring to DownSample tasks? https://github.com/seung-lab/igneous#downsampling-downsampletask. Will DownSample task work after rechunking (https://github.com/seung-lab/igneous#data-transfer--rechunking-transfertask)? So in the actual code, would it be...something like this?
Thank you Will, |
Hi m, I think you will find it easier to use the Igneous command line interface if you can. The transfer tasks will automatically create a few levels of downsamples, so if you run downsampling from mip 0 again, you probably won't see much improvement. You'll probably also enjoy using FileQueue more as you can stop and restart jobs without starting again from the beginning. With XY dimension chunk size 128 and a task size of 1024, you should expect three downsamples to be generated. Try this: igneous image xfer SRC DEST --mip 0 --chunk-size 128,128,64 --shape 1024,1024,64 --queue ./queue
igneous -p 8 execute -x ./queue
igneous image downsample SRC --mip 3 --num-mips 4 --queue ./queue
igneous -p 8 execute -x ./queue |
Hi @william-silversmith , Okay, I've tried testing with 3D image volume (7332 x 10131 x 3900; tiff stacks, 329GB in total size) and I don't know if I succeeded at a downsampling stage. The files sizes don't seem to change in the destination folder before and after the downsample. I see an error running a CLI command so I could be designing the chunk or shape sizes incorrectly... Would you be able to advise what am I doing wrong? Here are the steps I took.
Configured parameters in a CloudVolume script:
Output files/folders in the destination folder:
CLI:
Output files/folders in the destination folder:
CLI:
Output files/folders in the destination folder:
Error message: Quite a few cloudvolume.exceptions.EmptyVolumeException printed on the terminal due to missing chunks.
|
Hi m,
The empty volume error appears if your image does not completely fill the
space or if you're pointed at an incorrect location.
I noticed you reset the chunk size on the command line after you set it in
the info file which may not be what you want.
You can try using the --fill-missing flag which will write zeroed data
instead of throwing an exception.
For the transfer step, you can also try using --sharded to reduce the
number of files written dramatically though no downsamples will generate
from that step (so it all will need to be done via the downsample command).
…On Thu, Jun 30, 2022, 5:10 PM manoaman ***@***.***> wrote:
Hi Will,
Okay, I've tried testing with 3D image volume (7332 x 10131 x 3900; tiff
stacks, 329GB in total size) and I don't know if I succeeded at a
downsampling stage. The files sizes don't seem to change in the destination
folder before and after the downsample. I see an error running a CLI
command so I could be designing the chunk or shape sizes incorrectly...
Would you be able to advise what am I doing wrong?
Here are the steps I took.
------------------------------
1. Run CloudVolume to chunk XY dimension. (I chose 1024,1024,1 so that
I won't face I/O error on creating too many files. It seems that 1,000,000
files is the storage limit. Maybe configurable on the storage to increase
this limit to allow smaller chunks?)
*Configured parameters in a CloudVolume script:*
chunk_size=[1024, 1024, 1],
volume_size=[7332, 10131, 3900],
*Output files/folders in the destination folder:*
$ ls
1800_1800_2000 info progress provenance
------------------------------
1. Next, rechunked on XYZ with Igneous CLI.
*CLI:*
$ igneous image xfer SRC DEST --mip 0 --chunk-size 128,128,64 --shape 1024,1024,64 --queue ./queue
$ igneous -p 36 execute -x ./queue
*Output files/folders in the destination folder:*
$ du -sh ./
35M ./14400_14400_2000
2.9G ./1800_1800_2000
569M ./3600_3600_2000
138M ./7200_7200_2000
24K ./info
24K ./provenance
------------------------------
1. Lastly, the failing step. Downsample.
*CLI:*
$ igneous image downsample SRC --mip 3 --num-mips 4 --queue ./queue
$ igneous -p 36 execute -x ./queue
*Output files/folders in the destination folder:*
$ du -sh ./
35M ./14400_14400_2000
2.9G ./1800_1800_2000
569M ./3600_3600_2000
138M ./7200_7200_2000
24K ./info
24K ./provenance
*Error message:*
Quite a few cloudvolume.exceptions.EmptyVolumeException printed on the
terminal due to missing chunks.
ERROR FunctionTask(('igneous.tasks.image.image', 'DownsampleTask'),[],{'layer_path': 'file:///folder_name', 'mip': 3, 'shape': [2048, 2048, 64], 'offset': [0, 0, 1088], 'axis': 'z', 'fill_missing': False, 'sparse': False, 'delete_black_uploads': False, 'background_color': 0, 'dest_path': None, 'compress': None, 'factor': [2, 2, 1]},"327db0a1-27d1-4394-a20e-05c4cb7a2cea") raised 14400_14400_2000/0-128_256-384_1088-1152
Traceback (most recent call last):
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/taskqueue.py", line 375, in poll
task.execute(*execute_args, **execute_kwargs)
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/queueablefns.py", line 78, in execute
self(*args, **kwargs)
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/taskqueue/queueablefns.py", line 87, in __call__
return self.tofunc()()
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/igneous/tasks/image/image.py", line 467, in DownsampleTask
factor=factor,
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/igneous/tasks/image/image.py", line 426, in TransferTask
src_bbox, agglomerate=agglomerate, timestamp=timestamp
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/frontends/precomputed.py", line 709, in download
bbox.astype(np.int64), mip, parallel=parallel, renumber=bool(renumber)
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/__init__.py", line 183, in download
background_color=int(self.background_color),
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 281, in download
green=green, secrets=secrets, background_color=background_color
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 560, in download_chunks_threaded
green=green,
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/scheduler.py", line 104, in schedule_jobs
return schedule_threaded_jobs(fns, concurrency, progress, total)
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/scheduler.py", line 30, in schedule_threaded_jobs
tq.put(updatefn(fn))
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 257, in __exit__
self.wait(progress=self.with_progress)
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 227, in wait
self._check_errors()
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 191, in _check_errors
raise err
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 153, in _consume_queue
self._consume_queue_execution(fn)
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 180, in _consume_queue_execution
fn()
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/scheduler.py", line 23, in realupdatefn
res = fn()
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 528, in process
decode_fn, decompress
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 509, in download_chunk
background_color=background_color)
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 582, in decode
mip, background_color,
File "/homedir/.conda/envs/igneous/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/rx.py", line 629, in _decode_helper
raise EmptyVolumeException(input_bbox)
cloudvolume.exceptions.EmptyVolumeException: 14400_14400_2000/0-128_256-384_1088-1152
—
Reply to this email directly, view it on GitHub
<#505 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATGQSMOL3MD6FNZB6LNEW3VRYEMVANCNFSM5HKIBQVQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
For converting into shard format, the following command did not convert to .shard files. I still see .gz files in the folder. Execution of tasks looked okay without throwing any errors.
Obviously, I can't shard all the levels at once. How do I go about to convert each mip level to shard formats?
|
Hi m,
I think it would make sense to either use larger chunks or transfer using
the --sharded flag. This will circumvent your limitation.
You then generate all mip levels by downsampling as sharded one at a time
However, the upper levels have many fewer chunks so it may be convinient to
generate only the first mip as sharded and the upper levels as unsharded.
Will
…On Fri, Oct 7, 2022, 4:17 PM manoaman ***@***.***> wrote:
For converting into shard format, the following command did not convert to
.shard files. I still see .gz files in the folder. Execution of tasks
looked okay without throwing any errors.
igneous image downsample file:///nfs/precomputed/1024x1024x1_128x128x64 --mip 0 --num-mips 1 --queue ./queue --sharded
Obviously, I can't shard all the levels at once. How do I go about to
convert each mip level to shard formats?
igneous image downsample file:///nfs/precomputed/1024x1024x1_128x128x64 --mip 0 --num-mips 5 --queue ./queue --sharded
igneous: sharded downsamples only support producing one mip at a time.
—
Reply to this email directly, view it on GitHub
<#505 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATGQSM75GLVI2BV7DVTNP3WCCAMBANCNFSM5HKIBQVQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Ohh, I see. It is okay to mix shard and unshard formats in different mip levels. I didn't think about that and that sure makes sense. Thank you Will for your advise! |
If you specify chunk-size on the command line it will overwrite your
previous settings. If you already set it just omit it from the command.
shape corresponds to the size of a task and should be a power of two the
size of the chunks to ensure standalone downsamples generate efficiently.
Regardless, the shape must be chunk aligned or errors will result.
For the downsample command, each additional mip level quadruples the size
of the task (though it will cap at the maximum number of downsamples when
a single downsample is a chunk).
It looks like your info file is ok and is maximally downsampled for your
chunk size.
Are you sure your queue is totally empty? Maybe it didn't finish processing.
One more tip: if you set --encoding png you can save almost 2x the disk
space losslessly at the expense of compression speed.
…On Fri, Jul 1, 2022, 1:59 PM manoaman ***@***.***> wrote:
Thanks for the feedbacks Will,
--fill-missing option did work. Thank you.
I didn't quite understand what you mean by "reset the chunk size". What
chunk size should I be using in the first place? I think I'm a little
confused how to define chunk sizes transitioning from a CloudVolume script
to Igneous CLI and the use of --shape option.
I noticed you reset the chunk size on the command line after you set it in
the info file which may not be what you want.
The results so far I see in the viewer are partially loaded chunks and
stops loading so I'm not sure if I succeeded in downsampling. I went as
deep as --num-mips 8 and here are the generated files.
du -sh ./*
29M ./14400_14400_2000
2.0G ./1800_1800_2000
19M ./28800_28800_2000
487M ./3600_3600_2000
6.1M ./57600_57600_2000
117M ./7200_7200_2000
24K ./info
24K ./provenance
$ cat ./info
{
"data_type": "uint16",
"num_channels": 1,
"scales": [
{
"chunk_sizes": [
[
128,
128,
64
]
],
"encoding": "raw",
"key": "1800_1800_2000",
"resolution": [
1800,
1800,
2000
],
"size": [
7332,
10131,
3900
],
"voxel_offset": [
0,
0,
0
]
},
{
"chunk_sizes": [
[
128,
128,
64
]
],
"encoding": "raw",
"key": "3600_3600_2000",
"resolution": [
3600,
3600,
2000
],
"size": [
3666,
5066,
3900
],
"voxel_offset": [
0,
0,
0
]
},
{
"chunk_sizes": [
[
128,
128,
64
]
],
"encoding": "raw",
"key": "7200_7200_2000",
"resolution": [
7200,
7200,
2000
],
"size": [
1833,
2533,
3900
],
"voxel_offset": [
0,
0,
0
]
},
{
"chunk_sizes": [
[
128,
128,
64
]
],
"encoding": "raw",
"key": "14400_14400_2000",
"resolution": [
14400,
14400,
2000
],
"size": [
917,
1267,
3900
],
"voxel_offset": [
0,
0,
0
]
},
{
"chunk_sizes": [
[
128,
128,
64
]
],
"encoding": "raw",
"key": "28800_28800_2000",
"resolution": [
28800,
28800,
2000
],
"size": [
459,
634,
3900
],
"voxel_offset": [
0,
0,
0
]
},
{
"chunk_sizes": [
[
128,
128,
64
]
],
"encoding": "raw",
"key": "57600_57600_2000",
"resolution": [
57600,
57600,
2000
],
"size": [
230,
317,
3900
],
"voxel_offset": [
0,
0,
0
]
}
],
"type": "image"
}
Any thoughts?
—
Reply to this email directly, view it on GitHub
<#505 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATGQSPZ4MNWUJ7UR4CKVD3VR4WZZANCNFSM5HKIBQVQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi Will, Do you know how to get around these errors? One error occurs when Thank you,
|
Hi m, That function In the second error, the chunk size appears to be zero? This seems like either an info file error or a mistake I made clamping values somewhere. |
Attaching |
Hi m, I wasn't able to reproduce the sharded issue, but I found a bug in my task shape calculation when the specified z chunk size was greater than the computed chunk shape. I released igneous-pipeline==4.10.0 which contains a fix for that issue. Thanks for reporting the bug (and providing the info files which allowed me to reproduce it easily)! Let me know if that helps. |
Hi Will, I'm glad to hear that I could contribute!! And thank you for the quick release!! I'll upgrade my Igneous to the latest version. -m |
Hello @william-silversmith I encountered a new error which I haven't experienced before and I have a couple of questions. I'm tackling a 3D volume with the volume resolution in (41088 x 28416 x 10240), uint8. Initially used cloud-volume and chunked xy plane in (4096 x 4096 x 1). And then, used igneous by (128 x 128 x 64) chunk which I encountered the following error during the process. This is ran on a high RAM compute machine (1TB RAM, 36 cpu cores). And I'm running again at the moment to reproduce the issue on a different machine.
Perhaps run the failed task queue jobs again would fix? Any guidance here would be appreciated. Thank you! Happy Halloween!
|
Hi m, Can you show me the parameters you are using for downsampling? If you use the newest version of Igneous, it should try to keep these tasks to a reasonable memory limit. On the old version, you can try setting num_mips to a smaller number (3 or 4). The error you are encountering is likely an inability for malloc to find a contiguous memory region of that size. Smaller memory segments will probably make this go away. |
Yes. I ran the following commands in sequence. That reminds me I have not updated the igneous-pipeline. The version I used is Version: 4.17.0. Perhaps I should go ahead and upgrade to 4.20.1.
|
The line with num_mips 7 is possibly the culprit. Each additional mip
requires 4x the memory. I'd change that to 4 or 5 and see if it helps.
…On Tue, Oct 31, 2023, 5:57 PM manoaman ***@***.***> wrote:
Yes. I ran the following commands in sequence. That reminds me I have not
updated the igneous-pipeline. The version I used is Version: 4.17.0.
Perhaps I should go ahead and upgrade to 4.20.1.
igneous image xfer file:///nfs/3d_volume/xy_precomputed/ file:///nfs/3d_volume/ch0/ --mip 0 --chunk-size 128,128,64 --fill-missing --queue ./queue --sharded &&
igneous -p 36 execute -x ./queue &&
ptq purge ./queue &&
igneous image downsample file:///nfs/3d_volume/ch0/ --mip 0 --num-mips 2 --volumetric --fill-missing --queue ./queue &&
igneous -p 36 execute -x ./queue &&
ptq purge ./queue &&
igneous image downsample file:///nfs/3d_volume/ch0/ --mip 2 --num-mips 7 --volumetric --fill-missing --queue ./queue &&
igneous -p 36 execute -x ./queue &&
ptq purge ./queue;
—
Reply to this email directly, view it on GitHub
<#505 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATGQSJSYZHGJDRIG6PANV3YCFX3RAVCNFSM5HKIBQV2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZYHAYDSMZRG4YA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @william-silversmith , It looks like low mip levels (400_400_800, 800_800_1600, 1600_1600_3200) are created over 1 million chunks in a folder and causing the break in the storage side. I'm going to change the chunk size to 128,128,64 to 128,128,128 instead. However, I cannot figure out how to make the second and the third level chunks (800_800_1600, 1600_1600_3200) to convert into a shard format chunks. Can you advise how I can specify specific levels to shard format? A sequence of commands I'm using is attached. Thank you,
|
Hi m, You can created sharded downsamples by adding |
Is this correct for the first three mip levels?
|
Yes, that should work. The volumetric flag doesn't get tested too regularly, let me know if you run into problems. |
I've tested Looks like I can get away with this without using Thank you, |
That's really weird! Can you show me the command you are using and the info
file?
Does the problem resolve when you zoom in?
…On Tue, Nov 28, 2023, 11:41 AM manoaman ***@***.***> wrote:
Hi @william-silversmith <https://github.com/william-silversmith>
I've tested igneous image downsample and it appears that with --sharded
option, the view on two other planes (xy, yz) become corrupted viewing from
Neuroglancer. It's almost as if same images are overlayed many times with
offset slightly shifted and looks blurry. If I try to add --sharded
option to subsequent downsample commands on a deeper mip levels, the image
corruption becomes worse.
Looks like I can get away with this without using --sharded option with igneous
image downsample. However, the number of chunks becomes really large. Do
you have any other suggestions on downsample images and use shard format
volumetrically? I was thinking of using shard format for the first 2~3
levles (mip 0,1,2).
Thank you,
-m
Screenshot.2023-11-28.at.8.33.19.AM.png (view on web)
<https://github.com/seung-lab/cloud-volume/assets/47464840/10b89f22-68f0-4df5-a3b1-4bea4cc34ce8>
Screenshot.2023-11-28.at.8.33.04.AM.png (view on web)
<https://github.com/seung-lab/cloud-volume/assets/47464840/6ecb6483-2187-4b6e-814f-8b589f566fc5>
—
Reply to this email directly, view it on GitHub
<#505 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATGQSP6T5M5XDYLIDQEET3YGYH4ZAVCNFSM5HKIBQV2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTGAZDOMBTG42Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi Will, this is the command and info file attached. Any clue I might be processing incorrectly?
|
Huh. This is pretty weird. Can you give it a try without using |
Okay, running the modified command lines. Probably it'll finish tomorrow so I'll get back to you once I see the results. |
Hi @william-silversmith , images (xz, yz planes) still appear the same as before. Could this be the issue with using "--sharded" during downsample since I did not see this effect? I used igneous-pipeline (4.20.1).
|
Does the volume look like you would expect when you scroll in the z section
on the xy plane? Are you sure this isn't the data? Take a look at it with
only the highest resolution images.
…On Wed, Nov 29, 2023, 11:42 AM manoaman ***@***.***> wrote:
Hi @william-silversmith <https://github.com/william-silversmith> , images
(xz, yz planes) still appear the same as before. Could this be the issue
with using "--sharded" during downsample since I did not see this effect? I
used igneous-pipeline (4.20.1).
igneous image xfer file:///nafs/precomputed_xy/ file:///nfs/precomputed_xyz/ --mip 0 --chunk-size 128,128,128 --fill-missing --queue ./queue --sharded &&
igneous -p 36 execute -x ./queue &&
ptq purge ./queue &&
igneous image downsample file:///nfs/precomputed_xyz/ --mip 0 --num-mips 1 --sharded --volumetric --fill-missing --queue ./queue &&
igneous -p 36 execute -x ./queue &&
ptq purge ./queue &&
igneous image downsample file:///nfs/precomputed_xyz/ --mip 1 --num-mips 1 --sharded --fill-missing --queue ./queue &&
igneous -p 36 execute -x ./queue &&
ptq purge ./queue &&
igneous image downsample file:///nfs/precomputed_xyz/ --mip 2 --num-mips 1 --fill-missing --queue ./queue &&
igneous -p 36 execute -x ./queue &&
ptq purge ./queue &&
igneous image downsample file:///nfs/precomputed_xyz/ --mip 3 --num-mips 1 --fill-missing --queue ./queue &&
igneous -p 36 execute -x ./queue &&
ptq purge ./queue &&
igneous image downsample file:///nfs/precomputed_xyz/ --mip 4 --num-mips 1 --fill-missing --queue ./queue &&
igneous -p 36 execute -x ./queue &&
ptq purge ./queue &&
igneous image downsample file:///nfs/precomputed_xyz/ --mip 5 --num-mips 1 --fill-missing --queue ./queue &&
igneous -p 36 execute -x ./queue &&
ptq purge ./queue;
Screenshot.2023-11-29.at.8.28.56.AM.png (view on web)
<https://github.com/seung-lab/cloud-volume/assets/47464840/fb92b4a8-bf54-4815-b7c4-d65025e63010> Screenshot.2023-11-29.at.8.34.54.AM.png
(view on web)
<https://github.com/seung-lab/cloud-volume/assets/47464840/5b579fd0-9f4f-4756-9229-bd155d4aafe7>
info.txt
<https://github.com/seung-lab/cloud-volume/files/13503158/info.txt>
—
Reply to this email directly, view it on GitHub
<#505 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATGQSOAQR4PVDY2MH7NCHDYG5QYVAVCNFSM5HKIBQV2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTGIZTCMBVGUZQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes, the xy plane looks okay to me. Looking at xz and yz planes while gradually zooming in, I see the borders of the images get corrected at the highest resolution. I no longer see these blurred borders. (And maybe the second highest too from looking at the chunk statistic which is kind of hard to tell). When I compared with the version of chunks without the use of "--sharded" option in |
Let me try on a different image if I can reproduce the same issue. I know for sure that first two mip levels generate more than a million chunks which causes an issue on the storage. There might be errors I overlooked in subsequent downsample steps... kind of difficult to debug at the moment. |
This is a really good idea if you can reproduce something I can def help
debug it
…On Thu, Nov 30, 2023, 4:48 PM manoaman ***@***.***> wrote:
Let me try on a different image if I can reproduce the same issue. I know
for sure that first two mip levels generate more than a million chunks
which causes an issue on the storage. There might be errors I overlooked in
subsequent downsample steps... kind of difficult to debug at the moment.
—
Reply to this email directly, view it on GitHub
<#505 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATGQSOA5FEQLTR43E2BQBDYHD5MBAVCNFSM5HKIBQV2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTGQ3DCNRTGM4A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Okay, debugging in progress. I try to simplify the test by only limiting downsample commands to first two~three mip levels. The following command lines where
Next, I started over by running with (Questionable line) I will move on to test without |
Hi @william-silversmith , the following commands generated the result which was satisfying on the Neuroglancer viewer. As you mentioned, I think there is an issue with using
One thing I found weird running after the questionable command, ... I see the taskqueue with more completed tasks than what has been inserted. This is something I have not seen before.
I know 2-2-1 downsample works but is there an alternative way I can accomplish 2-2-2 downsample with shard format for optimal viewing on Neuroglancer? |
I think this will take some debugging but I am otherwise occupied at the moment unfortunately... If possible, I would recommend sticking with 2x2x1 downsampling for shards for now. |
Hello,
I tried running CloudVolume on a TIFF stack which is 245GB in size. (155MB each for about 1600+ slices.) I realized the number of chunk files created in a directory have hit 1000001 and that seems like an upper bound from what I can create. (Probably this value is configurable but I'm not sure. Any thoughts?) The following is the error I see while running CloudVolume.
Should the tiff files be downsized before running CloudVolume? If you could advise me on the approaches, it would be nice to know.
Thank you!
-m
The text was updated successfully, but these errors were encountered: