-
Notifications
You must be signed in to change notification settings - Fork 44
Description
Hi,
I am prototyping usage of ratarmount on a distributed environment (compute nodes) that access a shared storage (Lustre). I have huge datasets to deal with (up to 8 million files), and I would need to reduce the number of (small) files: and ratarmount seems a perfect tool for this.
I tested it on my laptop with no problem at all. The problem comes when I try to use it on the "production" environment (the compute nodes).
The problem:
When I mount an archive (be it tar or tgz) the first time, I get this error on a small archive (26 GB for ~26000 files):
$ ratarmount tf_medium.tgz
Creating offset dictionary for /XXX/tf_medium.tgz ...
Position 331354366 of 26563406918 (1.25%). Remaining time: 2 min 38 s (current rate), 2 min 38 s (average rate). Spent time: 0 min 2 s
Position 675300055 of 26563406918 (2.54%). Remaining time: 2 min 30 s (current rate), 2 min 33 s (average rate). Spent time: 0 min 4 s
Traceback (most recent call last):
File "/home/XXX/.local/bin/ratarmount", line 8, in <module>
sys.exit(cli())
File "/home/XXX/.local/lib/python3.6/site-packages/ratarmount.py", line 1798, in cli
indexMinimumFileCount = args.index_minimum_file_count,
File "/home/XXX/.local/lib/python3.6/site-packages/ratarmount.py", line 590, in __init__
mountSources[key] = openMountSource(path, **options)
File "/home/XXX/.local/lib/python3.6/site-packages/ratarmountcore/factory.py", line 149, in openMountSource
raise CompressionError(f"Archive to open ({str(fileOrPath)}) has unrecognized format!")
ratarmountcore.utils.CompressionError: Archive to open (/XXX/tf_medium.tgz) has unrecognized format!
The file index file is generated, but is "empty": there is only the "files" table defined, and not a single record in it.
The behavior is the same with tar or tgz file (so I guess that it's not really an unrecognized format).
The problem is not related to the Lustre filesystem, since the behavior is the same with the file stored on a local file system.
On the compute node: I am able to mount the archive if I specify "--index-minimum-file-count 50000", because no index file is created.
On my laptop: I am able to mount a copy of this archive: the index file is correctly generated.
With the index file created on my laptop, I am able to mount (and use) this archive on my compute node.
I have the same behavior on 5 different compute nodes (all of them are hardware identical)
I think that I installed all the recommended dependencies, but I may have missed something.
Here are some details on one of the compute node:
- OS: Red Hat Enterprise Linux Server release 7.9 (Maipo)
- Python 3.6.8
- pip3 21.3.1
- 2x Xeon Gold 6242R (20 cores / 40 threads each)
- Memory 14x 16GB
Any advice on where to look for diagnosing the problem ?