Replies: 7 comments 4 replies
-
Support on this. we have a new project last year, need store about 800 M files with avg 380KB size, and increased by 250M / year, so MooseFS (and LizardFS) is not suited. BTW, we had to resort on test some commerce supported solution. One of them is customized from Ceph FS, has a feature can be turn on to support billions small file. |
Beta Was this translation helpful? Give feedback.
-
Interesting experience, @bash99. Thanks. I'd say on properly specked hardware MooseFS can probably handle 800M files but you might consider using a block device to accommodate them to reduce load on metadata (master) server. IMHO CephFS is an absolute rubbish. I regret wasting too much time on it. Ceph have (or had) no concept of data integrity when OSDs merely detects inconsistencies (eventually) but not even repair them automatically. |
Beta Was this translation helpful? Give feedback.
-
Dmitry,
So how do you know that the slowness is coming from the replication?
One way to figure out will be do a test with say with one million files,
and make it single copy. After you get the total time
for this test, you can repeat the same test with 2 copies,
3 copies, 4 copies, etc.
If the slowness is not related that much to the replication
but instead coming from the metadata server operations,
I don't think this new method is going to help much.
In addition, what you are proposing doesn't fit the existing
moosefs architecture either.(IMHO)
…-- Marco
On 5/21/20 2:51 AM, Dmitry Smirnov wrote:
MooseFS have a young (still immature) rival - SeaweedFS
<https://github.com/chrislusf/seaweedfs> which have some good design ideas.
One particularly good idea is to store multiple small files in "volumes"
and replicate per volume.
It would be great to implement such design in MooseFS by
introducing a new type of chunk file and store small files
within such chunks.
MooseFS performance in regards to small files is far from optimal. I've
been storing 30 million files (average size ~26 kb) in my cluster and
found numerous problems such as replication few orders of magnutude
slower, etc. Large number of chunk/files kills performance of rotational
HDDs so I've moved small files to SSDs to see how much it would help but
ultimately was still disappointed with results.
I've decided to compare MooseFS performance to SeaweedFS in regards to
small files.
I've copied all 30 million small files to SeaweedFS and then measured
time taken by |rsync| to copy small files to empty 100GB SSD. It took
874 minutes (14h 34m) to fill the SSD with small files from MooseFS
while it took 356 minutes (3h 56m) to fill the SSD from SeaweedFS.
SeaweedFS accomplished the task roughly 2.5 times faster than MooseFS
but there is a catch: I've placed small files to SSD-backed chunkservers
while SeaweedFS had the data on rotational HDDs. That's a serious
evidence of efficient design when one can demonstrate 2+ times better
performance on HDD versus SSD, isn't it?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/moosefs/moosefs/issues/370>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALNUYCDOSKBO5S7CRJUGRZDRSTFOFANCNFSM4NGS5YOQ>.
|
Beta Was this translation helpful? Give feedback.
-
This is not my conclusion. I've said that small files replication is slower and that comes from observation of operating a cluster where small files are segregated to dedicated chunkservers. Slowness is not from number of replicas but from number of chunks because HDD performance significantly degrades when file system accommodates millions of files.
Nonsense. It can be implemented in chunkservers alone, without modifying any other components. Aggregation of smaller chunks into chunk files can be retro-fitted without changing architecture. In the essence it is merely a batch replication of several chunks together as one archive/meta-chunk. |
Beta Was this translation helpful? Give feedback.
-
"Hide" gobs of small objects by storing up to X chunks in a fixed size pod, and replicate entire pods in a single stream. Such architectural decision could greatly speed up replication of data when moosefs is used for WORM / CDN / archival type of workloads, but at the expense of extra complexity. Not sure how fragmented the pods would become on really busy systems with unpredictable workload sizes. @marcomilano - transactional overhead is a thing. Streaming replication of entire pods, vs CHUNKS_READ_REP_LIMIT, would make a cross-DC migration of 1.5B+ chunks less of a PITA. |
Beta Was this translation helpful? Give feedback.
-
You could look to storing the files in a relational database or graph database instead because DBMS tend to handle small objects much better than filesystems do. You could consider using memcached (with extstore) for storage of the files. I don't think that the problem stated is particular a flaw with MooseFS since the problem is common to all filesystems. |
Beta Was this translation helpful? Give feedback.
-
I work around this by storing space files in a zfs built from xTB loopback files backed by lizardfs 2.6.0 (probably moving back to moosefs soon). Such a potential feature would be great given that metadata were also encapsulated into the SuperChunk. Used this trick on and off for years, doing it live now with imap service mailbox storage as we speak as a method to move to a more agile data stack for this service. |
Beta Was this translation helpful? Give feedback.
-
MooseFS have a young (still immature) rival - SeaweedFS which have some good design ideas.
One particularly good idea is to store multiple small files in "volumes" and replicate per volume.
It would be great to implement such design in MooseFS by introducing a new type of chunk file and store small files within such chunks.
MooseFS performance in regards to small files is far from optimal. I've been storing 30 million files (average size ~26 kb) in my cluster and found numerous problems such as replication few orders of magnutude slower, etc. Large number of chunk/files kills performance of rotational HDDs so I've moved small files to SSDs to see how much it would help but ultimately was still disappointed with results.
I've decided to compare MooseFS performance to SeaweedFS in regards to small files.
I've copied all 30 million small files to SeaweedFS and then measured time taken by
rsync
to copy small files to empty 100GB SSD. It took 874 minutes (14h 34m) to fill the SSD with small files from MooseFS while it took 356 minutes (3h 56m) to fill the SSD from SeaweedFS.SeaweedFS accomplished the task roughly 2.5 times faster than MooseFS but there is a catch: I've placed small files to SSD-backed chunkservers while SeaweedFS had the data on rotational HDDs. That's a serious evidence of efficient design when one can demonstrate 2+ times better performance on HDD versus SSD, isn't it?
Beta Was this translation helpful? Give feedback.
All reactions