Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on 8 cores CPU fclones seems not to use all cores #158

Closed
kapitainsky opened this issue Sep 7, 2022 · 26 comments · Fixed by #162
Closed

on 8 cores CPU fclones seems not to use all cores #158

kapitainsky opened this issue Sep 7, 2022 · 26 comments · Fixed by #162
Labels
bug Something isn't working performance

Comments

@kapitainsky
Copy link
Contributor

kapitainsky commented Sep 7, 2022

When trying to find out why yadf is faster than fclones I have noticed much better CPU utilisation in yadf.

macOS 12.5.1
CPU: 2.3 GHz 8-Core Intel Core i9
SSD - internal NVME

and not surprisengly yadf usually finishes duplicates search 2x faster than fclones

Why - no idea. Maybe it scales better with number of processes? You used 4 core CPU, yadf tests used 6 core and I used 8 core. Also I use macOS when two other tests were run in Ubuntu. We all most likely used different SSD. Too many factors to say for sure why:)

Also I tested the latest versions. In your tests yadf was v0.13 when now it is v1.0

Actually I have idea. I looked at htop because yadf is faster in any test I tried... yadf is using all my 8 cores vs fclones only using 4. Which explains why it is about x2 faster

With fclones I can see thast all 8 cores are used during initial stages - but then only 4 when contents hashes are calculated.

Originally posted by @kapitainsky in #153 (comment)

@kapitainsky kapitainsky changed the title > Why - no idea. Maybe it scales better with number of processes? You used 4 core CPU, yadf tests used 6 core and I used 8 core. Also I use macOS when two other tests were run in Ubuntu. We all most likely used different SSD. Too many factors to say for sure why:) on 8 cores CPU fclones seems not to use all cores Sep 7, 2022
@kapitainsky
Copy link
Contributor Author

kapitainsky commented Sep 7, 2022

Not very scientific but gives good feel what is going on. The same dataset:

yadf v1.0.0 - command to run yadf

yadf

fclones v0.27.2 - command to run fclones group .

fclones

it is a bit dirty animated gif capture - to restart sync both of them refresh browser window - #158 (comment) - they are not exactly the same time and slowly get out of sync

It is clear that both programs use all cores initiallyfor grouping (lengths for fclones (yadf does not do it) then prexid and suffix for both followed by crunching contents hashes). What is also insteresting that even in initial phase yadf uses all available CPU power - fclones takes it easy

@kapitainsky
Copy link
Contributor Author

I think I should stop using it - hahah - as I only create issues:)

@kapitainsky
Copy link
Contributor Author

more multi core is becoming norm - in Apple world basic MacBook Air has 10 cores and Mac Studio as of today 20 - it will obviously only grow in the future so it is good idea to use what is available well - does not matter if 2 or 100

@pkolaczk pkolaczk added performance bug Something isn't working labels Sep 7, 2022
@pkolaczk
Copy link
Owner

pkolaczk commented Sep 7, 2022

I think I should stop using it - hahah - as I only create issues:)

It's been great! Keep going!

@kapitainsky
Copy link
Contributor Author

I think I should stop using it - hahah - as I only create issues:)

Be carefull what you wish for. But I need good deduplicator so will be pestering for time being.

@kapitainsky
Copy link
Contributor Author

fclones has the biggest potential to be close to perfect

@kapitainsky
Copy link
Contributor Author

kapitainsky commented Sep 7, 2022

Before I thought that jdupes is the best (it is still great) - fantastic attention to details and any person trying to learn C should study its code- but they ignored macOS world and think that SSD are only used in some high end hardware plus believe (there is no other word) that byte by byte files comparison is the best - so now bet is on fclones:) It has good base design - configurable parallelism with clever heuristics - it makes default easier. And attention to details similar to jdupes. Rest is about filling gaps.

@kapitainsky
Copy link
Contributor Author

kapitainsky commented Sep 8, 2022

I think good news. I have tried with -t 8 option and then it looks much better.

Overall CPU utilisation is lower than for yadf but now fclones finishes in the same time as yadf (75s in my test). With defaults it takes 150s for fclones to finish - 2 times slower on the spot.

So problem is with threads heuristics.

@kapitainsky
Copy link
Contributor Author

kapitainsky commented Sep 8, 2022

Implementing #159 would help to diagnose it + would let users to make more informed decision before going into tuning.

IMHO it should show available cores and threads values used - either based on decision made by fclones for detected device(s) or user provided options

@kapitainsky
Copy link
Contributor Author

kapitainsky commented Sep 8, 2022

I think I have an answer.

  1. Above default results use unknown device threads defaults (4,1) - which is consistant with quick test I did timing fclones group . -t 1, fclones group . -t 4 and fclones group . -t 8

  2. Problem is probably here:

let regex = regex::Regex::new(r"^/dev/([fhs]d[a-z]|nvme[0-9]+).*").unwrap();

as on macOS devices are named like this: /dev/diskXsY e.g., /dev/disk1s5

EDIT: ok - this code is only for linux but it also should be used for macOS - as going by names is wrong in case of this OS. it is "mentally" much closer to linux than to windows. but has its quirks.

@pkolaczk
Copy link
Owner

pkolaczk commented Sep 8, 2022

That regex is a hack to get the device name from the partition name and is called on Linux only. On other systems it uses the raw device name it obtained from enumerating the device list. But maybe some of that device listing logic is wrong or some data is missing there, and then it can't find the device for each file.

@kapitainsky
Copy link
Contributor Author

yes does not help I do not know rust:) but I think the issue is exactly with device listing logic. As I did some crude debug and I can see that on macOS first device is always unknown - they it is correct. Do you use first device from the list as "/" ?

@pkolaczk
Copy link
Owner

pkolaczk commented Sep 8, 2022

Yes, the first device is the default one that is always unknown - this is just a fallback if no devices are retrieved from the system. But then there is the mount_points vector that should containt the real mount points with device index and should point to some real device.

@kapitainsky
Copy link
Contributor Author

ok... so this might be an issue becasue on macOS mount points are not always what they seem to be - APFS uses concept of firmlinks when magically one "partition" becomes one with another. It only applies to operating system disk - internal one. External disks work pretty much the same as in Linux.

@kapitainsky
Copy link
Contributor Author

I think I could figure out what is going on if fclones had some debug output

@kapitainsky
Copy link
Contributor Author

e.g., which device it assumed that folder used for fclones group is used

@pkolaczk
Copy link
Owner

pkolaczk commented Sep 8, 2022

Yeah, I'll add some

@kapitainsky
Copy link
Contributor Author

if you could add all enumerated devices list with as much info as fclones is using would help. I looked at sysinfo crate and for macOS they do a bit of hacking - which might lead to fclones getting funny info.

@kapitainsky
Copy link
Contributor Author

kapitainsky commented Sep 9, 2022

APFS macOS disk has multiple volumes/"partitions"

folder /Users (and any other user writable folders) data is stored in "partition" called "Macintosh HD - Data" mounted to /System/Volumes/Data

this is mount point and partition name fclones discovers via sysinfo and correctly marks as SSD

but from user perspective (and fclones) folder Users has path /Users/ - so any data there for fclones is on unknown device

this is macOS file system way - multiple "partitions" are fused together via firmlinks into one filesystem

@kapitainsky
Copy link
Contributor Author

kapitainsky commented Sep 9, 2022

later when I have a moment I will try to think what would be the safest way to tackle it. Especially that fclones can be also used on older macs where things are different.

@kapitainsky
Copy link
Contributor Author

I think I got to the bottom of it. Testing my first rust code now:) will do PR tomorrow.

@kapitainsky
Copy link
Contributor Author

kapitainsky commented Sep 9, 2022

Here results of my debug,

DiskDevices structure on macOS contains the following:

device_name = "VM"
file_system = "apfs"
mount_point = "/System/Volumes/VM"
type = SSD

device_name = "Preboot"
file_system = "apfs"
mount_point = "/System/Volumes/Preboot"
type = SSD

device_name = "Update"
file_system = "apfs"
mount_point = "/System/Volumes/Update"
type = SSD

device_name = "Macintosh HD - Data"
file_system = "apfs"
mount_point = "/System/Volumes/Data"
type = SSD

device_name = "extSamsung-SSD"
file_system = "apfs"
mount_point = "/Volumes/extSamsung-SSD"
type = SSD

device_name = "extSamsung-SSD-Temp"
file_system = "apfs"
mount_point = "/Volumes/extSamsung-SSD-Temp"
type = SSD

device_name = "Lacie01-Data"
file_system = "apfs"
mount_point = "/Volumes/Lacie01-Data"
type = HDD

device_name = "Lacie01-TM01"
file_system = "apfs"
mount_point = "/Volumes/Lacie01-TM01"
type = HDD

device_name = "Untitled 1"
file_system = "exfat"
mount_point = "/Volumes/USB stick"
type = HDD

in this case mix of system and external devices.

We do not have to worry about VM, Preboot nor Update - they are only used by system. Neither external devices - I only tried to see if their type is recognised correctly.

The problem is with:

device_name = "Macintosh HD - Data"
file_system = "apfs"
mount_point = "/System/Volumes/Data"
type = SSD

This is device where users all data is... however users never see nor use this path, neither does fclones. APFS uses firmlinks (in Apple own words: "Bi-directional wormhole in path traversal. Firmlinks are used on the system volume to point to the user data on the data volume."). From user perspective data device folders are part of root filesystem. - https://www.swiftforensics.com/2019/10/macos-1015-volumes-firmlink-magic.html

Very APFS specific - but it is what makes fclones unable to recognise correct device. As e.g. when trying to deduplicate folder /Users/kptsky/FilesForDedup there is no such device path in DiskDevices and as a result 'unknown' device is used.

Solution? For APFS we have to help and point 'Macintosh HD - Data' to root folder.

sending PR with my proposed solution.

pkolaczk pushed a commit that referenced this issue Sep 9, 2022
macOS - '/System/Volumes/Data' DiskDevice path remapping

Fixes #158

On macOS APFS disk all users' data is mounted in '/System/Volumes/Data' but fused transparently using firmlinks and presented as part of the root filesystem.
It requires remapping Data volume path for this DiskDevice to '/' in order for fclones correctly recognise device deduplicated files are on.

Ref: 
https://www.swiftforensics.com/2019/10/macos-1015-volumes-firmlink-magic.html
https://eclecticlight.co/2020/01/23/catalina-boot-volumes/
@kapitainsky
Copy link
Contributor Author

Thank you for merging it. Now it flies on macOS with defaults.

@pkolaczk
Copy link
Owner

Thank you for all the hard work on this. Awesome contribution!

@pkolaczk
Copy link
Owner

After thinking about this issue more, I came to the conclusion the approach taken by fclones to identifying disks by file paths is fundamentally broken. I believe a lot better way would be to use the device identifiers we already have in the FileId / FileInfo and use them to find the actual device, instead of messing up with mount points. That information should be way more reliable as it comes from the OS directly. Unfortunately I haven't found a good, portable way to map those ids to device names returned by sysinfo. I opened a feature request in sysinfo, let's see what they respond.

@kapitainsky
Copy link
Contributor Author

As it is now it is good enough IMHO. I have looked at sysinfo source code for macOS part and it is full of hacks as well. Cross platform solutions are never easy:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance
Projects
None yet
2 participants