Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.11.2 seems to hang? #255

Closed
wmertens opened this issue Nov 25, 2020 · 11 comments
Closed

v0.11.2 seems to hang? #255

wmertens opened this issue Nov 25, 2020 · 11 comments

Comments

@wmertens
Copy link

wmertens commented Nov 25, 2020

When I run 0.11.2 with Linux 5.9.9, it seems to run ok but hangs after it completes:

$ sudo /nix/store/8k23yfyrz8ssganbxyls1ca08g9c3xnm-duperemove-0.11.2/bin/duperemove -rdhA -vvvv --hashfile=/duperemove-hashes.db /
[sudo] password for wmertens: 
Warning: The hash file format in Duperemove master branch is under development and may change.
If the changes are not backwards compatible, you will have to re-create your hash file.
Using 128K blocks
Using hash: murmur3
Gathering file list...
Mountpoint traversal disallowed: / 
Adding files from database for hashing.
Database record ("/home/wmertens/.config/google-chrome/Safe Browsing/UrlMalBin.store", 6917241.590) differs from disk ("/home/wmertens/.config/google-chrome/Safe Browsing/UrlMalBin.store", 6919921.590), update flagged.
Database record ("/home/wmertens/.config/google-chrome/Safe Browsing/UrlSoceng.store", 6917242.590) differs from disk ("/home/wmertens/.config/google-chrome/Safe Browsing/UrlSoceng.store", 6919922.590), update flagged.
Using 8 threads for file hashing phase
[01/35] (02.86%) csum: /home/wmertens/.config/google-chrome/Default/Storage/ext/fahmaaghhglfmonjliepjlchgpgfmobi/def/GPUCache/index
[03/35] (08.57%) csum: /home/wmertens/.config/google-chrome/ShaderCache/GPUCache/index
[02/35] (05.71%) csum: /home/wmertens/.config/google-chrome/Default/GPUCache/index
[04/35] (11.43%) csum: /home/wmertens/Projects/StratoSync/data/nacf/data.sqlite3
[05/35] (14.29%) csum: /home/wmertens/.config/google-chrome/Default/Local Extension Settings/hjdoplcnndgiblooccencgcggcoihigg/000003.log
[07/35] (20.00%) csum: /home/wmertens/.config/google-chrome/GrShaderCache/GPUCache/index
[06/35] (17.14%) csum: /home/wmertens/.config/google-chrome/Default/Local Extension Settings/immpkjjlgappgfkkfieppnmlhakdmaab/000003.log
[08/35] (22.86%) csum: /home/wmertens/.config/google-chrome/Default/Storage/ext/fahmaaghhglfmonjliepjlchgpgfmobi/def/File System/000/p/Paths/000003.log
[09/35] (25.71%) csum: /home/wmertens/.config/google-chrome/Default/Session Storage/MANIFEST-000001
[10/35] (28.57%) csum: /home/wmertens/.config/google-chrome/Default/Visited Links
[11/35] (31.43%) csum: /home/wmertens/.config/google-chrome/Default/Application Cache/Cache/index
[12/35] (34.29%) csum: /home/wmertens/.config/Code/User/globalStorage/state.vscdb
[13/35] (37.14%) csum: /home/wmertens/.config/google-chrome/Default/Web Data
[14/35] (40.00%) csum: /home/wmertens/.config/google-chrome/Default/Login Data
[15/35] (42.86%) csum: /home/wmertens/.config/google-chrome/Safe Browsing/UrlMalBin.store
[16/35] (45.71%) csum: /home/wmertens/.config/google-chrome/Default/Network Action Predictor
[17/35] (48.57%) csum: /home/wmertens/.config/google-chrome/Default/Shortcuts
[18/35] (51.43%) csum: /home/wmertens/.config/google-chrome/Safe Browsing/UrlSoceng.store
[19/35] (54.29%) csum: /home/wmertens/.config/google-chrome/Default/Storage/ext/fahmaaghhglfmonjliepjlchgpgfmobi/def/GPUCache/data_1
[20/35] (57.14%) csum: /home/wmertens/.config/google-chrome/Default/Favicons
[21/35] (60.00%) csum: /duperemove-hashes.db
[22/35] (62.86%) csum: /home/wmertens/.config/google-chrome/ShaderCache/GPUCache/data_1
[23/35] (65.71%) csum: /home/wmertens/.config/Slack/GPUCache/data_1
[24/35] (68.57%) csum: /home/wmertens/.config/Code/GPUCache/data_1
[25/35] (71.43%) csum: /home/wmertens/.config/google-chrome/Default/GPUCache/data_1
[26/35] (74.29%) csum: /home/wmertens/.config/google-chrome/Default/Application Cache/Index
[27/35] (77.14%) csum: /home/wmertens/.config/google-chrome/GrShaderCache/GPUCache/data_1
[28/35] (80.00%) csum: /home/wmertens/.cache/mesa_shader_cache/index
[29/35] (82.86%) csum: /home/wmertens/.config/Code/User/workspaceStorage/575e38ed227e5768c6243162dc51bc46/state.vscdb
[30/35] (85.71%) csum: /home/wmertens/.config/google-chrome/Default/Application Cache/Cache/data_1
[31/35] (88.57%) csum: /home/wmertens/.config/google-chrome/Default/Local Extension Settings/ghbmnnjooekpmoecnnnilnnbdlolhkhi/000003.log
[32/35] (91.43%) csum: /var/log/journal/5e0c4c668f1546ee8336b59a5c5e71c6/system.journal
[33/35] (94.29%) csum: /home/wmertens/.config/google-chrome/Default/Cookies
[34/35] (97.14%) csum: /var/log/journal/5e0c4c668f1546ee8336b59a5c5e71c6/user-1000.journal
[35/35] (100.00%) csum: /home/wmertens/.config/google-chrome/Default/History
^C

If I let it be, it takes 200% CPU for hours and I let it crawl up to 6.8GB RAM before I killed it.

strace -f shows this after a while, it looks like its looping, lots of similar numbers:

[pid  6693] ioctl(6, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6692] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6690] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6693] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6689] <... pread64 resumed>"\n\3\312\0\213\1\377\0\3\341\17\351\17\322\17\273\17\216\17w\17`\17I\5Q\0172\17\33\t\267"..., 4096, 18714624) = 4096
[pid  6693] pread64(6,  <unfinished ...>
[pid  6692] ioctl(5, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6689] pwrite64(3, "\n\5W\0\225\1\225\0\17H\0171\f\255\1\225\79\4Z\3\347\4\237\10\10\10\222\nF\7\254"..., 4096, 45572096 <unfinished ...>
[pid  6693] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6692] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6690] ioctl(4, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6692] pread64(5,  <unfinished ...>
[pid  6689] <... pwrite64 resumed>)     = 4096
[pid  6690] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6693] ioctl(6, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6692] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6689] pread64(3,  <unfinished ...>
[pid  6693] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6690] pread64(4,  <unfinished ...>
[pid  6693] pread64(6,  <unfinished ...>
[pid  6689] <... pread64 resumed>"\n\2F\0\231\2\1\0\16b\4\22\r\357\3C\4n\r\330\r\301\4@\r\253\16\220\f=\r8"..., 4096, 38539264) = 4096
[pid  6692] ioctl(5, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6690] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6693] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6692] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6689] pread64(3,  <unfinished ...>
[pid  6692] pread64(5,  <unfinished ...>
[pid  6689] <... pread64 resumed>"\2\2\274\0m\1A\5\0\0(Y\4\240\4\205\4j\4O\0044\4\31\3\376\3\343\3\311\3\256"..., 4096, 33972224) = 4096
[pid  6693] ioctl(6, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6692] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6690] ioctl(4, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6689] pwrite64(3, "\n\f\305\0\205\1\243\2\f\256\r\357\f\227\f\200\fi\r \fR\f$\5;\f\r\v\366\v\337"..., 4096, 11030528 <unfinished ...>
[pid  6693] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6690] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0

Anything I can do to debug?

@lorddoskias
Copy link
Collaborator

Does this happen if you run duperemove on any directory or only on this particular one? Can you provide the output of duperemove when run with the --debug option. Looking at the repeated pattern of fiemap calls it seems it's locked up in a loop that constantly calls do_fiemap. Also can you provide the output of the filefrag -v /home/wmertens/.config/google-chrome/Default/History command.

@wmertens
Copy link
Author

@lorddoskias debug shows that it's running on several files, I captured duperemove-debug.txt

I then closed Chrome and then it was only looping on:

fiemap_iter: filename "/home/wmertens/.config/google-chrome/ShaderCache/GPUCache/index" idx 0 return poff 158912512 loff 131072 len 131072 flags 0x2001
Fiemap file "/home/wmertens/.config/google-chrome/ShaderCache/GPUCache/index", start: 262144, count: 292
0 extents found
$ filefrag -v .config/google-chrome/ShaderCache/GPUCache/index 
Filesystem type is: 9123683e
File size of .config/google-chrome/ShaderCache/GPUCache/index is 262512 (65 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       0:   99448109..  99448109:      1:            
   1:       32..      63:      38797..     38828:     32:   99448141: last,shared
.config/google-chrome/ShaderCache/GPUCache/index: 2 extents found

@lorddoskias
Copy link
Collaborator

Is this an XFS or a btrfs filesystem?

@wmertens
Copy link
Author

wmertens commented Nov 26, 2020

Btrfs. Fsck is fine.

@lorddoskias
Copy link
Collaborator

lorddoskias commented Nov 26, 2020

This is really odd, I'm not able to reproduce this:

./duperemove --debug --hashfile=ludnica.db -r mnt-test/
Detected 24 logical and 12 physical cpus (ht is on).
Warning: The hash file format in Duperemove master branch is under development and may change.
If the changes are not backwards compatible, you will have to re-create your hash file.
Using 128K blocks
Using hash: murmur3
Gathering file list...
Using 24 threads for file hashing phase
[1/2] (50.00%) csum: /home/nborisov/projects/kernel/duperemove/mnt-test/base-file
[2/2] (100.00%) csum: /home/nborisov/projects/kernel/duperemove/mnt-test/ref-file
Got 2 extent for file
Fiemap file "/home/nborisov/projects/kernel/duperemove/mnt-test/base-file", start: 0, count: 292
2 extents found
[0] logical: 0, physical: 13697024 length: 4194304
Got 2 extent for file
Fiemap file "/home/nborisov/projects/kernel/duperemove/mnt-test/ref-file", start: 0, count: 292
[1] logical: 5242880, physical: 17891328 length: 1048576
fiemap_iter: filename "/home/nborisov/projects/kernel/duperemove/mnt-test/base-file" idx 0 return poff 13697024 loff 0 len 4194304 flags 0x2000
2 extents found
[0] logical: 0, physical: 13631488 length: 4096
[1] logical: 131072, physical: 13697024 length: 131072
fiemap_iter: filename "/home/nborisov/projects/kernel/duperemove/mnt-test/ref-file" idx 0 return poff 13631488 loff 0 len 4096 flags 0x0
fiemap_iter: filename "/home/nborisov/projects/kernel/duperemove/mnt-test/ref-file" idx 1 return poff 13697024 loff 131072 len 131072 flags 0x2001
Csumed file /home/nborisov/projects/kernel/duperemove/mnt-test/ref-file(262144) in 0.000236 (1059.322034 mb/s)
db: write 2 hashes for file /home/nborisov/projects/kernel/duperemove/mnt-test/ref-file
fiemap_iter: filename "/home/nborisov/projects/kernel/duperemove/mnt-test/base-file" idx 1 return poff 17891328 loff 5242880 len 1048576 flags 0x1
Csumed file /home/nborisov/projects/kernel/duperemove/mnt-test/base-file(6291456) in 0.003643 (1646.994236 mb/s)
db: write 2 hashes for file /home/nborisov/projects/kernel/duperemove/mnt-test/base-file
Total files:  2
Total extent hashes: 4
Loading only duplicated hashes from hashfile.
Found 0 identical extents.
Simple read and compare of file data found 0 instances of extents that might benefit from deduplication.
Duperemove memory usage statistics:
struct file_block num: 0 sizeof: 88 total: 0 max: 41 max total: 3608
struct dupe_blocks_list num: 0 sizeof: 112 total: 0 max: 1 max total: 112
struct dupe_extents num: 0 sizeof: 112 total: 0 max: 0 max total: 0
struct extent num: 0 sizeof: 144 total: 0 max: 0 max total: 0
struct filerec num: 0 sizeof: 192 total: 0 max: 2 max total: 384
struct filerec_token num: 0 sizeof: 32 total: 0 max: 0 max total: 0
struct file_hash_head num: 0 sizeof: 48 total: 0 max: 2 max total: 96
struct find_dupes_cmp num: 0 sizeof: 16 total: 0 max: 0 max total: 0
Sqlite3 used: 0  highwater: 226072

And this is also on a btrfs filesystem. Also the log you provide is indeed strange. Initially duperemove finds 2 extents (which is corroborated by your filefrag output)

[2/9] (22.22%) csum: /home/wmertens/.config/google-chrome/Default/GPUCache/index
Fiemap file "/home/wmertens/.config/google-chrome/Default/GPUCache/index", start: 0, count: 292 <--- this initializes the fiemap context
fiemap_iter: filename "/home/wmertens/.config/google-chrome/Default/GPUCache/index" idx 0 return poff 361035153408 loff 0 len 4096 flags 0x0 <--- finds first extent
fiemap_iter: filename "/home/wmertens/.config/google-chrome/Default/GPUCache/index" idx 1 return poff 158912512 loff 131072 len 131072 flags 0x2001 <--- finds second extent, this one has flags 0x2001 which translates to FIEMAP_EXTENT_SHARED/FIEMAP_EXTENT_LAST , again corroborated by your filefrag. 

Because FIEMAP_EXTENT_LAST is set then the loop in csum_by_extent should break, even the looping output suggests this flag is set:

fiemap_iter: filename "/home/wmertens/.config/google-chrome/Default/GPUCache/index" idx 0 return poff 158912512 loff 131072 len 131072 flags 0x2001
Fiemap file "/home/wmertens/.config/google-chrome/Default/GPUCache/index", start: 262144, count: 292
0 extents found

At this point I'm wondering if perhahps your database file is using the old, v2 format which hashes files in blocksizes and uses slightly different logic. Can you provide the output of the following command:

sqlite3 /duperemove-hashes.db "select * from config" . Another thing, depending on how comfortable you are with gdb you can try running duperemove with single hashing thread by adding the --io-threads=1 option and break when the hang occurs and provide a calltrace with gdb's bt command and provide the output.

@lorddoskias
Copy link
Collaborator

Ok I managed to reproduce it, working on a fix.

@lorddoskias
Copy link
Collaborator

Should be fixed by fd20113 , please retest.

@lorddoskias
Copy link
Collaborator

Any feedback on the fix, I'd like to close the issue.

@wmertens
Copy link
Author

wmertens commented Dec 8, 2020 via email

@wmertens
Copy link
Author

It works, thanks! 🎉

@lorddoskias
Copy link
Collaborator

Fixed in fd20113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants