Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help] Recovering data based on list of flawed sectors. #55

Closed
thican opened this issue Mar 9, 2017 · 13 comments
Closed

[Help] Recovering data based on list of flawed sectors. #55

thican opened this issue Mar 9, 2017 · 13 comments
Labels

Comments

@thican
Copy link

thican commented Mar 9, 2017

Hello,

Note: I was trying to find an email address for this kind of support, but as the Readme file requests it, I am opening a new issue.

TL;DR: Ask help for a tool which can indicate which file (or none) belongs to a sector/block/whatever.

Couple of years ago, one of my HDD was becoming faulty; so I unplugged it, and few days later, I did a recover of its data using GNU's ddrescue. Inside this HDD, only one partition with an exFAT FS, covering almost the whole support (inside an MBR partition table, IIRC).
As a result, I have two files, the first as a RAW data file, a (almost) perfect copy of the partition which I can mount using loop devices (working nicely), and the second as a ddrescue's log file, which contents a list of faulty sectors when trying to recover data from the HDD.

Based on this logfile, I have something like 4 MiB of bad sectors on a 1.4 TiB FS; but the problem is I don't know if those bad sectors were holding data, or were deleted files.

I guess I am looking for a tool which can "reverse" the searching process, and then tells us “this sectors belongs to file "/path/to/file"” or “no content on this sector”. As my final objective, I want to get the list of broken files, and finaly recover the other data from this RAW image to another safe device.

I guess I could try to implement myself this tool, but I don't know the "protocol" behind exFAT, hence my request for support, please.

Hope to hear from you soon.

Here a little paste from the logfile:

# Rescue Logfile. Created by GNU ddrescue version 1.19
# Command line: ddrescue --input-position=0 --direct --preallocate --retry-passes=3 --verbose /dev/sdc1 wd_1500G.dd wd_1500G.logfile
# Start time:   2015-07-28 02:31:56
# Current time: 2015-07-28 02:32:02
# Retrying bad sectors... Retry 1 (forwards)
# current_pos  current_status
0x93B04F1400     -
#      pos        size  status
0x00000000  0x93B04F1000  +
0x93B04F1000  0x00001000  -
0x93B04F2000  0x00006000  +
0x93B04F8000  0x00001000  -
../..
0x93B055A000  0x00001000  -
0x93B055B000  0x59C2E3C000  +
0xED73397000  0x00001000  -
../..
0xED73DC1000  0x00001000  +
0xED73DC2000  0x00001000  -
0xED73DC3000  0x6FDCF3D000  +
@relan
Copy link
Owner

relan commented Mar 10, 2017

Hello,

Let's try! If fuse-exfat manages to mount the image, chances are good.

First of all, I'd ensure that FS metadata is OK. Does exfatfsck detect any errors on the rescued image?

@thican
Copy link
Author

thican commented Mar 10, 2017

Hello,

First of all, thanks for the answer.

About exfatfsck, it indeed found 2 errors, on the same file (named "something.bin"):

# exfatfsck /mnt/temp/wd_1500G.dd
exfatfsck 1.2.4
Checking file system on /mnt/temp/wd_1500G.dd.
File system version           1.0
Sector size                 512 bytes
Cluster size                512 KB
Volume size                1397 GB
Used space                  988 GB
Available space             409 GB
ERROR: cluster 0x2baa02 of file 'something.bin' is not allocated.
ERROR: cluster 0x2baa03 of file 'something.bin' is not allocated.
Totally 6633 directories and 34131 files.
File system checking finished. ERRORS FOUND: 2.

Note: I see the output only give the file's name, without its path inside the FS; what if I have multiple files with the same name? The output lacks its context, I guess.

As you can see, 409 GiB are free, so the other bad sectors might be inside the empty space.

Thanks for support.

@relan
Copy link
Owner

relan commented Mar 11, 2017

About exfatfsck, it indeed found 2 errors, on the same file

That's strange. This error means cluster bitmap is corrupt, which should not be the case because missing parts start at 0x93B04F1000. Cluster bitmap is located at the beginning of the FS.

I see the output only give the file's name, without its path inside the FS; what if I have multiple files with the same name? The output lacks its context, I guess.

You can enable debug output to make exfatfsck print full paths of all files it checks:

diff --git a/fsck/main.c b/fsck/main.c
index 19628e8..f4bbc94 100644
--- a/fsck/main.c
+++ b/fsck/main.c
@@ -26,7 +26,7 @@
 #include <inttypes.h>
 #include <unistd.h>
 
-#define exfat_debug(format, ...)
+// #define exfat_debug(format, ...)
 
 uint64_t files_count, directories_count;
 

I'd do the following:

  1. Mount the image (in read-only mode of course).
  2. Run find -type f to list all files paths.
  3. For each of those paths do dump/dumpexfat -f FILE IMAGE. It'll print fragments occupied by a FILE (see manpage). Note that you need at least exfat-utils 1.2.5 for this.
  4. Compare each printed range with ddrescue log.

This will be slow, but unfortunately exFAT does not have a reverse mapping for clusters, i.e. you cannot easily tell which file uses a particular cluster.

@relan relan added the question label Mar 14, 2017
@relan
Copy link
Owner

relan commented Jun 5, 2017

I hope you're fine and safe. :)

@relan relan closed this as completed Jun 5, 2017
@thican
Copy link
Author

thican commented Nov 9, 2017

Hello!

Sorry, it has been a while without news, but I finally got my scripts working.
By the way, I am fine and safe, thank you, I hope you are as well.

So, I made two Python files, which can be reused efficiently (I hope it will help someone else too, so this message is kind of a tutorial) (note: GitHub doesn't want Python files, so I "gzip-ed" them):

  1. The first one, named exfat_dump_script.py.gz, is a Python code/script which requires access (in read-only) to the content of the image (tested with Python 3.4 and Python 3.6, should support Python 3.5 too).
    It only requires the path of the (loop) device on which the image has been attached, to seek its mounted point after being mounted (man findmnt(8)); dumpexfat utility requires the access to the content of the medium, so I used the following mount command: mount -o loop,ro /path/to/image.dd /media/mount_point/
    Use losetup -l for finding out which loop device is used by your image (for me, it was /dev/loop0).
    And then, the Python script will write to stdout (don't forget to save it!) the list of files with the positions and sizes in byte for each files, which is the dumpexfat output. Note 1: if a file is not contigu, then you will have multiple lines, which is not a problem. Note 2: I didn't handle at 100% every filename possibilities (having a new line inside its name for example), but I guess exFAT is limited on this feature (unlike other Linux FS), so it's okay (I guess).
    Now, with this output in our possession, we don't need to use the image (except for the final recovery), and can use the second Python code.

  2. This second Python file is named seek_matches.py.gz, and it will now need to content of the ddrescue logfile.
    I won't explain everything, but mostly, it starts reading the ddrescue file content, to get the list of flawed sectors based on their status (3rd column, "-" value) and then it compares the ranges of (a piece of) each files with each flawed sectors entries (using the function "_isCrossing").
    If a crossing is detected, then the path of the file is happened to the JSON output, which is displayed at the end of the execution.

And that's it.

But somehow, I am a bit confused, because the last script didn't find any match on my image with the list of files and the list of flawed sectors. I made two files for some tests, and those work as expected.
So, before I copy some corrupted files, I would like please some reviews on those scripts, and maybe if you want, you can ship them as tools for some recovery.

Sorry for providing some not-so-user-friendly pieces of software, you have to open their content to modify some paths (in the end of the files, before the call for the main function).

I hope to hear from you soon, to be sure I didn't miss some information (I was aware about the value in bytes).

Thanks again, and have a nice day.

@thican
Copy link
Author

thican commented Nov 9, 2017

Oh, I forgot, here are the test files for the 2nd script:
test_dumpexfat.txt
This one has been renamed with the .txt extension, for github…
test_logfile.txt

@dumblob
Copy link

dumblob commented Aug 28, 2019

@relan would you consider adding the linked scripts to contrib/ in this repository? Or even reimplementing them as part of exfatfsck?

@relan
Copy link
Owner

relan commented Aug 29, 2019

would you consider adding the linked scripts to contrib/ in this repository?

No, the author said those scripts didn't work as expected.

Or even reimplementing them as part of exfatfsck?

Recovery utility would be nice to have, but realistically I'll hardly have any time to implement and (more importantly) test it.

@dumblob
Copy link

dumblob commented Aug 29, 2019

Recovery utility would be nice to have, but realistically I'll hardly have any time to implement and (more importantly) test it.

Understood. Either way if you had at least ideas what and how such "recovery" utility should work, just describe it in your own words in a TODO section in Readme.md and put a "help needed" tag to it. The reason is, that someone understanding exFAT up to such a degree is an extremely scarce species 😉, but programmers being able to write something like that according to a technical advice are thousands.

@thican
Copy link
Author

thican commented Aug 29, 2019

Hello everyone,

Glad those scripts could help someone else.

would you consider adding the linked scripts to contrib/ in this repository?

No, the author said those scripts didn't work as expected.

Uh, no, I said “I [was] a bit confused, because the last script didn't find any match on my image with the list of files and the list of flawed sectors”; which in facts means “no files were corrupted”, which is a good news.
In fact, it might not be a surprise: on an over than 1.5 TB-sized partition, "only" 900 GB of it were used, and only 4M in not contiguous sectors/blocks were unreadable, not on the 900 GB used part, but on the whole filesystem.

However, I might have to rework some part of those Python scripts, IIRC, I think I made some mistakes, or even to update it to Python 3.7 (subprocess module got another update for its run method).
But it actually was working, I don't why you are saying it didn't. :-)

Best regards,

@relan
Copy link
Owner

relan commented Aug 30, 2019

I said “I [was] a bit confused, because the last script didn't find any match on my image with the list of files and the list of flawed sectors”; which in facts means “no files were corrupted”, which is a good news.

Oh, I misunderstood you then.

@hamlet
Copy link

hamlet commented Mar 6, 2021

Hello,
just a quick note to tell you that @thican's scripts did work for me. They found a few hundred impacted files, and a visual check of jpg files did show some holes in them. Pdf files were missing pages.
I had to slightly adapt the script because I ddrescued a whole disk, so I used kpartx to expose the single partition as a loop device, and parted -l to determine the offset to apply between dumpexfat and ddrescue results, which I hard-coded in the second script.

Beforehand I tried ddru_findbad, which use sleuthkit, that support exfat, so it should work, but not for me. I read somewhere a lucky user, but for me ifind failed, and I don't know enough to dig more.
Best regards

@mmassing
Copy link

mmassing commented Sep 28, 2021

I've also succesfully used slightly modified versions of @thican's scripts - note that seek_matches.py (line 37) does not handle non-scraped ranges (status '\') correctly - they should be considered bad sectors as well, since these ranges are not copied by ddrescue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants