Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't read data correctly with 'Save identical file as references' features. #24

Closed
hasse69 opened this issue Mar 13, 2015 · 21 comments
Closed

Comments

@hasse69
Copy link
Owner

hasse69 commented Mar 13, 2015

What steps will reproduce the problem?

  1. Create archive with 'reference'.
  2. Mount that archive.
  3. Compare file by binary basis.

Step-by-step instructions below.

What is the expected output? What do you see instead?

# dd if=1/test/winrar-x64-501.exe bs=512 count=1 status=none | hexdump
0000000 b394 4d40 0220 0b03 0500 1000 0000 0000
0000010 0080 0600 6574 7473 312f 030a c002 fb0c
0000020 5106 cf1b b301 782a 1ecb 0302 000b 0005
0000030 0010 0000 8000 0000 7404 7365 0a74 0203
0000040 a8ae 05bf 1b51 01cf a891 29a9 030e ca06
0000050 0000 00ca 8000 0000 5102 7d4f ce90 45aa
0000060 ce00 7183 4e40 33f3 3b49 0302 ee0b f181
0000070 0080 d804 f8d8 0080 6a20 71a2 80c6 0025
0000080 7419 7365 2f74 2f31 6977 726e 7261 782d
0000090 3436 352d 3130 652e 6578 030a 8002 6e42
00000a0 6ca8 ceee 1d01 5677 0351 0405 0000     
00000ad

dd if=1/test/1/winrar-x64-501.exe bs=512 count=1 status=none | hexdump
0000000 5a4d 0090 0003 0000 0004 0000 ffff 0000
0000010 00b8 0000 0000 0000 0040 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
0000030 0000 0000 0000 0000 0000 0000 00e8 0000
0000040 1f0e 0eba b400 cd09 b821 4c01 21cd 6854
0000050 7369 7020 6f72 7267 6d61 6320 6e61 6f6e
0000060 2074 6562 7220 6e75 6920 206e 4f44 2053
0000070 6f6d 6564 0d2e 0a0d 0024 0000 0000 0000
0000080 54aa 22ef 35ee 7181 35ee 7181 35ee 7181
0000090 4de7 7114 35e3 7181 4de7 7102 3594 7181
00000a0 4de7 7106 35ef 7181 4de7 7112 35fb 7181
00000b0 35ee 7180 351c 7181 4de7 7105 358b 7181
00000c0 4de7 7113 35ef 7181 4de7 7115 35ef 7181
00000d0 4de7 7110 35ef 7181 6952 6863 35ee 7181
00000e0 0000 0000 0000 0000 4550 0000 8664 0005
00000f0 ee85 529a 0000 0000 0000 0000 00f0 0023
0000100 020b 0009 ac00 0002 2a00 0003 0000 0000
0000110 1b6c 0002 1000 0000 0000 4000 0001 0000
0000120 1000 0000 0200 0000 0005 0002 0000 0000
0000130 0005 0002 0000 0000 1000 0006 0400 0000
0000140 1a2b 001f 0002 8100 0000 0010 0000 0000
0000150 1000 0000 0000 0000 0000 0010 0000 0000
0000160 1000 0000 0000 0000 0000 0000 0010 0000
0000170 3aa0 0003 0033 0000 236c 0003 00c8 0000
0000180 a000 0005 6640 0000 7000 0005 2100 0000
0000190 1400 001e 1858 0000 0000 0000 0000 0000
00001a0 c7c0 0002 001c 0000 0000 0000 0000 0000
00001b0 0000 0000 0000 0000 0000 0000 0000 0000
*
00001d0 c000 0002 06f0 0000 0000 0000 0000 0000
00001e0 0000 0000 0000 0000 0000 0000 0000 0000
00001f0 742e 7865 0074 0000 ab12 0002 1000 0000
0000200

Two files must be identical.

What version of the product are you using? On what operating system?

OS : openSUSE 13.1 x86-64

    # rar2fs/rar2fs-1.19.1/rar2fs -V
    rar2fs v1.19.1 (DLL version 6)    Copyright (C) 2009-2013 Hans Beckerus
    This program comes with ABSOLUTELY NO WARRANTY.
    This is free software, and you are welcome to redistribute it under
    certain conditions; see <http://www.gnu.org/licenses/> for details.
    FUSE library version: 2.9.3
    fusermount version: 2.9.3
    using FUSE kernel interface version 7.19

Please provide any additional information below.

I used x86-64 Windows version of WinRAR to create archive.
It works with WINE correctly. I guess WinRAR binary's target OS
isn't affect to this behavior.

Instructions:

1. Make following directory hierarchy.

test/winrar-x64-501.exe
test/1/winrar-x64-501.exe

2. Make RAR archive with following options.

# wine 'C:\Program Files\WinRAR\Rar.exe' a -ma5 -m5 -oi:0 TEST.rar test

NOTE : With -oi option, WinRAR scans identical file and stores it as 'reference' that points only one copy of actual compressed data. ":0" means "scan any size of file".

3. Verify the archive creates correctly as expected.

# unrar vta TEST.rar

UNRAR 5.00 freeware      Copyright (c) 1993-2013 Alexander Roshal

Archive: TEST.rar
Details: RAR 5

        Name: test/1/winrar-x64-501.exe
        Type: File
        Size: 1977432
 Packed size: 1851630
       Ratio: 93%
       mtime: 2013-12-01 17:09,000
  Attributes: ..A....
       CRC32: C671A26A
     Host OS: Windows
 Compression: RAR 5.0(v0) -m5 -md=2M

        Name: test/winrar-x64-501.exe
        Type: File copy
      Target: test\1\winrar-x64-501.exe
        Size: 1977432
 Packed size: 0
       Ratio: 0%
       mtime: 2013-12-01 17:09,000
  Attributes: ..A....
       CRC32: 00000000
     Host OS: Windows
 Compression: RAR 5.0(v0) -m0 -md=2M

        Name: test/1
        Type: Directory
       mtime: 2014-01-27 20:15,000
  Attributes: ...D...
       CRC32: 00000000
     Host OS: Windows
 Compression: RAR 5.0(v0) -m0 -md=0K

        Name: test
        Type: Directory
       mtime: 2014-01-27 20:15,000
  Attributes: ...D...
       CRC32: 00000000
     Host OS: Windows
 Compression: RAR 5.0(v0) -m0 -md=0K

        Name: QO
        Type: Service
        Size: 74
 Packed size: 74
       Ratio: 100%
  Attributes: .B
     Host OS: Windows
 Compression: RAR 5.0(v0) -m0 -md=128K

     Service: EOF

Yes, test/test/winrar-x64-501.exe is marked as 'File Copy'. Target is 'test/winrar-x64-501.exe'.

4. mount this archive to any path.
5. Compare file using cmp. results are above.

I'll attach example file explained above.

Original issue reported on code.google.com by jyhpsycho on 2014-01-27

@hasse69 hasse69 self-assigned this Mar 13, 2015
@hasse69 hasse69 removed their assignment Mar 13, 2015
@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

some typo... :-(

Yes, test/test/winrar-x64-501.exe is marked as 'File Copy'. Target is 'test/winrar-x64-501.exe'.
=> Yes, test/winrar-x64-501.exe is marked as 'File Copy'. Target is 'test/1/winrar-x64-501.exe'.

Original issue reported on code.google.com by jyhpsycho on 2014-01-27 11:41:57

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

Thanks for the issue report.
This is the first time I heard of this feature (looks very similar to symbolic links
though). I need to investigate how this feature works on low level. Since the archive
itself is compressed it should work unless the UnRAR library API does not support this
feature. 
Are the files otherwise presented correctly in the file system? 

Original issue reported on code.google.com by hasse69 on 2014-01-27 15:04:26

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

Ok, after digging around a bit I found that the "file copy" mode is more or less the
same has a "hard link". Hard links are currently not supported by rar2fs, neither are
windows style symbolic links and junctions. The redirection supported by rar2fs today
is Unix style symbolic links only. It should be possible to add support for more redirection
types but it will require a bit more effort. Would it be ok as a short term solution
to treat "file copy" as a symbolic link instead?

Original issue reported on code.google.com by hasse69 on 2014-01-27 16:14:24

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

It seems like symbolic link technically - stores target path instead of actual data
and data offset(internal RARv5 data structure) doesn't indicates actual data - and
it seems to be sufficient at this time. But, symbolic link is treated more carefully
when using some other software(e.g. cp, diff)...

Original issue reported on code.google.com by jyhpsycho on 2014-01-27 23:35:50

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

WinRAR extracts "File Copy" as regular file, not hard link or symbolic link. It means
file operation (cp, diff, or many other software) on it causes unexpected or different
results. If "File Copy" exposes as link, I think there's no merit mounting RAR file
as filesystem because extracting it always worked correctly as expected, but mounting
it doesn't. Can you implement that files exposes as regular file on filesystem?

Original issue reported on code.google.com by jyhpsycho on 2014-01-28 00:14:06

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

Right. Agreed.
The "File Copy" mode is actually more similar to hard links than soft links. I will
look into adding support for "File Copy". Does not seem that hard, but I am a bit short
on time at the moment. Support for hard links, Windows soft links and junctions will
have to come later, if ever.
I will post a patch when I have something ready for test. 

Original issue reported on code.google.com by hasse69 on 2014-01-28 07:38:59

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

This might take some time I am afraid :(
Unfortunately I just discovered that the solution for issue #22 created a new problem
that prevents proper display of "RAR inside RAR" for RAR5 archives. I need to look
into that problem first.

Original issue reported on code.google.com by hasse69 on 2014-01-28 17:54:22

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

Ok, the problem with "RAR inside RAR" has been resolved and a new version has been released.
I will now continue looking into the "File Copy" issue. I had something almost working
until I stumbled into the other issue :( Google has also chosen to change how downloads
are hosted so I had to spend a few hours on restructuring the project page.

Original issue reported on code.google.com by hasse69 on 2014-01-29 00:04:01

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

I have something now that is close to working. What is missing is the support for references
to "RAR inside RAR" files, eg. when a RAR file is referencing an identical RAR file
but in another location inside the main archive. Probably not a very common use case
though. 

Original issue reported on code.google.com by hasse69 on 2014-01-29 15:28:51

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

There is now a new version committed to trunk.
Please verify if it resolves the issue and report back the result.

Original issue reported on code.google.com by hasse69 on 2014-01-29 17:23:10

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

It doesn't work correctly for me; I used approx. 38GB RARv5 archive for test. I found
that some random data difference occurs when I extracted contents to HDD and compare
it directly to mounted one using diff. It seems to be 'random' I mentioned because
that's location checked by cmp varies when I unmount and remount it or when I flushes
all page cache using "echo 3 > /proc/sys/vm/drop_caches" manually and re-run diff.
Sometimes it says no difference! But, why it randomly returns wrong data? I think there's
some other bugs on it...

Original issue reported on code.google.com by jyhpsycho on 2014-01-30 03:38:35

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

Whole steps :

1. Extract archive contents to HDD using unrar.

2. Mount that archive using rar2fs.

3. Run following:
# diff -ur (Extracted path) (Mounted path)

It says some files are differ.

4. Run cmp to identify where data difference occurs.
# cmp (file in Extracted path) (file in Mounted path)

It says differ location - varies when remount or flushes page cache.

I doesn't checked whether it occurs with "small archives" - I means <100 megabytes.

Original issue reported on code.google.com by jyhpsycho on 2014-01-30 03:50:22

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

Thanks for testing. But from your description it sounds really strange.
Can you tell if this has anything to do with the 'Save identical file as references'
feature or if this is something else? Is it always the same file that differs or is
it different files? To me it does not sound like it has anything to do with the "File
Copy" mode, or? Have you tried using RAR5 archives with and without "File Copy" and
also RAR4 archives? What compression level (-mN) do you use? I will try to reproduce
it myself. I will re-open the case for now, but if this is not related to the original
problem I would prefer a new issue report.

Original issue reported on code.google.com by hasse69 on 2014-01-30 07:10:03

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

(No text was entered with this change)

Original issue reported on code.google.com by hasse69 on 2014-01-30 07:10:33

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

Oh, I checked results once more and realized that it seems not related to 'File Copy'
feature... I will perform many combination of tests using that data and report it later.
That tests consumes lot of time because original data has very big size(~40GB)...

Original issue reported on code.google.com by jyhpsycho on 2014-01-30 07:43:02

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

Committed a patch to the trunk. Please update and try again.

Original issue reported on code.google.com by hasse69 on 2014-01-30 07:48:52

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

The last patch only seemed to make things better, but I could still reproduce the problem
for archives with "File Copy" mode. Have not yet been able to reproduce this for archives
without it! I have now made yet another patch and committed it to trunk. For revision
r438 I have not been able to reproduce the issue as of yet. 

Original issue reported on code.google.com by hasse69 on 2014-01-30 08:22:17

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

It doesn't work, too. I tested some cases of compression options, then I found that
stored (-m0) archive does not have that problem. That occurs only compressed steam
- even no 'references'. I think this case is fixed, just open another issue that such
as "diff failed".

Original issue reported on code.google.com by jyhpsycho on 2014-01-30 09:29:54

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

Can you please clarify, what works and what does not?
I am having issues closing this issue if you are still facing problems.

Original issue reported on code.google.com by hasse69 on 2014-01-30 10:01:06

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

Oh, I see... I'm sorry for unclear message;

Works : "Referenced" file. archive that created with stored mode doesn't have any problems
- few times of test, though.

Not works : File compare test sometimes failed at random location.

Original issue reported on code.google.com by jyhpsycho on 2014-01-30 10:50:55

@hasse69
Copy link
Owner Author

hasse69 commented May 2, 2015

Alright. I will close this case.
The original issue as reported was the fact that the 'Save identical file as references'
feature was not supported at all by rar2fs. In fact, it read from completely bogus
offsets in the archive resulting in a very deterministic error rate. This new problem
is something else and needs to be investigated separately as part of issue 25.

Original issue reported on code.google.com by hasse69 on 2014-01-30 15:50:17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant