New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for merging/combining deltas #36

Closed
GoogleCodeExporter opened this Issue Mar 24, 2015 · 19 comments

Comments

Projects
None yet
1 participant
@GoogleCodeExporter

GoogleCodeExporter commented Mar 24, 2015

(I deleted the default issue description headers, since those apply to a
bug report but not so much to a feature request)

Say I have a huge file (3GB) that is version 1. I also have a xdelta3 patch
for version1 -> version2 and another for version2 -> version3. I would like
to do version1 -> version3 patching without needing to do it in two steps,
since that would require to actually create version2 in disk.

A solution would be allowing stdin as source. Something like this:
xdelta30q -d -s version1 v1-to-v2.xd3 | xdelta30q -d -s - v2-to-v3.xd3 version3

A dash in place of a filename is a quite standard way of saying "use
stdin/stdout". md5sum uses it on its -c/--check switch, for example. Doing
it this way would also do the patching almost twice as fast on a dual-core
machine, since both would run at the same time. On Unix systems, named
pipes (mkfifo) could also be used, but that's more cumbersome to set up
(and I'm on Windows anyway).

Another scenario: I have version1 and those two patches, and want to have
all three versions. I have a dual-core* and want to decompress both at the
same time. Command:
xdelta30q -d -s version1 v1-to-v2.xd3 | tee version2 | xdelta30q -d -s -
v2-to-v3.xd3 version3

* I'm only saying that to explain my request. I *wish* I really had a dual
core :(

Original issue reported on code.google.com by nicolas....@gmail.com on 1 Jun 2007 at 7:28

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

Another solution might be some form of diff file combining:

vcdiff-combine v1-to-v2.xd3 v2-to-v3.xd3 > v1-to-v3.xd3
xdelta30q -d -s version1 v1-to-v3.xd3 > version3

Is such a thing possible?
    I guess this doesn't help with reducing everything to a single command line, but
it would make repeated multi-diff patching much easier. Redundant data between
v1-to-v2.xd3 and v2-to-v3.xd3 might also be eliminated in the first step, making
processing faster in subsequent use of the new v1-to-v3.xd3 file. Assuming the 
diffs
insert less than half of the file size of version3 each, v1-to-v3.xd3 is bound 
to use
much less disk space than an interim version2 would. 

Original comment by jaredha...@gmail.com on 5 Jun 2007 at 3:26

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

I just tried to use a named pipe (created with mkfifo) as source while 
*creating* a
diff, and it gave this error:
xdelta3: source file must be seekable

I can understand why seeking may be needed to create the patch. However, is it
required for the source file to be seekable when *applying* a patch? (I haven't 
yet
tried to use a fifo there)

Original comment by nicolas....@gmail.com on 9 Jun 2007 at 11:13

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

This is a common request.  Right now the command-line does not support multiple
arguments, I'd like to reorganize the code to support that first.  After that, 
it
shouldn't be too difficult to add merge-delta support.

Original comment by josh.mac...@gmail.com on 14 Dec 2007 at 9:32

  • Changed title: Support for merging/combining deltas
  • Changed state: Accepted
@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

Original comment by josh.mac...@gmail.com on 14 Dec 2007 at 9:32

  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect
@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

FYI, I've begun work on a new "merge" command.

Original comment by josh.mac...@gmail.com on 15 Dec 2007 at 1:16

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

Original comment by josh.mac...@gmail.com on 15 Dec 2007 at 9:00

  • Changed state: Started
@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

FYI, as of SVN 240 the "merge" command is working, but only lightly tested.  I 
will
update this issue again when it's been tested well enough for a new release.

Original comment by josh.mac...@gmail.com on 22 Apr 2008 at 4:03

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

I've been trying to use the new merge command and have had no luck in actually
getting it to want to do anything.  Should xdelta3 -m 1.vcdiff -m 2.vcdiff 
3.vcdiff
out.vcdiff work or are there other things I should need to have on the command 
line?
 Also, to double check, 1.vcdiff should take the file from ver1 to ver2, 2.vcdiff
should take it from ver2 to ver3, and 3.vcdiff should take it from ver3 to
ver4...outputting a file that will take ver1 to ver4?  Does it need the source 
at all?  

I keep seeing "cannot merge inputs which do not have a source file" 

Could you give me an example usage that works for you so I can figure it out 
from
there?  Thanks.

Also, is there a way to pipe the output of the merged vcdiffs to stdout so it 
can be
read right back into an xdelta3 decompress from the original file?

An unrelated feature request I'll toss in here would be the ability to have the
source file be used from over the network in a low bandwidth sort of way (like 
rsync
i guess) which looks like it could be done (or seems so from my first glance 
anyway)
using the fingerprinting stuff already being done.  I'm going to play around 
with
that idea a little bit, but the merge command would be so so so so so useful 
but I
just can't figure out where I'm screwing up in actually getting it to work.  
Thanks
for an absolutely awesome tool.

Original comment by mmart...@gmail.com on 28 May 2008 at 6:33

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

Yeah, I would really love this to work too and can't make it work. If you do 
merge 01
01a then 01a is basically the same as 01 but without the header. It still works 
with
-d. But if you do merge -m 01 12 02 then printdelta on 02 seems to suggest
everything's ok, but -d -s 0 02 2a does not put anything in 2a. So I can't work 
this
one out. It would be so very useful. Maybe the current implementation doesn't 
work if
the source files aren't the same length? Dunno.

Original comment by matthew....@gmail.com on 13 Aug 2008 at 9:53

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

so using -A, there are differences:

calculating the delta directly between 0 and 2:

> ./xdelta3 printdelta /tmp/foo02
VCDIFF version:               0
VCDIFF header size:           5
VCDIFF header indicator:      none
VCDIFF secondary compressor:  none
VCDIFF window number:         0
VCDIFF window indicator:      VCD_ADLER32 
VCDIFF adler32 checksum:      1BBB0401
VCDIFF delta encoding length: 22
VCDIFF target window length:  12
VCDIFF data section length:   12
VCDIFF inst section length:   1
VCDIFF addr section length:   0
  Offset Code Type1 Size1 @Addr1 + Type2 Size2 @Addr2
  000000 013  ADD    12        

And then the merged version (created with merge -m /tmp/foo01 /tmp/foo12 
/tmp/foo02a):

> ./xdelta3 printdelta /tmp/foo02a
VCDIFF version:               0
VCDIFF header size:           5
VCDIFF header indicator:      none
VCDIFF secondary compressor:  none
VCDIFF window number:         0
VCDIFF window indicator:      none
VCDIFF delta encoding length: 18
VCDIFF target window length:  12
VCDIFF data section length:   12
VCDIFF inst section length:   1
VCDIFF addr section length:   0
  Offset Code Type1 Size1 @Addr1 + Type2 Size2 @Addr2
  000000 013  ADD    12        

So there's no window indicator and the encoding length is different (22 vs 18). 
-d -s
foo0 foo02 foo2.1 does recreate foo2 correctly. -d -s foo0 foo02a foo2.2 does 
not put
anything in foo2.2.

Original comment by matthew....@gmail.com on 13 Aug 2008 at 10:02

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

You're right that this code isn't really working yet.  However, I've 
implemented a
new testing framework and now that I have a good test ready, I'm in the process 
of
debugging this stuff.  Won't be long now.

Original comment by josh.mac...@gmail.com on 14 Aug 2008 at 2:57

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

I just fixed a bug in the merge code and would appreciate if you could test 
with the
latest version in SVN.  If it doesn't work, please post the files and 
command-lines
used, and I'll be happy to add it to my test suite.  As of now, my test actually
passes (doesn't mean everything is working but it's a good sign).

Original comment by josh.mac...@gmail.com on 16 Aug 2008 at 3:52

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

Ok, it works in some cases. However, not all. The following fails - I'm trying 
to
combine two large deltas, both of which are inserts and both of which end 
before the
original file starts.

cp /usr/share/dict/words mergetest/w0
tail -n 50000 mergetest/w0 > mergetest/w1
tail -n 5000 mergetest/w1 > mergetest/w2

./xdelta3 -f -v -s mergetest/w2 mergetest/w1 mergetest/w21
./xdelta3 -f -v -s mergetest/w1 mergetest/w0 mergetest/w10

sha1sum mergetest/w0
./xdelta3 -f -v -v -d -c -s mergetest/w1 mergetest/w10 | sha1sum -
# check sums match, all is well, no errors yet

sha1sum mergetest/w1
./xdelta3 -f -v -v -d -c -s mergetest/w2 mergetest/w21 | sha1sum -
# check sums still match, all is well, no errors yet

./xdelta3 -f -v merge -m mergetest/w21 mergetest/w10 mergetest/w20
# printdelta does return a lot and all looks like it might be ok, but...
./xdelta3 -f -v -v -d -c -s mergetest/w2 mergetest/w20
xdelta3: input window size: 386177
xdelta3: source mergetest/w2 winsize 44 KB size 45631
xdelta3: source block size: 45631
xdelta3: source file too short: XD3_INTERNAL

# no dice.
I've attached the original /usr/share/dict/words file that I used, though I 
would
guess this bug wouldn't be specific to that!

On the other hand, doing this test the other way round (i.e. going from big to 
small)
works fine.

Original comment by matthew....@gmail.com on 16 Aug 2008 at 7:33

Attachments:

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

SVN 267 introduces a test that reproduces this issue, at least.

Original comment by josh.mac...@gmail.com on 5 Sep 2008 at 3:07

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

SVN 268 fixes at least one merge bug, but your test still fails.  I'll keep at 
it
another day.

Original comment by josh.mac...@gmail.com on 5 Sep 2008 at 4:33

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

SVN 271 fixes two more merge bugs, and the new tests (inspired by your example) 
are
now passing.  I ran your example by hand, to verify it.

However, this "final step" in getting merge to work is not a permanent solution
because the algorithm is slow and somewhat inefficient.  The TODO in 
xdelta3-merge.h
reads:

          // TODO: this is slow because of the recursion, which
          // could reach a depth equal to the number of target
          // copies, and this is compression-inefficient because
          // it can produce duplicate adds.

Original comment by josh.mac...@gmail.com on 7 Sep 2008 at 9:46

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

Now that the algorithm is at least correct, the merge command needs finishing
touches: preserving window checksums and application headers.  Stay tuned.  I'll
probably handle some other pending issues before I solve the TODO above.

Original comment by josh.mac...@gmail.com on 7 Sep 2008 at 9:48

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

3.0u

Original comment by josh.mac...@gmail.com on 13 Sep 2008 at 1:22

  • Changed state: Fixed
@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Mar 24, 2015

Looks like my original request (source from stdin so I can use a pipeline) was 
recently implemented too, wasn't it?

Original comment by nicolas....@gmail.com on 9 Dec 2011 at 8:04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment