New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ten megabytes per minute #2

Closed
jaredjennings opened this Issue Jun 5, 2013 · 11 comments

Comments

Projects
None yet
1 participant
@jaredjennings
Owner

jaredjennings commented Jun 5, 2013

After the unit tests all passed, and the -n switch was changed to mean "don't actually burn discs," I ran a test backup of a directory containing a rip of a CD. 700MiB of data, on 30MiB "discs," with no compression, AES encryption and redundancy, took ... well I forgot what, but the rate worked out to ten megabytes per minute, and subjectively it looked like most of the time spent was spent running par2. And this was not even counting the time it takes to burn discs.

My aim is for darbrrb to back up hundreds of gigabytes of data, and do it in less time than a week.

@jaredjennings

This comment has been minimized.

Show comment
Hide comment
@jaredjennings

jaredjennings Jun 19, 2013

Owner

par2-tbb is par2cmdline modified to use Intel Threading Building Blocks. It seems this software can work on multiple files at once. But I've never heard of TBB and I'm not keen on using it.

Owner

jaredjennings commented Jun 19, 2013

par2-tbb is par2cmdline modified to use Intel Threading Building Blocks. It seems this software can work on multiple files at once. But I've never heard of TBB and I'm not keen on using it.

@jaredjennings

This comment has been minimized.

Show comment
Hide comment
@jaredjennings

jaredjennings Jun 19, 2013

Owner

Gibraltar uses CUDA to perform Reed-Solomon encoding and decoding on a GPU rather than the CPU. Besides the impressive results, another great thing about this paper is the quick introduction it gives to how Reed-Solomon coding is commonly done on CPUs.

Owner

jaredjennings commented Jun 19, 2013

Gibraltar uses CUDA to perform Reed-Solomon encoding and decoding on a GPU rather than the CPU. Besides the impressive results, another great thing about this paper is the quick introduction it gives to how Reed-Solomon coding is commonly done on CPUs.

@jaredjennings

This comment has been minimized.

Show comment
Hide comment
@jaredjennings

jaredjennings Jun 19, 2013

Owner

FFLAS appears to be an optimized way to do finite-field matrix and vector operations. Perhaps it may offer a more highly optimized means of doing the Reed-Solomon arithmetic than par2cmdline presently has.

Owner

jaredjennings commented Jun 19, 2013

FFLAS appears to be an optimized way to do finite-field matrix and vector operations. Perhaps it may offer a more highly optimized means of doing the Reed-Solomon arithmetic than par2cmdline presently has.

@jaredjennings

This comment has been minimized.

Show comment
Hide comment
@jaredjennings

jaredjennings Jun 19, 2013

Owner

I feel an itch to make a par2 utility in Haskell. I'm trying to ignore it, and think toward working with par2cmdline instead, but the PAR2 format is so well specified that writing another program that produces it is tempting.

par2cmdline is written in about 13,000 lines of C++. I think a PAR2 utility could be written in not as many lines of Haskell, but FFLAS is not bound to Haskell, and PARI doesn't seem to be, either, yet, so there would be a detour to get tools ready.

Owner

jaredjennings commented Jun 19, 2013

I feel an itch to make a par2 utility in Haskell. I'm trying to ignore it, and think toward working with par2cmdline instead, but the PAR2 format is so well specified that writing another program that produces it is tempting.

par2cmdline is written in about 13,000 lines of C++. I think a PAR2 utility could be written in not as many lines of Haskell, but FFLAS is not bound to Haskell, and PARI doesn't seem to be, either, yet, so there would be a detour to get tools ready.

@jaredjennings

This comment has been minimized.

Show comment
Hide comment
@jaredjennings

jaredjennings Jun 19, 2013

Owner

Looked at FFLAS and par2cmdline's reedsolomon.cpp and galois.h. reedsolomon.cpp is right down in the weeds, xoring things and creating lookup tables and etc. That's what would need to be replaced with calls to FFLAS, I think.

Looked at PARI documentation. It was suggested because it's not made out of C++ templates, which would be difficult to bind to Haskell. But instead of embodying complexities of types in C++ templates, it swallows the complexities into its C code and interfaces. Not something that can be easily and quickly bound to Haskell.

Those StackOverflow questions linked above were asked by Johannes Weiß, who appears to have settled on NTL for doing whatever he was doing with finite fields.

A possibility I've overlooked so far is not binding the whole finite field library to Haskell, but writing a Reed-Solomon algorithm using the library and binding that into Haskell. That would be hard enough for me.

Owner

jaredjennings commented Jun 19, 2013

Looked at FFLAS and par2cmdline's reedsolomon.cpp and galois.h. reedsolomon.cpp is right down in the weeds, xoring things and creating lookup tables and etc. That's what would need to be replaced with calls to FFLAS, I think.

Looked at PARI documentation. It was suggested because it's not made out of C++ templates, which would be difficult to bind to Haskell. But instead of embodying complexities of types in C++ templates, it swallows the complexities into its C code and interfaces. Not something that can be easily and quickly bound to Haskell.

Those StackOverflow questions linked above were asked by Johannes Weiß, who appears to have settled on NTL for doing whatever he was doing with finite fields.

A possibility I've overlooked so far is not binding the whole finite field library to Haskell, but writing a Reed-Solomon algorithm using the library and binding that into Haskell. That would be hard enough for me.

@jaredjennings

This comment has been minimized.

Show comment
Hide comment
@jaredjennings

jaredjennings Jun 28, 2013

Owner

I ran another test. First I made 200MB of random data, in 20 files.

for i in `seq 1 20`; do dd if=/dev/urandom of=big$(printf %02d $i) bs=1048576 count=10; done

Then I backed it up using darbrrb, with 30MB discs, 3 data discs per set, 2 parity discs per set, 10 slices per disc. This took 770 seconds, and when I did it again, it took 770 seconds again. This was under Debian GNU/Linux i386. When I reinstalled the same machine with Debian GNU/Linux amd64, it took 593 and 590 seconds to do it twice again. These figures work out to something like 15 MB/min and 20 MB/min, respectively. Still not what they need to be, but better.

Owner

jaredjennings commented Jun 28, 2013

I ran another test. First I made 200MB of random data, in 20 files.

for i in `seq 1 20`; do dd if=/dev/urandom of=big$(printf %02d $i) bs=1048576 count=10; done

Then I backed it up using darbrrb, with 30MB discs, 3 data discs per set, 2 parity discs per set, 10 slices per disc. This took 770 seconds, and when I did it again, it took 770 seconds again. This was under Debian GNU/Linux i386. When I reinstalled the same machine with Debian GNU/Linux amd64, it took 593 and 590 seconds to do it twice again. These figures work out to something like 15 MB/min and 20 MB/min, respectively. Still not what they need to be, but better.

@jaredjennings

This comment has been minimized.

Show comment
Hide comment
@jaredjennings

jaredjennings Aug 26, 2013

Owner

A careful look through the Gibraltar paper yields a footnote I overlooked before: "Gibraltar is available at http://www.cis.uab.edu/hpcl/gibraltar, along with sample applications to test it."

Owner

jaredjennings commented Aug 26, 2013

A careful look through the Gibraltar paper yields a footnote I overlooked before: "Gibraltar is available at http://www.cis.uab.edu/hpcl/gibraltar, along with sample applications to test it."

@jaredjennings

This comment has been minimized.

Show comment
Hide comment
@jaredjennings

jaredjennings Jun 24, 2015

Owner

After a lengthy hiatus, I found that while PAR2 specifies the use of a 16-bit Galois field, Gibraltar uses an 8-bit Galois field. I didn't want to change Gibraltar because the paper goes into detail about how they optimized its speed, but I didn't want to sprout my own file format, because PAR2 has been established for 10 years. I decided to make my own file format, found that 33 commits have happened to par2cmdline since I forked it, and saw that several of these are about how you can send par2cmdline into an infinite loop with a maliciously crafted par2 file. This made me realize that there may be more possibilities to deal with in making one's own file format than I reckoned on.

I hacked on Gibraltar, and arrived at jaredjennings/libgibraltar@735f7c5. I found that the log/antilog tables necessary for 16-bit Galois field arithmetic, which are 65536 entries long instead of 256, and are made of 16-bit numbers rather than 8-bit numbers, don't fit in the shared memory of my graphics card. I could be five feet away from a solution to that problem and not know it, because I haven't learned CUDA.

Now it appears that, to get fast redundancy calculations, I need my own file format and tool to read and write it, and the best I can do is to look at the history and tests of par2cmdline in an attempt to avoid duplicating too much work.

Owner

jaredjennings commented Jun 24, 2015

After a lengthy hiatus, I found that while PAR2 specifies the use of a 16-bit Galois field, Gibraltar uses an 8-bit Galois field. I didn't want to change Gibraltar because the paper goes into detail about how they optimized its speed, but I didn't want to sprout my own file format, because PAR2 has been established for 10 years. I decided to make my own file format, found that 33 commits have happened to par2cmdline since I forked it, and saw that several of these are about how you can send par2cmdline into an infinite loop with a maliciously crafted par2 file. This made me realize that there may be more possibilities to deal with in making one's own file format than I reckoned on.

I hacked on Gibraltar, and arrived at jaredjennings/libgibraltar@735f7c5. I found that the log/antilog tables necessary for 16-bit Galois field arithmetic, which are 65536 entries long instead of 256, and are made of 16-bit numbers rather than 8-bit numbers, don't fit in the shared memory of my graphics card. I could be five feet away from a solution to that problem and not know it, because I haven't learned CUDA.

Now it appears that, to get fast redundancy calculations, I need my own file format and tool to read and write it, and the best I can do is to look at the history and tests of par2cmdline in an attempt to avoid duplicating too much work.

@jaredjennings

This comment has been minimized.

Show comment
Hide comment
@jaredjennings

jaredjennings Feb 7, 2016

Owner

I thought I had a breakthrough with Gibraltar and par2cmdline, but then I found I didn't, and came back around to needing a new file format. After much handwringing, paper-reading, googling, and more paper-reading, I've found that cooler, newer codes exist than Reed-Solomon, which may be easier to compute. Nope, just kidding, they're only faster because you implement them in hardware. And there are so many patents around error-correcting codes that it looks like a minefield to try to implement good new software that will do it. I mean, some guy used Intel's SIMD operations (SSE2 or so, I assume) to make a speed breakthrough with Reed-Solomon, but then got a patent smackdown.

Then I realized that all the features PAR2 has over PAR1 are not being used by darbrrb. I fetched parchive, and modified darbrrb slightly to use it instead of par2. I tested the resulting darbrrb on 1.5GB of random data, and found it to be around five and a half times faster than par2.

Since I filed this ticket, I've gotten a new computer, so par2 goes 34 MB/min. parchive went 184 MB/min. Ah, now I see the test above, and will run exactly that one.

Owner

jaredjennings commented Feb 7, 2016

I thought I had a breakthrough with Gibraltar and par2cmdline, but then I found I didn't, and came back around to needing a new file format. After much handwringing, paper-reading, googling, and more paper-reading, I've found that cooler, newer codes exist than Reed-Solomon, which may be easier to compute. Nope, just kidding, they're only faster because you implement them in hardware. And there are so many patents around error-correcting codes that it looks like a minefield to try to implement good new software that will do it. I mean, some guy used Intel's SIMD operations (SSE2 or so, I assume) to make a speed breakthrough with Reed-Solomon, but then got a patent smackdown.

Then I realized that all the features PAR2 has over PAR1 are not being used by darbrrb. I fetched parchive, and modified darbrrb slightly to use it instead of par2. I tested the resulting darbrrb on 1.5GB of random data, and found it to be around five and a half times faster than par2.

Since I filed this ticket, I've gotten a new computer, so par2 goes 34 MB/min. parchive went 184 MB/min. Ah, now I see the test above, and will run exactly that one.

@jaredjennings

This comment has been minimized.

Show comment
Hide comment
@jaredjennings

jaredjennings Feb 7, 2016

Owner

OK - same test as above, with par2, 345 seconds; with parchive (PAR1), 60 seconds flat. That's 34.7 MB/min with par2 and 200 MB/min with parchive. (The discrepancy for parchive with the above comment may be explained by the idea that parchive is no longer the dominant influence on the time taken; perhaps dar did something faster when it had smaller data. I don't know.)

Owner

jaredjennings commented Feb 7, 2016

OK - same test as above, with par2, 345 seconds; with parchive (PAR1), 60 seconds flat. That's 34.7 MB/min with par2 and 200 MB/min with parchive. (The discrepancy for parchive with the above comment may be explained by the idea that parchive is no longer the dominant influence on the time taken; perhaps dar did something faster when it had smaller data. I don't know.)

@jaredjennings

This comment has been minimized.

Show comment
Hide comment
@jaredjennings

jaredjennings Feb 7, 2016

Owner

OK - there have been several commits to go back to parchive (PAR 1), most notably fd0092e. I'm going to call this fixed.

Owner

jaredjennings commented Feb 7, 2016

OK - there have been several commits to go back to parchive (PAR 1), most notably fd0092e. I'm going to call this fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment