New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider replacing MD5 with CRC32 for quick check #2396
Comments
I have started looking into the par2 code in SAB and how to get the slices. I haven't looked at it much before so it may take some time. Just wanted to let you all know so we don't do the same work. |
I had a look through NZBGet and it appears to implement what's discussed here. Besides from repairs the MD5 calculation must be the most demanding job SAB does I'm optimistic this could be a large imrpovement especially for slower devices. Some of the comments on the below function may have some useful information or things to watch out for. |
I guess because I never understood that par2 contains CRC, I always misunderstood when hugbug said that they use CRC to do quick checking. I thought he was solely talking about per-article CRC instead of combining then! |
First seemingly working version https://github.com/puzzledsab/sabnzbd/tree/feature/block_hash I did not try to calculate the file CRC out of order in the decoder because doing it in the assembler seems simpler. I would have to use index to get the article index in the array and there were more calculations involved. Maybe if we start using sparse files. @Safihre : The new crc32.py has the standard SABnzbd GPL header. Maybe I should remove it considering it's @animetosho's code? |
@puzzledsab nice, does it work? 🤩 |
It generates the correct CRC32 file values from the PAR2 file and the quick-check says the files are all ok. Unless I'm missing something it should be working. I don't have any nzbs with broken files for testing. I kept the article.md5sum working like before but using the CRC32 file value. The PAR2 code is a bit complex and used in multiple contexts so I didn't want to do any more changes than I had to. From what I understand you intend to change it before the 4.0 release. Edit: It detected the bad file when I deleted the last part of a file in an nzb. |
If you can extend the tests in |
Nice work @puzzledsab! I generally write code as public domain, so feel free to include/use/modify/license it in any way you see fit. Sabyenc could probably return both the CRC and file begin or end offsets, which would allow the CRC to be merged in the download code. There'd be a bit of an assumption that the begin/end value in the yEnc header is accurate, but if not, the computed hash would be wrong. @Safihre do you want to keep the |
@animetosho why do we need the file offsets? I don't understand the CRC fully, but in @puzzledsab python version we don't seem need them either? |
It's not strictly necessary, as you point out. The first example in the first post shows how the CRC can be accumulated out of order. To do the accumulation part, you'll need to know the offset of the data the CRC refers to. Edit: thinking about it, if true threading becomes a thing in Python, aggregating at the end might be easier anyway. I see puzzledsab's code also avoids computing |
Thanks! |
From what I can tell, there's a check on the size. Although it's not guaranteed to be the same, I'd expect it to be most of the time, so the check should usually hit the cached value.
|
Ah I missed that. Thanks. |
Bitwise logic, like that for CRC, generally codes nicely in lower level languages, though it's often not too different in higher level languages. Note that I don't code much in Python, so everyone here would definitely know more than I do. Performance wise, a few hundred cycles of bit manipulation, per article, doesn't really concern me if done in Python, but profiling it will give you a more definitive answer. |
I already did some testing to find out if the caching was worth it. Seconds for 100k rounds, crcs calculated from random 750KB blocks:
|
found this earlier, never knew there is two crc32 variants (crc32a/b): not that it should matter to our needs (as were staying somewhat within a small ecosystem bubble and not using external tools). |
That's a bit slower than I was expecting (though that is effectively over 72GB of data) - thanks for testing! Since there's some interest in speed, here's a faster # compute 2**n (without LUT)
def crc_2pow_slow(n: int):
k = 0x80000000
for bit in range(0, 32):
k = crc_multiply(k, k)
if n & (0x80000000 >> bit):
k = (k>>1) ^ (CRC32_POLYNOMIAL & -(k&1))
return k
CRC32_POWER_TABLE = [
[
crc_2pow_slow(v << (tbl*4)) for v in range(0, 16)
] for tbl in range(0, 8)
]
# compute 2**n
def crc_2pow(n: int):
result = CRC32_POWER_TABLE[0][n & 15]
n >>= 4
tbl = 1
while n:
if n & 15:
result = crc_multiply(result, CRC32_POWER_TABLE[tbl & 7][n & 15])
n >>= 4
tbl += 1
return result (not sure whether you prefer pre-computed tables, or runtime computed) From my quick test, it's around 2.5x the speed of the original. |
If you're checking against PAR2's CRC32, you have to use the same polynomial as that. So we don't actually have a choice in the matter. |
I don't think it matters too much considering it normally runs as a server. I assume it will only be calculated at startup.
It was just under 2x faster in my tests but that's pretty good too. |
Would be more efficient to do the crc_multiply in sabyenc? I don't know if crcutil provides any of the required functions. @animetosho I did notice cloudera/crcutil#3 should that be fixed? IsSSE42Available() returns false for me when it shouldn't but I've no idea if the suggest fix is correct. |
@Safihre : Do you know if the distributions can use mypyc? I think the bpsmeter could benefit from it too. I have modified it locally so that the only thing stopping it from compiling is that mysterious T() function for translations. |
@animetosho maybe.. I looked into it but found no straightforward way. |
I was mostly thinking about how crc32calc.py would benefit from it. What about including the size comparison? It would be very easy to add a counter to the assembler but it would require another nzf variable. Do you prefer reading it from the filesystem? |
What's the Python <-> C transition overhead like? With
I think they've got most things implemented.
It's pointless for us regardless, as the The CLMUL instruction could be useful, but was introduced after crcutil was written. |
I suspect 0 is actually a valid CRC, so if possible, it's probably best to avoid using that as a special marker. Edit: example: |
I suspected that too but it's just so much easier and all that happens on a false positive is that the par2 check is run. For instance, concating 0 and a new crc32 value gives the new value so I can initialize them to 0 and start concating without checking every time if it needs to be initialized. It's not a big deal changing it, though. We'll need some other way of signaling bad crc from sabyenc. |
Sorry, I was more referring to this code. Perhaps it could return |
Considering this implemented! Will fix the last comment of @animetosho in next sabyenc update :) |
Thanks for everyone's contributions! |
Particularly you, @animetosho :) I tried importing zlib's crc32_combine as described here: https://stackoverflow.com/questions/35259527/using-zlib-crc32-combine-in-python It was only marginally faster than your optimized python code for the full calculation and much slower for the precalculated. It's a bit strange considering the mypyc version of your code is 10x faster. Maybe it's the way it is imported, but still...
|
@puzzledsab if you're curious try https://github.com/mnightingale/sabyenc/tree/feature/crc32_combine it's barely any effort to add it to sabyenc I even used the faster method definition to have as little overhead as possible. https://gist.github.com/mnightingale/9f03901063dcf66510a20f0acb8e8d6c 500K rounds: python
17.913410699999076
crcutil
0.7693139000002702 |
@Safihre : We should probably use that. Same test as previously used 0.0684 seconds. |
Nice. Indeed let's add it there but with a tiny bit of input verification, just to prevent a segfault if due to some bug one of the 2 values is |
That's surprising. zlib, crcutil and my original Python version looks to be largely the same algorithm. Maybe there's a fair bit of FFI overhead with the zlib test. |
To not stray too far off topic in a previous conversation, I've decided to move the discussion here.
Summary so far:
assemble
computes the MD5 of the file it's assembling. MD5 computation may become a performance limiter in the future (and already consumes a fair bit of CPU)Essentially this could eliminate the need to compute the MD5 hash of each file, whilst still verifying the data against what's specified in the PAR2. The downside is that CRC32 is a less reliable hash than MD5.
If we're going to experiment, I thought I'd point the following out, in case it's missed:
When assembling the CRC from decoded articles, they don't need to be assembled in order (unlike MD5). This is handy since articles don't necessarily arrive in order, and there's no need to keep the article CRCs around.
Example:
When assembling CRC from PAR2's IFSC, the block size is constant across the entire PAR2. If you look at the definition of
crc_concat
:Since
len2
is constant, it follows thatcrc_2pow(len2*8)
is also constant. So as a simple optimisation, it only needs to be computed once, and can be reused in the loop.Example:
(in theory,
crc_multiply
could be made faster if one of the coefficients is constant, but that optimisation is probably not necessary here)The text was updated successfully, but these errors were encountered: