You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was tinkering with the examples and as a bit surprised that a single bit error can corrupt the decoding. The README does mention this "If values have changed in a shard, it cannot be reconstructed".
I was thinking it might be useful to have an example closer to a real world use case.
The example README does mention hashing the shards so you can delete the corrupt ones. But in the default example (4 shards + 2 parity) even with a hash of each shard, 3 single bit errors could kill a decoding.
So I was thinking of adding a sha256 per 64KB block. That way any bit corruptions would be recognized and you could use the other 5 shards for that particular block.
I wrote a little simulator to add errors and I believe that on average a 2GB file encoded with the defaults (4 shards + 2 parity shards) would survive (on average) between 280 and 1200 64KB errors before any part of the file is corrupted. Quite a bit better than 3 bit errors. Storage overhead would be something like 0.05%. Additionally even with 8192 errors 95% of the file would be recoverable.
Would you accept a pull request to add sha256 block checksums to your simple encoder/decoder?
The text was updated successfully, but these errors were encountered:
I was tinkering with the examples and as a bit surprised that a single bit error can corrupt the decoding. The README does mention this "If values have changed in a shard, it cannot be reconstructed".
I was thinking it might be useful to have an example closer to a real world use case.
The example README does mention hashing the shards so you can delete the corrupt ones. But in the default example (4 shards + 2 parity) even with a hash of each shard, 3 single bit errors could kill a decoding.
So I was thinking of adding a sha256 per 64KB block. That way any bit corruptions would be recognized and you could use the other 5 shards for that particular block.
I wrote a little simulator to add errors and I believe that on average a 2GB file encoded with the defaults (4 shards + 2 parity shards) would survive (on average) between 280 and 1200 64KB errors before any part of the file is corrupted. Quite a bit better than 3 bit errors. Storage overhead would be something like 0.05%. Additionally even with 8192 errors 95% of the file would be recoverable.
Would you accept a pull request to add sha256 block checksums to your simple encoder/decoder?
The text was updated successfully, but these errors were encountered: