Skip to content

Commit

Permalink
refactor/rewrite the cli
Browse files Browse the repository at this point in the history
  • Loading branch information
laktak committed Dec 21, 2023
1 parent 806ceb1 commit 2159f0b
Show file tree
Hide file tree
Showing 18 changed files with 695 additions and 348 deletions.
3 changes: 0 additions & 3 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,3 @@ name = "pypi"
blake3 = ">=0.3.4"

[dev-packages]

[requires]
python_version = "3.11"
6 changes: 2 additions & 4 deletions Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

58 changes: 39 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,30 +22,34 @@ Some cloud providers re-encode your videos or compress your images to save space

## Installation

The easiest way to install python CLI tools is with [pipx](https://pipx.pypa.io/latest/installation/).

```
pip install --user chkbit
pipx install chkbit
```

Or in its own environment:
You can also use pip:

```
pipx install chkbit
pip install --user chkbit
```

**NOTE** version 3 now uses the blake3 hash algorithm by default as it is not only better but also faster than md5.

## Usage

Run `chkbit -u PATH` to create/update the chkbit index.

chkbit will

- create a `.chkbit` index in every subdirectory of the path it was given.
- update the index with md5/sha512/blake3 hashes for every file.
- update the index with blake3 (see --algo) hashes for every file.
- report damage for files that failed the integrity check since the last run (check the exit status).

Run `chkbit PATH` to verify only.

```
usage: chkbit [-h] [-u] [--algo ALGO] [-f] [-i] [-s] [-w N] [-q] [-v] [PATH ...]
usage: chkbit [-h] [-u] [--algo ALGO] [-f] [-s] [--index-name NAME] [--ignore-name NAME] [-w N] [--plain] [-q] [-v] [PATH ...]
Checks the data integrity of your files. See https://github.com/laktak/chkbit-py
Expand All @@ -54,12 +58,14 @@ positional arguments:
options:
-h, --help show this help message and exit
-u, --update update indices (without this chkbit will only verify files)
--algo ALGO hash algorithm: md5, sha512, blake3
-u, --update update indices (without this chkbit will verify files in readonly mode)
--algo ALGO hash algorithm: md5, sha512, blake3 (default: blake3)
-f, --force force update of damaged items
-i, --verify-index verify files in the index only (will not report new files)
-s, --skip-symlinks do not follow symlinks
-w N, --workers N number of workers to use, default=5
--index-name NAME filename where chkbit stores its hashes (default: .chkbit)
--ignore-name NAME filename that chkbit reads its ignore list from (default: .chkbitignore)
-w N, --workers N number of workers to use (default: 5)
--plain show plain status instead of being fancy
-q, --quiet quiet, don't show progress/information
-v, --verbose verbose output
Expand All @@ -74,7 +80,7 @@ Status codes:
EXC: internal exception
```

chkbit is set to use only 5 workers by default so it will not slow your system to a crawl. You can specify a higher number to make it a lot faster (requires about 128kB of memory per worker).
chkbit is set to use only 5 workers by default so it will not slow your system to a crawl. You can specify a higher number to make it a lot faster if the IO throughput can also keep up.

## Repair

Expand Down Expand Up @@ -123,7 +129,7 @@ When you run it again it first checks the modification time,

### I wish to use a stronger hash algorithm

chkbit now supports sha512 and blake3. You can specify it with `--algo sha512` or `--algo blake3`.
chkbit now uses blake3 by default. You can also specify it with `--algo sha512` or `--algo md5`.

Note that existing index files will use the hash that they were created with. If you wish to update all hashes you need to delete your existing indexes first.

Expand All @@ -145,19 +151,30 @@ Create test and set the modified time:
```
$ echo foo1 > test; touch -t 201501010000 test
$ chkbit -u .
add ./test
Processed 1 file(s).
Indices were updated.
new ./test
Processed 1 file.
- 192.31 files/second
- 0.00 MB/second
- 1 directory was updated
- 1 file hash was added
- 0 file hashes were updated
```
`add` indicates the file was added.

`new` indicates a new file was added.

Now update test with a new modified:
```
$ echo foo2 > test; touch -t 201501010001 test # update test & modified
$ chkbit -u .
upd ./test
Processed 1 file(s).
Indices were updated.
Processed 1 file.
- 191.61 files/second
- 0.00 MB/second
- 1 directory was updated
- 0 file hashes were added
- 1 file hash was updated
```

`upd` indicates the file was updated.
Expand All @@ -167,10 +184,13 @@ Now update test with the same modified to simulate damage:
$ echo foo3 > test; touch -t 201501010001 test
$ chkbit -u .
DMG ./test
Processed 0 file(s).
Processed 1 file.
- 173.93 files/second
- 0.00 MB/second
chkbit detected damage in these files:
./test
error: detected 1 file(s) with damage!
error: detected 1 file with damage!
```

`DMG` indicates damage.
Expand Down
9 changes: 0 additions & 9 deletions chkbit.py

This file was deleted.

5 changes: 3 additions & 2 deletions chkbit/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from chkbit.status import Status
from chkbit.context import Context
from chkbit.hashfile import hashfile, hashtext
from chkbit.index import Index, Stat
from chkbit.indexthread import IndexThread
from chkbit.index import Index
from chkbit.index_thread import IndexThread
31 changes: 28 additions & 3 deletions chkbit/context.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,35 @@
import queue
from chkbit import Status


class Context:
def __init__(self, verify_index, update, force, hash_algo, skip_symlinks):
self.verify_index = verify_index
self.update = update
def __init__(
self,
*,
num_workers=5,
force=False,
update=False,
hash_algo="blake3",
skip_symlinks=False,
index_filename=".chkbit",
ignore_filename=".chkbitignore",
):
self.num_workers = num_workers
self.force = force
self.update = update
self.hash_algo = hash_algo
self.skip_symlinks = skip_symlinks
self.index_filename = index_filename
self.ignore_filename = ignore_filename

self.result_queue = queue.Queue()
self.hit_queue = queue.Queue()

if hash_algo not in ["md5", "sha512", "blake3"]:
raise Exception(f"{hash_algo} is unknown.")

def log(self, stat: Status, path: str):
self.result_queue.put((0, stat, path))

def hit(self, *, cfiles: int = 0, cbytes: int = 0):
self.result_queue.put((1, cfiles, cbytes))
12 changes: 8 additions & 4 deletions chkbit/hashfile.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
import hashlib
from typing import Callable


BLOCKSIZE = 2**10 * 128 # kb


def hashfile(path, hash_algo=None):
if not hash_algo or hash_algo == "md5":
def hashfile(path: str, hash_algo: str, *, hit: Callable[[str], None]):
if hash_algo == "md5":
h = hashlib.md5()
elif hash_algo == "sha512":
h = hashlib.sha512()
Expand All @@ -14,14 +15,17 @@ def hashfile(path, hash_algo=None):

h = blake3()
else:
raise Exception(f"{hash_algo} is unknown.")
raise Exception(f"algo '{hash_algo}' is unknown.")

with open(path, "rb") as f:
while True:
buf = f.read(BLOCKSIZE)
if len(buf) <= 0:
l = len(buf)
if l <= 0:
break
h.update(buf)
if hit:
hit(l)
return h.hexdigest()


Expand Down

0 comments on commit 2159f0b

Please sign in to comment.