Skip to content

Commit

Permalink
fix local-mode caching, update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
ryan-williams committed Aug 24, 2023
1 parent 209340f commit 8a54fbc
Show file tree
Hide file tree
Showing 5 changed files with 38 additions and 15 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ jobs:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
run: |
pip install setuptools twine
pip install setuptools twine wheel
python setup.py sdist bdist_wheel
twine upload dist/*
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,17 @@ Disk-space tree-maps and statistics

https://github.com/runsascoded/disk-tree/assets/465045/3d03e817-b72d-4ac3-993e-191c41927d0c

## Index
<!-- toc -->
- [Install](#install)
- [Examples](#examples)
- [S3 bucket](#s3)
- [Local directory](#local)
- [Notes](#notes)
- [Caching](#caching)
- [Performance](#performance)
- [Max. entries](#max-entries)
- [TUI-only mode](#tui-only)
<!-- /toc -->

## Install <a id="install"></a>
Expand Down Expand Up @@ -85,5 +91,18 @@ disk-tree -odisk-tree.htmldisk-tree -odisk-tree.html -csize disk-tree
![](screenshots/disk-tree%20repo%20screenshot.png)
(default color scale is "RdBu"; see Plotly options [here][plotly color scales], `-csize=<scale>` to configure)

## Notes <a id="notes"></a>

### Caching <a id="caching"></a>
`disk-tree` caches file stats in a SQLite database, defaulting to `~/.config/disk-tree/disk-tree.db` and a `1d` TTL (see `-C`/`--cache-path` and `-t`/`--ttl`, resp.).

### Performance <a id="performance"></a>
`disk-tree` is reasonably performant on S3 buckets (it caches the result of `aws s3 ls --recursive s3://…`, and hydrates its cache from there), but ["local mode"](#local) is slow, as it stats every file and directory in a given tree, in a single-threaded tree-traversal.

### Max. entries <a id="max-entries"></a>
Plotly treemaps fall over with too many elements; `-m`/`--max-entries` (default `10k`) determines the maximum number of nodes (files and directories) to attempt to render.

### TUI-only mode <a id="tui-only"></a>
If you omit the `-o<path>.html` in [the examples above](#examples), `disk-tree` will simply print the sizes of all children of the specified URL, and exit.

[plotly color scales]: https://plotly.com/python/builtin-colorscales/#builtin-sequential-color-scales
11 changes: 9 additions & 2 deletions disk_tree/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,15 @@ def compute_s3(self, url, bucket, root_key):
#return aggd
return S3.query.get((bucket, root_key))

def compute(self, path, now=None, fsck=False, excludes=None):
def compute_file(self, path, now=None, fsck=False, excludes=None):
path = abspath(path)
record = self.get(path)
if record:
return record
else:
return self.compute(path, now=now, fsck=fsck, excludes=excludes)

def compute(self, path, now=None, fsck=False, excludes=None):
if excludes and any(is_descendant(path, exclude) for exclude in excludes):
err(f'skipping excluded: {path}')
return None
Expand Down Expand Up @@ -201,7 +208,7 @@ def insert(self, file, commit=True):
db.session.commit()

def get(self, path):
existing = File.query.filter_by(path=path).first()
existing = File.query.get(path)
if existing:
now = dt.now()
if now - existing.checked_at <= self.ttl:
Expand Down
21 changes: 9 additions & 12 deletions disk_tree/main.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,19 @@
from urllib.parse import ParseResult

from typing import Optional

from click import argument, command, option
from functools import partial
from humanize import naturalsize
from os import getcwd, makedirs, remove
from os.path import abspath, exists

import pandas as pd
import plotly.express as px
import sys
from click import argument, command, option
from functools import partial
from humanize import naturalsize
from re import fullmatch
from subprocess import check_call
import sys
from sys import stderr
from tempfile import NamedTemporaryFile

from utz import basename, concat, DF, dirname, dt, env, process, singleton, sxs, urlparse, err
from typing import Optional
from urllib.parse import ParseResult
from utz import basename, concat, DF, dirname, env, process, singleton, sxs, urlparse, err

from disk_tree.config import SQLITE_PATH
from disk_tree.db import init
Expand Down Expand Up @@ -83,7 +81,7 @@ def load_file(url: str, cache: 'Cache', fsck: bool = False, excludes: Optional[l
if excludes:
excludes = [ abspath(exclude) for exclude in excludes ]
print(f'excludes: {excludes}')
root = cache.compute(root, fsck=fsck, excludes=excludes)
root = cache.compute_file(root, fsck=fsck, excludes=excludes)
entries = root.descendants(excludes=excludes)
keys = [ 'path', 'kind', 'size', 'mtime', 'num_descendants', 'parent', 'checked_at', ]
df = DF([
Expand Down Expand Up @@ -133,7 +131,6 @@ def load_s3(url: str, parsed: ParseResult, cache: 'Cache', profile: str = None,
def cli(url, color, cache_path, fsck, max_entries, no_max_entries, sort_by_name, out_path, no_open, profile, size_mode, cache_ttl, tmp_html, excludes):
from disk_tree.config import ROOT_DIR
db = init(cache_path)
from .model import File, S3
db.create_all()

from disk_tree.cache import Cache
Expand Down
Binary file modified screenshots/disk-tree repo screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 8a54fbc

Please sign in to comment.