Releases: kaparoo/kaparoo-python
Releases · kaparoo/kaparoo-python
v0.5.0
Added
kaparoo.utils.aggregate(still experimental):VarandStdreductions
-- weighted population variance and standard deviation, accumulated online
(Welford) and merged exactly (Chan's parallel algorithm), so they nest
across loop levels like the other reductions.kaparoo.data.sequences.FileListSequence: a "one file per item"
DataSequenceover an explicit, ordered list of files. Unlike
FileFolderSequenceit takes the files directly (norootdiscovery),
so they may live in unrelated directories -- or, on Windows, different
drives -- whichFileFolderSequencecannot represent. Subclasses
implement onlyload_file/get_meta; the input order is preserved
verbatim (duplicates kept) and files are loaded lazily.
Fixed
make_dirsnow raisesNotADirectoryError(matchingmake_dir) when a
path exists but is not a directory, instead of the divergent
FileExistsErrorthatmkdirproduced.make_dir/make_dirsvalidate every path before any directory is
wiped or created, so a deterministically bad entry (e.g. a file in the
list) no longer leaves earlier directories already cleaned or created.make_dir(clean=True)/make_dirs(clean=True)reject a symlink with
NotADirectoryErrorrather than failing deep insideshutil.rmtree;
cleaning never operates through a link.reserve_path/reserve_pathstreat a symlink -- including a broken
one, whichPath.existsreports as absent -- as occupying the path.StagedFile.commit(withoverwrite=False) no longer fails outright on a
filesystem without hardlink support (FAT/exFAT, some network mounts): it
falls back to an existence check plus replace instead of losing the staged
content to a rawOSError.StagedFile.commit/StagedDirectory.commitnow fsync the destination's
parent directory after the move, so the committed result survives a crash
on POSIX (a no-op where directories cannot be fsynced, e.g. Windows).StagedDirectory.commitwithoverwrite=Truenow restores the original
directory if moving the staged one into place fails, instead of leaving
the destination missing with the old contents stranded under a<name>.old
name; the backup removal is best-effort.
v0.4.0
Added
kaparoo.filesystem.staged.StagedFile: a safe (atomic) file writer.
Content is staged in a temporary file in the destination's directory and
moved into place only on commit, so readers never see a half-written file
and a failed write leaves any existing file untouched. Usable as a context
manager (commit on clean exit, discard on exception) or explicitly like
a file object (write/seek/tell/flush, pluscommit/
abort,path,committed, and the underlyingfile). Text by default
(StagedFile[str]) with optionalencoding/newline;binary=True
gives a binary writer (StagedFile[bytes]), the type parameter tracking
the mode.overwrite=False(default) fails fast on an existing destination
and creates the file atomically;overwrite=Truereplaces it, keeping its
permissions;make_parents=Truecreates a missing parent directory. An
uncommitted writer discards its staged file on garbage collection.kaparoo.filesystem.staged.StagedDirectory: the directory counterpart of
StagedFile. Files are written into a temporaryworkdirin the
destination's parent and moved into place on commit. Same context-manager /
explicit usage andcommit/abort/path/committedAPI (plus
workdir), and the sameoverwrite/make_parentsoptions. Creating a
new directory is atomic (single rename); replacing an existing one
(overwrite=True) swaps the old aside and removes it, which is not fully
atomic. An uncommitted builder discards its staging directory on garbage
collection.kaparoo.filesystem.utils.reserve_path/reserve_paths: a guard (and
its bulk form) for a path that should not yet exist, returning it
(optionally stringified) so the caller can create something there.
exist_ok(named as inmake_dir/Path.mkdir) is a
non-destructive bypass (nothing is deleted) andmake_parents
creates the parent directory when missing.
RaisesFileExistsErroron conflict.reserve_pathsis fail-fast and
takes noroot(compose withwrap_paths(prepend=...)). For directory
destinations prefermake_dir(exist_ok=...); for exclusive file creation
the stdlibopen(path, "x")suffices.cleanoption onmake_dir/make_dirs: when an existing directory
is present, remove its contents and recreate it empty (a fresh slate).
Destructive, and only ever wipes a directory -- a non-directory at
the path still raisesNotADirectoryError.clean=Truemakesexist_ok
moot, since the directory is removed and remade.kaparoo.filesystemdirectory checksdir_not_empty,
dir_not_empty_unsafe,dirs_not_empty, anddirs_not_empty_unsafe,
the negated counterparts of thedir_emptyseries.dirs_not_empty
is True only when every directory is non-empty.kaparoo.utils.aggregatemodule (experimental -- the API may change in
a later release):Aggregatorfor nested, pluggable metric aggregation
(the batch → epoch → run pattern). Each metric is
reduced by aReduction-- built-insMean(weighted),Sum,Min,
Max,Last, andFold(a scalar monoid from a callable) -- with
per-metricoverrides. Reductions are online (constant memory); nested
levels compose viamerge(exact sample-weighted pooling) or
update(child.compute(), ...)(different reduction per level). Custom
reductions subclassReduction/UnweightedReduction.SegmentTimer.measure(label): a stopwatch-style context manager (and
decorator) that records a segment covering only the wrapped block, so
time spent outside anymeasureblock is excluded fromrecords/
summary. Complementslap, which splits the timeline into
contiguous segments. Pauses inside the block are excluded; a block
that raises records nothing.
Changed
- Renamed
LapTimer->SegmentTimer,LapRecord->SegmentRecord,
and the record fieldlap_time->duration, reflecting that the
timer now records named segments via bothlap(split) and the new
measure(block). Thelapmethod keeps its name. Timer.resume/SegmentTimer.resumenow returnNoneinstead of
the pause duration in nanoseconds. The value had no consumer
(suspenddiscarded it) and leaked a raw-nanosecond figure that broke
the timer'sunitabstraction. Subclasses that need the pause
interval override the new protected_resumehook instead.
v0.3.0
Added
kaparoo.data.sequencessubpackage: aSequence-based foundation for
dataset code.DataSequence[T, M]ABC with abstractget_item/get_metaand
defaultget_items/get_metas/get_pair/get_pairs.
__getitem__returns the item only.- Composers:
SlicedSequence(stable-length view at given indices,
duplicates allowed and order preserved);ConcatSequence
(O(log N) lookup over multiple sources via cumulative lengths +
bisect_right);WindowedSequence[T, M_in, M_out](abstract
sliding window withsize/step/skip;get_itemis
implemented,get_metais left abstract). - Templates:
FileFolderSequence(folder-rooted, one file per item;
subclasses implementlist_files/load_file/get_meta;
supports the "set state BEFOREsuper().__init__()" pattern for
parameterized subclasses);SingleFileSequence(thin ABC for
"one file, many records" formats).
Changed
generate_batches:step,skip,start,stop, anddrop_last
are now keyword-only. Empty ranges (start == stop) are accepted
and yield no batches. Docstring expanded.
Fixed
register_filterdecorator now preserves the decorated subclass's
type. Previously it widened totype[Filter], so static checkers
rejected subclass-specific constructor calls at decorated classes.generate_batcheswithdrop_last=False: the final partial window
no longer extends paststopwhenstop < len(sequence).
Removed
kaparoo.data.sequence(single module) andkaparoo.data.utils—
replaced by thekaparoo.data.sequencessubpackage. The previous
DataSequence.by_index/by_indicesAPI was a placeholder and
has been superseded byget_item/get_items/get_meta/
get_metas/get_pair/get_pairs.
v0.2.1
Added
- Filter serialization:
Filter.to_dict()/Filter.from_dict()with
a"kind"-discriminated polymorphic dispatcher. Each concrete
filter round-trips through a JSON-compatible dict. register_filter(kind)decorator for registering customFilter
subclasses with the polymorphic dispatcher.Filter.parse(value)— normalizes either aFilterinstance
(passed through) or aFilterDictinto aFilter.FilterDictTypedDict family at
kaparoo.filesystem.search.filters.types:FilterDict(base,
kind-only),PatternFilterDict,MultiPatternFilterDict,
LogicalChildrenFilterDict,LogicalChildFilterDict. User-defined
filter dicts extend these to type-check againstFilter.parse.Search.run/search_paths/search_files/search_dirs
accept aFilterDictforpart_filterandname_filterin
addition to aFilterinstance.
v0.2.0
Published on PyPI: https://pypi.org/project/kaparoo-python/0.2.0/
uv add kaparoo-python # or: pip install kaparoo-pythonRequires Python 3.14+.
See CHANGELOG.md for the full list of changes.