A collection of functions and data structures that we've found useful over the years.
Pull request Compare This branch is 11 commits behind simplegeo:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


pyutil -- a library of useful Python functions and classes

Many of these utilities (or their ancestors) were developed for the Mojo Nation, Mnet, Allmydata.com "Mountain View", Tahoe-LAFS, or SimpleGeo's products. (In the case where the code was developed for a for-profit company, the copyright holder donated the pyutil code to the public under these open source licences.)



  • mathutil.py - integer power, floor, ceil, and nearest multiples; permute and fit slope
  • memutil.py - statistics and diagnostics for memory use and garbage collection
  • platformutil.py - get platform including Linux distro; more accurate and less noisy than platform.platform()
  • strutil.py - common prefix and suffix of two strings, and newline processing
  • assertutil.py - test preconditions, postconditions, and assertions
  • benchutil.py - benchmark a function by running it repeatedly
  • fileutil.py - work with files and directories
  • iputil.py - query available local IPv4 addresses
  • jsonutil.py - wrapper around simplejson which converts decimal inputs to Python Decimal objects instead of to Python floats
  • lineutil.py - remove extra whitespace from files
  • testutil.py - utilities for use in unit tests, especially in Twisted
  • time_format.py - date and time formatting operations
  • version_class.py - parse version strings into a Version Number object
  • verlib.py_ - utility to compare version strings, by Tarek Ziadé

out of shape

I don't currently use these, but I still think they are possibly good ideas.

  • nummedobj.py - number objects in order of creation for consistent debug output
  • observer.py - the Observer pattern
  • increasing.py - an implementation of a monotonically-increasing timer; By the way a future, better implementation of this would use CLOCK_MONOTONIC or CLOCK_MONOTONIC_RAW if it were available: http://stackoverflow.com/questions/1205722/how-do-i-get-monotonic-time-durations-in-python/1205762#1205762
  • repeatable_random.py - Make the random and time modules deterministic, so that executions can be reproducible.
  • strutil.py - string utilities
  • cache.py - multiple implementations of a least-recently-used in-memory caching strategy, optimized for different sizes (note: I, Zooko, nowadays prefer a random-replacement cache eviction strategy over least-recently-used because the former has more consistent and predictable behavior)
  • odict.py - ordered dictionary implementation: see PEP 372. Note: there is now (as of Python 2.7) an ordered dict implementation in the standard library, but I haven't checked if it is as good as this one.
  • zlibutil.py - zlib decompression in limited memory


I no longer use these and I don't recommend that you do either.

  • logutil.py - send log messages to Twisted logger if present, else Python library logger
  • weakutil.py - allows a bound method's object to be GC'd
  • twistedutil.py - callLater_weakly, a variant of Twisted's callLater which interacts more nicely with weakrefs
  • PickleSaver.py - make all or part of an object persistent, by saving it to disk when it's garbage collected
  • humanreadable.py - an improved version of the builtin repr() function
  • hashexpand.py - cryptographically strong pseudo-random number generator based on SHA256
  • find_exe.py - try different paths in search of an executable
  • dictutil.py - several specialized dict extensions, as well as some convenient functions for working with dicts
  • randutil.py - various ways to get random bytes
  • xor.py - xor two same-length strings together

Thanks to Peter Westlake and Ravi Pinjala for help documenting what these do.



issue tracker


darcs repository


(To get the latest source, run darcs get --lazy http://tahoe-lafs.org/source/pyutil/trunk.)

tests and benchmarks

To run tests: python ./setup.py trial -s pyutil.test.current.

You can also run the tests with the standard pyunit test runner instead of trial, but a couple of the tests will fail due to the absence of Trial's "Skip This Test" feature. You can also run the tests of the out-of-shape and deprecated modules:

python ./setup.py trial -s pyutil.test.out_of_shape

python ./setup.py trial -s pyutil.test.deprecated

Or of all modules:

python ./setup.py trial -s pyutil.test

Some modules have self-benchmarks provided. For example, to benchmark the cache module: python -OOu -c 'from pyutil.test import test_cache; test_cache.quick_bench()'

or for more complete and time-consuming results: python -OOu -c 'from pyutil.test import test_cache; test_cache.slow_bench()'

(The "-O" is important when benchmarking, since cache has extensive self-tests that are optimized out when -O is included.)


You may use this package under the GNU General Public License, version 2 or, at your option, any later version. You may use this package under the Transitive Grace Period Public Licence, version 1.0, or at your option, any later version. (You may choose to use this package under the terms of either licence, at your option.) You may use this package under the Simple Permissive Licence, version 1 or, at your option, any later version. See the file COPYING.GPL for the terms of the GNU General Public License, version 2. See the file COPYING.TGPPL.html for the terms of the Transitive Grace Period Public Licence, version 1.0. See the file COPYING.SPL.txt for the terms of the Simple Permissive Licence, version 1.