Command-line file tagging and organization tool. Mirror of; pull requests there or by email please!
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Khph is a tool for managing collections of files from the command line. It allows you to tag files via links, and query your collection with a simple query language.

Khph is for when you just want to tag your recent trip's photos, without worrying about what all-in-one program to use, or how it stores its metadata. With khph, the file system is the database.


Copyright 2016 Bryan Gardiner

A range of successive copyright years may be written as XXXX-YYYY as an abbreviation for listing all of the years from XXXX to YYYY inclusive, individually.

Khph is free software. It is licensed under the Affero General Public License (v3 or later, at your option). See the LICENSE file for more information.


Khph manages files within a project. A project is just a directory tree with a file whose name ends with .khph in it (the project file or config file). This file may define some properties of the project, or it may be empty, or even absent if a project is specified explicitly on the command line.

Within a project, khph examines directories, and files both hard-linked and soft-linked. Soft-linked directories are ignored. All links to the same target file are grouped together and called an entry, and you can then perform queries and modifications to this structure. Broken soft links are ignored, but otherwise, entries may have hard links, soft links, or both. Only links inside the project are counted, so if a project contains a soft link pointing outside the project, then the corresponding entry will have a soft link but no hard link.

At a level above raw links, khph has the idea of tags. Khph classifies certain directories within a project as source directories. All files (or soft links to files) somewhere below source directories are considered as source files, candidates for tagging. Directories that are not source directories and are not contained in source directories are tag directories, or just tags. A link to a file in source directory from a tag directory indicates the presence of that tag on the file. Tags are naturally hierarchical. Tag presence on a file is not inherited up or down the tree automatically, but hierarchical querying is supported.

That's it for the data model. An entry has a (positive) number of links to it, which may translate to it having tags.

Project configuration

The project file, if present, is a YAML file that can contain the following keys:

  • sourceDirs: A list of paths, relative to the project root, that should be treated as source directories. Everything under directories listed here are treated as source files or subdirectories.

  • ignore: A list of paths, relative to the project root, that khph should pretend don't exist.

Invoking khph

The khph binary expects any number of options, and exactly one command. A command consists of one or more arguments telling khph what to do. The arguments for a command may be interleaved with option arguments. Parts of a command are specified with regular --foo as options are, but a command's --arguments as shown below may not be rearranged. For example, for the --tag-add command, you can't use --files ENTRY... --tag-add TAG....

If no project is specified with --project, then all directories from the current directory up to the file system root are searched in turn. The first *.khph file found is used; if there are multiple such files in a single directory, an arbitrary one is chosen. If a project is given explicitly as a program argument, then that is used instead and this search is not performed.


The supported commands are listed below. An UPPERCASE word indicates a required argument, [BRACKETS] indicate an optional argument, and ELLIPSES... indicate an argument that may be repeated zero or more times.

  • --list [QUERY]: Determines the entries that match the query, and prints out one link for each. An arbitrary hard link is preferred to an arbitary soft link. If no query is provided, all entries are matched.

  • --tag TAG...: Lists tags matching the any of the given tag specifications, or all tags if no argument is given.

  • --tag-create TAG...: Creates tags by the given names, if they don't exist already. By-name TAGs are not allowed here.

  • --tag-add TAG... --files ENTRY...: Adds tags to entries that don't already have them.

  • --tag-remove TAG... --files ENTRY...: Removes tags from entries that have them.

  • --realize TAG QUERY: Sets the tag to contain the results of the query. The tag is created if it does not exist, and the tag is removed from all entries before the query is run, so that it does not affect the query's results.

ENTRY, TAG, and QUERY are described in the following sections.

Supported options are as follows. These are all optional.

  • -?, --help: Displays a help message showing how to invoke khph and exits. This is a summary of the information in this section.

  • -p, --project=PATH: Sets the location of the project. Either a file or directory may be provided. If a file is given, then that file is used as the project file, and its directory is the project root. If a directory is given, then it is used as the project directory; if that directory contains some *.khph file(s), then one is used, otherwise no project file is used.

  • --print-summaries[=BOOL]: Default true. Prints brief summary messages about actions performed to stderr.

  • --print-file-pactions[=BOOL]: Default false. Prints out shell command equivalents to actions performed to stderr.

  • --tags-in-name[=BOOL]: Default false. Appends tag names to link base names created using --realize.

Entry specifications

In program arguments, ENTRY and TAG are known as entry specifications and tag specifications. They are patterns that refer to zero or more paths. They have a few different forms, and each is converted into real paths differently.

Relative specifications begin with a . component, for example ./foo or ./foo/bar. These paths are interpreted relative to the current working directory. Currently, the current working directory and the resulting path must point inside the project. These specifications refer to a path without requiring its existence; although the requested command may need it.

Absolute specifications are specifications that are not relative and contain a path separator, for example /foo, /foo/bar, or foo/bar. These paths are always interpreted relative to the project root, so in this case, the effective path would be <project>/foo/bar. Like relative specifications, absolute specifications may refer to paths that don't exist.

By-name specifications include no path separators. Examples are foo or bar. These match all existing paths in the project whose last component equals the string. In contract to the previous forms, by-name specifications test for existence and may refer to an arbitrary number of paths. As usual, a specific number of paths may be expected at any place a by-name specification is accepted.

Tag specifications are the same as entry specifications, except that they refer to tags instead of entries, and for by-name specifications, tag (i.e. non-source directory) existence is checked instead of entry (i.e. linked file) existence.

These examples use / as a path separator, but all separators for your platform are usable.


Khph supports a human-readable query syntax for matching entries. A query is a predicate that is applied to entries. When a query is passed on the command line, it must be passed as a single argument. The grammar is described here in approximate "EBNF with functions".

query* = plurality<("link" | "links" | "path" | "paths"), linkQuery>
       | plurality<("tag" | "tags"), tagQuery>

/* Matches a list based on how its contents match.
   all: All items must match, if any exist.
   all1: All items must match, and at least one must exist.
   some: At least one item must match (with value) or exist (without).
   no: No items may match (with value) or exist (without).
plurality<name, value> = "all", name, value
                       | "all1", name, value
                       | ["some"], name, [value]  /* "some" may be omitted. */
                       | "no", name, [value]

linkQuery* = "is", ("hard" | "soft")
           | "matches", entrySpec
           | ["file" | "directory"], stringQuery
           | treeQuery<tagSpec>  /* By-name specs must match exactly one tag. */

tagQuery* = treeQuery<tagSpec>

treeQuery<value>* = "at", value
                  | "atabove", value
                  | "atbelow", value
                  | "above", value
                  | "below", value

stringQuery* = "eq", string
             | ("contains" | "contain"), string
             | ("starts" | "start"), "with", string
             | ("ends" | "end"), "with", string

entrySpec = string

tagSpec = string

string = /* Starts with a " or ', ends with the same character.
            Escape sequences are \\, \0, \n, \r, \t, \v, and the
            sequence for the current delimiter. */

This lets you write queries such as, assuming that me is a subtag of people:

tag atbelow "tokyo" and tag atbelow "people" and not atbelow "me"

This parses as:

(some tag atbelow "tokyo") and (some tag (atbelow "people" and not atbelow "me"))

and would find all photos in Tokyo with someone other than myself.

Which brings us to our next point: logical operators and parentheses for grouping are supported around each of the syntax rules above whose name ends in a *, and the usual not > and > or precedence applies.

Note that this allows some queries which sound like logical English but are incorrect, for example this:

tags at "me" and at "you"

which is equivalent to:

some tag (at "me" and at "you")

which is impossible, because no single tag can be at two locations in the tree. Here what was intended was:

tag at "me" and tag at "you"

Remember: one occurrence of a plurality (e.g. "some tag" or even just "tag") means one iteration through that list (e.g. of tags on the entry).