CK: Collective Knowledge
This blog post gives a brief introduction into CK and its basic concepts. There is a ton of existing documentation out there in the CK wiki on GitHub. All of this documentation can easily feel overwhelming. This is why I wrote this deliberately short and lightweight introduction into some of the fundamental basic concepts of CK, which helped me a lot in understanding CK.
I assume that you have the CK tool installed on your machine, which you can easily check by running
ck version. If this returns an error you want to install CK by running
pip install ck [^1].
[^1]: If you have troubles installing CK this way you find more information in the CK wiki.
So what is CK?
To put it quite generic, CK is a tool which helps organise and work with stuff you care about. Stuff can be a lot of different things, such as research data, programs or scripts analysing this data, as well as the resulting data obtained by the analysis -- just to give a typical research workflow as an example.
CK helps you to organise this stuff by assigning unique identifiers (so called 'UIDs') to every entry registered with ck. Entries are stored in repositories which facilitate sharing. A special type of entries are modules which implement the functionality of CK. CK comes with a set of built-in modules, but you can also write custom modules yourself.
Entries, repositories, and modules are the basic vocabulary of CK. Let's start talking more about them.
CK tracks entities by assigning them unique identifiers. Each entry is stored in a separate directory and CK also stores additional metadata in form of a couple of JSON files for each entry. These file are stored in the
.cm subdirectory of the entry. There are three metadata files:
.cm/info.jsonstores information like, who is the author or what is the license of the entry, etc.
.cm/meta.jsonstores arbitrary meta information about the entry, which is used by the CK modules to process this entry. One important example are tags which are identifying words which can be used to filter out common entries.
.cm/desc.jsonis indented for a documentary description of the entry, but currently mostly empty.
In CK a repository is a collection of entries which are meant to be shared with other people. CK uses a tool called
git which makes it incredible easy to share repositories among team members or make them publicly available. Websites such as GitHub or Bitbucket can be used to host CK repositories online.
Ck stores all of the repositories in one central folder. On linux and macOS this is by default:
Modules in CK group entries as well as actions to operate on these entries. CK entries which are operated on by a particular module are put in a directory which has the same name as the module. For example:
- Programs, which are compiled and run by the
programmodule, are put in a directory called
- Datasets, which are extended by the
datasetmodule, are put in a directory called
- Experiments, which are added, browsed, and rerun by the
experimentmodule, are put in a directory called
This leads to a familiar directory structure where at the top-level directories are called after CK modules, e.g.,
experiment. At the second-level directories store the actual programs, datasets, and experiments you care about, e.g.,
experiment/my-awesome-experiment. These are themselves CK entries with their own metadata and UIDs.
Actions in CK are functionalities offered by modules to operate on CK entries. Let's make a few concrete examples:
programmodule offers actions for
datasetmodule offers an action for adding new files into an existing dataset (
experimentmodule offers actions for
adding new experiments,
browseexisting once, or
Every command line in CK has the same basic form to perform an action of a particular module:
ck action module
Therefore, we write:
ck compile program,
ck add_file_to dataset,
ck rerun experiment, and so on.
This style is deliberately designed so that the commands read like sentences. I call this
ck action module structure the grammar of CK.
CK commands which talk about particular entries specify them by using the following notation:
ck action module:entry
Sometimes it is required to help CK distinguish between entries in different repositories. In these cases we have to write:
ck action repository:module:entry
Many modules allow to specify additional options as command line flags. You can get a full list of supported actions by calling on a particular module:
ck help module
CK modules for managing repositories and modules
There exists CK modules for managing repositories and modules themselves. These are called
module and are briefly described here.
Repositories are a central concept in CK (as we have seen above) which are managed by the
Here are some things one can do with this module:
ck info repolists information about the
ck help repolists all possible actions one can perform with a CK repository
ck list repolists all installed repositories
There are a number of things one can do with a particular repository. We take the
ck-autotuning repository as an example:
ck pull repo:ck-autotuninginstalls or updates the
ck-autotuningrepository to the latest version on the remote server (It is performing a
git pullon the GitHub repository: https://github.com/ctuning/ck-autotuning)
ck info repo:ck-autotuninglists information about the
ck find repo:ck-autotuninglists the path where the
ck-autotuningrepository is installed
Modules are managed by a module called
Similarly to the actions on repositories one can:
ck info modulelists information about the
ck help modulelists all possible actions one can perform with a CK module
ck list modulelist all installed modules, across all installed repositories
To list only the modules of a particular repository, for example
ck-autotuning one can execute:
ck list module --repo_uoa=ck-autotuning
--repo_uoa=ck-autotuning part is an input argument passed to the
list action of the module
module. To list all the possible input arguments of an action call:
ck action module --help.
So for example:
ck list module --help. This will print a description of the action and which input arguments it will process and what output it will return.
Common CK actions
There are some actions which can be used on every module. These are called common actions. You can list all common actions by running:
Furthermore, you can always call
ck action module --help to get learn about the input arguments and return values of an action.
Many of the common actions are for managing ck entries, the most important of them are:
ck add module:entryadds a new ck entry called
entryto the module named ***
ck cp module1:entry1 module2:entry2copies ck entry called
ck find module:entryprints the path of the ck entry named
entryfrom module ***
ck mv module1:entry1 module2:entry2moves ck entry called
ck rm module:entryremoves (deletes) an existing ck entry called
entryfrom the module named ***
Where to go from here?
I only scratched the surface of CK. I haven't talked about the meta data format (which is
JSON) and the implementation of your own custom modules (which is commonly done in Python).
As I said in the beginning, there is plenty of documentation available on the CK wiki. It is incredible useful to keep the vocabulary (entries, repositories, modules) and the grammar (
ck action module) of CK in mind while reading these documents and start playing around with CK.
For seeing how to implement you own workflow with CK following an example, read the Getting Started Guide.
For learning how to implement portable workflows with CK, by
- Describing and detecting existing software
- Setting up software environment
- Automating installation of a missing software
- and more ...
read the corresponding sections in the Portable Workflows page.
Also, ask questions on the CK mailing list. The community is very much open to answer your questions!