Articles about CK
CSS
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
assets/css
README.md
_config.yml

README.md

CK: Collective Knowledge

This blog post gives a brief introduction into CK and its basic concepts. There is a ton of existing documentation out there in the CK wiki on GitHub. All of this documentation can easily feel overwhelming. This is why I wrote this deliberately short and lightweight introduction into some of the fundamental basic concepts of CK, which helped me a lot in understanding CK.

I assume that you have the CK tool installed on your machine, which you can easily check by running ck version. If this returns an error you want to install CK by running pip install ck [^1].

[^1]: If you have troubles installing CK this way you find more information in the CK wiki.

So what is CK?

To put it quite generic, CK is a tool which helps organise and work with stuff you care about. Stuff can be a lot of different things, such as research data, programs or scripts analysing this data, as well as the resulting data obtained by the analysis -- just to give a typical research workflow as an example.

CK helps you to organise this stuff by assigning unique identifiers (so called 'UIDs') to every entry registered with ck. Entries are stored in repositories which facilitate sharing. A special type of entries are modules which implement the functionality of CK. CK comes with a set of built-in modules, but you can also write custom modules yourself.

Entries, repositories, and modules are the basic vocabulary of CK. Let's start talking more about them.

CK Entries

CK tracks entities by assigning them unique identifiers. Each entry is stored in a separate directory and CK also stores additional metadata in form of a couple of JSON files for each entry. These file are stored in the .cm subdirectory of the entry. There are three metadata files:

  • .cm/info.json stores information like, who is the author or what is the license of the entry, etc.
  • .cm/meta.json stores arbitrary meta information about the entry, which is used by the CK modules to process this entry. One important example are tags which are identifying words which can be used to filter out common entries.
  • .cm/desc.json is indented for a documentary description of the entry, but currently mostly empty.

CK Repositories

In CK a repository is a collection of entries which are meant to be shared with other people. CK uses a tool called git which makes it incredible easy to share repositories among team members or make them publicly available. Websites such as GitHub or Bitbucket can be used to host CK repositories online.

Ck stores all of the repositories in one central folder. On linux and macOS this is by default: $HOME/CK_REPOS.

CK Modules

Modules in CK group entries as well as actions to operate on these entries. CK entries which are operated on by a particular module are put in a directory which has the same name as the module. For example:

  • Programs, which are compiled and run by the program module, are put in a directory called program.
  • Datasets, which are extended by the dataset module, are put in a directory called dataset.
  • Experiments, which are added, browsed, and rerun by the experiment module, are put in a directory called experiment.

This leads to a familiar directory structure where at the top-level directories are called after CK modules, e.g., program, dataset, and experiment. At the second-level directories store the actual programs, datasets, and experiments you care about, e.g., program/my-awesome-program, dataset/my-awesome-dataset, and experiment/my-awesome-experiment. These are themselves CK entries with their own metadata and UIDs.

Actions in CK are functionalities offered by modules to operate on CK entries. Let's make a few concrete examples:

  • The program module offers actions for compileing and runing programs.
  • The dataset module offers an action for adding new files into an existing dataset (add_file_to).
  • The experiment module offers actions for adding new experiments, browse existing once, or rerun experiments.

Every command line in CK has the same basic form to perform an action of a particular module:

ck action module

Therefore, we write: ck compile program, ck add_file_to dataset, ck rerun experiment, and so on.

This style is deliberately designed so that the commands read like sentences. I call this ck action module structure the grammar of CK.

CK commands which talk about particular entries specify them by using the following notation:

ck action module:entry

Sometimes it is required to help CK distinguish between entries in different repositories. In these cases we have to write:

ck action repository:module:entry

Many modules allow to specify additional options as command line flags. You can get a full list of supported actions by calling on a particular module:

ck help module

CK modules for managing repositories and modules

There exists CK modules for managing repositories and modules themselves. These are called repo and module and are briefly described here.

repo

Repositories are a central concept in CK (as we have seen above) which are managed by the repo module.

Here are some things one can do with this module:

  • ck info repo lists information about the repo module itself
  • ck help repo lists all possible actions one can perform with a CK repository
  • ck list repo lists all installed repositories

There are a number of things one can do with a particular repository. We take the ck-autotuning repository as an example:

  • ck pull repo:ck-autotuning installs or updates the ck-autotuning repository to the latest version on the remote server (It is performing a git pull on the GitHub repository: https://github.com/ctuning/ck-autotuning)
  • ck info repo:ck-autotuning lists information about the ck-autotuning repository
  • ck find repo:ck-autotuning lists the path where the ck-autotuning repository is installed

module

Modules are managed by a module called module.

Similarly to the actions on repositories one can:

  • ck info module lists information about the module module itself
  • ck help module lists all possible actions one can perform with a CK module
  • ck list module list all installed modules, across all installed repositories

To list only the modules of a particular repository, for example ck-autotuning one can execute:

ck list module --repo_uoa=ck-autotuning

The --repo_uoa=ck-autotuning part is an input argument passed to the list action of the module module. To list all the possible input arguments of an action call: ck action module --help. So for example: ck list module --help. This will print a description of the action and which input arguments it will process and what output it will return.

Common CK actions

There are some actions which can be used on every module. These are called common actions. You can list all common actions by running: ck help.

Furthermore, you can always call ck action module --help to get learn about the input arguments and return values of an action.

Many of the common actions are for managing ck entries, the most important of them are:

  • ck add module:entry adds a new ck entry called entry to the module named ***module***.
  • ck cp module1:entry1 module2:entry2 copies ck entry called entry1 from module1 into entry2 in ***module2***.
  • ck find module:entry prints the path of the ck entry named entry from module ***module***.
  • ck mv module1:entry1 module2:entry2 moves ck entry called entry1 from module1 to entry2 in ***module2***.
  • ck rm module:entry removes (deletes) an existing ck entry called entry from the module named ***module***.

Where to go from here?

I only scratched the surface of CK. I haven't talked about the meta data format (which is JSON) and the implementation of your own custom modules (which is commonly done in Python).

As I said in the beginning, there is plenty of documentation available on the CK wiki. It is incredible useful to keep the vocabulary (entries, repositories, modules) and the grammar (ck action module) of CK in mind while reading these documents and start playing around with CK.

The two most appropriate starting points are the Getting Started Guide and the Portable Workflows page.

For seeing how to implement you own workflow with CK following an example, read the Getting Started Guide.

For learning how to implement portable workflows with CK, by

  • Describing and detecting existing software
  • Setting up software environment
  • Automating installation of a missing software
  • and more ...

read the corresponding sections in the Portable Workflows page.

Also, ask questions on the CK mailing list. The community is very much open to answer your questions!