Merge pull request #94 from epigen/implied_columns

Implement implied columns
pepkit · May 4, 2017 · ff0db70 · ff0db70
2 parents 6e78c6d + 9eee2a2
commit ff0db70
Show file tree

Hide file tree

Showing 10 changed files with 391 additions and 104 deletions.
diff --git a/doc/source/changelog.rst b/doc/source/changelog.rst
@@ -1,6 +1,14 @@
 Changelog
 ******************************
 
+- **v0.6** (*unreleased*):
+
+  - New
+
+    - Adds support for implied_column section of the project config file
+
+    - Adds support for Python 3
+
 - **v0.5** (*2017-03-01*):
 
   - New

diff --git a/doc/source/implied-columns.rst b/doc/source/implied-columns.rst
@@ -0,0 +1,22 @@
+.. _advanced-implied-columns:
+
+Implied columns
+=============================================
+
+At some point, you will have a situation where you need a single sample attribute (or column) to populate several different pipeline arguments. In other words, the value of a given attribute may **imply** values for other attributes. It would be nice if you didn't have to enumerate all of these secondary, implied attributes, and could instead just infer them from the value of the original attribute. For example, if my `organism` attribute is ``human``, I want to set an attribute ``genome`` to ``hg38`` **and** an attribute ``genome_size`` to `hs`. Looper lets you do this with a feature called ``implied columns``. Instead of hard-coding ``genome`` and ``macs_genome_size`` in the sample annotation sheet, you can simply specify that organism ``human`` implies such-and-such additional attribute-value pairs (and, perhaps, organism ``mouse`` implies others), all in your project configuration file.
+
+To do this, just add an ``implied_columns`` section to your project_config.yaml file.
+Example:
+
+.. code-block:: yaml
+
+  implied_columns:
+    organism:
+      human:
+        genome: "hg38"
+        macs_genome_size: "hs"
+      mouse:
+        genome: "mm10"
+        macs_genome_size: "mm"
+
+In this example, any samples with organism set to "human" will automatically also have attributes for genome (hg38) and for macs_genome_size (hs). Any samples with organism set to "mouse" will have the corresponding values.
diff --git a/doc/source/index.rst b/doc/source/index.rst
@@ -22,6 +22,7 @@ Contents
 
 	define-your-project.rst
 	derived-columns.rst
+	implied-columns.rst
 	cluster-computing.rst
 	advanced.rst
 

diff --git a/doc/source/project-config.rst b/doc/source/project-config.rst
@@ -51,6 +51,23 @@ Example:
 For more details, see :ref:`advanced-derived-columns`.
 
 
+Project config section: implied_columns
+"""""""""""""""""""""""""""""""""""""""""""
+``implied_columns`` lets you infer additional attributes, which can be useful for pipeline arguments.
+
+Example:
+
+.. code-block:: yaml
+
+  implied_columns:
+    organism:
+      human:
+        genome: "hg38"
+        macs_genome_size: "hs"
+
+For more details, see :ref:`advanced-implied-columns`.
+
+
 Project config section: subprojects
 """""""""""""""""""""""""""""""""""""""""""""""
 

diff --git a/looper/__init__.py b/looper/__init__.py
@@ -41,7 +41,6 @@ def setup_looper_logger(level, additional_locations=None, devmode=False):
     :return logging.Logger: project-root logger
     """
 
-    logging.addLevelName(0, "EVERYTHING")
     logging.addLevelName(5, "VERY_FINE")
 
     fmt = DEV_LOGGING_FMT if devmode else DEFAULT_LOGGING_FMT