planning 2018.05.16

info

define single caching directory for all transformers (Per project? Context handler? Something else?) -> do not do it per transformer
cache/save output
delete cache at the end of pipeline computations
separate output dirs for train validation test and user-defined splits
Input has complicated notation: nested dicts in input -> simplify interface -> DataStep should merge input_step and input_data into one API piece.

data = {'input':
          {
               'X': X_train,
               'y': y_train,
           }
        }