Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

planning 2018.05.16

Kamil A. Kaczmarek edited this page May 16, 2018 · 11 revisions

info

  • notebooks to be merged without any serious changes
  • name is gradus

status

MR results for:

discussion

  • define single caching directory for all transformers (Per project? Context handler? Something else?) -> do not do it per transformer
  • cache/save output
  • delete cache at the end of pipeline computations
  • separate output dirs for train validation test and user-defined splits
  • Input has complicated notation: nested dicts in input -> simplify interface -> DataStep should merge input_step and input_data into one API piece.
data = {'input':
          {
               'X': X_train,
               'y': y_train,
           }
        }

notes

  • You add input that is never used -> you see it on the graph.
  • cache_transformer -> persisted_transformer
  • Do not transform twice!