Skip to content

Documentation

Rob edited this page Jan 7, 2021 · 2 revisions
  • Pipeline(self, source, extract, transformations, load)
    • A data pipeline. Comprised of a data object (a DataFrame) and a set of Steps.
      • source: The data source for the pipeline. Either a DataFrame object or fpath of CSV file to read.
      • extract: (Optional) The Step to run for extraction.
      • transformations: List of Steps and Transforms to run.
      • load: (Optional) The final Step in a pipeline. Should save or pass Pipeline.data somewhere.
  • Step(self, func, *args, **kwargs)
    • A function and a set of arguments that are called during Pipeline.run().
  • Transform(self, func, *args, **kwargs)
    • A subclass of Step. When run, its function is passed Pipeline.data as the first positional argument.
  • Load(self, func, *args, **kwargs)
    • A subclass of Transform. It requires a destination keyword argument (indicates where the data will be saved or passed to).
Clone this wiki locally