Skip to content

lexual/bumblebee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bumblebee

Write ETL descriptions in YAML. Should be trivial to extend to JSON.

Just a thought-bubble at the moment, but the idea is:

  • Readable by anyone, including non-programmers.
  • Create a simple webapp with GUI to create this YAML.

e.g. this dataset: http://www.bom.gov.au/fwo/IDV60901/IDV60901.95936.axf

read_from_row_that_starts_with: sort_order
number_of_rows_to_skip_at_file_end: 1
read_these_columns_in_these_formats:
    date:
        - local_date_time_full[80]
list_of_actions:
    - rename_column:
        - time = local_date_time_full[80]
        - name = name[80]
    - run_these_formula:
        - air_temp_f = air_temp * 9 / 5 + 32
    - only_keep_rows_where:
        - apparent_t < 13 & name == 'Melbourne (Olympic Park)'
    - only_keep_these_columns:
        - name
        - time
        - air_temp
        - air_temp_f
        - apparent_t

Then Python calls are:

import bumblebee as bb

c = bb.Convertor.from_yaml('etl.yaml')
output = c.transform('my_data.csv')

Gives:

            name                time  air_temp  air_temp_f  apparent_t
    0  Melbourne 2014-08-08 22:00:00      10.1       50.18         6.3
    1  Melbourne 2014-08-08 21:30:00      10.5       50.90         7.8
    2  Melbourne 2014-08-08 21:00:00      10.9       51.62         8.8

Current Operations:

  • change_date_or_time_format
  • copy_column
  • rename_column
  • extract_query_string
  • extract_text
  • only_keep_these_columns
  • only_keep_rows_where
  • only_edit_rows_where
  • run_these_formula
  • remove_columns
  • remove_duplicates
  • replace_text
  • add_text_at_end
  • add_text_at_start
  • sum_up_by
  • make_column_names_lowercase
  • make_column_names_alphanumeric
  • ensure_column_is_in_this_format

Supports Python 2 & 3.

About

Transforms things

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published