Skip to content

Releases: holukas/dataflow

v0.21.1

01 Dec 14:27

Choose a tag to compare

v0.21.1 | 5 Sep 2024

  • Fixed bug: output file that contains variables that were not greenlit was not created correctly (
    dataflow.main.DataFlow._store_info_csv)

Full Changelog: v0.21.0...v0.21.1

v0.21.0

08 Jul 13:47

Choose a tag to compare

v0.21.0 | 8 Jul 2024

  • Added new function to parse position indices for specific variables. In the config files, the
    setting parse_pos_indices can be set for single variables where position indices are available.

Full Changelog: v0.20.1...v0.21.0

v0.20.1

04 Jul 08:04

Choose a tag to compare

v0.20.1 | 4 Jul 2024

  • Fixed release

v0.20.0 | 4 Jul 2024

  • This is a major update that refactors many parts of the code
  • Removed: dbc-influxdb dependency, required functionality is now directly built into dataflow
  • Adjusted how .read_csv() reads data files, to comply with current pandas requirements. For all filetypes, the
    timestamp is now built in a separate step after reading the file, never during
    reading. (dataflow.filetypereader.filetypereader.FileTypeReader._add_timestamp)
  • Refactored the way rawfunc is handled. rawfunc variables are now created and added to the main dataframe before
    looping through the main dataframe. This means that rawfunc variables are now handled like the variables from the
    data files. All relevant tag entries are adjusted during the execution of the respective rawfunc.
  • Uploading to the database does not require the Python dependency dbc-influxdb anymore. dataflow uses its own
    uploading routine. This was necessary to guarantee faster execution and cleaner code.
  • Added new function to apply gain between two dates (dataflow.rawfuncs.common.apply_gain_between_dates)
  • Added new function to add offset between to dates(dataflow.rawfuncs.common.add_offset_between_dates)
  • Added new rawfunc to correct O2 measurements using temperature, used at
    site CH-CHA (dataflow.rawfuncs.ch_cha.correct_o2)
  • gain is now set to 1 as a float if not specifically given, before it was an integer
  • Added new database tag offset
  • Added new database tag site

What's Changed

New Contributors

Full Changelog: v0.12.2...v0.20.1

v0.12.2

11 Jun 10:40

Choose a tag to compare

v0.12.2 | 11 Jun 2024

Changes

  • Run ID now also includes nanoseconds to better differentiate between (many) log files of runs that were started in
    parallel (dataflow.common.times.make_run_id)
  • Added parameter to add an optional suffix to Run ID
  • Updated dbc-influxdb dependency to v0.11.3

Full Changelog: v0.12.1...v0.12.2

v0.12.1

21 May 12:28

Choose a tag to compare

v0.12.1 | 21 May 2024

Changes

  • Variables are now strictly converted to float, because the automatic detection of datatypes confused the database
    which lead to values being skipped because int was expected but float was
    delivered (dataflow.main.DataFlow._to_numeric)
  • Columns of objects type are now officially
    excluded infer_objects=False (dataflow.main.DataFlow._convert_to_float_or_string)

Full Changelog: v0.12.0...v0.12.1

v0.12.0

07 May 17:37

Choose a tag to compare

v0.12.0 | 7 May 2024

Handling old meteo files

  • In the configs it is now possible to define multiple IDs that identify good data rows. In dataflow this is now
    handled accordingly.
  • In the configs, this is done by specifying e.g. data_keep_good_rows: [ 0, [ 102, 103 ], [ 202, 203 ] ], which
    means that all data rows that start with either 102 or 103 are kept and use variable info in data_vars,
    and 202 or 203 use the variable info given in data_vars2.
  • In case single integers are given instead of a list, then all records that start with that integer are kept. For
    example, data_keep_good_rows: [ 0, 102, 202 ], which means that all data rows that start with 102 are kept and use
    variable info in data_vars, and all data rows that start with 202 use the variable info given in data_vars2.

Additions

  • Added new function to calculate soil water content SWC from SDP variables measured at the site CH-CHA. The
    function to do the calculation was taken from the previous MeteoScreening tool. Conversions for other sites follow
    later. (dataflow.rawfuncs.ch_cha.calc_swc_from_sdp and dataflow.main.DataFlow._execute_rawfuncs)
  • After reading the data file, all rows that do not contain a timestamp are now
    removed. This is the case e.g. for the file CH-CHA_iDL_BOX1_1min_20160930-1545.csv.gz that contains the
    string ap>0.004216865 in the 3rd row of the timestamp column. (dataflow.main.DataFlow._varscanner)

Changes

  • Updated date offsets to be compliant with new versions of pandas (
    see here). (dataflow.common.times.timedelta_to_string)
  • Adjusted check for missing IDs due to the new option in data_keep_good_rows as described
    above (dataflow.main.DataFlow._check_special_format_alternating_missed_ids)
  • Updated detection of good rows for special format alternating, it can now handle multiple IDs that mark good
    rows (dataflow.filetypereader.special_format_alternating.special_format_alternating)

Bugfixes

  • Fixed EmptyDataError bug when reading compressed gzip files that have filesize zero when uncompressed. This
    error occurs when completely empty files are gzipped. In this case, the filesize of the compressed file is > 0.
    When the script then tries to uncompress the file, the exception pd.errors.EmptyDataError is raised. There are now
    more checks implented to avoid empty dataframes. (dataflow.filetypereader.filetypereader.FileTypeReader._readfile)

Full Changelog: v0.11.4...v0.12.0

v0.11.4

01 Mar 23:14

Choose a tag to compare

Full Changelog: v0.11.3...v0.11.4

v0.11.3

01 Mar 22:16

Choose a tag to compare

Full Changelog: v0.11.2...v0.11.3

v0.11.2

23 Feb 11:21

Choose a tag to compare

Full Changelog: v0.11.1...v0.11.2

v0.11.1

03 Feb 12:55

Choose a tag to compare