Releases · holukas/dataflow · GitHub

01 Dec 14:27

holukas

v0.21.1 Latest

Latest

v0.21.1 | 5 Sep 2024

Fixed bug: output file that contains variables that were not greenlit was not created correctly (
dataflow.main.DataFlow._store_info_csv)

Full Changelog: v0.21.0...v0.21.1

Assets 2

08 Jul 13:47

holukas

v0.21.0

v0.21.0 | 8 Jul 2024

Added new function to parse position indices for specific variables. In the config files, the
setting parse_pos_indices can be set for single variables where position indices are available.

Full Changelog: v0.20.1...v0.21.0

Assets 2

04 Jul 08:04

holukas

v0.20.1

v0.20.1 | 4 Jul 2024

Fixed release

v0.20.0 | 4 Jul 2024

This is a major update that refactors many parts of the code
Removed: dbc-influxdb dependency, required functionality is now directly built into dataflow
Adjusted how .read_csv() reads data files, to comply with current pandas requirements. For all filetypes, the
timestamp is now built in a separate step after reading the file, never during
reading. (dataflow.filetypereader.filetypereader.FileTypeReader._add_timestamp)
Refactored the way rawfunc is handled. rawfunc variables are now created and added to the main dataframe before
looping through the main dataframe. This means that rawfunc variables are now handled like the variables from the
data files. All relevant tag entries are adjusted during the execution of the respective rawfunc.
Uploading to the database does not require the Python dependency dbc-influxdb anymore. dataflow uses its own
uploading routine. This was necessary to guarantee faster execution and cleaner code.
Added new function to apply gain between two dates (dataflow.rawfuncs.common.apply_gain_between_dates)
Added new function to add offset between to dates(dataflow.rawfuncs.common.add_offset_between_dates)
Added new rawfunc to correct O2 measurements using temperature, used at
site CH-CHA (dataflow.rawfuncs.ch_cha.correct_o2)
gain is now set to 1 as a float if not specifically given, before it was an integer
Added new database tag offset
Added new database tag site

What's Changed

v0.20.0 by @holukas in #8

New Contributors

@holukas made their first contribution in #8

Full Changelog: v0.12.2...v0.20.1

Contributors

holukas

Assets 2

11 Jun 10:40

holukas

v0.12.2

v0.12.2 | 11 Jun 2024

Changes

Run ID now also includes nanoseconds to better differentiate between (many) log files of runs that were started in
parallel (dataflow.common.times.make_run_id)
Added parameter to add an optional suffix to Run ID
Updated dbc-influxdb dependency to v0.11.3

Full Changelog: v0.12.1...v0.12.2

Assets 2

21 May 12:28

holukas

v0.12.1

v0.12.1 | 21 May 2024

Changes

Variables are now strictly converted to float, because the automatic detection of datatypes confused the database
which lead to values being skipped because int was expected but float was
delivered (dataflow.main.DataFlow._to_numeric)
Columns of objects type are now officially
excluded infer_objects=False (dataflow.main.DataFlow._convert_to_float_or_string)

Full Changelog: v0.12.0...v0.12.1

Assets 2

07 May 17:37

holukas

v0.12.0

v0.12.0 | 7 May 2024

Handling old meteo files

In the configs it is now possible to define multiple IDs that identify good data rows. In dataflow this is now
handled accordingly.
In the configs, this is done by specifying e.g. data_keep_good_rows: [ 0, [ 102, 103 ], [ 202, 203 ] ], which
means that all data rows that start with either 102 or 103 are kept and use variable info in data_vars,
and 202 or 203 use the variable info given in data_vars2.
In case single integers are given instead of a list, then all records that start with that integer are kept. For
example, data_keep_good_rows: [ 0, 102, 202 ], which means that all data rows that start with 102 are kept and use
variable info in data_vars, and all data rows that start with 202 use the variable info given in data_vars2.

Additions

Added new function to calculate soil water content SWC from SDP variables measured at the site CH-CHA. The
function to do the calculation was taken from the previous MeteoScreening tool. Conversions for other sites follow
later. (dataflow.rawfuncs.ch_cha.calc_swc_from_sdp and dataflow.main.DataFlow._execute_rawfuncs)
After reading the data file, all rows that do not contain a timestamp are now
removed. This is the case e.g. for the file CH-CHA_iDL_BOX1_1min_20160930-1545.csv.gz that contains the
string ap>0.004216865 in the 3rd row of the timestamp column. (dataflow.main.DataFlow._varscanner)

Changes

Updated date offsets to be compliant with new versions of pandas (
see here). (dataflow.common.times.timedelta_to_string)
Adjusted check for missing IDs due to the new option in data_keep_good_rows as described
above (dataflow.main.DataFlow._check_special_format_alternating_missed_ids)
Updated detection of good rows for special format alternating, it can now handle multiple IDs that mark good
rows (dataflow.filetypereader.special_format_alternating.special_format_alternating)

Bugfixes

Fixed EmptyDataError bug when reading compressed gzip files that have filesize zero when uncompressed. This
error occurs when completely empty files are gzipped. In this case, the filesize of the compressed file is > 0.
When the script then tries to uncompress the file, the exception pd.errors.EmptyDataError is raised. There are now
more checks implented to avoid empty dataframes. (dataflow.filetypereader.filetypereader.FileTypeReader._readfile)

Full Changelog: v0.11.4...v0.12.0

Assets 2

01 Mar 23:14

holukas

v0.11.4

Full Changelog: v0.11.3...v0.11.4

Assets 2

01 Mar 22:16

holukas

v0.11.3

Full Changelog: v0.11.2...v0.11.3

Assets 2

23 Feb 11:21

holukas

v0.11.2

Full Changelog: v0.11.1...v0.11.2

Assets 2

03 Feb 12:55

holukas

v0.11.1

.

Assets 2