Releases · holukas/diive

11 Jun 14:02

holukas

v0.77.0

60e6623

v0.77.0 Latest

Latest

v0.77.0 | 11 Jun 2024

Additions

Plotting cumulatives with CumulativeYear now also shows the cumulative for the reference, i.e. for the mean over the
reference years (diive.core.plotting.cumulative.CumulativeYear)
Plotting DielCycle now accepts ylim parameter (diive.core.plotting.dielcycle.DielCycle)
Added long-term dataset for local testing purposes (internal
only) (diive.configs.exampledata.load_exampledata_parquet_long)
Added several classes in preparation for long-term gap-filling for a future update

Changes

Several updates and changes to the base class for regressor decision
trees (diive.core.ml.common.MlRegressorGapFillingBase):
- The data are now split into training set and test set at the very start of regressor setup. This test set is used
  to evaluate models on unseen data. The default split is 80% training and 20% test data.
- Plotting (scores, importances etc.) is now generally separated from the method where they are calculated.
- the same random_state is now used for all processing steps
- refactored code
- beautified console output
When correcting for relative humidity values above 100%, the maximum of the corrected time series is now set to 100,
after the (daily) offset was removed (diive.pkgs.corrections.offsetcorrection.remove_relativehumidity_offset)
During feature reduction in machine learning regressors, features with permutation importance < 0 are now always
removed (diive.core.ml.common.MlRegressorGapFillingBase._remove_rejected_features)
Changed default parameters for quick random forest gap-filling (diive.pkgs.gapfilling.randomforest_ts.QuickFillRFTS)
I tried to improve the console output (clarity) for several functions and methods

Environment

Added package dtreeviz to visualize decision trees

Notebooks

Updated notebook (notebooks/GapFilling/RandomForestGapFilling.ipynb)
Updated notebook (notebooks/GapFilling/LinearInterpolation.ipynb)
Updated notebook (notebooks/GapFilling/XGBoostGapFillingExtensive.ipynb)
Updated notebook (notebooks/GapFilling/XGBoostGapFillingMinimal.ipynb)
Updated notebook (notebooks/GapFilling/RandomForestParamOptimization.ipynb)
Updated notebook (notebooks/GapFilling/QuickRandomForestGapFilling.ipynb)

Tests

Updated and fixed test case (tests.test_outlierdetection.TestOutlierDetection.test_zscore_increments)
Updated and fixed test case (tests.test_gapfilling.TestGapFilling.test_gapfilling_randomforest)

What's Changed

Ml long term gap filling by @holukas in #128

Full Changelog: v0.76.2...v0.77.0

Contributors

holukas

Assets 2

24 May 23:19

holukas

v0.76.2

ceebdb4

v0.76.2

v0.76.2 | 23 May 2024

Additions

Added function to calculate absolute double differences of a time series, which is the sum of absolute differences
between a data record and its preceding and next record. Used in class zScoreIncrements for finding (isolated)
outliers that are distant from neighboring records. (diive.core.dfun.stats.double_diff_absolute)
Added small function to calculate z-score stats of a time series (diive.core.dfun.stats.sstats_zscore)
Added small function to calculate stats for absolute double differences of a time
series (diive.core.dfun.stats.sstats_doublediff_abs)

Changes

Changed the algorithm for outlier detection when using zScoreIncrements. Data points are now flagged as outliers if
the z-scores of three absolute differences (previous record, next record and the sum of both) all exceed a specified
threshold. (diive.pkgs.outlierdetection.incremental.zScoreIncrements)

Notebooks

Added new notebook for outlier detection using
class LocalOutlierFactorAllData (notebooks/OutlierDetection/LocalOutlierFactorAllData.ipynb)

Tests

Added new test case
for LocalOutlierFactorAllData (tests.test_outlierdetection.TestOutlierDetection.test_lof_alldata)

What's Changed

More stats by @holukas in #116

Full Changelog: v0.76.1...v0.76.2

Contributors

holukas

Assets 2

17 May 10:10

holukas

v0.76.1

7878a8b

v0.76.1

v0.76.1 | 17 May 2024

Additions

It is now possible to set a fixed random seed when creating impulse
noise (diive.pkgs.createvar.noise.add_impulse_noise)

Changes

In class zScoreIncrements, outliers are now detected by calculating the sum of the absolute differences between a
data point and its respective preceding and next data point. Before, only the non-absolute difference of the preceding
data point was considered. The sum of absolute differences is then used to calculate the z-score and in further
consequence to flag outliers. (diive.pkgs.outlierdetection.incremental.zScoreIncrements)

Notebooks

Added new notebook for outlier detection using
class zScoreIncrements (notebooks/OutlierDetection/zScoreIncremental.ipynb)
Added new notebook for outlier detection using
class LocalSD (notebooks/OutlierDetection/LocalSD.ipynb)

Tests

Added new test case for zScoreIncrements (tests.test_outlierdetection.TestOutlierDetection.test_zscore_increments)
Added new test case for LocalSD (tests.test_outlierdetection.TestOutlierDetection.test_localsd)

What's Changed

Added more notebooks and test cases by @holukas in #108

Full Changelog: v0.76.0...v0.76.1

Contributors

holukas

Assets 2

14 May 21:33

holukas

v0.76.0

53d72fc

v0.76.0

v0.76.0 | 14 May 2024

Diel cycle plot

The new class DielCycle allows to plot diel cycles per month or across all data for time series data. At the moment,
it plots the (monthly) diel cycles as means (+/- standard deviation). It makes use of the time info contained in the
datetime timestamp index of the data. All aggregates are calculated by grouping data by time and (optional) separately
for each month. The diel cycles have the same time resolution as the time component of the timestamp index, e.g. hourly.

New features

Added new class DielCycle for plotting diel cycles per month (diive.core.plotting.dielcycle.DielCycle)
Added new function diel_cycle for calculating diel cycles per month. This function is also used by the plotting
class DielCycle (diive.core.times.resampling.diel_cycle)

Additions

Added color scheme that contains 12 colors, one for each month. Not perfect, but better than
before. (diive.core.plotting.styles.LightTheme.colors_12_months)

Notebooks

Added new notebook for plotting diel cycles (per month) (notebooks/Plotting/DielCycle.ipynb)
Added new notebook for calculating diel cycles (per month) (notebooks/Resampling/ResamplingDielCycle.ipynb)

Tests

Added test case for new function diel_cycle (tests.test_resampling.TestResampling.test_diel_cycle)

What's Changed

Diel cycle plot by @holukas in #107

Full Changelog: v0.75.0...v0.76.0

Contributors

holukas

Assets 2

26 Apr 11:26

holukas

v0.75.0

e648180

v0.75.0

v0.75.0 | 26 Apr 2024

XGBoost gap-filling

XGBoost can now be used to fill gaps in time series data.
In diive, XGBoost is implemented in class XGBoostTS, which adds additional options for easily including e.g.
lagged variants of feature variables, timestamp info (DOY, month, ...) and a continuous record number. It also allows
direct feature reduction by including a purely random feature (consisting of completely random numbers) and calculating
the 'permutation importance'. All features where the permutation importance is lower than for the random feature can
then be removed from the dataset, i.e., the list of features, before building the final model.

XGBoostTS and RandomForestTS both use the same base class MlRegressorGapFillingBase. This base class will also
facilitate the implementation of other gap-filling algorithms in the future.

Another fun (for me) addition is the new class TimeSince. It allows to calculate the time since the last occurrence of
specific conditions. One example where this class can be useful is the calculation of 'time since last precipitation',
expressed as number of records, which can be helpful in identifying dry conditions. More examples: 'time since freezing
conditions' based on air temperature; 'time since management' based on management info, e.g. fertilization events.
Please see the notebook for some illustrative examples.

Please note that diive is still under developement and bugs can be expected.

New features

Added gap-filling class XGBoostTS for time series data,
using XGBoost (diive.pkgs.gapfilling.xgboost_ts.XGBoostTS)
Added new class TimeSince: counts number of records (inceremental number / counter) since the last time a time
series was inside a specified range, useful for e.g. counting the time since last precipitation, since last freezing
temperature, etc. (diive.pkgs.createvar.timesince.TimeSince)

Additions

Added base class for machine learning regressors, which is basically the code shared between the different
methods. At the moment used by RandomForestTS and XGBoostTS. (diive.core.ml.common.MlRegressorGapFillingBase)
Added option to change line color directly in TimeSeries plots (diive.core.plotting.timeseries.TimeSeries.plot)

Notebooks

Added new notebook for gap-filling using XGBoostTS with mininmal settings (notebooks/GapFilling/XGBoostGapFillingMinimal.ipynb)
Added new notebook for gap-filling using XGBoostTS with more extensive settings (notebooks/GapFilling/XGBoostGapFillingExtensive.ipynb)
Added new notebook for creating TimeSince variables (notebooks/CalculateVariable/TimeSince.ipynb)

Tests

Added test case for XGBoost gap-filling (tests.test_gapfilling.TestGapFilling.test_gapfilling_xgboost)
Updated test case for random forest gap-filling (tests.test_gapfilling.TestGapFilling.test_gapfilling_randomforest)
Harmonized test case for XGBoostTS with test case of RandomForestTS
Added test case for TimeSince variable creation (tests.test_createvar.TestCreateVar.test_timesince)

What's Changed

Adding xgboost by @holukas in #102

Full Changelog: v0.74.1...v0.75.0

Contributors

holukas

Assets 2

22 Apr 22:54

holukas

v0.74.1

b9c0129

v0.74.1

v0.74.1 | 23 Apr 2024

This update adds the first notebooks (and tests) for outlier detection methods. Only two tests are included so far and
both tests are relatively simple, but both notebooks already show in principle how outlier removal is handled. An
important aspect is that diive single outlier methods do not remove outliers by default, but instead a flag is created
that shows where the outliers are located. The flag can then be used to remove the data points.
This update also includes the addition of a small function that creates artificial spikes in time series data and is
therefore very useful for testing outlier detection methods.
More outlier removal notebooks will be added in the future, including a notebook that shows how to combine results from
multiple outlier tests into one single overall outlier flag.

New features

Added: new function to add impulse noise to time series (diive.pkgs.createvar.noise.impulse)

Notebooks

Added: new notebook for outlier detection: absolute limits, separately for daytime and nighttime
data (notebooks/OutlierDetection/AbsoluteLimitsDaytimeNighttime.ipynb)
Added: new notebook for outlier detection: absolute limits (notebooks/OutlierDetection/AbsoluteLimits.ipynb)

Tests

Added: test case for outlier detection: absolute limits, separately for daytime and
nighttime data (tests.test_outlierdetection.TestOutlierDetection.test_absolute_limits)
Added: test case for outlier detection: absolute
limits (tests.test_outlierdetection.TestOutlierDetection.test_absolute_limits)

What's Changed

Outlier notebooks by @holukas in #95
Update README.md by @inkenbrandt in #86
Update pyproject.toml by @inkenbrandt in #85

Full Changelog: v0.74.0...v0.74.1

Contributors

inkenbrandt and holukas

Assets 2

21 Apr 12:29

holukas

v0.74.0

6a4d7a2

v0.74.0

v0.74.0 | 21 Apr 2024

Additions

Added: new function to remove rows that do not have timestamp
info (NaT) (diive.core.times.times.remove_rows_nat and diive.core.times.times.TimestampSanitizer)
Added: new settings VARNAMES_ROW and VARUNITS_ROW in filetypes YAML files, allows better and more specific
configuration when reading data files (diive/configs/filetypes)
Added: many (small) example data files for various filetypes, e.g. ETH-RECORD-TOA5-CSVGZ-20HZ
Added: new optional check in TimestampSanitizer that compares the detected time resolution of a time series with
the nominal (expected) time resolution. Runs automatically when reading files with ReadFileType, in which case
the FREQUENCY from the filetype configs is used as the nominal time
resolution. (diive.core.times.times.TimestampSanitizer, diive.core.io.filereader.ReadFileType)
Added: application of TimestampSanitizer after inserting a timestamp and setting it as index with
function insert_timestamp, this makes sure the freq/freqstr info is available for the new timestamp
index (diive.core.times.times.insert_timestamp)

Notebooks

General: Ran all notebook examples to make sure they work with this version of diive
Added: new notebook for reading EddyPro fluxnet output file with DataFileReader
parameters (notebooks/ReadFiles/Read_single_EddyPro_fluxnet_output_file_with_DataFileReader.ipynb)
Added: new notebook for reading EddyPro fluxnet output file with ReadFileType and pre-defined
filetype EDDYPRO-FLUXNET-CSV-30MIN (notebooks/ReadFiles/Read_single_EddyPro_fluxnet_output_file_with_ReadFileType.ipynb)
Added: new notebook for reading multiple EddyPro fluxnet output files with MultiDataFileReader and pre-defined
filetype EDDYPRO-FLUXNET-CSV-30MIN (notebooks/ReadFiles/Read_multiple_EddyPro_fluxnet_output_files_with_MultiDataFileReader.ipynb)

Changes

Renamed: function get_len_header to parse_header(diive.core.dfun.frames.parse_header)
Renamed: exampledata files (diive/configs/exampledata)
Renamed: filetypes YAML files to always include the file extension in the file name (diive/configs/filetypes)
Reduced: file size for most example data files

Tests

Added: various test cases for loading filetypes (tests/test_loaddata.py)
Added: test case for loading and merging multiple
files (tests.test_loaddata.TestLoadFiletypes.test_load_exampledata_multiple_EDDYPRO_FLUXNET_CSV_30MIN)
Added: test case for reading EddyPro fluxnet output file with DataFileReader
parameters (tests.test_loaddata.TestLoadFiletypes.test_load_exampledata_EDDYPRO_FLUXNET_CSV_30MIN_datafilereader_parameters)
Added: test case for resampling series to 30MIN time
resolution (tests.test_time.TestTime.test_resampling_to_30MIN)
Added: test case for inserting timestamp with a different convention (middle, start,
end) (tests.test_time.TestTime.test_insert_timestamp)
Added: test case for inserting timestamp as index (tests.test_time.TestTime.test_insert_timestamp_as_index)

Bugfixes

Fixed: bug in class DetectFrequency when inferred frequency is None (diive.core.times.times.DetectFrequency)
Fixed: bug in class DetectFrequency where pd.Timedelta() would crash if the input frequency does not have a
number. Timedelta does not accept e.g. the frequency string min for minutely time resolution, even though
e.g. pd.infer_freq() outputs min for data in 1-minute time resolution. TimeDelta requires a number, in this
case 1min. Results from infer_freq() are now checked if they contain a number and if not, 1 is added at the
beginning of the frequency string. (diive.core.times.times.DetectFrequency)
Fixed: bug in notebook WindDirectionOffset, related to frequency detection during heatmap plotting
Fixed: bug in TimestampSanitizer where the script would crash if the timestamp contained an element that could
not be converted to datetime, e.g., when there is a string mixed in with the regular timestamps. Data rows with
invalid timestamps are now parsed as NaT by using errors='coerce'
in pd.to_datetime(data.index, errors='coerce'). (diive.core.times.times.convert_timestamp_to_datetime
and diive.core.times.times.TimestampSanitizer)
Fixed: bug when plotting heatmap (diive.core.plotting.heatmap_datetime.HeatmapDateTime)

What's Changed

Update read csv and notebooks by @holukas in #93
Added new and updated test cases by @holukas in #94

Full Changelog: v0.73.0...v0.74.0

Contributors

holukas

Assets 2

17 Apr 20:59

holukas

v0.73.0

b8a9369

v0.73.0

v0.73.0 | 17 Apr 2024

New features

Added new function trim_frame that allows to trim the start and end of a dataframe based on available records of a
variable (diive.core.dfun.frames.trim_frame)
Added new option to export borderless
heatmaps (diive.core.plotting.heatmap_base.HeatmapBase.export_borderless_heatmap)

Additions

Added more info in comments of class WindRotation2D (diive.pkgs.echires.windrotation.WindRotation2D)
Added example data for EddyPro full_output
files (diive.configs.exampledata.load_exampledata_eddypro_full_output_CSV_30MIN)
Added code in an attempt to harmonize frequency detection from data: in class DetectFrequency the detected
frequency strings are now converted from Timedelta (pandas) to offset (pandas) to .freqstr. This will yield
the frequency string as seen by (the current version of) pandas. The idea is to harmonize between different
representations e.g. T or min for minutes. Currently it seems that pandas is not consistent with e.g. the
represenation of minutes, using T in .infer_freq() but min
for Timedelta (
see here). (diive.core.times.times.DetectFrequency)

Changes

Updated class DataFileReader to comply with new pandas kwargs when
using .read_csv() (diive.core.io.filereader.DataFileReader._parse_file)
Environment: updated pandas to v2.2.2 and pyarrow to v15.0.2
Updated date offsets in config filetypes to be compliant with pandas version 2.2+ (
see here and here), e.g., 30T was changed
to 30min. This seems to work without raising a warning, however, if frequency is inferred from available data,
the resulting frequency string shows e.g. 30T, i.e. still showing T for minutes instead
of min. (diive/configs/filetypes)
Changed variable names in WindRotation2D to be in line with the variable names given in the paper by Wilczak et
al. (2001) https://doi.org/10.1023/A:1018966204465

Removals

Removed function timedelta_to_string because this can be done with pandas to_offset().freqstr
Removed function generate_freq_str (unused)

Tests

Added test case for reading EddyPro full_output
files (tests.test_loaddata.TestLoadFiletypes.test_load_exampledata_eddypro_full_output_CSV_30MIN)
Updated test for frequency detection (tests.test_timestamps.TestTime.test_detect_freq)

What's Changed

Adding trim frame by @holukas in #81

Full Changelog: v0.72.1...v0.73.0

Contributors

holukas

Assets 2

26 Mar 21:15

holukas

v0.72.1

c90732c

v0.72.1

v0.72.1 | 26 Mar 2024

pyproject.toml now uses the inequality syntax >= instead of caret syntax ^ because the version capping is
restrictive and prevents compatibility in conda installations. See #74
Added badges in README.md
Smaller diive logo in README.md

What's Changed

Update pyproject.toml by @inkenbrandt in #74
Minor updates by @holukas in #77

Full Changelog: v0.72.0...v0.72.1

Contributors

inkenbrandt and holukas

Assets 2

25 Mar 21:41

holukas

v0.72.0

2b634b8

v0.72.0

v0.72.0 | 25 Mar 2024

New feature

Added new heatmap plotting class HeatmapYearMonth that allows to plot a variable in year/month
classes(diive.core.plotting.heatmap_datetime.HeatmapYearMonth)

Changes

Refactored code for class HeatmapDateTime (diive.core.plotting.heatmap_datetime.HeatmapDateTime)
Added new base class HeatmapBase for heatmap plots. Currently used by HeatmapYearMonth
and HeatmapDateTime (diive.core.plotting.heatmap_base.HeatmapBase)

Notebooks

Added new notebook for HeatmapDateTime (notebooks/Plotting/HeatmapDateTime.ipynb)
Added new notebook for HeatmapYearMonth (notebooks/Plotting/HeatmapYearMonth.ipynb)

Bugfixes

Fixed bug in HeatmapDateTime where the last record of each day was not shown

What's Changed

Heatmap plot update by @holukas in #75
Heatmap plot update by @holukas in #76

Full Changelog: v0.71.6...v0.72.0

Contributors

holukas

Assets 2

Releases: holukas/diive

v0.77.0

v0.77.0 | 11 Jun 2024

Additions

Changes

Environment

Notebooks

Tests

What's Changed

Contributors

v0.76.2

v0.76.2 | 23 May 2024

Additions

Changes

Notebooks

Tests

What's Changed

Contributors

v0.76.1

v0.76.1 | 17 May 2024

Additions

Changes

Notebooks

Tests

What's Changed

Contributors

v0.76.0

v0.76.0 | 14 May 2024

Diel cycle plot

New features

Additions

Notebooks

Tests

What's Changed

Contributors

v0.75.0

v0.75.0 | 26 Apr 2024

XGBoost gap-filling

New features

Additions

Notebooks

Tests

What's Changed

Contributors

v0.74.1

v0.74.1 | 23 Apr 2024

New features

Notebooks

Tests

What's Changed

Contributors

v0.74.0

v0.74.0 | 21 Apr 2024

Additions

Notebooks

Changes

Tests

Bugfixes

What's Changed

Contributors

v0.73.0

v0.73.0 | 17 Apr 2024

New features

Additions

Changes

Removals

Tests

What's Changed

Contributors

v0.72.1

v0.72.1 | 26 Mar 2024

What's Changed

Contributors

v0.72.0

v0.72.0 | 25 Mar 2024

New feature

Changes

Notebooks

Bugfixes

What's Changed