Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram error on large floats #244

Closed
twalen opened this issue Aug 12, 2022 · 1 comment
Closed

Histogram error on large floats #244

twalen opened this issue Aug 12, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@twalen
Copy link

twalen commented Aug 12, 2022

Running pm_stability_error on float columns with large values triggers (in some cases) Assertion Error.

For example running following code:

import pandas as pd
import numpy as np
import popmon

np.random.seed(1)
n = 1000
start_date = pd.to_datetime("2022-01-01")
example = pd.DataFrame({
    "dt": [start_date + pd.DateOffset(i//100) for i in range(n)], 
    "a": (np.random.rand(n) - 0.5) * 10**4
})
example.loc[len(example)//2, 'a'] *= 10**4
example.pm_stability_report(time_axis="dt", time_width="1w")

Gives following output:

% python popmon_bug.py
.../.virtualenvs/random/lib/python3.7/site-packages/histogrammar/dfinterface/make_histograms.py:172: UserWarning: time-axis "dt" already found in binning specifications. not overwriting.
  f'time-axis "{time_axis}" already found in binning specifications. not overwriting.'
2022-08-12 14:14:19,649 INFO [histogram_filler_base]: Filling 1 specified histograms. auto-binning.
100%|████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 463.15it/s]
2022-08-12 14:14:19,652 INFO [hist_splitter]: Splitting histograms "hists" as "split_hists"
2022-08-12 14:14:19,654 INFO [hist_comparer]: Comparing "split_hists" with rolling sum of 1 previous histogram(s).
2022-08-12 14:14:19,666 INFO [hist_profiler]: Profiling histograms "split_hists" as "profiles"
2022-08-12 14:14:19,692 INFO [hist_comparer]: Comparing "split_hists" with reference "split_hists"
2022-08-12 14:14:19,702 INFO [pull_calculator]: Comparing "comparisons" with median/mad of reference "comparisons"
2022-08-12 14:14:19,713 INFO [pull_calculator]: Comparing "profiles" with median/mad of reference "profiles"
2022-08-12 14:14:19,749 INFO [apply_func]: Computing significance of (rolling) trend in means of features
2022-08-12 14:14:19,752 INFO [compute_tl_bounds]: Calculating static bounds for "profiles"
2022-08-12 14:14:19,795 INFO [compute_tl_bounds]: Calculating static bounds for "comparisons"
2022-08-12 14:14:19,806 INFO [compute_tl_bounds]: Calculating traffic light alerts for "profiles"
2022-08-12 14:14:19,819 INFO [compute_tl_bounds]: Calculating traffic light alerts for "comparisons"
2022-08-12 14:14:19,825 INFO [apply_func]: Generating traffic light alerts summary.
2022-08-12 14:14:19,828 INFO [alerts_summary]: Combining alerts into artificial variable "_AGGREGATE_"
2022-08-12 14:14:19,831 INFO [report_pipelines]: Generating report "html_report".
2022-08-12 14:14:19,831 INFO [overview_section]: Generating section "Overview". skip empty plots: True
100%|████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 276.10it/s]
2022-08-12 14:14:19,842 INFO [histogram_section]: Generating section "Histograms".
  0%|                                                                         | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "popmon_bug.py", line 13, in <module>
    example.pm_stability_report(time_axis="dt", time_width="1w")
  File ".../python3.7/site-packages/popmon/pipeline/report.py", line 196, in df_stability_report
    reference=reference_hists,
  File ".../python3.7/site-packages/popmon/pipeline/report.py", line 71, in stability_report
    result = pipeline.transform(datastore)
  File ".../python3.7/site-packages/popmon/base/pipeline.py", line 69, in transform
    datastore = module.transform(datastore)
  File ".../python3.7/site-packages/popmon/pipeline/report_pipelines.py", line 250, in transform
    return super().transform(datastore)
  File ".../python3.7/site-packages/popmon/base/pipeline.py", line 69, in transform
    datastore = module.transform(datastore)
  File ".../python3.7/site-packages/popmon/base/module.py", line 50, in _transform
    outputs = func(self, *list(inputs.values()))
  File ".../python3.7/site-packages/popmon/visualization/histogram_section.py", line 141, in transform
    plots = parallel(_plot_histograms, args)
  File ".../python3.7/site-packages/popmon/utils.py", line 52, in parallel
    func(*args) if mode == "args" else func(**args) for args in args_list
  File ".../python3.7/site-packages/popmon/utils.py", line 52, in <listcomp>
    func(*args) if mode == "args" else func(**args) for args in args_list
  File ".../python3.7/site-packages/popmon/visualization/histogram_section.py", line 247, in _plot_histograms
    hists, feature, hist_names, y_label, is_num, is_ts
  File ".../python3.7/site-packages/popmon/visualization/utils.py", line 297, in plot_histogram_overlay
    len(bin_edges), len(bin_values), x_label
AssertionError: bin edges (+ upper edge) and bin values have inconsistent lengths: 43 vs 41. a
@twalen
Copy link
Author

twalen commented Aug 12, 2022

It seems that this might be an issues with Histogrammar.
From debugging it looks like in SparselyBin in some cases len(hist.bin_edges(low, high)) > len(hist.bin_entries(low, high))+1

https://github.com/histogrammar/histogrammar-python/blob/master/histogrammar/primitives/sparselybin.py#L717

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants