Skip to content

Commit

Permalink
Configuration update (#91)
Browse files Browse the repository at this point in the history
* Small cleanup of spark histogrammar
* Small update of configuration section of docs
* Update configuration.rst
  • Loading branch information
mbaak committed Feb 4, 2021
1 parent 2823025 commit 4ebca5b
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 8 deletions.
9 changes: 7 additions & 2 deletions docs/source/configuration.rst
Expand Up @@ -12,7 +12,7 @@ Reference types
When generating a report from a DataFrame, the reference type can be set with the option ``reference_type``,
in four different ways:

1. Using the DataFrame on which the stability report is built as a self-reference. This reference method is static: each time slot is compared to all the previous slots in the DataFrame (all included in one distribution). This is the default reference setting.
1. Using the DataFrame on which the stability report is built as a self-reference. This reference method is static: each time slot is compared to all the slots in the DataFrame (all included in one distribution). This is the default reference setting.

.. code-block:: python
Expand Down Expand Up @@ -40,6 +40,10 @@ in four different ways:
# generate stability report with specific monitoring rules
report = df.pm_stability_report(reference_type="expanding", shift=1)
Note that, by default, popmon also performs a rolling comparison of the histograms in each time period with those in the
previous time period. The results of these comparisons contain the term "prev1", and are found in the comparisons section
of a report.


Binning specifications
----------------------
Expand All @@ -53,7 +57,8 @@ To specify the time-axis binning alone, do:
report = df.pm_stability_report(time_axis='date', time_width='1w', time_offset='2020-1-6')
The default time width is 4 weeks ('4w'). All other features (except for 'date') are auto-binned in this example.
The default time width is 30 days ('30d'), with time offset 2010-1-4 (a Monday).
All other features (except for 'date') are auto-binned in this example.

To specify your own binning specifications for individual features or combinations of features, do:

Expand Down
6 changes: 0 additions & 6 deletions popmon/hist/filling/spark_histogrammar.py
Expand Up @@ -7,7 +7,6 @@
"""

import histogrammar as hg
import histogrammar.sparksql
import numpy as np
from tqdm import tqdm

Expand Down Expand Up @@ -189,8 +188,6 @@ def process_features(self, df, cols_by_type):
to_ns = sparkcol(col).cast("timestamp").cast("float") * 1e9
idf = idf.withColumn(col, to_ns)

hg.sparksql.addMethods(idf)

return idf

def construct_empty_hist(self, df, features):
Expand Down Expand Up @@ -218,9 +215,6 @@ def construct_empty_hist(self, df, features):

hist = self.get_hist_bin(hist, features, quant, col, dt)

# set data types in histogram
dta = [self.var_dtype[col] for col in features]
hist.datatype = dta[0] if len(features) == 1 else dta
return hist

def fill_histograms(self, idf):
Expand Down

0 comments on commit 4ebca5b

Please sign in to comment.