Skip to content

Commit

Permalink
Merge pull request #59 from ing-bank/develop
Browse files Browse the repository at this point in the history
v0.3.9
  • Loading branch information
sbrugman committed Sep 29, 2020
2 parents 7bd287a + 23c69ba commit a2ede6b
Show file tree
Hide file tree
Showing 47 changed files with 691 additions and 352 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Expand Up @@ -142,4 +142,7 @@ cython_debug/
docs/build

.vscode
.idea/
.idea/

# Developer's playground
/playground/
10 changes: 4 additions & 6 deletions Makefile
@@ -1,14 +1,12 @@
ifeq ($(check),1)
ISORT_ARG= --check-only
BLACK_ARG= --check
CHECK_ARG= --check
else
ISORT_ARG=
BLACK_ARG=
CHECK_ARG=
endif

lint:
isort $(ISORT_ARG) --project popmon --thirdparty histogrammar --thirdparty pybase64 --multi-line=3 --trailing-comma --force-grid-wrap=0 --use-parentheses --line-width=88 -y
black $(BLACK_ARG) .
isort $(CHECK_ARG) --profile black --project popmon --thirdparty histogrammar --thirdparty pybase64 .
black $(CHECK_ARG) .

install:
pip install -e .
145 changes: 1 addition & 144 deletions docs/source/readme.rst
@@ -1,144 +1 @@
===========================
Population Shift Monitoring
===========================

|build| |docs|

* Version: 0.3.8. Released: July 2020
* Documentation: https://popmon.readthedocs.io
* Repository: https://github.com/ing-bank/popmon
* Authors: ING Wholesale Banking Advanced Analytics

|
|logo|

`popmon` is a package that allows one to check the stability of a dataset.
`popmon` works with both pandas and spark datasets.

`popmon` creates histograms of features binned in time-slices,
and compares the stability of the profiles and distributions of
those histograms using statistical tests, both over time and with respect to a reference.
It works with numerical, ordinal, categorical features, and the histograms can be higher-dimensional,
e.g. it can also track correlations between any two features.
`popmon` can automatically flag and alert on changes observed over time, such
as trends, shifts, peaks, outliers, anomalies, changing correlations, etc,
using monitoring business rules.

Documentation
=============

The entire `popmon` documentation including tutorials can be found at `read-the-docs <https://popmon.readthedocs.io>`_.


Examples
========

- `Flight Delays and Cancellations Kaggle data <https://crclz.com/popmon/reports/flight_delays_report.html>`_
- `Synthetic data (code example below) <https://crclz.com/popmon/reports/test_data_report.html>`_

Check it out
============

The `popmon` library requires Python 3.6+ and is pip friendly. To get started, simply do:

.. code-block:: bash
$ pip install popmon
or check out the code from our GitHub repository:

.. code-block:: bash
$ git clone https://github.com/ing-bank/popmon.git
$ pip install -e popmon
where in this example the code is installed in edit mode (option -e).

You can now use the package in Python with:

.. code-block:: python
import popmon
**Congratulations, you are now ready to use the popmon library!**

Quick run
=========

As a quick example, you can do:

.. code-block:: python
import pandas as pd
import popmon
from popmon import resources
# open synthetic data
df = pd.read_csv(resources.data('test.csv.gz'), parse_dates=['date'])
df.head()
# generate stability report using automatic binning of all encountered features
# (importing popmon automatically adds this functionality to a dataframe)
report = df.pm_stability_report(time_axis='date', features=['date:age', 'date:gender'])
# to show the output of the report in a Jupyter notebook you can simply run:
report
# or save the report to file and open in a browser
report.to_file("monitoring_report.html")
To specify your own binning specifications and features you want to report on, you do:

.. code-block:: python
# time-axis specifications alone; all other features are auto-binned.
report = df.pm_stability_report(time_axis='date', time_width='1w', time_offset='2020-1-6')
# histogram selections. Here 'date' is the first axis of each histogram.
features=[
'date:isActive', 'date:age', 'date:eyeColor', 'date:gender',
'date:latitude', 'date:longitude', 'date:isActive:age'
]
# Specify your own binning specifications for individual features or combinations thereof.
# This bin specification uses open-ended ("sparse") histograms; unspecified features get
# auto-binned. The time-axis binning, when specified here, needs to be in nanoseconds.
bin_specs={
'longitude': {'bin_width': 5.0, 'bin_offset': 0.0},
'latitude': {'bin_width': 5.0, 'bin_offset': 0.0},
'age': {'bin_width': 10.0, 'bin_offset': 0.0},
'date': {'bin_width': pd.Timedelta('4w').value,
'bin_offset': pd.Timestamp('2015-1-1').value}
}
# generate stability report
report = df.pm_stability_report(features=features, bin_specs=bin_specs, time_axis=True)
These examples also work with spark dataframes.
You can see the output of such example notebook code `here <https://crclz.com/popmon/reports/test_data_report.html>`_.
For all available examples, please see the `tutorials <https://popmon.readthedocs.io/en/latest/tutorials.html>`_ at read-the-docs.

Project contributors
====================

Special thanks to the following people who have contributed to the development of this package: `Ahmet Erdem <https://github.com/aerdem4>`_, `Fabian Jansen <https://github.com/faab5>`_, `Nanne Aben <https://github.com/nanne-aben>`_, Mathieu Grimal.

Contact and support
===================

* Issues & Ideas & Support: https://github.com/ing-bank/popmon/issues

Please note that ING WBAA provides support only on a best-effort basis.

License
=======
Copyright ING WBAA. `popmon` is completely free, open-source and licensed under the `MIT license <https://en.wikipedia.org/wiki/MIT_License>`_.

.. |logo| image:: https://raw.githubusercontent.com/ing-bank/popmon/master/docs/source/assets/popmon-logo.png
:alt: POPMON logo
:target: https://github.com/ing-bank/popmon
.. |build| image:: https://github.com/ing-bank/popmon/workflows/build/badge.svg
:alt: Build status
.. |docs| image:: https://readthedocs.org/projects/popmon/badge/?version=latest
:alt: Package docs status
.. include:: ../../README.rst
10 changes: 4 additions & 6 deletions make.bat
Expand Up @@ -4,14 +4,12 @@ setlocal enabledelayedexpansion

IF "%1%" == "lint" (
IF "%2%" == "check" (
SET ISORT_ARG= --check-only
SET BLACK_ARG= --check
SET CHECK_ARG= --check
) ELSE (
set ISORT_ARG=
set BLACK_ARG=
set CHECK_ARG=
)
isort !ISORT_ARG! --project popmon --thirdparty histogrammar --thirdparty pybase64 --multi-line=3 --trailing-comma --force-grid-wrap=0 --use-parentheses --line-width=88 -y
black !BLACK_ARG! .
isort !CHECK_ARG! --profile black --project popmon --thirdparty histogrammar --thirdparty pybase64 .
black !CHECK_ARG! .
GOTO end
)

Expand Down
8 changes: 3 additions & 5 deletions popmon/alerting/compute_tl_bounds.py
Expand Up @@ -67,7 +67,7 @@ def traffic_light_summary(row, cols=None, prefix=""):


def traffic_light(value, red_high, yellow_high, yellow_low=0, red_low=0):
""" Get corresponding traffic light given a value and traffic light bounds.
"""Get corresponding traffic light given a value and traffic light bounds.
:param float value: value to check
:param float red_high: higher bound of red traffic light
Expand Down Expand Up @@ -337,8 +337,7 @@ def df_single_op_pull_bounds(


class DynamicBounds(Pipeline):
""" Calculate dynamic traffic light bounds based on pull thresholds and dynamic mean and std.deviation.
"""
"""Calculate dynamic traffic light bounds based on pull thresholds and dynamic mean and std.deviation."""

def __init__(
self, read_key, rules, store_key="", suffix_mean="_mean", suffix_std="_std"
Expand Down Expand Up @@ -380,8 +379,7 @@ def transform(self, datastore):


class StaticBounds(Pipeline):
""" Calculate static traffic light bounds based on pull thresholds and static mean and std.deviation.
"""
"""Calculate static traffic light bounds based on pull thresholds and static mean and std.deviation."""

def __init__(
self, read_key, rules, store_key="", suffix_mean="_mean", suffix_std="_std"
Expand Down
4 changes: 2 additions & 2 deletions popmon/analysis/apply_func.py
Expand Up @@ -187,7 +187,7 @@ def transform(self, datastore):
def apply_func_array(
feature, metrics, apply_to_df, assign_to_df, apply_funcs, same_key
):
""" Apply list of functions to dataframe
"""Apply list of functions to dataframe
Split off for parallellization reasons
Expand Down Expand Up @@ -231,7 +231,7 @@ def apply_func_array(


def apply_func(feature, selected_metrics, df, arr):
""" Apply function to dataframe
"""Apply function to dataframe
:param str feature: feature currently looping over
:param list selected_metrics: list of selected metrics to apply to
Expand Down
27 changes: 9 additions & 18 deletions popmon/analysis/comparison/hist_comparer.py
Expand Up @@ -135,8 +135,7 @@ def hist_compare(row, hist_name1="", hist_name2="", max_res_bound=7.0):


class HistComparer(Pipeline):
""" Base pipeline to compare histogram to previous rolling histograms
"""
"""Base pipeline to compare histogram to previous rolling histograms"""

def __init__(
self,
Expand Down Expand Up @@ -192,8 +191,7 @@ def __init__(


class RollingHistComparer(HistComparer):
""" Compare histogram to previous rolling histograms
"""
"""Compare histogram to previous rolling histograms"""

def __init__(
self,
Expand Down Expand Up @@ -238,8 +236,7 @@ def transform(self, datastore):


class PreviousHistComparer(RollingHistComparer):
""" Compare histogram to previous histograms
"""
"""Compare histogram to previous histograms"""

def __init__(
self,
Expand All @@ -262,8 +259,7 @@ def __init__(


class ExpandingHistComparer(HistComparer):
""" Compare histogram to previous expanding histograms
"""
"""Compare histogram to previous expanding histograms"""

def __init__(
self,
Expand Down Expand Up @@ -305,8 +301,7 @@ def transform(self, datastore):


class ReferenceHistComparer(HistComparer):
""" Compare histogram to reference histograms
"""
"""Compare histogram to reference histograms"""

def __init__(
self,
Expand Down Expand Up @@ -349,8 +344,7 @@ def transform(self, datastore):


class NormHistComparer(Pipeline):
""" Base pipeline to compare histogram to normalized histograms
"""
"""Base pipeline to compare histogram to normalized histograms"""

def __init__(
self,
Expand Down Expand Up @@ -396,8 +390,7 @@ def __init__(


class RollingNormHistComparer(NormHistComparer):
""" Compare histogram to previous rolling normalized histograms
"""
"""Compare histogram to previous rolling normalized histograms"""

def __init__(self, read_key, store_key, window, shift=1, hist_col="histogram"):
"""Initialize an instance of RollingNormHistComparer.
Expand Down Expand Up @@ -425,8 +418,7 @@ def transform(self, datastore):


class ExpandingNormHistComparer(NormHistComparer):
""" Compare histogram to previous expanding normalized histograms
"""
"""Compare histogram to previous expanding normalized histograms"""

def __init__(self, read_key, store_key, shift=1, hist_col="histogram"):
"""Initialize an instance of ExpandingNormHistComparer.
Expand All @@ -450,8 +442,7 @@ def transform(self, datastore):


class ReferenceNormHistComparer(NormHistComparer):
""" Compare histogram to reference normalized histograms
"""
"""Compare histogram to reference normalized histograms"""

def __init__(self, reference_key, assign_to_key, store_key, hist_col="histogram"):
"""Initialize an instance of ReferenceNormHistComparer.
Expand Down
2 changes: 1 addition & 1 deletion popmon/analysis/functions.py
Expand Up @@ -382,7 +382,7 @@ def expand_norm_hist_mean_cov(df, shift=1, *args, **kwargs):


def normalized_hist_mean_cov(x, hist_name=""):
""" Mean normalized histogram and its covariance of list of input histograms
"""Mean normalized histogram and its covariance of list of input histograms
Usage: df['hists'].apply(normalized_hist_mean_cov) ; series.apply(normalized_hist_mean_cov)
Expand Down
12 changes: 6 additions & 6 deletions popmon/analysis/hist_numpy.py
Expand Up @@ -66,7 +66,7 @@ def prepare_2dgrid(hist):


def set_2dgrid(hist, xkeys, ykeys):
""" Set 2d grid of first two dimenstions of input histogram
"""Set 2d grid of first two dimenstions of input histogram
Used as input by get_2dgrid(hist).
Expand Down Expand Up @@ -116,7 +116,7 @@ def set_2dgrid(hist, xkeys, ykeys):


def get_2dgrid(hist, get_bin_labels=False):
""" Get filled x,y grid of first two dimensions of input histogram
"""Get filled x,y grid of first two dimensions of input histogram
:param hist: input histogrammar histogram
:return: x,y grid of first two dimenstions of input histogram
Expand All @@ -141,7 +141,7 @@ def get_2dgrid(hist, get_bin_labels=False):


def get_consistent_numpy_2dgrids(hc_list=[], get_bin_labels=False):
""" Get list of consistent x,y grids of first two dimensions of (sparse) input histograms
"""Get list of consistent x,y grids of first two dimensions of (sparse) input histograms
:param list hc_list: list of input histogrammar histograms
:param bool get_bin_labels: if true, return x-keys and y-keys describing binnings of 2d-grid.
Expand Down Expand Up @@ -181,7 +181,7 @@ def get_consistent_numpy_2dgrids(hc_list=[], get_bin_labels=False):


def get_consistent_numpy_1dhists(hc_list, get_bin_labels=False):
""" Get list of consistent numpy hists for list of sparse input histograms
"""Get list of consistent numpy hists for list of sparse input histograms
Note: a numpy histogram is a union of lists of bin_edges and number of entries
Expand Down Expand Up @@ -232,7 +232,7 @@ def get_consistent_numpy_1dhists(hc_list, get_bin_labels=False):


def get_consistent_numpy_entries(hc_list, get_bin_labels=False):
""" Get list of consistent numpy bin_entries for list of 1d input histograms
"""Get list of consistent numpy bin_entries for list of 1d input histograms
:param list hist_list: list of input histogrammar histograms
:return: list of consistent 1d numpy arrays with bin_entries for list of input histograms
Expand Down Expand Up @@ -439,7 +439,7 @@ def assert_similar_hists(hc_list, check_type=True, assert_type=used_hist_types):


def check_same_hists(hc1, hc2):
""" Check if two hists are the same
"""Check if two hists are the same
:param hc1: input histogram container 1
:param hc2: input histogram container 2
Expand Down

0 comments on commit a2ede6b

Please sign in to comment.