Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 33ac752
Showing
152 changed files
with
14,973 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
.flake8 | ||
.coverage | ||
|
||
*.pyc | ||
*.egg-info | ||
.coverage | ||
docs/build | ||
|
||
version.py | ||
|
||
.vscode | ||
.idea |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# See: https://docs.readthedocs.io/en/latest/yaml-config.html | ||
# .readthedocs.yml | ||
|
||
build: | ||
image: latest | ||
|
||
python: | ||
version: 3.7 | ||
setup_py_install: true | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
============= | ||
Release notes | ||
============= | ||
|
||
Version 0.3.1, March 2020 | ||
------------------------- | ||
|
||
Released the first open-source version op popmon. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
Copyright 2020 ING Wholesale Banking Advanced Analytics | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | ||
documentation files (the "Software"), to deal in the Software without restriction, including without limitation | ||
the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and | ||
to permit persons to whom the Software is furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all copies or substantial portions | ||
of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED | ||
TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL | ||
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF | ||
CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER | ||
DEALINGS IN THE SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
################################################################################################ | ||
# | ||
# NOTICE: pass-through licensing of bundled components | ||
# | ||
# Population Shift Monitoring ("popmon") gathers together a toolkit of pre-existing third-party | ||
# open-source software components. These software components are governed by their own licenses | ||
# which Population Shift Monitoring does not modify or supersede, please consult the originating | ||
# authors. These components altogether have a mixture of the following licenses: Apache 2.0, GNU, | ||
# MIT, BSD2, BSD3 licenses. | ||
# | ||
# Although we have examined the licenses to verify acceptance of commercial and non-commercial | ||
# use, please see and consult the original licenses or authors. | ||
# | ||
# Here is the full list of license dependencies: | ||
# | ||
# numpy: https://github.com/numpy/numpy/blob/master/LICENSE.txt | ||
# scipy: https://github.com/scipy/scipy/blob/master/LICENSE.txt | ||
# pandas: https://github.com/pandas-dev/pandas/blob/master/LICENSE | ||
# histogrammar: https://github.com/histogrammar/histogrammar-python/blob/master/LICENSE | ||
# phik: https://github.com/KaveIO/PhiK/blob/master/LICENSE | ||
# pyyaml: https://github.com/yaml/pyyaml/blob/master/LICENSE | ||
# jinja2: https://github.com/noirbizarre/jinja2/blob/master/LICENSE | ||
# tqdm: https://github.com/tqdm/tqdm/blob/master/LICENCE | ||
# matplotlib: https://github.com/matplotlib/matplotlib/blob/master/LICENSE/LICENSE | ||
# joblib: https://github.com/joblib/joblib/blob/master/LICENSE.txt | ||
# pybase64: https://github.com/mayeut/pybase64/blob/master/LICENSE | ||
# htmlmin: https://github.com/mankyd/htmlmin/blob/master/LICENSE | ||
# eskapade: https://github.com/KaveIO/Eskapade/blob/master/LICENSE | ||
# statsmodels: https://github.com/statsmodels/statsmodels/blob/master/LICENSE.txt | ||
# root: https://root.cern.ch/license | ||
# | ||
# There are several popmon functions/classes where code or techniques have been reproduced and/or modified | ||
# from existing open-source packages. We list these here: | ||
# | ||
# Package: ROOT | ||
# popmon file: popmon/stats/numpy.py | ||
# Functions: uu_chi2(), ks_test(), ks_prob() | ||
# Reference: https://root.cern.ch/doc/master/classTH1.html | ||
# License: GNU | ||
# For details see: https://root.cern.ch/license | ||
# | ||
# Package: StatsModels | ||
# popmon file: popmon/stats/numpy.py | ||
# Functions: mad() | ||
# Reference: https://www.statsmodels.org/dev/_modules/statsmodels/robust/scale.html#mad | ||
# License: | ||
# For details see: https://github.com/statsmodels/statsmodels/blob/master/LICENSE.txt | ||
# | ||
# Package: Eskapade | ||
# popmon file: popmon/visualization/backend.py | ||
# Functions: set_matplotlib_backend(), check_interactive_backend(), in_ipynb() | ||
# Reference: https://github.com/KaveIO/Eskapade-Core/blob/master/python/escore/utils.py | ||
# popmon file: popmon/visualization/utils.py | ||
# Functions: plot_overlay_1d_histogram_b64() | ||
# Reference: https://github.com/KaveIO/Eskapade/blob/master/python/eskapade/visualization/vis_utils.py#L397 | ||
# popmon file: popmon/hist/filling/spark_histogrammar.py | ||
# Class: SparkHistogrammar | ||
# Ref: https://github.com/KaveIO/Eskapade-Spark/blob/master/python/eskapadespark/links/spark_histogrammar_filler.py | ||
# popmon file: popmon/hist/filling/pandas_histogrammar.py | ||
# Class: PandasHistogrammar | ||
# Reference: https://github.com/KaveIO/Eskapade/blob/master/python/eskapade/analysis/links/hist_filler.py | ||
# popmon file: popmon/hist/filling/histogram_filler_base.py | ||
# Class: HistogramFillerBase | ||
# Reference: https://github.com/KaveIO/Eskapade/blob/master/python/eskapade/analysis/histogram_filling.py | ||
# License: Apache v2.0 | ||
# For details see: https://github.com/KaveIO/Eskapade-Core/blob/master/LICENSE | ||
# | ||
################################################################################################ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
=========================== | ||
Population Shift Monitoring | ||
=========================== | ||
|
||
* Version: 0.3.1. Released: April 2020 | ||
* Documentation: https://popmon.readthedocs.io | ||
* Repository: https://github.com/ing-bank/popmon | ||
|
||
.. figure:: https://github.com/ing-bank/popmon/blob/master/docs/source/assets/popmon-logo.png | ||
:width: 300px | ||
:align: center | ||
|
||
`popmon` is a package that allows one to check the stability of a dataset. | ||
`popmon` works with both pandas and spark datasets. | ||
|
||
`popmon` creates histograms of features binned in time-slices, | ||
and compares the stability of the profiles and distributions of | ||
those histograms using statistical tests, both over time and with respect to a reference. | ||
It works with numerical, ordinal, categorical features, and the histograms can be higher-dimensional, | ||
e.g. it can also track correlations between any two features. | ||
`popmon` can automatically flag and alert on changes observed over time, such | ||
as trends, shifts, peaks, outliers, anomalies, changing correlations, etc, | ||
using monitoring business rules. | ||
|
||
Documentation | ||
============= | ||
|
||
The entire `popmon` documentation including tutorials can be found at `read-the-docs <https://popmon.readthedocs.io>`_. | ||
|
||
|
||
Examples | ||
======== | ||
|
||
- `Flight Delays and Cancellations Kaggle data <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_advanced.ipynb>`_ | ||
|
||
Check it out | ||
============ | ||
|
||
The `popmon` library requires Python 3.6 and is pip friendly. To get started, simply do: | ||
|
||
.. code-block:: bash | ||
$ pip install popmon | ||
or check out the code from our GitHub repository: | ||
|
||
.. code-block:: bash | ||
$ git clone https://github.com/ing-bank/popmon.git | ||
$ pip install -e popmon | ||
where in this example the code is installed in edit mode (option -e). | ||
|
||
You can now use the package in Python with: | ||
|
||
.. code-block:: python | ||
import popmon | ||
**Congratulations, you are now ready to use the popmon library!** | ||
|
||
Quick run | ||
========= | ||
|
||
As a quick example, you can do: | ||
|
||
.. code-block:: python | ||
import pandas as pd | ||
import popmon | ||
from popmon import resources | ||
# open fake car insurance data | ||
df = pd.read_csv(resources.data('test.csv.gz')) | ||
df['date'] = pd.to_datetime(df['date']) | ||
df.head() | ||
# generate stability report using automatic binning of all encountered features | ||
report = df.pm_stability_report(time_axis='date') | ||
# to show the output of the report in a Jupyter notebook you can simply run: | ||
report | ||
# or save the report to file and open in a browser | ||
report.to_file("monitoring_report.html") | ||
To specify your own binning specifications and features you want to report on, you do: | ||
|
||
.. code-block:: python | ||
# time-axis specifications alone; all other features are auto-binned. | ||
report = df.pm_stability_report(time_axis='date', time_width='1w', time_offset='2020-1-6') | ||
# histogram selections. Here 'date' is the first axis of each histogram. | ||
features=[ | ||
'date:isActive', 'date:age', 'date:eyeColor', 'date:gender', | ||
'date:latitude', 'date:longitude', 'date:isActive:age' | ||
] | ||
# Specify your own binning specifications for individual features or combinations thereof. | ||
# This bin specification uses open-ended ("sparse") histograms; unspecified features get | ||
# auto-binned. The time-axis binning, when specified here, needs to be in nanoseconds. | ||
bin_specs={ | ||
'longitude': {'bin_width': 5.0, 'bin_offset': 0.0}, | ||
'latitude': {'bin_width': 5.0, 'bin_offset': 0.0}, | ||
'age': {'bin_width': 10.0, 'bin_offset': 0.0}, | ||
'date': {'bin_width': pd.Timedelta('4w').value, | ||
'bin_offset': pd.Timestamp('2015-1-1').value} | ||
} | ||
# generate stability report | ||
report = df.pm_stability_report(features=features, bin_specs=bin_specs) | ||
These examples also works with spark dataframes. | ||
You can see the output of such example notebook code `here <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_advanced.ipynb>`_. | ||
For all available examples, please see the `tutorials <https://popmon.readthedocs.io/en/latest/tutorials.html>`_ at read-the-docs. | ||
|
||
Contact and support | ||
=================== | ||
|
||
* Issues & Ideas & Support: https://github.com/ing-bank/popmon/issues | ||
|
||
Please note that ING WBAA provides support only on a best-effort basis. | ||
|
||
License | ||
======= | ||
`popmon` is completely free, open-source and licensed under the `MIT license <https://en.wikipedia.org/wiki/MIT_License>`_. |
Oops, something went wrong.