Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
tomcis committed Apr 27, 2020
0 parents commit 33ac752
Show file tree
Hide file tree
Showing 152 changed files with 14,973 additions and 0 deletions.
12 changes: 12 additions & 0 deletions .gitignore
@@ -0,0 +1,12 @@
.flake8
.coverage

*.pyc
*.egg-info
.coverage
docs/build

version.py

.vscode
.idea
10 changes: 10 additions & 0 deletions .readthedocs.yml
@@ -0,0 +1,10 @@
# See: https://docs.readthedocs.io/en/latest/yaml-config.html
# .readthedocs.yml

build:
image: latest

python:
version: 3.7
setup_py_install: true

8 changes: 8 additions & 0 deletions CHANGES.rst
@@ -0,0 +1,8 @@
=============
Release notes
=============

Version 0.3.1, March 2020
-------------------------

Released the first open-source version op popmon.
15 changes: 15 additions & 0 deletions LICENSE
@@ -0,0 +1,15 @@
Copyright 2020 ING Wholesale Banking Advanced Analytics

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
documentation files (the "Software"), to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions
of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
68 changes: 68 additions & 0 deletions NOTICE
@@ -0,0 +1,68 @@
################################################################################################
#
# NOTICE: pass-through licensing of bundled components
#
# Population Shift Monitoring ("popmon") gathers together a toolkit of pre-existing third-party
# open-source software components. These software components are governed by their own licenses
# which Population Shift Monitoring does not modify or supersede, please consult the originating
# authors. These components altogether have a mixture of the following licenses: Apache 2.0, GNU,
# MIT, BSD2, BSD3 licenses.
#
# Although we have examined the licenses to verify acceptance of commercial and non-commercial
# use, please see and consult the original licenses or authors.
#
# Here is the full list of license dependencies:
#
# numpy: https://github.com/numpy/numpy/blob/master/LICENSE.txt
# scipy: https://github.com/scipy/scipy/blob/master/LICENSE.txt
# pandas: https://github.com/pandas-dev/pandas/blob/master/LICENSE
# histogrammar: https://github.com/histogrammar/histogrammar-python/blob/master/LICENSE
# phik: https://github.com/KaveIO/PhiK/blob/master/LICENSE
# pyyaml: https://github.com/yaml/pyyaml/blob/master/LICENSE
# jinja2: https://github.com/noirbizarre/jinja2/blob/master/LICENSE
# tqdm: https://github.com/tqdm/tqdm/blob/master/LICENCE
# matplotlib: https://github.com/matplotlib/matplotlib/blob/master/LICENSE/LICENSE
# joblib: https://github.com/joblib/joblib/blob/master/LICENSE.txt
# pybase64: https://github.com/mayeut/pybase64/blob/master/LICENSE
# htmlmin: https://github.com/mankyd/htmlmin/blob/master/LICENSE
# eskapade: https://github.com/KaveIO/Eskapade/blob/master/LICENSE
# statsmodels: https://github.com/statsmodels/statsmodels/blob/master/LICENSE.txt
# root: https://root.cern.ch/license
#
# There are several popmon functions/classes where code or techniques have been reproduced and/or modified
# from existing open-source packages. We list these here:
#
# Package: ROOT
# popmon file: popmon/stats/numpy.py
# Functions: uu_chi2(), ks_test(), ks_prob()
# Reference: https://root.cern.ch/doc/master/classTH1.html
# License: GNU
# For details see: https://root.cern.ch/license
#
# Package: StatsModels
# popmon file: popmon/stats/numpy.py
# Functions: mad()
# Reference: https://www.statsmodels.org/dev/_modules/statsmodels/robust/scale.html#mad
# License:
# For details see: https://github.com/statsmodels/statsmodels/blob/master/LICENSE.txt
#
# Package: Eskapade
# popmon file: popmon/visualization/backend.py
# Functions: set_matplotlib_backend(), check_interactive_backend(), in_ipynb()
# Reference: https://github.com/KaveIO/Eskapade-Core/blob/master/python/escore/utils.py
# popmon file: popmon/visualization/utils.py
# Functions: plot_overlay_1d_histogram_b64()
# Reference: https://github.com/KaveIO/Eskapade/blob/master/python/eskapade/visualization/vis_utils.py#L397
# popmon file: popmon/hist/filling/spark_histogrammar.py
# Class: SparkHistogrammar
# Ref: https://github.com/KaveIO/Eskapade-Spark/blob/master/python/eskapadespark/links/spark_histogrammar_filler.py
# popmon file: popmon/hist/filling/pandas_histogrammar.py
# Class: PandasHistogrammar
# Reference: https://github.com/KaveIO/Eskapade/blob/master/python/eskapade/analysis/links/hist_filler.py
# popmon file: popmon/hist/filling/histogram_filler_base.py
# Class: HistogramFillerBase
# Reference: https://github.com/KaveIO/Eskapade/blob/master/python/eskapade/analysis/histogram_filling.py
# License: Apache v2.0
# For details see: https://github.com/KaveIO/Eskapade-Core/blob/master/LICENSE
#
################################################################################################
127 changes: 127 additions & 0 deletions README.rst
@@ -0,0 +1,127 @@
===========================
Population Shift Monitoring
===========================

* Version: 0.3.1. Released: April 2020
* Documentation: https://popmon.readthedocs.io
* Repository: https://github.com/ing-bank/popmon

.. figure:: https://github.com/ing-bank/popmon/blob/master/docs/source/assets/popmon-logo.png
:width: 300px
:align: center

`popmon` is a package that allows one to check the stability of a dataset.
`popmon` works with both pandas and spark datasets.

`popmon` creates histograms of features binned in time-slices,
and compares the stability of the profiles and distributions of
those histograms using statistical tests, both over time and with respect to a reference.
It works with numerical, ordinal, categorical features, and the histograms can be higher-dimensional,
e.g. it can also track correlations between any two features.
`popmon` can automatically flag and alert on changes observed over time, such
as trends, shifts, peaks, outliers, anomalies, changing correlations, etc,
using monitoring business rules.

Documentation
=============

The entire `popmon` documentation including tutorials can be found at `read-the-docs <https://popmon.readthedocs.io>`_.


Examples
========

- `Flight Delays and Cancellations Kaggle data <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_advanced.ipynb>`_

Check it out
============

The `popmon` library requires Python 3.6 and is pip friendly. To get started, simply do:

.. code-block:: bash
$ pip install popmon
or check out the code from our GitHub repository:

.. code-block:: bash
$ git clone https://github.com/ing-bank/popmon.git
$ pip install -e popmon
where in this example the code is installed in edit mode (option -e).

You can now use the package in Python with:

.. code-block:: python
import popmon
**Congratulations, you are now ready to use the popmon library!**

Quick run
=========

As a quick example, you can do:

.. code-block:: python
import pandas as pd
import popmon
from popmon import resources
# open fake car insurance data
df = pd.read_csv(resources.data('test.csv.gz'))
df['date'] = pd.to_datetime(df['date'])
df.head()
# generate stability report using automatic binning of all encountered features
report = df.pm_stability_report(time_axis='date')
# to show the output of the report in a Jupyter notebook you can simply run:
report
# or save the report to file and open in a browser
report.to_file("monitoring_report.html")
To specify your own binning specifications and features you want to report on, you do:

.. code-block:: python
# time-axis specifications alone; all other features are auto-binned.
report = df.pm_stability_report(time_axis='date', time_width='1w', time_offset='2020-1-6')
# histogram selections. Here 'date' is the first axis of each histogram.
features=[
'date:isActive', 'date:age', 'date:eyeColor', 'date:gender',
'date:latitude', 'date:longitude', 'date:isActive:age'
]
# Specify your own binning specifications for individual features or combinations thereof.
# This bin specification uses open-ended ("sparse") histograms; unspecified features get
# auto-binned. The time-axis binning, when specified here, needs to be in nanoseconds.
bin_specs={
'longitude': {'bin_width': 5.0, 'bin_offset': 0.0},
'latitude': {'bin_width': 5.0, 'bin_offset': 0.0},
'age': {'bin_width': 10.0, 'bin_offset': 0.0},
'date': {'bin_width': pd.Timedelta('4w').value,
'bin_offset': pd.Timestamp('2015-1-1').value}
}
# generate stability report
report = df.pm_stability_report(features=features, bin_specs=bin_specs)
These examples also works with spark dataframes.
You can see the output of such example notebook code `here <https://nbviewer.jupyter.org/github/ing-bank/popmon/blob/master/popmon/notebooks/popmon_tutorial_advanced.ipynb>`_.
For all available examples, please see the `tutorials <https://popmon.readthedocs.io/en/latest/tutorials.html>`_ at read-the-docs.

Contact and support
===================

* Issues & Ideas & Support: https://github.com/ing-bank/popmon/issues

Please note that ING WBAA provides support only on a best-effort basis.

License
=======
`popmon` is completely free, open-source and licensed under the `MIT license <https://en.wikipedia.org/wiki/MIT_License>`_.

0 comments on commit 33ac752

Please sign in to comment.