Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

32 enhance sphinx bibtext #35

Merged
merged 2 commits into from
Jun 10, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 26 additions & 25 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,31 +49,32 @@ Detector List

Menelaus implements the following drift detectors.

+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
| Type | Detector | Abbreviation | Streaming | Batch | Ref. |
+===================+================================================================+===============+============+========+=========+
| Change detection | Cumulative Sum Test | CUSUM | x | | [C1]_ |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
| Change detection | Page-Hinkley | PH | x | | [C2]_ |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
| Concept drift | ADaptive WINdowing | ADWIN | x | | [C3]_ |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
| Concept drift | Drift Detection Method | DDM | x | | [C4]_ |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
| Concept drift | Early Drift Detection Method | EDDM | x | | [C5]_ |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
| Concept drift | Linear Four Rates | LFR | x | | [C6]_ |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
| Concept drift | Statistical Test of Equal Proportions to Detect concept drift | STEPD | x | | [C7]_ |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
| Data drift | Confidence Distribution Batch Detection | CDBD | | x | [C8]_ |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
| Data drift | Hellinger Distance Drift Detection Method | HDDDM | | x | [C9]_ |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
| Data drift | kdq-Tree Detection Method | kdq-Tree | x | x | [C10]_ |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
| Data drift | PCA-Based Change Detection | PCA-CD | x | | [C11]_ |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+---------+
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+
| Type | Detector | Abbreviation | Streaming | Batch | Ref. |
+===================+================================================================+===============+============+========+==================================+
| Change detection | Cumulative Sum Test | CUSUM | x | | :cite:t:`hinkley1971inference` |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+
| Change detection | Page-Hinkley | PH | x | | :cite:t:`page1954continuous` |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+
| Concept drift | ADaptive WINdowing | ADWIN | x | | :cite:t:`bifet2007learning` |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+
| Concept drift | Drift Detection Method | DDM | x | | :cite:t:`gama2004learning` |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+
| Concept drift | Early Drift Detection Method | EDDM | x | | :cite:t:`baena2006early` |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+
| Concept drift | Linear Four Rates | LFR | x | | :cite:t:`wang2015concept` |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+
| Concept drift | Statistical Test of Equal Proportions to Detect concept drift | STEPD | x | | :cite:t:`nishida2007detecting` |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+
| Data drift | Confidence Distribution Batch Detection | CDBD | | x | :cite:t:`lindstrom2013drift` |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+
| Data drift | Hellinger Distance Drift Detection Method | HDDDM | | x | :cite:t:`ditzler2011hellinger` |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+
| Data drift | kdq-Tree Detection Method | kdq-Tree | x | x | :cite:t:`dasu2006information` |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+
| Data drift | PCA-Based Change Detection | PCA-CD | x | | :cite:t:`qahtan2015pca` |
+-------------------+----------------------------------------------------------------+---------------+------------+--------+----------------------------------+



The three main types of detector are described below. More details can be found
Expand Down
4 changes: 4 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinxcontrib.bibtex",
]

autodoc_default_options = {
Expand All @@ -42,6 +43,9 @@
"special-members": "__init__",
}

bibtex_bibfiles = ["refs.bib"]
bibtex_reference_style = "author_year"

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

Expand Down
12 changes: 1 addition & 11 deletions docs/source/references.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,4 @@

References
==================================
.. [C1] Hinkley, David V. "Inference about the change-point from cumulative sum tests." Biometrika 58.3 (1971): 509-523.
.. [C2] Page, Ewan S. "Continuous inspection schemes." Biometrika 41.1/2 (1954): 100-115.
.. [C3] Bifet, Albert, and Ricard Gavalda. "Learning from time-changing data with adaptive windowing." Proceedings of the 2007 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, 2007.
.. [C4] Gama, Joao, et al. "Learning with drift detection." Brazilian symposium on artificial intelligence. Springer, Berlin, Heidelberg, 2004.
.. [C5] Baena-Garcıa, Manuel, et al. "Early drift detection method." Fourth international workshop on knowledge discovery from data streams. Vol. 6. 2006.
.. [C6] Wang, Heng, and Zubin Abraham. "Concept drift detection for streaming data." 2015 international joint conference on neural networks (IJCNN). IEEE, 2015.
.. [C7] Nishida, Kyosuke, and Koichiro Yamauchi. "Detecting concept drift using statistical testing." International conference on discovery science. Springer, Berlin, Heidelberg, 2007.
.. [C8] Lindstrom, Patrick, Brian Mac Namee, and Sarah Jane Delany. "Drift detection using uncertainty distribution divergence." Evolving Systems 4.1 (2013): 13-25.
.. [C9] Ditzler, Gregory, and Robi Polikar. "Hellinger distance based drift detection for nonstationary environments." 2011 IEEE symposium on computational intelligence in dynamic and uncertain environments (CIDUE). IEEE, 2011.
.. [C10] Dasu, Tamraparni, et al. "An information-theoretic approach to detecting changes in multi-dimensional data streams." In Proc. Symp. on the Interface of Statistics, Computing Science, and Applications. 2006.
.. [C11] Qahtan, Abdulhakim A., et al. "A pca-based change detection framework for multidimensional data streams: Change detection in multidimensional data streams." Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015.
.. bibliography::
102 changes: 102 additions & 0 deletions docs/source/refs.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
@article{hinkley1971inference,
title={Inference about the change-point from cumulative sum tests},
author={Hinkley, David V},
journal={Biometrika},
volume={58},
number={3},
pages={509--523},
year={1971},
publisher={Oxford University Press}
}

@article{page1954continuous,
title={Continuous inspection schemes},
author={Page, Ewan S},
journal={Biometrika},
volume={41},
number={1/2},
pages={100--115},
year={1954},
publisher={JSTOR}
}

@inproceedings{bifet2007learning,
title={Learning from time-changing data with adaptive windowing},
author={Bifet, Albert and Gavalda, Ricard},
booktitle={Proceedings of the 2007 SIAM international conference on data mining},
pages={443--448},
year={2007},
organization={SIAM}
}

@inproceedings{gama2004learning,
title={Learning with drift detection},
author={Gama, Joao and Medas, Pedro and Castillo, Gladys and Rodrigues, Pedro},
booktitle={Brazilian symposium on artificial intelligence},
pages={286--295},
year={2004},
organization={Springer}
}

@inproceedings{baena2006early,
title={Early drift detection method},
author={Baena-Garc{\i}a, Manuel and del Campo-{\'A}vila, Jos{\'e} and Fidalgo, Ra{\'u}l and Bifet, Albert and Gavalda, R and Morales-Bueno, Rafael},
booktitle={Fourth international workshop on knowledge discovery from data streams},
volume={6},
pages={77--86},
year={2006}
}

@inproceedings{wang2015concept,
title={Concept drift detection for streaming data},
author={Wang, Heng and Abraham, Zubin},
booktitle={2015 international joint conference on neural networks (IJCNN)},
pages={1--9},
year={2015},
organization={IEEE}
}

@inproceedings{nishida2007detecting,
title={Detecting concept drift using statistical testing},
author={Nishida, Kyosuke and Yamauchi, Koichiro},
booktitle={International conference on discovery science},
pages={264--269},
year={2007},
organization={Springer}
}

@article{lindstrom2013drift,
title={Drift detection using uncertainty distribution divergence},
author={Lindstrom, Patrick and Mac Namee, Brian and Delany, Sarah Jane},
journal={Evolving Systems},
volume={4},
number={1},
pages={13--25},
year={2013},
publisher={Springer}
}

@inproceedings{ditzler2011hellinger,
title={Hellinger distance based drift detection for nonstationary environments},
author={Ditzler, Gregory and Polikar, Robi},
booktitle={2011 IEEE symposium on computational intelligence in dynamic and uncertain environments (CIDUE)},
pages={41--48},
year={2011},
organization={IEEE}
}

@inproceedings{dasu2006information,
title={An information-theoretic approach to detecting changes in multi-dimensional data streams},
author={Dasu, Tamraparni and Krishnan, Shankar and Venkatasubramanian, Suresh and Yi, Ke},
booktitle={In Proc. Symp. on the Interface of Statistics, Computing Science, and Applications},
year={2006},
organization={Citeseer}
}

@inproceedings{qahtan2015pca,
title={A pca-based change detection framework for multidimensional data streams: Change detection in multidimensional data streams},
author={Qahtan, Abdulhakim A and Alharbi, Basma and Wang, Suojin and Zhang, Xiangliang},
booktitle={Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
pages={935--944},
year={2015}
}
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ dev =
sphinx
sphinx-rtd-theme
sphinx-autoapi
sphinxcontrib-bibtex

[options.packages.find]
where=src
Expand Down
2 changes: 1 addition & 1 deletion src/menelaus/change_detection/cusum.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ class CUSUM(DriftDetector):
single model performance metric, or could be applied to the mean of a
feature variable of interest.

Ref. [C1]_
Ref. :cite:t:`hinkley1971inference`

Attributes:
total_updates (int): number of samples the drift detector has ever
Expand Down
2 changes: 1 addition & 1 deletion src/menelaus/change_detection/page_hinkley.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ class PageHinkley(DriftDetector):
If the threshold is too small, PH may result in many false alarms. If too
large, the PH test will be more robust, but may miss true drift.

Ref. [C2]_
Ref. :cite:t:`page1954continuous`

Attributes:
total_updates (int): number of samples the drift detector has ever
Expand Down
2 changes: 1 addition & 1 deletion src/menelaus/concept_drift/adwin.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class ADWIN(DriftDetector):
When drift occurs, the index of the element at the beginning of ADWIN's new
window is stored in ``self.retraining_recs``.

Ref. [C3]_
Ref. :cite:t:`bifet2007learning`

Attributes:
total_updates (int): number of samples the drift detector has ever
Expand Down
2 changes: 1 addition & 1 deletion src/menelaus/concept_drift/ddm.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ class DDM(DriftDetector):
The index of the first sample which triggered a warning/drift state
(relative to ``self.updates_since_reset``) is stored in ``self.retraining_recs``.

Ref. [C4]_
Ref. :cite:t:`gama2004learning`

Attributes:
total_updates (int): number of samples the drift detector has ever
Expand Down
2 changes: 1 addition & 1 deletion src/menelaus/concept_drift/eddm.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ class EDDM(DriftDetector):
The index of the first sample which triggered a warning/drift state
(relative to ``self.updates_since_reset``) is stored in ``self.retraining_recs``.

Ref. [C5]_
Ref. :cite:t:`baena2006early`

Attributes:
total_updates (int): number of samples the drift detector has ever
Expand Down
2 changes: 1 addition & 1 deletion src/menelaus/concept_drift/lfr.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ class LinearFourRates(DriftDetector):
of number of time steps and estimated empirical rate, if a given combination
has been simulated before, the bounds are re-used.

Ref. [C6]_
Ref. :cite:t:`wang2015concept`

Attributes:
total_updates (int): number of samples the drift detector has ever
Expand Down
2 changes: 1 addition & 1 deletion src/menelaus/concept_drift/stepd.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ class STEPD(DriftDetector):
using only a single data point vs. being required to retrain on the entire
set.

Ref. [C7]_
Ref. :cite:t:`nishida2007detecting`

Attributes:
total_updates (int): number of samples the drift detector has ever
Expand Down
2 changes: 1 addition & 1 deletion src/menelaus/data_drift/cdbd.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ class CDBD(HistogramDensityMethod):
drift is not detected. The reference batch is updated to include this
most recent test batch. All statistics are maintained.

Ref. [C8]_
Ref. :cite:t:`lindstrom2013drift`

Attributes:
total_updates (int): number of batches the drift detector has ever
Expand Down
2 changes: 1 addition & 1 deletion src/menelaus/data_drift/hdddm.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ class HDDDM(HistogramDensityMethod):
calculated using the first test batch, allowing for detection of
drift on this batch.

Ref. [C9]_
Ref. :cite:t:`ditzler2011hellinger`

Attributes:
total_updates (int): number of batches the drift detector has ever
Expand Down
2 changes: 1 addition & 1 deletion src/menelaus/data_drift/histogram_density_method.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ class HistogramDensityMethod(DriftDetector):
and samples since reset will be number of batches passed to HDM
plus 1, due to splitting of reference batch

Ref. [C8]_ and [C9]_
Ref. :cite:t:`lindstrom2013drift` and :cite:t:`ditzler2011hellinger`

Attributes:
total_updates (int): number of batches the drift detector has ever
Expand Down
16 changes: 8 additions & 8 deletions src/menelaus/data_drift/kdq_tree.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ class KdqTree(DriftDetector):
Note that the current implementation does not explicitly handle categorical
data.

Ref. [C10]_
Ref. :cite:t:`dasu2006information`


Attributes:
Expand Down Expand Up @@ -299,18 +299,18 @@ def to_plotly_dataframe(self, tree_id1="build", tree_id2="test", max_depth=None)

* ``name``: a label corresponding to which feature this split is on
* ``idx``: a unique ID for the node, to pass
``plotly.express.treemap``'s id argument
``plotly.express.treemap``'s id argument
* ``parent_idx``: the ID of the node's parent
* ``cell_count``: how many samples are in this node in the
reference tree.
reference tree.
* ``depth``: how deep the node is in the tree
* ``count_diff``: if ``tree_id2`` is specified, the change in
counts from the reference tree.
counts from the reference tree.
* ``kss``: the Kulldorff Spatial Scan Statistic for this node,
defined as the Kullback-Leibler divergence for this node
between the reference and test trees, using the individual
node and all other nodes combined as the bins for the
distributions.
defined as the Kullback-Leibler divergence for this node
between the reference and test trees, using the individual
node and all other nodes combined as the bins for the
distributions.
"""

return self._kdqtree.to_plotly_dataframe(tree_id1, tree_id2, max_depth)
4 changes: 2 additions & 2 deletions src/menelaus/data_drift/pca_cd.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ class PCACD(DriftDetector):
Once drift is detected, the reference window is replaced with the current
test window, and the test window is initialized.

Ref. [C11]_
Ref. :cite:t:`qahtan2015pca`

Attributes:
total_updates (int): number of samples the drift detector has ever
Expand Down Expand Up @@ -56,7 +56,7 @@ def __init__(
"""
Args:
window_size (int): size of the reference window. Note that
``PCA_CD``will only try to detect drift periodically, either
``PCA_CD`` will only try to detect drift periodically, either
every 100 observations or 5% of the ``window_size``, whichever
is smaller.
ev_threshold (float, optional): Threshold for percent explained
Expand Down