Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
34 changed files
with
3,255 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,153 @@ | ||
.. image:: https://img.shields.io/pypi/v/estream.svg | ||
:target: https://pypi.python.org/pypi/estream | ||
:alt: PyPI Version | ||
.. image:: https://img.shields.io/pypi/l/estream.svg | ||
:target: https://github.com/mickeycj/estream/blob/master/LICENSE | ||
:alt: License | ||
.. image:: https://travis-ci.org/mickeycj/estream.svg | ||
:target: https://travis-ci.org/mickeycj/estream | ||
:alt: Travis CI Build Status | ||
|
||
==================================== | ||
An E-Stream implementation in Python | ||
===================================== | ||
==================================== | ||
|
||
E-Stream is an evolution-based technique for stream clustering which supports | ||
five behaviors: | ||
|
||
1. Appearance | ||
2. Disappearance | ||
3. Self-evolution | ||
4. Merge | ||
5. Split | ||
|
||
These behaviors are achieved by representing each cluster as a *Fading Cluster | ||
Structure with Histogram (FCH)*, utilizing a histogram for each feature of the | ||
data. | ||
|
||
The details for the underlying concepts can be found `here <https://www.researchgate.net/publication/221571035_E-Stream_Evolution-Based_Technique_for_Stream_Clustering>`_: | ||
|
||
Udommanetanakit, K, Rakthanmanon, T, Waiyamai, K, *E-Stream: Evolution-Based | ||
Technique for Stream Clustering*, Advanced Data Mining and Applications: Third | ||
International Conference, 2007 | ||
|
||
------------------- | ||
How to use E-Stream | ||
------------------- | ||
|
||
The ``estream`` package aims to be substibutable with ``sklearn`` classes so it | ||
can be used interchangably with other transformers with similar API. | ||
|
||
.. code-block:: python | ||
from estream import EStream | ||
from sklearn.datasets.samples_generator import make_blobs | ||
estream = EStream() | ||
data, _ = make_blobs() | ||
estream.fit(data) | ||
E-Stream contains a number of parameters that can be set; the major ones are as | ||
follows: | ||
|
||
- ``max_clusters``: This limits the number of clusters the clustering can have | ||
before the existing clusters have to be merged. The default is set to | ||
*10*. | ||
- ``stream_speed/decay_rate``: These determine the fading factor of the | ||
clusters. In this implementation, the fading function is constant derived | ||
from the default values of *10* and *0.1*, respectively. | ||
- ``remove_threshold``: This sets the lower bound for each cluster's weight | ||
before they are considered to be removed. The default is set to *0.1*. | ||
- ``merge_threshold``: This determines whether two close clusters can be merged | ||
togther. The default is set to *1.25*. | ||
- ``radius_threshold``: This determines the minimum range from an existing | ||
cluster that a new data must be in order to be merged into one. The default | ||
is set to *3.0*. | ||
- ``active_threshold``: This sets the minimum weight of each cluster before | ||
they are considered active. The default is set to *5.0*. | ||
|
||
An example of setting these parameters: | ||
|
||
.. code-block:: python | ||
from estream import EStream | ||
from sklearn.datasets.samples_generator import make_blobs | ||
estream = EStream(max_clusters=5, | ||
merge_threshold=0.5, | ||
radius_threshold=1.5, | ||
active_threshold=3.0) | ||
data, _ = make_blobs() | ||
estream.fit(data) | ||
------------ | ||
Installation | ||
------------ | ||
|
||
Currently, the package is only available through either ``PyPI``: | ||
|
||
.. code-block:: bash | ||
pip install estream | ||
or a manual install: | ||
|
||
.. code-block:: bash | ||
wget https://github.com/mickeycj/estream/archive/master.zip | ||
unzip master.zip | ||
rm master.zip | ||
cd estream-master | ||
python setup.py install | ||
-------------- | ||
Help & Support | ||
-------------- | ||
|
||
Currently, there is no dedicated documentation available, so any questions or | ||
issues can be asked via my `email <chanonjenakom@gmail.com>`_. | ||
|
||
-------- | ||
Citation | ||
-------- | ||
|
||
If you make use of this software for your work, please cite the paper from the | ||
Advanced Data Mining and Applications: Third International Conference: | ||
|
||
.. code-block:: bibtex | ||
@inproceedings{inproceedings, | ||
author = {Udommanetanakit, Komkrit, and Rakthanmanon, Thanawin and Waiyamai, Kitsana}, | ||
year = {2007}, | ||
month = {08}, | ||
pages = {605-615}, | ||
title = {E-Stream: Evolution-Based Technique for Stream Clustering}, | ||
volume = {4632}, | ||
doi = {10.1007/978-3-540-73871} | ||
} | ||
Moreover, this implementation is based on a MOA implementaion of E-Stream (and | ||
other related algorithms) by `David Ratier <https://gitub.com/ratierd>`_. The | ||
original source code can be found in this `repository <https://gitub.com/ratierd/MOA>`_. | ||
|
||
------- | ||
License | ||
------- | ||
|
||
The ``estream`` package is under the GNU General Public License. | ||
|
||
------------ | ||
Contributing | ||
------------ | ||
|
||
Contributions are always welcome! Everything ranging from code to notebooks and | ||
examples/documentation will be very valuable to the growth of this project. To | ||
contribute, please `fork this project <https://github.com/mickeycj/estream/issues#fork-destination-box>`_ | ||
, make your changes and submit a pull request. I will do my best to fix any | ||
issues and merge your code into the main branch. | ||
|
||
:Author: Chanon Jenakom | ||
:Version: $Revision: 0.0.1 $ | ||
:Version: 0.0.3 | ||
:Dedicated: To DAKDL, Kasetsart University |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from estream.estream import EStream | ||
from estream.fading_cluster import FadingCluster | ||
from estream.histogram import Histogram |
Oops, something went wrong.