Skip to content

Commit

Permalink
Merge pull request #36 from gmrukwa/develop
Browse files Browse the repository at this point in the history
Release v2.3.14

- fix seeding issues in GAP
- add  CLI for Dunn's index version of DiviK
- more permissive tests - 1 cluster of misidentification allowed
- compatibility layer for gin-config
  • Loading branch information
gmrukwa committed Jan 12, 2020
2 parents ce2ceab + 9f05af6 commit cdcb821
Show file tree
Hide file tree
Showing 20 changed files with 489 additions and 32 deletions.
2 changes: 1 addition & 1 deletion .bettercodehub.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
component_depth: 3
component_depth: 2
languages:
- python
2 changes: 1 addition & 1 deletion .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ on:
env:
MAJOR: ${{ 2 }}
MINOR: ${{ 3 }}
FIXUP: ${{ 13 }}
FIXUP: ${{ 14 }}
PACKAGE_INIT_FILE: ${{ 'divik/__init__.py' }}
DOCKER_REPO: ${{ 'gmrukwa/divik' }}
IS_ALPHA: ${{ github.event_name == 'pull_request' }}
Expand Down
19 changes: 15 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ Python implementation of Divisive iK-means (DiviK) algorithm.

> This section will be further developed soon.
1) [`divik`](divik/_cli/divik.md) - runs DiviK in one of many scenarios
2) [`kmeans`](divik/_cli/auto_kmeans.md) - runs K-means
1) [`divik`](divik/_cli/divik.md) - runs DiviK in GAP-only scenario
2) [`dunn-divik`](dunn-divik/_cli/dunn-divik.md) - runs DiviK in GAP & Dunn scenario
2) [`kmeans`](divik/_cli/auto_kmeans.md) - runs K-means with GAP statistic
3) `linkage` - runs agglomerative clustering
4) [`inspect`](divik/_cli/inspect.md) - visualizes DiviK result
5) `visualize` - generates `.png` file with visualization of clusters for 2D
Expand All @@ -39,7 +40,7 @@ docker pull gmrukwa/divik
To install specific version, you can specify it in the command, e.g.:

```bash
docker pull gmrukwa/divik:2.3.13
docker pull gmrukwa/divik:2.3.14
```

## Python package
Expand All @@ -59,9 +60,19 @@ pip install divik
or any stable tagged version, e.g.:

```bash
pip install divik==2.3.13
pip install divik==2.3.14
```

If you want to have compatibility with
[`gin-config`](https://github.com/google/gin-config), you can install
necessary extras with:

```bash
pip install divik[gin]
```

**Note:** Remember about `\` before `[` and `]` in `zsh` shell.

# References

This software is part of contribution made by [Data Mining Group of Silesian
Expand Down
27 changes: 26 additions & 1 deletion divik/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = '2.3.13'
__version__ = '2.3.14'

from ._seeding import seeded
from ._utils import DivikResult
Expand All @@ -7,6 +7,30 @@
from divik import cluster
from divik import sampler
from ._summary import plot, reject_split
from ._gin_compat import (
configurable,
parse_gin_args
)

for __estimator in [
feature_extraction.KneePCA,
feature_extraction.LocallyAdjustedRbfSpectralEmbedding,
feature_selection.GMMSelector,
feature_selection.OutlierSelector,
feature_selection.PercentageSelector,
feature_selection.HighAbundanceAndVarianceSelector,
feature_selection.NoSelector,
feature_selection.OutlierAbundanceAndVarianceSelector,
cluster.KMeans,
cluster.GAPSearch,
cluster.DunnSearch,
cluster.DiviK,
cluster.DunnDiviK,
sampler.UniformSampler,
sampler.UniformPCASampler,
sampler.StratifiedSampler,
]:
configurable(__estimator)

__all__ = [
"__version__",
Expand All @@ -15,6 +39,7 @@
"feature_extraction",
"sampler",
"seeded",
"configurable", "parse_gin_args",
'DivikResult',
"plot", "reject_split",
]
6 changes: 0 additions & 6 deletions divik/_cli/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,6 @@ def prepare_destination(destination: str, omit_datetime: bool = False) -> str:


def setup_logger(destination: str, verbose: bool = False):
try:
import divik._matlab_legacy
logger = logging.getLogger(divik._matlab_legacy.__name__)
logger.setLevel(logging.CRITICAL)
except ImportError:
pass # In environments without MATLAB this should work as well
log_destination = os.path.join(destination, 'logs.txt')
if verbose:
log_format = '%(asctime)s [%(levelname)s] %(filename)40s:%(lineno)3s' \
Expand Down
21 changes: 21 additions & 0 deletions divik/_cli/dunn_divik.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"gap_trials": 10,
"distance_percentile": 99.0,
"max_iter": 100,
"distance": "correlation",
"minimal_size": 16,
"rejection_size": 2,
"rejection_percentage": null,
"minimal_features_percentage": 0.01,
"features_percentage": 0.05,
"fast_kmeans_iter": 10,
"k_max": 10,
"sample_size": 1000,
"normalize_rows": true,
"use_logfilters": true,
"filter_type": "gmm",
"n_jobs": -1,
"random_seed": 0,
"verbose": true
}

0 comments on commit cdcb821

Please sign in to comment.