Skip to content

Commit

Permalink
cut classification by within-group Chi-square: experimental code
Browse files Browse the repository at this point in the history
Rolecek et al (2009) modified twinspan so that instead dichotomic
splits by level it will only split the most heterogeneous group
at each step. This function tries to achieve the same by returning
a classification vector so that most heterogeneous classes are split.
  • Loading branch information
jarioksa committed Dec 14, 2021
1 parent 18fa394 commit 03bad03
Showing 1 changed file with 28 additions and 0 deletions.
28 changes: 28 additions & 0 deletions R/cut.twinspan.R
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,31 @@
}
cl
}

## cut by group homogeneity as defined by within-group
## Chi-square. This could give similar clustering as Rolecek's
## modified twinspan which only splits the most heterogeneous group at
## each step instead of dichotomizing. However, we use the internal
## twinspan criterion of Chi-square instead of dissimilarities. There
## is no guarantee that the tree will be ordered, but let's hope so
## and as.dendrogram(x, height="chi") will show this.

#' @export
`chicut` <-
function(x, ngroups)
{
if (missing(ngroups))
ngroups <- 1 # return one group if nothing is asked
chi <- twintotalchi(x)
chi[chi <= 0] <- NA # not evaluated
ix <- order(chi, decreasing = TRUE) # order by heterogeneity
clim <- 2^(0:x$levelmax) - 1L
class <- rep(1, x$nquadrat)
for(i in seq_len(ngroups - 1)) {
lev <- max(which(ix[i] > clim))
id <- 2L * ix[i]
class[cut(tw, level=lev) == id] <- id
class[cut(tw, level=lev) == id+1L] <- id + 1L
}
class
}

0 comments on commit 03bad03

Please sign in to comment.