-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing modified Twinspan according to Rolecek et al. (2009) #2
Comments
There are a couple of issues here. Firstly, the choice of the next step should be implemented in Fortran code which makes it all a bit trickier. In particular as I also want to maintain the original behaviour of TWINSPAN so that we need IF...ELSE... branches. Second issue is how to define the heterogeneity. I got to log in with my university credentials to read the linked paper, and therefore I am not sure how Roleček defined the heterogeneity, but I have a vague memory that it was some external dissimilarity index. Then an alternative that we can easily implement. The twinspan package has now function The standard cut by division level gives this distribution of quadrats in groups 8…15 > table(cut(tw, 3))
8 9 10 11 12 13 14 15
14 25 18 30 22 18 32 11 With the experimental function the distribution of quadrats to eight groups is: > table(twinspan:::chicut(tw, 8)) # preliminary (also name which can change), not yet exported
7 8 9 10 12 13 22 23
43 14 25 18 22 18 19 11 Group 7 was not split, and so groups 14 & 15 are missing. Group 11 is also missing because it was split to 22 & 23. Other groups are the same in both cuts The first experimental version of function is available in commit 03bad03 in branch cut-by-chisquare. |
Implements Rolecek et al., JVS 20, 596-602 (2009) modified TWINSPAN where divisions are picked by heterogeneity (but honouring hierarchy). Basically there are two functions achieve this: cuth() that returns a classification vector for quadrats or species for most heterogeneous classes, and modification of as.hclust that now can display the TWINSPAN tree using heterogeneity as height. The heterogeneity is extracted from twinspan object, and it is the total chi-square (or inertia or sum of all eigenvalues) of the matrix internally used by twinspan in that division. Wish of github issue #2.
I have essentially implemented this using "alternative method" of the Roleček paper. This involves one new user-level function and one new argument in another, plus some internal support functions. I kept the old TWINSPAN code untouched and only manipulate the result so that splitting heterogeneities are taken into account. Function > library(twinspan)
> library(vegan) # ordilabel
> data(ahti)
> tw <- twinspan(ahti)
> cl0 <- as.hclust(tw)
> cl <- as.hclust(tw, height="chi")
> plot(cl0)
> ordilabel(cl0, "internal", cl0$nodelabels) # plot(tw) does all this, but does not do the rect.hclust
> rect.hclust(cl0, 8)
> plot(cl, main="Rolecek Tree")
> ordilabel(cl, "internal", cl$nodelabels)
> rect.hclust(cl, 8)
The trees can also be displayed in the > image(tw, height="chi", leadingspecies=TRUE)
I wrote a new function > cut(tw, level=3) # level 3 has 2^3 = 8 classes (at maximum)
[1] 11 11 11 11 11 11 11 11 14 11 11 11 12 11 11 11 14 10 10 10 10 10 10 10 10
[26] 10 10 10 10 10 10 10 10 10 8 8 8 8 8 8 8 8 8 8 8 8 8 9 8 9
[51] 9 9 9 9 9 9 9 9 9 11 11 11 9 9 9 9 9 11 11 11 11 11 9 11 10
[76] 9 9 9 9 11 11 9 11 11 11 11 11 15 14 14 14 14 15 14 14 14 14 14 14 14
[101] 14 14 14 14 14 14 14 14 14 12 12 12 15 15 15 15 15 14 14 14 14 15 14 15 14
[126] 14 15 13 12 14 13 14 13 13 12 13 13 13 14 12 12 12 12 12 9 12 9 12 12 12
[151] 13 13 13 12 9 13 12 13 13 12 12 12 12 12 13 13 13 13 13 15
> cuth(tw, n=8)
[1] 23 23 23 22 23 23 23 23 7 22 22 23 12 22 23 22 7 10 10 10 10 10 10 10 10
[26] 10 10 10 10 10 10 10 10 10 8 8 8 8 8 8 8 8 8 8 8 8 8 9 8 9
[51] 9 9 9 9 9 9 9 9 9 22 23 23 9 9 9 9 9 22 22 22 22 22 9 22 10
[76] 9 9 9 9 22 22 9 22 22 22 22 22 7 7 7 7 7 7 7 7 7 7 7 7 7
[101] 7 7 7 7 7 7 7 7 7 12 12 12 7 7 7 7 7 7 7 7 7 7 7 7 7
[126] 7 7 13 12 7 13 7 13 13 12 13 13 13 7 12 12 12 12 12 9 12 9 12 12 12
[151] 13 13 13 12 9 13 12 13 13 12 12 12 12 12 13 13 13 13 13 7 As the criterion of heterogeneity I used total Chi-square also known as Inertia also known as the sum of all eigenvalues. I use the same Chi-square of the same matrix (or submatrix) that is used in TWINSPAN: stacked pseudospecies with downweighting. For species, I also use the internally constructed species matrix – which is a bit trickier to construct. Functions The greatest complication was that the classification tree can be reversed: child group is more heterogeneous than her parent group. It seems that I can handle these cases. If you look carefully the Finally about the direct alternative implementation following more closely Roleček's original implementation. Unlike I wrote, it could be implemented in R without need of modifying the Fortran code. A clever implementation would need refactoring current This is now merged, and I appreciate your comments, criticism and ideas of development @kamapu and @zdealveindy (and for your information @gavinsimpson). |
Wow! Thank you, for considering the issue. I'll get through (I'm thinking to run a classification in Juice in parallel and compare the outputs). |
As mentioned once, we have tested this package during a course using a small data set from Fujiwara et al. (2014). Now my question is if there is a possibility to implement the modified Twinspan according to Roleček et al. (2009).
During the course I also had contacted @zdealveindy, who is also interested in such implementation (maybe even able to support?).
Though the modified Twinspan is already implemented on Juice, this software is restricted to Windows and therefore an implementation on R will increase the accessibility of the assessment for people working in different operative systems, like me (I'm working on Linux).
The text was updated successfully, but these errors were encountered: