Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write.tree error #53

Closed
lpipes opened this issue Apr 20, 2022 · 25 comments
Closed

write.tree error #53

lpipes opened this issue Apr 20, 2022 · 25 comments

Comments

@lpipes
Copy link

lpipes commented Apr 20, 2022

Hi, I'm getting this error when I try to write out my tree. Any suggestions on how to fix this? Thanks!

write.tree(tree,file="global_out.tree")
Error: C stack usage  7969476 is too close to the limit
@KlausVigo
Copy link
Contributor

Dear @lpipes,
is it possible to provide a reproducible example.
Kind regards,
Klaus

@lpipes
Copy link
Author

lpipes commented Apr 20, 2022

I can't upload the file because it is too big (>30MB). Is there somewhere I can send it? The commands that I'm using are

tree<-read.tree("global.tree")
tree<-multi2di(tree)
write.tree(tree,file="global_out.tree")

@lpipes
Copy link
Author

lpipes commented Apr 21, 2022

I also get the same error with write.nexus()

write.nexus(tree,file="global_out.nexus")
Error: C stack usage  7969428 is too close to the limit

@emmanuelparadis
Copy link
Owner

Hi Lenore,
Here's a small test to assess if the data/file size is a problem:

R> x <- rtree(1e6)
R> write.tree(x, "x.tre")
R> file.size("x.tre")/1e6
[1] 35.88815

So it seems really possible to write a 35 MB Newick file (the same in NEXUS where the output file is 63 MB big).
Not sure what to suggest... Do you run this from RStudio? Sometimes this IDE is a bit hungry with the RAM (the above example was with the simplest CLI).

@KlausVigo
Copy link
Contributor

Hi @lpipes,
how many tips has your tree (Ntip(tree))?
Can you please send the output of sessionInfo().

@lpipes
Copy link
Author

lpipes commented Apr 21, 2022

Hi @emmanuelparadis if I read in the tree and then write the tree, I can do that without error. But it's only when I try to read in the tree and then do tree<-multi2di(tree) and then write the tree, do I get the error. I'm not using any IDE. I'm just running R from the terminal.

@KlausVigo

> Ntip(tree)
[1] 6799103
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ape_5.6-2

loaded via a namespace (and not attached):
[1] compiler_4.1.3  tools_4.1.3     parallel_4.1.3  Rcpp_1.0.7     
[5] nlme_3.1-157    grid_4.1.3      lattice_0.20-45

@emmanuelparadis
Copy link
Owner

@lpipes:

  1. Do you need to save this tree in a file to read it with another program? If your aim is just to save it to read it back in R for a later session, then you may consider using saveRDS() which is much faster and outputs a slightly smaller file (and does not lose any information).
  2. Could you type degree(tree)? Before and after running multi2di (in the second you should have something like in the example below).
  3. I made a few more tests to mimick your problem and this run OK for me:
R> n <- 6799103
R> x <- rtree(n)
R> s <- x$edge.length < .001
R> x$edge.length[s] <- 0
R> x <- di2multi(x)
R> degree(x)
  Degree       N
1      1 6799103
2      2       1
3      3 6785459
4      4    6803
5      5      12
R> x <- multi2di(x, random = FALSE)
R> degree(x)
  Degree       N
1      1 6799103
2      2       1
3      3 6799101
R> write.tree(x, "x2.tre") # takes ~ 5 min
R> file.size("x2.tre") / 1e6
[1] 250.266
R> saveRDS(x, "x2.rds") # takes ~ 30 sec
R> file.size("x2.rds") / 1e6
[1] 152.8073

@lpipes
Copy link
Author

lpipes commented Apr 25, 2022

@emmanuelparadis
(1) Yes I need to output the tree to read it into a different program (not in R). I'm just using ape to make the tree strictly bifurcating.
(2) Before I read the tree:

> degree(tree)
      Degree       N
1          1 6799314
2          2    7216
3          3  678409
4          4  242123
5          5  122957
6          6   73503
7          7   48624
8          8   34428
9          9   25518
10        10   19405
11        11   15412
12        12   12556
13        13   10409
14        14    8402
15        15    7281
16        16    6158
17        17    5353
18        18    4649
19        19    4134
20        20    3614
21        21    3168
22        22    2877
23        23    2519
24        24    2283
25        25    2148
26        26    1906
27        27    1723
28        28    1509
29        29    1440
30        30    1302
31        31    1214
32        32    1082
33        33    1074
34        34     980
35        35     900
36        36     870
37        37     840
38        38     795
39        39     698
40        40     639
41        41     658
42        42     585
43        43     567
44        44     549
45        45     539
46        46     467
47        47     460
48        48     420
49        49     410
50        50     405
51        51     360
52        52     371
53        53     316
54        54     329
55        55     353
56        56     310
57        57     290
58        58     282
59        59     275
60        60     256
61        61     262
62        62     252
63        63     213
64        64     206
65        65     226
66        66     208
67        67     203
68        68     166
69        69     168
70        70     192
71        71     173
72        72     168
73        73     175
74        74     155
75        75     139
76        76     150
77        77     146
78        78     123
79        79     122
80        80     107
81        81     111
82        82     116
83        83     127
84        84     109
85        85     114
86        86      96
87        87      92
88        88     107
89        89     100
90        90      77
91        91      88
92        92      86
93        93      67
94        94      72
95        95      77
96        96      87
97        97      70
98        98      76
99        99      59
100      100      87
101      101      65
102      102      62
103      103      58
104      104      78
105      105      50
106      106      66
107      107      66
108      108      71
109      109      56
110      110      54
111      111      53
112      112      61
113      113      55
114      114      46
115      115      52
116      116      49
117      117      51
118      118      55
119      119      56
120      120      56
121      121      50
122      122      38
123      123      42
124      124      39
125      125      39
126      126      40
127      127      43
128      128      38
129      129      30
130      130      30
131      131      36
132      132      33
133      133      34
134      134      31
135      135      33
136      136      26
137      137      27
138      138      34
139      139      18
140      140      40
141      141      42
142      142      27
143      143      30
144      144      23
145      145      33
146      146      28
147      147      25
148      148      27
149      149      28
150      150      31
151      151      20
152      152      31
153      153      26
154      154      46
155      155      25
156      156      26
157      157      20
158      158      22
159      159      18
160      160      20
161      161      24
162      162      22
163      163      15
164      164      22
165      165      19
166      166      21
167      167      17
168      168      27
169      169      23
170      170      28
171      171      13
172      172      13
173      173      18
174      174      19
175      175      16
176      176      19
177      177      18
178      178      17
179      179      13
180      180      18
181      181      16
182      182      21
183      183      18
184      184      21
185      185      17
186      186      18
187      187      19
188      188      18
189      189       9
190      190      20
191      191      10
192      192      12
193      193      12
194      194      11
195      195      17
196      196       6
197      197       7
198      198      14
199      199      13
200      200      17
201      201       8
202      202       9
203      203       9
204      204       8
205      205      14
206      206      14
207      207      12
208      208       9
209      209       9
210      210      14
211      211      13
212      212      13
213      213       9
214      214       9
215      215       4
216      216      10
217      217      12
218      218       8
219      219       6
220      220       7
221      221       9
222      222       9
223      223      17
224      224       7
225      225       6
226      226      12
227      227       3
228      228       6
229      229      11
230      230       9
231      231      10
232      232       8
233      233      11
234      234       7
235      235       8
236      236       9
237      237       8
238      238       4
239      239       2
240      240      10
241      241      10
242      242       7
243      243       6
244      244       9
245      245       6
246      246       6
247      247       4
248      248      11
249      249       7
250      250       5
251      251       3
252      252      10
253      253       4
254      254       7
255      255       9
256      256       8
257      257       3
258      258      13
259      259       4
260      260       4
261      261       7
262      262       2
263      263       5
264      264      11
265      265       9
266      266       4
267      267       1
268      268       7
269      269       5
270      270       7
271      271       4
272      272       4
273      273       5
274      274       6
275      275       4
276      276       3
277      277       6
278      278       6
279      279       7
280      280       3
281      281       9
282      282       2
283      283       4
284      284       6
286      286       3
287      287       4
288      288       5
289      289      10
290      290       4
291      291       5
292      292       1
293      293       8
294      294       4
295      295       3
296      296       4
297      297       7
298      298       5
299      299       4
300      300       2
301      301       3
302      302       8
303      303       5
304      304       2
305      305       2
306      306       5
307      307       3
308      308       3
309      309       3
310      310       6
311      311       5
312      312       7
313      313       6
314      314       3
315      315       4
316      316       3
317      317       6
318      318       5
319      319       6
320      320       4
321      321       5
322      322       3
323      323       3
324      324       7
325      325       6
326      326       4
327      327       2
328      328       3
329      329       3
330      330       8
331      331       4
332      332       4
333      333       5
334      334       2
335      335       7
336      336       2
337      337       3
338      338       3
339      339       4
340      340       4
341      341       7
342      342       3
343      343       2
344      344       4
345      345       1
346      346       7
347      347       2
348      348       6
349      349       2
350      350       3
351      351       5
352      352       1
353      353       3
354      354       3
355      355       1
356      356       1
357      357       1
358      358       7
359      359       4
360      360       6
361      361       2
362      362       4
363      363       4
364      364       2
365      365       4
367      367       4
368      368       2
369      369       2
371      371       2
372      372       2
373      373       6
374      374       1
375      375       2
376      376       3
377      377       1
378      378       3
379      379       3
381      381       2
382      382       3
383      383       4
384      384       2
385      385       2
386      386       1
387      387       3
388      388       4
389      389       3
390      390       1
391      391       2
393      393       3
394      394       6
395      395       3
397      397       3
398      398       1
399      399       4
400      400       4
401      401       2
402      402       1
404      404       1
405      405       2
406      406       1
407      407       1
408      408       2
409      409       4
410      410       1
411      411       3
413      413       2
414      414       1
415      415       2
416      416       1
417      417       2
418      418       1
419      419       5
420      420       4
421      421       5
422      422       2
423      423       3
424      424       4
425      425       1
427      427       3
428      428       4
429      429       3
430      430       1
431      431       2
432      432       1
433      433       7
435      435       2
436      436       3
438      438       4
439      439      10
440      440       3
441      441       3
443      443       2
444      444       5
445      445       2
446      446       3
447      447       1
448      448       2
449      449       1
450      450       3
451      451       3
452      452       1
455      455       1
456      456       2
458      458       3
459      459       2
460      460       1
461      461       1
462      462       3
463      463       2
465      465       2
466      466       2
467      467       1
468      468       1
469      469       1
470      470       3
471      471       2
472      472       4
473      473       1
474      474       1
477      477       1
478      478       2
480      480       4
481      481       2
482      482       2
483      483       1
484      484       3
485      485       3
488      488       3
489      489       2
490      490       1
491      491       1
492      492       1
493      493       2
494      494       3
495      495       4
496      496       1
499      499       2
500      500       1
501      501       1
503      503       3
504      504       3
505      505       2
507      507       2
509      509       3
510      510       3
512      512       1
513      513       1
514      514       1
515      515       1
518      518       2
519      519       1
520      520       3
521      521       1
522      522       3
523      523       1
527      527       2
528      528       1
529      529       1
530      530       1
531      531       1
532      532       1
533      533       2
535      535       3
536      536       2
538      538       1
539      539       2
540      540       1
541      541       1
544      544       2
545      545       2
546      546       2
547      547       2
549      549       3
550      550       1
551      551       1
553      553       3
555      555       2
557      557       1
558      558       3
559      559       3
560      560       3
561      561       4
563      563       2
564      564       1
565      565       1
567      567       1
568      568       2
570      570       5
572      572       1
575      575       2
579      579       1
580      580       3
582      582       3
583      583       1
584      584       1
586      586       2
587      587       1
590      590       1
591      591       3
594      594       1
595      595       2
596      596       2
597      597       1
598      598       2
602      602       1
604      604       1
606      606       2
607      607       2
608      608       1
611      611       1
617      617       2
619      619       1
623      623       1
624      624       3
625      625       1
626      626       1
630      630       1
631      631       1
633      633       3
634      634       1
636      636       1
637      637       1
639      639       2
641      641       1
642      642       2
643      643       1
646      646       1
647      647       1
648      648       2
655      655       2
663      663       1
668      668       1
669      669       1
672      672       1
674      674       1
676      676       2
677      677       1
678      678       1
679      679       1
680      680       1
682      682       1
683      683       2
684      684       1
685      685       1
687      687       1
688      688       2
689      689       1
691      691       1
692      692       3
697      697       1
701      701       1
703      703       3
704      704       1
705      705       1
714      714       1
717      717       2
718      718       2
721      721       1
730      730       2
731      731       1
733      733       1
736      736       2
740      740       1
742      742       1
744      744       1
746      746       1
748      748       2
749      749       1
754      754       1
755      755       2
757      757       1
765      765       1
766      766       1
769      769       1
772      772       1
777      777       1
779      779       1
780      780       1
781      781       1
785      785       1
787      787       1
793      793       1
795      795       1
800      800       2
801      801       1
802      802       1
804      804       1
808      808       1
813      813       1
814      814       1
822      822       1
824      824       1
829      829       1
830      830       1
831      831       1
832      832       1
841      841       1
842      842       1
845      845       1
850      850       1
852      852       2
857      857       1
858      858       1
862      862       1
866      866       1
867      867       1
873      873       1
875      875       1
878      878       2
887      887       1
890      890       1
893      893       2
898      898       1
903      903       1
904      904       1
912      912       2
914      914       1
916      916       2
917      917       2
921      921       2
935      935       1
937      937       2
939      939       1
954      954       1
977      977       1
981      981       1
983      983       1
986      986       1
988      988       1
992      992       1
994      994       1
1012    1012       2
1014    1014       1
1015    1015       1
1017    1017       1
1018    1018       3
1032    1032       1
1036    1036       1
1048    1048       1
1055    1055       1
1058    1058       1
1060    1060       1
1066    1066       1
1067    1067       1
1078    1078       1
1079    1079       1
1086    1086       1
1091    1091       1
1095    1095       1
1096    1096       1
1105    1105       1
1112    1112       1
1113    1113       2
1128    1128       1
1139    1139       1
1148    1148       1
1160    1160       1
1161    1161       1
1167    1167       1
1175    1175       1
1178    1178       1
1179    1179       1
1184    1184       1
1192    1192       1
1223    1223       1
1228    1228       1
1231    1231       1
1251    1251       1
1255    1255       1
1269    1269       1
1277    1277       1
1301    1301       1
1302    1302       1
1308    1308       1
1309    1309       1
1323    1323       2
1332    1332       1
1334    1334       2
1340    1340       1
1342    1342       1
1349    1349       1
1363    1363       1
1370    1370       1
1375    1375       1
1378    1378       1
1379    1379       1
1394    1394       1
1399    1399       1
1404    1404       1
1410    1410       1
1435    1435       1
1436    1436       1
1445    1445       1
1446    1446       1
1449    1449       1
1459    1459       1
1495    1495       1
1507    1507       1
1516    1516       1
1555    1555       1
1578    1578       1
1593    1593       1
1626    1626       1
1649    1649       1
1651    1651       1
1657    1657       1
1664    1664       1
1665    1665       1
1675    1675       1
1676    1676       1
1677    1677       1
1680    1680       1
1688    1688       1
1701    1701       1
1717    1717       1
1857    1857       1
1860    1860       1
1866    1866       1
1870    1870       1
1939    1939       1
1963    1963       2
2000    2000       1
2053    2053       1
2091    2091       1
2097    2097       1
2100    2100       1
2107    2107       1
2124    2124       1
2158    2158       1
2167    2167       1
2192    2192       1
2202    2202       1
2209    2209       1
2232    2232       1
2276    2276       1
2316    2316       1
2349    2349       1
2386    2386       1
2389    2389       1
2390    2390       1
2420    2420       1
2429    2429       1
2516    2516       1
2575    2575       1
2603    2603       1
2615    2615       1
2646    2646       1
2706    2706       1
2777    2777       1
2862    2862       1
2879    2879       1
2923    2923       1
2934    2934       1
3157    3157       1
3165    3165       1
3187    3187       1
3543    3543       1
3582    3582       1
3594    3594       1
3779    3779       1
3929    3929       1
4013    4013       1
4027    4027       1
4220    4220       1
4290    4290       1
4350    4350       1
4457    4457       1
4572    4572       1
4576    4576       1
4604    4604       1
4776    4776       1
5036    5036       1
5131    5131       1
5516    5516       1
5934    5934       1
5985    5985       1
6201    6201       1
6564    6564       1
6753    6753       1
6796    6796       1
6810    6810       1
6991    6991       1
6998    6998       1
7122    7122       1
7166    7166       1
8065    8065       1
8975    8975       1
8992    8992       1
10069  10069       1
10416  10416       1
10802  10802       1
12296  12296       1
13501  13501       1
14097  14097       1
14796  14796       1
16807  16807       1
18583  18583       1
19482  19482       1
19991  19991       1
24597  24597       1
33084  33084       1
38062  38062       1
62153  62153       1

But I am getting a very odd error after reading the tree

> library(ape)
> degree(tree)
Error in degree(tree) : could not find function "degree"

@lpipes
Copy link
Author

lpipes commented Apr 25, 2022

I saved the tree as an rds and then reloaded it into another session and was able to run degree

> degree(tree)
  Degree       N
1      1 6799103
2      2       1
3      3 6799101

@emmanuelparadis
Copy link
Owner

It seems we identified the cause: see #54. Your problem can be replicated with:

R> tr <- stree(2000, "l")
R> write.tree(tr)
Erreur : C stack usage  9526916 is too close to the limit

I've just finished a rewrite of the internal function that creates the Newick string: I paste the code below so you can try it. As a check; it should give you (of course if tree is not too unbalanced):

R> .write.tree3(tree) == write.tree(tree)
[1] TRUE

And to write in a file: cat(.write.tree3(tr), file = "output.tre").

@KlausVigo: I'm not sure we need/want to keep the recursive version .write.tree2() in ape...?

.write.tree3 <- function(phy, digits = 10, tree.prefix = "", check_tips = TRUE)
{
    brl <- !is.null(phy$edge.length)
    nodelab <- !is.null(phy$node.label)
    if (check_tips) phy$tip.label <- checkLabel(phy$tip.label)
    if (nodelab) phy$node.label <- checkLabel(phy$node.label)
    f.d <- paste0(":%.", digits, "g")

    n <- length(phy$tip.label)

    ## terminal branches:
    terms <- phy$edge[, 2] <= n
    TERMS <- phy$tip.label[phy$edge[terms, 2]]
    if (brl) TERMS <- paste0(TERMS, sprintf(f.d, phy$edge.length[terms]))

    ## internal branches:
    phy$edge[!terms, 2]
    INTS <- ")"
    if (nodelab) INTS <- paste0(INTS, phy$node.label[-1])
    if (brl) INTS <- paste0(INTS, sprintf(f.d, phy$edge.length[!terms]))

    ## borrowed from phangorn:
    parent <- phy$edge[, 1]
    children <- phy$edge[, 2]
    kids <- vector("list", n + phy$Nnode)
    for (i in 1:length(parent))
        kids[[parent[i]]] <- c(kids[[parent[i]]], children[i])
    Nkids <- lengths(kids)
    root <- parent[! parent %in% children][1]

    LEFT <- -.Machine$integer.max

    buildCladeIntegers <- function(node) {
        N <- Nkids[node]
        n <- 2 * N + 1L
        res <- integer(n)
        if (N > 1) res[] <- NA_integer_ # initialise with ','
        res[1] <- LEFT
        res[n] <- -node
        res[seq(2L, n - 1L, 2L)] <- kids[[node]]
        res
    }

    iSTR <- buildCladeIntegers(root)

    repeat {
        whr <- which(iSTR > n)
        if (!length(whr)) break
        for (i in rev(whr)) {
            z <- iSTR[i]
            iSTR <- c(iSTR[1:(i - 1L)],
                      buildCladeIntegers(z),
                      iSTR[(i + 1L):length(iSTR)])
        }
    }

    STRING <- character(length(iSTR))
    STRING[iSTR == LEFT] <- "("
    STRING[match(-phy$edge[!terms, 2], iSTR)] <- INTS
    STRING[is.na(iSTR)] <- ","
    STRING[match(1:n, iSTR)] <- TERMS

    ## treat the root:
    ROOT <- if (nodelab) paste0(")", phy$node.label[1]) else ")"
    if (!is.null(phy$root.edge))
        ROOT <- paste0(ROOT, sprintf(f.d, phy$root.edge))
    ROOT <- paste0(ROOT, ";")
    STRING[length(STRING)] <- ROOT

    paste(STRING, collapse = "")
}

@KlausVigo
Copy link
Contributor

Hi @emmanuelparadis

@KlausVigo: I'm not sure we need/want to keep the recursive version .write.tree2() in ape...?

I don't think we should have two versions of .write.tree() in ape. I just realized that the new version is a bit slower O(n²) instead of O(n) for large trees. I will have a look if I can speed it up.
We might should start collecting trees, which used to cause problems and write some tests to check the robustness of the new function.

Cheers,
Klaus

@emmanuelparadis
Copy link
Owner

Hi @KlausVigo,
I replaced seq by seq.int (the former is generic, the latter is primitive) and it's significantly faster.
You're right about the scaling: .write.tree3 is now faster than .write.tree2 with 1000 (or less) tips, but slower with bigger trees. I'm trying to profile more but not sure where to look for...
I was thinking to keep both versions and use the recursive one on big trees if they are not too unbalanced, but adding the tests will slow down the whole procedure....

@KlausVigo
Copy link
Contributor

Hi @emmanuelparadis,
it is this line:

iSTR <- c(iSTR[1:(i - 1L)], buildCladeIntegers(z), iSTR[(i + 1L):length(iSTR)])

where most of the time is spent and also most memory allocations happen.
I assume it is c() that grows the vector causes the problem.

@emmanuelparadis
Copy link
Owner

It should be possible without resizing iSTR! Its elements from 1 until i - 1 are not modified by this operation... Let me see this further...

@lpipes
Copy link
Author

lpipes commented Apr 26, 2022

@emmanuelparadis @KlausVigo Thanks for working on this! I'm trying out the fix now. It has been several hours since I started the .write.tree3 command so I will let you know when it has finished.

@emmanuelparadis
Copy link
Owner

@lpipes It seems you are victim of the O(n^2) curse! I paste below another version which seems to scale with O(n) and is even faster than the current internal function in ape (and it accepts totally unbalanced trees!). The basic idea is to build the Newick string progressively thanks to the postorder tree-traversal. So there's no need to do recursions, just an iteration along the edges after getting their postorder ordering. I made tests with some random trees generated with rtree and it's all good.

@KlausVigo This version assumes that the root is the 'n + 1' node (hence the code borrowed from phnagorn is not needed anymore). I think you wanted to have this relaxed... Maybe this can still be relaxed if postorder() is flexible to this?

.write.tree3 <- function(phy, digits = 10, tree.prefix = "", check_tips = TRUE)
{
    brl <- !is.null(phy$edge.length)
    nodelab <- !is.null(phy$node.label)
    if (check_tips) phy$tip.label <- checkLabel(phy$tip.label)
    if (nodelab) phy$node.label <- checkLabel(phy$node.label)
    f.d <- paste0(":%.", digits, "g")
    n <- length(phy$tip.label)

    ## terminal branches:
    terms <- phy$edge[, 2] <= n
    TERMS <- phy$tip.label[phy$edge[terms, 2]]
    if (brl) TERMS <- paste0(TERMS, sprintf(f.d, phy$edge.length[terms]))

    ## internal branches, including root edge:
    INTS <- rep(")", phy$Nnode)
    if (nodelab) INTS <- paste0(INTS, phy$node.label)
    if (brl) {
        tmp <- phy$edge.length[!terms][order(phy$edge[!terms, 2])]
        tmp <- c("", sprintf(f.d, tmp))
        if (!is.null(phy$root.edge)) tmp[1L] <- sprintf(f.d, phy$root.edge)
        INTS <- paste0(INTS, tmp)
    }
    INTS[1] <- paste0(INTS[1], ";")

###    ## borrowed from phangorn:
###    parent <- phy$edge[, 1]
###    children <- phy$edge[, 2]
###    kids <- vector("list", n + phy$Nnode)
###    for (i in 1:length(parent))
###        kids[[parent[i]]] <- c(kids[[parent[i]]], children[i])
###    Nkids <- lengths(kids, FALSE)
###    root <- parent[! parent %in% children][1]
###
    o <- postorder(phy)
    ANC <- phy$edge[o, 1L]
    DESC <- phy$edge[o, 2L]
    NEWICK <- character(n + phy$Nnode)
    NEWICK[1:n] <- TERMS
    root <- n + 1L
    from <- to <- 1L
    repeat {
        thenode <- ANC[from]
        if (thenode == root) {
            to <- length(ANC)
        } else {
            while (ANC[to + 1L] == thenode) to <- to + 1L
        }
        tmp <- paste(NEWICK[DESC[from:to]], collapse = ",")
        tmp <- paste0("(", tmp, INTS[thenode - n])
        NEWICK[thenode] <- tmp
        if (thenode == root) break
        from <- to + 1L
    }
    NEWICK[root]
}

@KlausVigo
Copy link
Contributor

Hi @emmanuelparadis,
that's great!

@KlausVigo This version assumes that the root is the 'n + 1' node (hence the code borrowed from phnagorn is not needed anymore). I think you wanted to have this relaxed... Maybe this can still be relaxed if postorder() is flexible to this?

If the tree is in postorder than the root is always tree$edege[nrow(tree$edge), 1], so it can easily relaxed. If postorder works on such a tree is another thing ;)

Cheers,
Klaus

@lpipes
Copy link
Author

lpipes commented Apr 28, 2022

@emmanuelparadis @KlausVigo Thanks for your help! I was able to print out the tree! I'll close this issue now.

@teng-gao
Copy link

teng-gao commented May 2, 2022

Hi guys,

Do you think the below issue we experience with treeio::.write.tree4 is possibly related to this issue?
YuLab-SMU/treeio#78

Thanks,
Teng

@KlausVigo
Copy link
Contributor

Hi @teng-gao,
this might might not be the answer you expected, but treeio::.write.tree4 seems mainly a copy from the ape function. The issues which come into my mind and you might want to get an answer for are:
Why not contributing and trying to improve the ape function and importing it, but having different copies of this function in your own package?
Are the contributions of the ape authors properly acknowledged, see e.g. the first bullet point in the CRAN policies?
I leave it here and sorry for the rant.
Klaus

@teng-gao
Copy link

teng-gao commented May 3, 2022

Hi @KlausVigo,

Sorry for the confusion. I'm not the author of the treeio package - I raised the issue on their Github, which I thought is related to the ape function discussed here. Seems that it's no coincidence because the function was derived from ape..

Thanks,
Teng

@lpipes lpipes reopened this Nov 8, 2022
@lpipes
Copy link
Author

lpipes commented Nov 8, 2022

Hi, my tree has become bigger now (~10,000,000 leaf nodes) and I have been trying to use .write.tree3() to print out my tree for about a day now without any luck. Should I expect it to take longer?

@lpipes
Copy link
Author

lpipes commented Nov 8, 2022

> degree(tree)
  Degree        N
1      1 10801287
2      2        1
3      3 10801285
> Ntip(tree)
[1] 10801287
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ape_5.6-2

loaded via a namespace (and not attached):
[1] compiler_4.2.2  parallel_4.2.2  Rcpp_1.0.9      nlme_3.1-160   
[5] grid_4.2.2      lattice_0.20-45

@emmanuelparadis
Copy link
Owner

Hi,
Since your tree is almost a "comb", you can estimate the expected times with stree():

R> N <- 10^(2:7)
R> RT <- NULL
R> for (n in N) {tr <- stree(n); tr$edge.length <- runif(n); RT <- rbind(RT, system.time(write.tree(tr, file = "tmp.tre")))}
R> rownames(RT) <- N
R> RT
      user.self sys.self elapsed user.child sys.child
100       0.001    0.001   0.002          0         0
1000      0.007    0.000   0.006          0         0
10000     0.033    0.000   0.032          0         0
1e+05     0.369    0.020   0.389          0         0
1e+06     6.048    0.124   6.230          0         0
1e+07    65.056    0.951  66.610          0         0

We can check whether the presence of branch lengths is a significant factor:

R> RT2 <- NULL
R> for (n in N) {tr <- stree(n); RT2 <- rbind(RT2, system.time(write.tree(tr, file = "tmp.tre")))}
R> rownames(RT2) <- N
R> RT2
      user.self sys.self elapsed user.child sys.child
100       0.003    0.024   0.027          0         0
1000      0.002    0.000   0.002          0         0
10000     0.016    0.000   0.016          0         0
1e+05     0.191    0.000   0.192          0         0
1e+06     1.772    0.008   1.796          0         0
1e+07    19.596    0.053  19.879          0         0

And the same test with fully binary trees (with branch lengths) but only until one million tips:

R> RT3 <- NULL
R> for (n in N[1:5]) {tr <- rtree(n); RT3 <- rbind(RT3, system.time(write.tree(tr, file = "tmp.tre")))}
R> rownames(RT3) <- N[1:5]
R> RT3
      user.self sys.self elapsed user.child sys.child
100       0.003    0.000   0.002          0         0
1000      0.014    0.000   0.014          0         0
10000     0.219    0.000   0.221          0         0
1e+05     2.209    0.004   2.220          0         0
1e+06    38.798    0.036  38.975          0         0

So even in this "worst" case, this should not take one hour...

Maybe you need to install the current version of ape from GitHub (ver. 5.6-3). From your message, it seems you loaded the above code .write.tree3() which could be less optimal.

@lpipes
Copy link
Author

lpipes commented Nov 9, 2022

I updated with the current version of ape from Github and it printed within 1 hour! Thanks a lot.

@lpipes lpipes closed this as completed Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants