Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy GTDB website and taxdump changelog #91

Open
3 tasks done
danielpodlesny opened this issue Feb 21, 2024 · 5 comments
Open
3 tasks done

Discrepancy GTDB website and taxdump changelog #91

danielpodlesny opened this issue Feb 21, 2024 · 5 comments

Comments

@danielpodlesny
Copy link

Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

  • describe the problem

Thanks for developing taxonkit and for sharing the taxdumps! it saves so much trouble.

There was this change in GTDB: R202 "CAG-521" -> R207 "Aphodousia".

I used your latest GTDB taxdump changelog which shows that CAG-521 was DELETED, Aphodousia NEW. However, I'm unable to get the connection that one changed into the other.

Going as per docs I run into this:

echo "CAG-521" | taxonkit name2taxid --data-dir $R202 | taxonkit lineage --taxid-field 2 --data-dir $R207

16:22:16.067 [WARN] taxid 1435403146 was deleted
CAG-521 1435403146

I'm not sure whether this is due to the taxdumps or taxonkit, so I post here.

CAG-521

cat <(zless gtdb-taxid-changelog.csv.gz | head -n1 | sed 's/,/\t/g') <(zless gtdb-taxid-changelog.csv.gz | grep 'R20[27]' | grep 'CAG-521' | sed 's/,/\t/g') | column -t

taxid       version  change  change-value  name         rank                                                                                  lineage                                                                                  lineage-taxids
279141433   R207     DELETE  CAG-521       sp003543795  species                                                                               Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp003543795            609216830;1641076285;329474883;2125578642;1754850155;1435403146;279141433
349671556   R207     DELETE  CAG-521       sp900554675  species                                                                               Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp900554675            609216830;1641076285;329474883;2125578642;1754850155;1435403146;349671556
494738701   R207     DELETE  CAG-521       sp900545335  species                                                                               Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp900545335            609216830;1641076285;329474883;2125578642;1754850155;1435403146;494738701
516566981   R207     DELETE  CAG-521       sp000437635  species                                                                               Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp000437635            609216830;1641076285;329474883;2125578642;1754850155;1435403146;516566981
587147611   R207     DELETE  CAG-521       sp900553105  species                                                                               Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp900553105            609216830;1641076285;329474883;2125578642;1754850155;1435403146;587147611
602392633   R202     NEW     902388655     no           rank                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp902388655;902388655  609216830;1641076285;329474883;2125578642;1754850155;1435403146;1756269640;602392633
725664906   R207     DELETE  CAG-521       sp900546995  species                                                                               Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp900546995            609216830;1641076285;329474883;2125578642;1754850155;1435403146;725664906
747615494   R202     NEW     900754945     no           rank                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp900544345;900754945  609216830;1641076285;329474883;2125578642;1754850155;1435403146;1008765200;747615494
825111592   R202     NEW     900765595     no           rank                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp902388655;900765595  609216830;1641076285;329474883;2125578642;1754850155;1435403146;1756269640;825111592
1008765200  R207     DELETE  CAG-521       sp900544345  species                                                                               Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp900544345            609216830;1641076285;329474883;2125578642;1754850155;1435403146;1008765200
1251617747  R207     DELETE  CAG-521       sp002329575  species                                                                               Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp002329575            609216830;1641076285;329474883;2125578642;1754850155;1435403146;1251617747
1435403146  R207     DELETE  CAG-521       genus        Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521  609216830;1641076285;329474883;2125578642;1754850155;1435403146
1756269640  R202     NEW     CAG-521       sp902388655  species                                                                               Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp902388655            609216830;1641076285;329474883;2125578642;1754850155;1435403146;1756269640
1756269640  R207     DELETE  CAG-521       sp902388655  species                                                                               Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;CAG-521;CAG-521  sp902388655            609216830;1641076285;329474883;2125578642;1754850155;1435403146;1756269640

Aphodousia:

cat <(zless gtdb-taxid-changelog.csv.gz | head -n1 | sed 's/,/\t/g') <(zless gtdb-taxid-changelog.csv.gz | grep 'R20[27]' | grep 'Aphodousia' | sed 's/,/\t/g') | column -t

taxid       version  change          change-value  name             rank                                                                                     lineage                                                                                             lineage-taxids
13156977    R207     CHANGE_LIN_TAX  900544315     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp002329575;900544315      609216830;1641076285;329474883;2125578642;1754850155;1577673191;465580961;13156977
101047054   R207     CHANGE_LIN_TAX  003543795     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp003543795;003543795      609216830;1641076285;329474883;2125578642;1754850155;1577673191;1688730210;101047054
159241665   R207     NEW             018714185     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  gallistercoris;018714185   609216830;1641076285;329474883;2125578642;1754850155;1577673191;262228660;159241665
255288910   R207     NEW             Aphodousia    sp017383055      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp017383055                609216830;1641076285;329474883;2125578642;1754850155;1577673191;255288910
262228660   R207     NEW             Aphodousia    gallistercoris   species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  gallistercoris             609216830;1641076285;329474883;2125578642;1754850155;1577673191;262228660
265694794   R207     NEW             Aphodousia    sp900546995      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900546995                609216830;1641076285;329474883;2125578642;1754850155;1577673191;265694794
366289909   R207     NEW             905204555     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  faecalis;905204555         609216830;1641076285;329474883;2125578642;1754850155;1577673191;626891884;366289909
394452769   R207     NEW             Aphodousia    secunda_A        species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  secunda_A                  609216830;1641076285;329474883;2125578642;1754850155;1577673191;394452769
404286898   R207     CHANGE_LIN_TAX  900544345     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900544345;900544345      609216830;1641076285;329474883;2125578642;1754850155;1577673191;1354447789;404286898
420315642   R207     CHANGE_LIN_TAX  000437635     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  faecalis;000437635         609216830;1641076285;329474883;2125578642;1754850155;1577673191;626891884;420315642
465580961   R207     NEW             Aphodousia    sp002329575      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp002329575                609216830;1641076285;329474883;2125578642;1754850155;1577673191;465580961
506319002   R207     NEW             Aphodousia    sp900545335      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900545335                609216830;1641076285;329474883;2125578642;1754850155;1577673191;506319002
599325129   R207     NEW             Aphodousia    sp905201055      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp905201055                609216830;1641076285;329474883;2125578642;1754850155;1577673191;599325129
602392633   R207     CHANGE_LIN_TAX  902388655     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp902388655;902388655      609216830;1641076285;329474883;2125578642;1754850155;1577673191;1800285846;602392633
609241997   R207     NEW             017646335     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp017646335;017646335      609216830;1641076285;329474883;2125578642;1754850155;1577673191;1665428462;609241997
612589562   R207     CHANGE_LIN_TAX  900546995     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900546995;900546995      609216830;1641076285;329474883;2125578642;1754850155;1577673191;265694794;612589562
626891884   R207     NEW             Aphodousia    faecalis         species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  faecalis                   609216830;1641076285;329474883;2125578642;1754850155;1577673191;626891884
663056101   R207     NEW             905206345     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp905206345;905206345      609216830;1641076285;329474883;2125578642;1754850155;1577673191;2119934576;663056101
732865391   R207     NEW             Aphodousia    faecigallinarum  species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  faecigallinarum            609216830;1641076285;329474883;2125578642;1754850155;1577673191;732865391
747615494   R207     CHANGE_LIN_TAX  900754945     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900544345;900754945      609216830;1641076285;329474883;2125578642;1754850155;1577673191;1354447789;747615494
825111592   R207     CHANGE_LIN_TAX  900765595     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp902388655;900765595      609216830;1641076285;329474883;2125578642;1754850155;1577673191;1800285846;825111592
1023245325  R207     CHANGE_LIN_TAX  900554675     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900554675;900554675      609216830;1641076285;329474883;2125578642;1754850155;1577673191;1477125710;1023245325
1024674004  R207     NEW             905187975     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900545335;905187975      609216830;1641076285;329474883;2125578642;1754850155;1577673191;506319002;1024674004
1044151090  R207     NEW             905197765     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900546995;905197765      609216830;1641076285;329474883;2125578642;1754850155;1577673191;265694794;1044151090
1116976797  R207     NEW             017500925     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp017383055;017500925      609216830;1641076285;329474883;2125578642;1754850155;1577673191;255288910;1116976797
1247819496  R207     NEW             905212135     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp002329575;905212135      609216830;1641076285;329474883;2125578642;1754850155;1577673191;465580961;1247819496
1317984236  R207     NEW             018712705     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  faecalis;018712705         609216830;1641076285;329474883;2125578642;1754850155;1577673191;626891884;1317984236
1321012077  R207     CHANGE_LIN_TAX  002329575     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp002329575;002329575      609216830;1641076285;329474883;2125578642;1754850155;1577673191;465580961;1321012077
1335180995  R207     NEW             Aphodousia    faecipullorum    species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  faecipullorum              609216830;1641076285;329474883;2125578642;1754850155;1577673191;1335180995
1354447789  R207     NEW             Aphodousia    sp900544345      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900544345                609216830;1641076285;329474883;2125578642;1754850155;1577673191;1354447789
1477125710  R207     NEW             Aphodousia    sp900554675      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900554675                609216830;1641076285;329474883;2125578642;1754850155;1577673191;1477125710
1577673191  R207     NEW             Aphodousia    genus            Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia  609216830;1641076285;329474883;2125578642;1754850155;1577673191
1651854969  R207     NEW             905212345     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp905212345;905212345      609216830;1641076285;329474883;2125578642;1754850155;1577673191;1703059417;1651854969
1665428462  R207     NEW             Aphodousia    sp017646335      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp017646335                609216830;1641076285;329474883;2125578642;1754850155;1577673191;1665428462
1688730210  R207     NEW             Aphodousia    sp003543795      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp003543795                609216830;1641076285;329474883;2125578642;1754850155;1577673191;1688730210
1703059417  R207     NEW             Aphodousia    sp905212345      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp905212345                609216830;1641076285;329474883;2125578642;1754850155;1577673191;1703059417
1800285846  R207     NEW             Aphodousia    sp902388655      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp902388655                609216830;1641076285;329474883;2125578642;1754850155;1577673191;1800285846
1808979396  R207     NEW             905201055     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp905201055;905201055      609216830;1641076285;329474883;2125578642;1754850155;1577673191;599325129;1808979396
1827248664  R207     NEW             018714205     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  faecipullorum;018714205    609216830;1641076285;329474883;2125578642;1754850155;1577673191;1335180995;1827248664
1832179602  R207     NEW             016901835     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  secunda_A;016901835        609216830;1641076285;329474883;2125578642;1754850155;1577673191;394452769;1832179602
1834432164  R207     NEW             018714755     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  faecigallinarum;018714755  609216830;1641076285;329474883;2125578642;1754850155;1577673191;732865391;1834432164
1859933766  R207     NEW             905198185     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900544345;905198185      609216830;1641076285;329474883;2125578642;1754850155;1577673191;1354447789;1859933766
1927048762  R207     CHANGE_LIN_TAX  900544925     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  faecalis;900544925         609216830;1641076285;329474883;2125578642;1754850155;1577673191;626891884;1927048762
1927253407  R207     NEW             017383055     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp017383055;017383055      609216830;1641076285;329474883;2125578642;1754850155;1577673191;255288910;1927253407
1949292207  R207     CHANGE_LIN_TAX  900553105     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900553105;900553105      609216830;1641076285;329474883;2125578642;1754850155;1577673191;2063973024;1949292207
1960767925  R207     NEW             905196825     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp902388655;905196825      609216830;1641076285;329474883;2125578642;1754850155;1577673191;1800285846;1960767925
2033651867  R207     CHANGE_LIN_TAX  900545335     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900545335;900545335      609216830;1641076285;329474883;2125578642;1754850155;1577673191;506319002;2033651867
2063973024  R207     NEW             Aphodousia    sp900553105      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp900553105                609216830;1641076285;329474883;2125578642;1754850155;1577673191;2063973024
2119934576  R207     NEW             Aphodousia    sp905206345      species                                                                                  Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  sp905206345                609216830;1641076285;329474883;2125578642;1754850155;1577673191;2119934576
2146281175  R207     NEW             018715245     no               rank                                                                                     Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Burkholderiaceae;Aphodousia;Aphodousia  secunda_A;018715245        609216830;1641076285;329474883;2125578642;1754850155;1577673191;394452769;2146281175
@shenwei356
Copy link
Owner

Thanks for the feedback.

I am very glad to see that GTDB has a https://gtdb.ecogenomic.org/taxon-history page!

taxonkit taxid-changelog was first designed for NCBI taxonomy, in which the changes are more continuous and not as drastic as GTDB. So some results are not satisfying, I'm sorry for this.

I've checked the source code and also some records, like a g__CAG-521 species. I do think I should revise the command someday, after finishing recent work.

@danielpodlesny
Copy link
Author

Thanks a lot for looking into this already.

So do you see this as a problem in the taxid-changelog command or in the taxdumps and the lineage command? Would this change be correctly picked up by lineage if documented differently in the taxdumps or would this in no case be resolved by this command?

@shenwei356
Copy link
Owner

lineage works fine. It's just the taxid-changelog, which did not handle some edge cases appropriately.

AS every single version of GTDB-taxonomy, it's correct and there's no known issue, only the deleted.dmp and merged.dmp files are not perfect which most tools do not use.

@shenwei356
Copy link
Owner

I just released a new version of gtdb-taxdump, which has better support for duplicated names with different ranks. And the taxids are totally changed. (not related to this issue).

(And I return to this issue again before the new release of taxonkit.)

I'm wondering if I can improve it. The answer is no for now.
In NCBI taxonomy, the TaxIds are stable, so I can directly check if the taxon names is changed by comparing names in the adjacent two versions.
While for GTDB taxonomy, I generate TaxIds from the hash value of

  1. before v0.16.0: the taxon name
  2. after v0.16.0: rank+taxon_name

So it's hard to detect renaming events for GTDB taxonomy.

But if we check the change history of an assembly, it's OK, showing CHANGE_LIN_TAX, meaning there are big changes.

$ grep GCA_003543795.1 gtdb-taxdump/R214/taxid.map 
GCA_003543795.1 60618853

$ zcat gtdb-taxid-changelog.csv.gz \
    | csvtk grep -f taxid -p 60618853  \
    | csvtk cut -f -change-value,-lineage-taxids \
    | csvtk pretty -W 40 -x ";" -S light

┌----------┬---------┬----------------┬-----------┬---------┬------------------------------------------┐
| taxid    | version | change         | name      | rank    | lineage                                  |
├==========┼=========┼================┼===========┼=========┼==========================================┤
| 60618853 | R089    | NEW            | 003543795 | no rank | Bacteria;Proteobacteria;                 |
|          |         |                |           |         | Gammaproteobacteria;Burkholderiales;     |
|          |         |                |           |         | Burkholderiaceae;CAG-521;                |
|          |         |                |           |         | CAG-521 sp003543795;003543795            |
├----------┼---------┼----------------┼-----------┼---------┼------------------------------------------┤
| 60618853 | R207    | CHANGE_LIN_TAX | 003543795 | no rank | Bacteria;Proteobacteria;                 |
|          |         |                |           |         | Gammaproteobacteria;Burkholderiales;     |
|          |         |                |           |         | Burkholderiaceae;Aphodousia;             |
|          |         |                |           |         | Aphodousia sp003543795;003543795         |
├----------┼---------┼----------------┼-----------┼---------┼------------------------------------------┤
| 60618853 | R214    | CHANGE_LIN_TAX | 003543795 | no rank | Bacteria;Pseudomonadota;                 |
|          |         |                |           |         | Gammaproteobacteria;Burkholderiales;     |
|          |         |                |           |         | Burkholderiaceae_A;Aphodousia;           |
|          |         |                |           |         | Aphodousia sp003543795;003543795         |
└----------┴---------┴----------------┴-----------┴---------┴------------------------------------------┘

@shenwei356
Copy link
Owner

shenwei356 commented Mar 6, 2024

I also add notes to taxid-changelog.

$ taxonkit taxid-changelog -h
Create TaxId changelog from dump archives

Attention:
  1. This command was originally designed for NCBI taxonomy, where the the TaxIds are stable.
  2. For other taxonomic data created by "taxonkit create-taxdump", e.g., GTDB-taxdump,
    some change events might be wrong, because
     a) There would be dramatic changes between the two versions.
     b) Different taxons in multiple versions might have the same TaxIds, because we only
        check and eliminate taxid collision within a single version.
     So a single version of taxonomic data created by "taxonkit create-taxdump" has no problem,
     it's just the changelog might not be perfect.

Note in create-taxdump:

  3. We only check and eliminate taxid collision within a single version of taxonomy data.
     Therefore, if you create taxid-changelog with "taxid-changelog", different taxons
     in multiple versions might have the same TaxIds and some change events might be wrong.

     So a single version of taxonomic data created by "taxonkit create-taxdump" has no problem,
     it's just the changelog might not be perfect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants