Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in name alignment report, suggest to list a aggregate summary unresolved names across all used taxonomies #3

Closed
jhpoelen opened this issue Mar 10, 2022 · 3 comments

Comments

@jhpoelen
Copy link

@jrhillae suggested to include in name alignment report a summary of names that were missing from one or more taxonomies.

Currently, unresolved names are listed separately for each taxonomy (NCBI, Catalogue of Life, GBIF backbone).

Desired is to create a summary view that includes the names listed by the number of taxonomies that they appeared in:

name number of matched taxonomies
Donald duck 0
Homo sapiens 5
jhpoelen pushed a commit to globalbioticinteractions/globinizer that referenced this issue Mar 10, 2022
@jhpoelen
Copy link
Author

@jrhillae I've added an top 10 mismatched aggregate count in the alignment report. Please confirm or suggest improvement.

2022-03-10T21:56:58.8636297Z ███    ██  █████  ███    ███ ███████                                        
2022-03-10T21:56:58.8637460Z ████   ██ ██   ██ ████  ████ ██                                             
2022-03-10T21:56:58.8637929Z ██ ██  ██ ███████ ██ ████ ██ █████                                          
2022-03-10T21:56:58.8638309Z ██  ██ ██ ██   ██ ██  ██  ██ ██                                             
2022-03-10T21:56:58.8638797Z ██   ████ ██   ██ ██      ██ ███████                                        
2022-03-10T21:56:58.8640394Z                                                                             
2022-03-10T21:56:58.8641275Z  █████  ██      ██  ██████  ███    ██ ███    ███ ███████ ███    ██ ████████ 
2022-03-10T21:56:58.8641746Z ██   ██ ██      ██ ██       ████   ██ ████  ████ ██      ████   ██    ██    
2022-03-10T21:56:58.8642215Z ███████ ██      ██ ██   ███ ██ ██  ██ ██ ████ ██ █████   ██ ██  ██    ██    
2022-03-10T21:56:58.8642650Z ██   ██ ██      ██ ██    ██ ██  ██ ██ ██  ██  ██ ██      ██  ██ ██    ██    
2022-03-10T21:56:58.8643062Z ██   ██ ███████ ██  ██████  ██   ████ ██      ██ ███████ ██   ████    ██    
2022-03-10T21:56:58.8643344Z                                                                             
2022-03-10T21:56:58.8643731Z ██████  ██    ██     ███    ██  ██████  ███    ███ ███████ ██████           
2022-03-10T21:56:58.8644157Z ██   ██  ██  ██      ████   ██ ██    ██ ████  ████ ██      ██   ██          
2022-03-10T21:56:58.8644616Z ██████    ████       ██ ██  ██ ██    ██ ██ ████ ██ █████   ██████           
2022-03-10T21:56:58.8645010Z ██   ██    ██        ██  ██ ██ ██    ██ ██  ██  ██ ██      ██   ██          
2022-03-10T21:56:58.8645438Z ██████     ██        ██   ████  ██████  ██      ██ ███████ ██   ██          
2022-03-10T21:56:58.8645779Z 
2022-03-10T21:56:58.8646065Z ⚠️ Disclaimer: The name alignment results in this review should be considered
2022-03-10T21:56:58.8646425Z friendly, yet naive, notes from an unsophisticated taxonomic robot. 
2022-03-10T21:56:58.8646793Z Please carefully review the results listed below and share issues/ideas
2022-03-10T21:56:58.8647152Z by email info at globalbioticinteractions.org or by opening an issue at 
2022-03-10T21:56:58.8647587Z https://github.com/globalbioticinteractions/globalbioticinteractions/issues .
2022-03-10T21:57:30.0410809Z Miller 5.6.2
2022-03-10T21:57:30.1856848Z s3cmd version 2.2.0
2022-03-10T21:57:30.2642335Z openjdk version "1.8.0_322"
2022-03-10T21:57:30.2651031Z OpenJDK Runtime Environment (Zulu 8.60.0.21-CA-linux64) (build 1.8.0_322-b06)
2022-03-10T21:57:30.2653522Z OpenJDK 64-Bit Server VM (Zulu 8.60.0.21-CA-linux64) (build 25.322-b06, mixed mode)
2022-03-10T21:57:30.2697416Z nomer not found... installing from [https://github.com/globalbioticinteractions/nomer/releases/download/0.2.11/nomer.jar]
2022-03-10T21:58:11.0416564Z nomer version 0.2.11
2022-03-10T21:58:11.0444721Z 
2022-03-10T21:58:11.0445803Z Review of [jhpoelen/GRIIS] started at [2022-03-10T21:58:11+00:00].
2022-03-10T21:58:11.0465987Z cat: '*.txt': No such file or directory
2022-03-10T21:58:11.0466509Z cat: '*.tsv': No such file or directory
2022-03-10T21:58:11.0532852Z mlr: Header/data length mismatch (11 != 1) at file "(stdin)" line 617.
2022-03-10T21:58:11.3897508Z 
2022-03-10T21:58:11.3898592Z --- [col] start ---
2022-03-10T21:58:11.3898872Z 
2022-03-10T21:58:11.8975956Z [main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [col]
2022-03-10T21:58:11.9910447Z [main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [gbif-parse]
2022-03-10T21:58:12.4365680Z [main] INFO org.globalbioticinteractions.nomer.match.CatalogueOfLifeTaxonService - [Catalogue of Life] taxonomy already indexed at [/home/runner/work/GRIIS/GRIIS/./.nomer/catalogue_of_life/catalogue_of_life], no need to import.
2022-03-10T21:58:14.1055535Z 
2022-03-10T21:58:14.1056511Z real	0m2.716s
2022-03-10T21:58:14.1056974Z user	0m4.646s
2022-03-10T21:58:14.1057472Z sys	0m0.323s
2022-03-10T21:58:14.1230989Z [col] aligned 702 resolved names to 613 provided names.
2022-03-10T21:58:14.1231809Z [col] first 10 unresolved names include:
2022-03-10T21:58:14.1233006Z 
2022-03-10T21:58:14.1304084Z | providedExternalId | providedName |
2022-03-10T21:58:14.1304820Z | --- | --- |
2022-03-10T21:58:14.1305128Z |  | Acanthoscelides pallidipennis (Motschulsky, 1874) |
2022-03-10T21:58:14.1305402Z |  | Aedes albopictus Skuse, 1894 |
2022-03-10T21:58:14.1305684Z |  | Amaranthus blitum subsp. blitum |
2022-03-10T21:58:14.1305923Z |  | Amaranthus budensis Priszter |
2022-03-10T21:58:14.1306195Z |  | Amphibalanus amphitrite Darwin, 1854 |
2022-03-10T21:58:14.1306475Z |  | Aproceros leucopoda Takeuchi, 1939 |
2022-03-10T21:58:14.1306720Z |  | Artemisia lavandulaefolia Nakai |
2022-03-10T21:58:14.1307279Z |  | Aster salignus L. |
2022-03-10T21:58:14.1307506Z |  | Aster versicolor hort. ex Steud. |
2022-03-10T21:58:14.1307764Z |  | Balanus improvisus Darwin, 1854 |
2022-03-10T21:58:14.1323570Z 
2022-03-10T21:58:14.1330419Z --- [col] end ---
2022-03-10T21:58:14.1330593Z 
2022-03-10T21:58:14.1330738Z 
2022-03-10T21:58:14.1331026Z --- [ncbi] start ---
2022-03-10T21:58:14.1331173Z 
2022-03-10T21:58:14.6676670Z [main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [gbif-parse]
2022-03-10T21:58:14.7457859Z [main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [ncbi-taxon]
2022-03-10T21:58:15.1497823Z [main] INFO org.globalbioticinteractions.nomer.match.NCBITaxonService - NCBI taxonomy already indexed at [/home/runner/work/GRIIS/GRIIS/./.nomer/ncbi/ncbi], no need to import.
2022-03-10T21:58:16.4104492Z 
2022-03-10T21:58:16.4105369Z real	0m2.279s
2022-03-10T21:58:16.4106497Z user	0m4.207s
2022-03-10T21:58:16.4106779Z sys	0m0.261s
2022-03-10T21:58:16.4297374Z [ncbi] aligned 577 resolved names to 613 provided names.
2022-03-10T21:58:16.4297726Z [ncbi] first 10 unresolved names include:
2022-03-10T21:58:16.4297893Z 
2022-03-10T21:58:16.4371203Z | providedExternalId | providedName |
2022-03-10T21:58:16.4371868Z | --- | --- |
2022-03-10T21:58:16.4372276Z |  | Agetus typicus Krøyer, 1849 |
2022-03-10T21:58:16.4372571Z |  | Agropogon littoralis (Smith) C.E.Hubbard |
2022-03-10T21:58:16.4372849Z |  | Ailanthus altissima Swingle |
2022-03-10T21:58:16.4373087Z |  | Amaranthus budensis Priszter |
2022-03-10T21:58:16.4373379Z |  | Amaranthus emarginatus Salzm. ex Uline & W.L.Bray |
2022-03-10T21:58:16.4373655Z |  | Artemisia lavandulaefolia Nakai |
2022-03-10T21:58:16.4374282Z |  | Aster laevis L. |
2022-03-10T21:58:16.4374496Z |  | Aster salignus L. |
2022-03-10T21:58:16.4374753Z |  | Aster versicolor hort. ex Steud. |
2022-03-10T21:58:16.4375007Z |  | Balsamita major Desf. |
2022-03-10T21:58:16.4390375Z 
2022-03-10T21:58:16.4391279Z --- [ncbi] end ---
2022-03-10T21:58:16.4391653Z 
2022-03-10T21:58:16.4391947Z 
2022-03-10T21:58:16.4392337Z --- [gbif] start ---
2022-03-10T21:58:16.4392514Z 
2022-03-10T21:58:16.9696057Z [main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [gbif-parse]
2022-03-10T21:58:17.0194969Z [main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [gbif-taxon]
2022-03-10T21:58:17.4322780Z [main] INFO org.globalbioticinteractions.nomer.match.GBIFTaxonService - GBIF taxonomy ids already indexed at [/home/runner/work/GRIIS/GRIIS/./.nomer/gbif/gbif], no need to import.
2022-03-10T21:58:18.9677906Z 
2022-03-10T21:58:18.9678588Z real	0m2.530s
2022-03-10T21:58:18.9678937Z user	0m4.641s
2022-03-10T21:58:18.9679225Z sys	0m0.284s
2022-03-10T21:58:18.9883548Z [gbif] aligned 979 resolved names to 613 provided names.
2022-03-10T21:58:18.9884155Z [gbif] first 10 unresolved names include:
2022-03-10T21:58:18.9884406Z 
2022-03-10T21:58:18.9953162Z | providedExternalId | providedName |
2022-03-10T21:58:18.9953984Z | --- | --- |
2022-03-10T21:58:18.9954241Z |  | Amaranthus blitum subsp. blitum |
2022-03-10T21:58:18.9954747Z |  | Malva sylvestris var. mauritiana (L.) Boiss. |
2022-03-10T21:58:18.9955171Z |  | Procambarus fallax f. virginalis Martin, Dorn, Kawai, van der Heiden & Scholtz, 2010 |
2022-03-10T21:58:18.9955515Z |  | Rosa xalba L. |
2022-03-10T21:58:18.9955888Z |  | Salicornia procumbens var. stricta (G.Mey.) J.Duvigneaud & J.Lambinon |
2022-03-10T21:58:18.9956514Z |  | Solanum triflorum var. ponticum (Prodán) Borza |
2022-03-10T21:58:18.9981656Z 
2022-03-10T21:58:18.9982563Z --- [gbif] end ---
2022-03-10T21:58:19.5377382Z [main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [gbif-parse]
2022-03-10T21:58:19.5377929Z 
2022-03-10T21:58:19.5377936Z 
2022-03-10T21:58:19.5378125Z --- [itis] start ---
2022-03-10T21:58:19.5378305Z 
2022-03-10T21:58:19.6082222Z [main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [itis-taxon-id]
2022-03-10T21:58:19.9802598Z [main] INFO org.globalbioticinteractions.nomer.match.ITISTaxonService - ITIS taxonomy already indexed at [/home/runner/work/GRIIS/GRIIS/./.nomer/itis/itis], no need to import.
2022-03-10T21:58:21.2089386Z 
2022-03-10T21:58:21.2090557Z real	0m2.213s
2022-03-10T21:58:21.2091117Z user	0m4.112s
2022-03-10T21:58:21.2091659Z sys	0m0.264s
2022-03-10T21:58:21.2286277Z [itis] aligned 541 resolved names to 613 provided names.
2022-03-10T21:58:21.2286829Z [itis] first 10 unresolved names include:
2022-03-10T21:58:21.2287076Z 
2022-03-10T21:58:21.2358254Z | providedExternalId | providedName |
2022-03-10T21:58:21.2359041Z | --- | --- |
2022-03-10T21:58:21.2359578Z |  | Achillea roseoalba Ehrend. |
2022-03-10T21:58:21.2359920Z |  | Adelges cooleyi (Gillette, 1907) |
2022-03-10T21:58:21.2360214Z |  | Adelges nordmannianae (Eckstein, 1890) |
2022-03-10T21:58:21.2371452Z |  | Agetus flaccus (Giesbrecht, 1891) |
2022-03-10T21:58:21.2372245Z |  | Agetus typicus Krøyer, 1849 |
2022-03-10T21:58:21.2372689Z |  | Agropogon littoralis (Smith) C.E.Hubbard |
2022-03-10T21:58:21.2373042Z |  | Alkekengi officinarum Moench |
2022-03-10T21:58:21.2373317Z |  | Amaranthus blitum subsp. blitum |
2022-03-10T21:58:21.2373659Z |  | Amaranthus budensis Priszter |
2022-03-10T21:58:21.2373974Z |  | Amaranthus emarginatus Salzm. ex Uline & W.L.Bray |
2022-03-10T21:58:21.2374469Z 
2022-03-10T21:58:21.2374860Z --- [itis] end ---
2022-03-10T21:58:21.2375181Z 
2022-03-10T21:58:21.2906769Z top 10 unresolved names sorted by decreasing number of mismatches across taxonomies
2022-03-10T21:58:21.2908177Z ---
2022-03-10T21:58:21.3058835Z       4 providedName
2022-03-10T21:58:21.3059700Z       4 Rosa xalba L.
2022-03-10T21:58:21.3069491Z       3 Solanum triflorum var. ponticum (Prodán) Borza
2022-03-10T21:58:21.3070307Z       3 Salicornia procumbens var. stricta (G.Mey.) J.Duvigneaud & J.Lambinon
2022-03-10T21:58:21.3071321Z       3 Procambarus fallax f. virginalis Martin, Dorn, Kawai, van der Heiden & Scholtz, 2010
2022-03-10T21:58:21.3071745Z       3 Petunia atkinsiana D.Don ex Loudon
2022-03-10T21:58:21.3072065Z       3 Macropsis eleagni Emeljanov, 1964
2022-03-10T21:58:21.3072488Z       3 Chenopodium schraderanum Schult.
2022-03-10T21:58:21.3072795Z       3 Aster versicolor hort. ex Steud.
2022-03-10T21:58:21.3073098Z       3 Aster salignus L.
2022-03-10T21:58:21.3073490Z ---
2022-03-10T21:58:21.3073642Z 
2022-03-10T21:58:21.3073648Z 
2022-03-10T21:58:21.3111104Z mlr: unacceptable empty CSV key at file "(stdin)" line 1.
2022-03-10T21:58:21.3115457Z 
2022-03-10T21:58:21.3115947Z gzip: stdout: Broken pipe
2022-03-10T21:58:21.3405791Z   adding: names-aligned.csv (stored 0%)
2022-03-10T21:58:21.3732722Z   adding: names-aligned.tsv (deflated 90%)
2022-03-10T21:58:21.4053521Z   adding: names-aligned.txt (deflated 90%)
2022-03-10T21:58:21.6690030Z [main] INFO org.globalbioticinteractions.nomer.cmd.CmdClean - cleaning cache at [./.nomer]...
2022-03-10T21:58:22.2447064Z [main] INFO org.globalbioticinteractions.nomer.cmd.CmdClean - cleaning cache at [./.nomer] done.
2022-03-10T21:58:22.2771237Z 
2022-03-10T21:58:22.2772438Z [jhpoelen/GRIIS] has 352 names alignment note(s)
2022-03-10T21:58:22.2794957Z 
2022-03-10T21:58:22.2795242Z 
2022-03-10T21:58:22.2797322Z If you'd like, you can generate your own name alignment by:
2022-03-10T21:58:22.2798156Z   - installing GloBI's Nomer via https://github.com/globalbioticinteractions/nomer
2022-03-10T21:58:22.2799245Z   - inspecting the align-names.sh script at https://github.com/globalbioticinteractions/globinizer/blob/master/align-names.sh
2022-03-10T21:58:22.2800017Z   - write your own script for name alignment
2022-03-10T21:58:22.2801460Z 
2022-03-10T21:58:22.2801981Z Please email info@globalbioticinteractions.org for questions/ comments.
2022-03-10T21:58:22.2802408Z 
2022-03-10T21:58:22.2803015Z Download the name alignment results with the single-use, and expiring, file.io link at:
2022-03-10T21:58:22.5209803Z https://file.io/jJ08trsXSIf2

@jrhillae
Copy link
Owner

@jhpoelen: This is perfect. Thank you

@jhpoelen
Copy link
Author

@jrhillae thanks for taking the time to review my work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants