Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Newick parsing #69

Merged
merged 1 commit into from Nov 2, 2023
Merged

Improve Newick parsing #69

merged 1 commit into from Nov 2, 2023

Conversation

tsibley
Copy link
Member

@tsibley tsibley commented Nov 2, 2023

Replaces the basic Newick parser with an external one that handles single-quoted node names. Quoted names are allowed by the format¹ and occur in trees produced by NCBI Pathogens.²

The parser's API is a little awkward for our use case, but it's perfectly workable. Out of several parsers I tried on NPM, this was the only one which handled quoted names, so use it despite the slightly awkward API.

¹ See https://en.wikipedia.org/wiki/Newick_format#Notes for lack of
any formal spec.

² https://discussion.nextstrain.org/t/displaying-trees-from-ncbi-pathogen-browser-in-auspice-us/1456

Checklist

  • Checks pass

@nextstrain-bot nextstrain-bot temporarily deployed to auspice-us-trs-improved-udntd9 November 2, 2023 19:57 Inactive
Replaces the basic Newick parser with an external one that handles
single-quoted node names.  Quoted names are allowed by the format¹ and
occur in trees produced by NCBI Pathogens.²

The parser's API is a little awkward for our use case, but it's
perfectly workable.  Out of several parsers I tried on NPM, this was the
only one which handled quoted names, so use it despite the slightly
awkward API.

¹ See <https://en.wikipedia.org/wiki/Newick_format#Notes> for lack of
  any formal spec.

² <https://discussion.nextstrain.org/t/displaying-trees-from-ncbi-pathogen-browser-in-auspice-us/1456>
@tsibley tsibley requested a review from a team November 2, 2023 19:59
@tsibley tsibley temporarily deployed to auspice-us-trs-improved-udntd9 November 2, 2023 19:59 Inactive
@tsibley
Copy link
Member Author

tsibley commented Nov 2, 2023

I tested this by dragging on a couple example Newick files to auspice.us vs. the review app for this PR (well, local server). The particular example that's fixed by this PR is this Newick tree:

((((((('clinical, 2021-06-01, USA, PNUSAS204121, PDT001057153.1':5,'clinical, 2020-07-22, USA, PNUSAS154799, PDT000794464.1':0.4,'clinical, 2020-11-05, USA, PNUSAS173850, PDT000878638.1':0.4):4,('clinical, 2021-11-12, USA, PNUSAS243890, PDT001173813.1':2,'clinical, 2021-07-12, USA, PNUSAS211661, PDT001088323.1':1):5):1,('clinical, 2021-07-23, USA, PNUSAS214401, PDT001095411.1':11,'clinical, 2020-12-07, USA, PNUSAS183708, PDT000905701.1':6):7):2,'clinical, 2019-12-30, USA, PNUSAS125933, PDT000651740.1':21,'clinical, 2022-11-28, USA, PNUSAS319288, PDT001503767.1':14):4,'clinical, 2021-12-06, USA, PNUSAS248727, PDT001197358.1':25):1,((('clinical, 2022-12-16, USA, PNUSAS322393, PDT001531480.1':4,'clinical, 2022-09-02, USA, PNUSAS296914, PDT001408059.1':0.4):12,'clinical, 2018-01-26, USA, stool, PNUSAS028961, PDT000281869.2':11):7,('clinical, 2023-10-25, PNUSAS395246, PDT001953269.1':6,'clinical, 2019-02-15, USA, PNUSAS056113, PDT000466197.1':7,'clinical, 2019-12-24, USA, PNUSAS124969, PDT000649918.1':8):2):12):9,('clinical, 2023-09-27, USA, PNUSAS386979, PDT001907074.1':2,'clinical, 2020-11-05, USA, PNUSAS173830, PDT000878827.1':1):6,'clinical, 2019-08-28, USA, PNUSAS093707, PDT000575200.1':20,'clinical, 2020-09-16, USA, PNUSAS167774, PDT000837790.1':5);

and it renders before/after like so:

Screen Shot 2023-11-02 at 16 08 14
Screen Shot 2023-11-02 at 16 08 59

Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like switching the newick parser resolves #66 as well. Using the example tree.nwk that Cornelius provided in the test app, I see the auspice.us error notification

Screenshot 2023-11-02 at 4 10 22 PM

and the following error message in the console

tree.nwk failed to be read as a newick tree. Error: Error: End of buffer reached.

Should we catch errors raised by the parser and display a the more specific error message?

Copy link
Member

@victorlin victorlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared the following between review app and what's currently at auspice.us:

  • The example file from discussion post does not show tip labels on auspice.us but does on the review app
  • tree.nwk from local run of zika-tutorial - identical
  • tree.nwk from local run of rsv - identical

@tsibley
Copy link
Member Author

tsibley commented Nov 2, 2023

@victorlin Not just tip labels for the first test case you list, but the actual structure of the tree is vastly different (because the current parser misparses completely).

@tsibley
Copy link
Member Author

tsibley commented Nov 2, 2023

@joverlee521

Should we catch errors raised by the parser and display a the more specific error message?

If we could turn the obtuse parser errors (e.g. End of buffer reached) into user-actionable errors (e.g. Check that your Newick tree file is well formed.) I'm not sure how feasible that is.

@tsibley tsibley merged commit b034aea into master Nov 2, 2023
@tsibley tsibley deleted the trs/improved-newick-parsing branch November 2, 2023 23:58
@tsibley
Copy link
Member Author

tsibley commented Nov 3, 2023

Merging this since it improves the situation. We can improve user-reported errors in a subsequent PR as desired. With some light prodding of newick-js, I got these errors out of it:

Extra content beyond end of Newick tree: …
Not long enough to be a Newick string: …
End of buffer reached.

Grepping the source reveals two more that are applicable to its parsing routines:

At start of buffer.
Unexpected character "${token}" in Newick tree string at position ${buffer.pos - 1}.

These all basically boil down to "Check the syntax of your Newick tree."

jameshadfield added a commit that referenced this pull request Dec 6, 2023
Two recent issues (#71, #72) provide examples where the improved parsing
either didn't parse a valid newick tree or (much more worryingly)
returned an entirely incorrect tree structure, including nodes not
present in the newick. See those issues for details, including the tree
files. While this reversion will re-introduce bugs such as #66 and the
bug in
<https://discussion.nextstrain.org/t/displaying-trees-from-ncbi-pathogen-browser-in-auspice-us/1456/4>,
but they are lesser than the bugs introduced by #69.

This reverts commit cabba98, although
subsequent changes to package-lock.json mean it's not a clean revert.
jameshadfield added a commit that referenced this pull request Dec 6, 2023
Two recent issues (#71, #72) provide examples where the improved parsing
either didn't parse a valid newick tree or (much more worryingly)
returned an entirely incorrect tree structure, including nodes not
present in the newick. See those issues for details, including the tree
files. While this reversion will re-introduce bugs such as #66 and the
bug in
<https://discussion.nextstrain.org/t/displaying-trees-from-ncbi-pathogen-browser-in-auspice-us/1456/4>,
but they are lesser than the bugs introduced by #69.

This reverts commit cabba98, although
subsequent changes to package-lock.json mean it's not a clean revert.
jameshadfield added a commit that referenced this pull request Dec 6, 2023
Two recent issues (#71, #72) provide examples where the improved parsing
either didn't parse a valid newick tree or (much more worryingly)
returned an entirely incorrect tree structure, including nodes not
present in the newick. See those issues for details, including the tree
files. While this reversion will re-introduce bugs such as #66 and the
bug in
<https://discussion.nextstrain.org/t/displaying-trees-from-ncbi-pathogen-browser-in-auspice-us/1456/4>,
but they are lesser than the bugs introduced by #69.

This reverts commit cabba98, although
subsequent changes to package-lock.json mean it's not a clean revert.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants