Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow "minimal" augur export #273

Closed
trvrb opened this issue Apr 15, 2019 · 5 comments · Fixed by #1299
Closed

Allow "minimal" augur export #273

trvrb opened this issue Apr 15, 2019 · 5 comments · Fixed by #1299
Labels
enhancement New feature or request moderate problem Requires an average amount of work priority: moderate To be resolved after high priority issues source: office hours Issue mentioned during office hours

Comments

@trvrb
Copy link
Member

trvrb commented Apr 15, 2019

Currently, it's way harder than it should be to just get a simple auspice JSON from a Newick tree + CSV metadata. I propose to allow augur export to not require --node-data and also to not require --auspice-config. Currently, this is the Zika export:

augur export \
--tree results/tree.nwk
--metadata results/metadata.tsv \
--node-data results/branch_lengths.json results/traits.json \
--auspice-config config/auspice_config.json \ 
--output-tree auspice/zika_tree.json \ 
--output-meta auspice/zika_meta.json

This should work (dropping --auspice-config), but does not:

augur export \
--tree results/tree.nwk \
--metadata results/metadata.tsv \
--node-data results/branch_lengths.json results/traits.json \
--output-tree auspice/zika_tree.json \
--output-meta auspice/zika_meta.json

Augur runs, but the resulting meta.json is not auspice compatible.

This should also work (dropping --node-data), but does not:

augur export \
--tree results/tree.nwk \
--metadata results/metadata.tsv \
--output-tree auspice/zika_tree.json \
--output-meta auspice/zika_meta.json

Augur complains that --node-data is required.

@huddlej
Copy link
Contributor

huddlej commented Mar 14, 2020

This issue still isn't completely resolved after the augur v6 upgrade. Although users can omit metadata from the augur export v2 command, augur still prints "ERROR: read_metadata called without a filename". The corresponding code in augur/utils.py needs to be updated to output a warning instead.

Also, at least one node data JSON is required by augur export v2. The updated version of Trevor's simplest export would now be:

augur export v2 \
--tree results/tree.nwk \
--metadata results/metadata.tsv \
--output auspice/zika.json

To allow this, the following sections need to be updated in augur/export_v2.py:

@huddlej huddlej added enhancement New feature or request moderate problem Requires an average amount of work labels Mar 14, 2020
@jameshadfield
Copy link
Member

Ran into this today so adding a message here as a reminder. John's summation ☝️ is still accurate I believe, although --metadata is optional!

@jameshadfield
Copy link
Member

jameshadfield commented Jun 8, 2021

Six monthly update 😂

PR #727 added some functional test coverage of augur export v2 including a minimal example.

  • Currently a coloring must be provided (via node-data or metadata file) as the schema requires at least 1. Note that technically the dataset file is produced, and auspice can display it, but the command exits with code 2. The schema should be relaxed here.

@huddlej
Copy link
Contributor

huddlej commented Apr 13, 2023

This issue came up at office hours today (and last week) where someone wanted to make a Nextclade dataset and needed to create an Auspice JSON version of their Newick tree for their custom dataset. This user did not have any node data JSON files and did not need any, but they were forced to mockup a nearly-empty JSON file to get augur export to work the way they wanted.

@huddlej huddlej added the source: office hours Issue mentioned during office hours label Apr 13, 2023
@huddlej
Copy link
Contributor

huddlej commented Jun 22, 2023

This issue came up again at office hour from a different person. Even just allowing --node-data to be optional would be a huge win for people.

jameshadfield added a commit that referenced this issue Aug 30, 2023
These work fine in Auspice. While the 'colorings' property is optional,
`augur export v2` will always set a (possibly empty) array.

Addresses comment in #273 <#273 (comment)>
jameshadfield added a commit that referenced this issue Aug 31, 2023
These work fine in Auspice. While the 'colorings' property is optional,
`augur export v2` will always set a (possibly empty) array. I also chose
to allow the auspice config file to have an empty colorings definition,
which in practice behaves the same as leaving it out.

Addresses comment in #273 <#273 (comment)>
jameshadfield added a commit that referenced this issue Aug 31, 2023
Allows a minimal `augur export` using only a (newick) tree as input,
functionality that we've wanted for over 4 years! To facilitate this we
parse branch lengths¹ from the newick file if such data wasn't available
in the node-data inputs (e.g. because there are none!).

The code for deciding where to read divergence from has been refactored
and in the process improved: the (rare? never encountered?) case where
divergence was sometimes read from node-data keys 'mutation_length' and
sometimes from 'branch_length' can non longer happen.

If data is provided which doesn't define divergence or num_date
(irregardless of whether node-data files were provided as inputs), then
the resulting dataset will fail validation.

Closes #273 <#273>

¹ I suppose these might represent time in certain cases, but I haven't
seen such data in Newick files.
jameshadfield added a commit that referenced this issue Aug 31, 2023
Allows a minimal `augur export` using only a (newick) tree as input,
functionality that we've wanted for over 4 years! To facilitate this we
parse branch lengths¹ from the newick file if such data wasn't available
in the node-data inputs (e.g. because there are none!).

The code for deciding where to read divergence from has been refactored
and in the process improved: the (rare? never encountered?) case where
divergence was sometimes read from node-data keys 'mutation_length' and
sometimes from 'branch_length' can non longer happen.

If data is provided which doesn't define divergence or num_date
(irregardless of whether node-data files were provided as inputs), then
the resulting dataset will fail validation.

Closes #273 <#273>

¹ I suppose these might represent time in certain cases, but I haven't
seen such data in Newick files.
jameshadfield added a commit that referenced this issue Aug 31, 2023
Allows a minimal `augur export` using only a (newick) tree as input,
functionality that we've wanted for over 4 years! To facilitate this we
parse branch lengths¹ from the newick file if such data wasn't available
in the node-data inputs (e.g. because there are none!).

The code for deciding where to read divergence from has been refactored
and in the process improved: the (rare? never encountered?) case where
divergence was sometimes read from node-data keys 'mutation_length' and
sometimes from 'branch_length' can non longer happen.

If data is provided which doesn't define divergence or num_date
(irregardless of whether node-data files were provided as inputs), then
the resulting dataset will fail validation.

Closes #273 <#273>

¹ I suppose these might represent time in certain cases, but I haven't
seen such data in Newick files.
jameshadfield added a commit that referenced this issue Aug 31, 2023
Allows a minimal `augur export` using only a (newick) tree as input,
functionality that we've wanted for over 4 years! To facilitate this we
parse branch lengths¹ from the newick file if such data wasn't available
in the node-data inputs (e.g. because there are none!).

The code for deciding where to read divergence from has been refactored
and in the process improved: the (rare? never encountered?) case where
divergence was sometimes read from node-data keys 'mutation_length' and
sometimes from 'branch_length' can non longer happen.

If data is provided which doesn't define divergence or num_date
(irregardless of whether node-data files were provided as inputs), then
the resulting dataset will fail validation.

Closes #273 <#273>

¹ I suppose these might represent time in certain cases, but I haven't
seen such data in Newick files.
jameshadfield added a commit that referenced this issue Sep 20, 2023
These work fine in Auspice. While the 'colorings' property is optional,
`augur export v2` will always set a (possibly empty) array. I also chose
to allow the auspice config file to have an empty colorings definition,
which in practice behaves the same as leaving it out.

Addresses comment in #273 <#273 (comment)>
jameshadfield added a commit that referenced this issue Sep 20, 2023
Allows a minimal `augur export` using only a (newick) tree as input,
functionality that we've wanted for over 4 years! To facilitate this we
parse branch lengths¹ from the newick file if such data wasn't available
in the node-data inputs (e.g. because there are none!).

The code for deciding where to read divergence from has been refactored
and in the process improved: the (rare? never encountered?) case where
divergence was sometimes read from node-data keys 'mutation_length' and
sometimes from 'branch_length' can non longer happen.

If data is provided which doesn't define divergence or num_date
(irregardless of whether node-data files were provided as inputs), then
the resulting dataset will fail validation.

Closes #273 <#273>

¹ I suppose these might represent time in certain cases, but I haven't
seen such data in Newick files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request moderate problem Requires an average amount of work priority: moderate To be resolved after high priority issues source: office hours Issue mentioned during office hours
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants