andv: initial addition by rneher · Pull Request #446 · nextstrain/nextclade_data

rneher · 2026-05-08T16:30:04Z

Description of proposed changes

Checklist

Check if changes affect downstream workflows which depend on this dataset. For instance, Nextstrain ingest workflows may break if clade nomenclature changes. Consider fixing those workflows or making an issue at least.

rneher · 2026-05-08T20:05:20Z

Trial link:
https://master.clades.nextstrain.org/?dataset-server=gh:@andv

ivan-aksamentov · 2026-05-08T20:18:54Z

Review from Codex. Claude is refusing once it sees pathogen.json

Nothing super special it seems. Posting because AI perception of the problem is interesting.

⚠️ AI-generated content below. Verify all claims.

Testing

Try in Nextclade Web:

I ran nextclade run with Docker image nextstrain/nextclade:latest, which reported nextclade 3.21.2. All three references and all three example FASTA files completed with QC ¹ status good.

Segment	Input	Sequence	QC	Coverage	Substitutions
L	reference	`NC_003468.2 Andes virus segment L, complete genome`	good	100.00%	0
L	examples	`AF291704`	good	8.23%	0
M	reference	`NC_003467.2 Andes virus segment M, complete genome`	good	100.00%	0
M	examples	`AF004659`	good	14.71%	25
S	reference	`NC_003466.1 Andes virus segment S, complete sequence`	good	100.00%	0
S	examples	`AF004660`	good	28.86%	20

These are encouraging smoke-test results: the references place cleanly, the examples run without QC failures, and the dataset is already usable for basic ANDV sequence checks.

The clade output column is empty for all six runs. This matches the tree JSONs: all three trees declare a gt coloring, but no inspected leaves have gt, clade_membership, or branch clade labels.

Science

Background on the pathogen, its classification, genome organization, epidemiology, and reference strains used in this dataset.

Taxonomy and genome organization [click to expand]

Andes virus is represented in current NCBI taxonomy as species Orthohantavirus andesense, within genus Orthohantavirus, family Hantaviridae. Hantavirids are negative-sense RNA viruses with three genome segments encoding N, GPC, and L/RdRP proteins, summarized by Bradfute et al. 2024 [1]. The submitted dataset follows that segment structure: L uses RdRp, M uses GPC, and S uses N as defaultCds.

Reference strains and annotations [click to expand]

The three references are the Chile-9717869 RefSeq ² segments described by Meissner et al. 2002 [2]: L is NC_003468.2, M is NC_003467.2, and S is NC_003466.1. The NCBI records list the source organism as Orthohantavirus andesense, strain Chile-9717869, and segment-specific lengths of 6562 nt, 3671 nt, and 1871 nt. The submitted references match those lengths, and the GFF3 ³ CDS ⁴ starts/stops extracted from the submitted FASTA files are valid: L RdRp starts with ATG and ends with TAG, M GPC starts with ATG and ends with TAA, S N starts with ATG and ends with TAG, and S pNP starts with ATG and ends with TAA.

Epidemiology and surveillance context [click to expand]

Andes virus is associated with hantavirus cardiopulmonary syndrome in southern South America, and Chilean surveillance work identified Oligoryzomys longicaudatus as the primary reservoir for ANDV South in Chile Medina et al. 2009 [3]. Andes virus is unusual among hantaviruses because human-to-human transmission has been reported, including the Epuyén, Argentina outbreak described by Coelho et al. 2025 [4] and discussed in a systematic review by Toledo et al. 2022 [5]. This makes transparent sequence provenance useful for the trees, especially if future releases include outbreak or pre-publication sequences.

Current MV Hondius outbreak relevance [click to expand]

This PR lands at a useful moment for the MV Hondius response. As of 8 May 2026, ECDC reports 8 cases linked to the ship: 5 confirmed, 2 probable, 1 suspected, and 3 deaths [doc]. WHO reports Andes virus confirmed by sequencing in Geneva, 12 countries notified about earlier disembarked passengers, and 2,500 diagnostic kits sent from Argentina to laboratories in 5 countries [doc].

Country status, 6 public reports checked:

South Africa: first laboratory-confirmed patient, in intensive care after evacuation from Ascension [doc].
Switzerland: confirmed former passenger receiving care in Zurich; Geneva sequencing confirmed Andes virus [doc].
Netherlands: two Dutch deaths, one confirmed for hantavirus; three evacuated patients received from the ship [doc].
United Kingdom: one British national among evacuated suspected cases; returning British nationals routed into isolation, testing, and contact tracing [doc].
Singapore: two exposed residents isolated and tested after ship and flight exposure [doc].
Spain: Tenerife arrival prepared; Spanish residents routed to supervised quarantine in Madrid [doc].

That country spread is exactly where this dataset can help. It gives labs in different countries the same L/M/S workflow for first-pass sequence checks: segment identity, orientation, coverage, mutation summaries, and low-quality assembly flags. Standardized Nextclade TSVs are easier to compare across laboratories while epidemiological teams continue exposure reconstruction.

This dataset is not yet an outbreak tree. The submitted trees do not include sample dates, country/ship metadata, outbreak labels, or genotype/clade assignments. The immediate value is practical triage and comparable sequence interpretation across countries.

Blocking issues

Issues affecting scientific correctness, data integrity, or user-facing accuracy. These block adoption of the dataset until addressed.

🔴 H1. S README reference accession needs correction [click to expand]

data/nextstrain/orthohantavirus/andv/s/README.md#L7 lists NC_0034686, but the S-segment reference is NC_003466 in data/nextstrain/orthohantavirus/andv/s/pathogen.json#L16, data/nextstrain/orthohantavirus/andv/s/reference.fasta#L1, and the NCBI S-segment record NC_003466.1. The listed NC_0034686 accession does not correspond to the S segment and looks like a digit transposition/extra digit.

Effect: Users reading the dataset metadata get the wrong reference accession for the S segment. This should be straightforward to correct because the source files already use the right accession elsewhere.

Fix: Change the S README reference row to NC_003466.

Non-blocking issues

Cosmetic issues, minor inconsistencies, and documentation improvements. Fix if time allows.

🟡 M1. README workflow links need a reachable target [click to expand]

All three READMEs point to https://github.com/nextstrain/andv/tree/main/nextclade at data/nextstrain/orthohantavirus/andv/l/README.md#L8, data/nextstrain/orthohantavirus/andv/m/README.md#L8, and data/nextstrain/orthohantavirus/andv/s/README.md#L8. GitHub API checks for nextstrain/andv and nextstrain/andv/contents/nextclade?ref=main returned 404 from this environment.

Fix: Update the workflow links to a public repository/path, or omit the workflow row until the workflow repository is public. Once this is reachable, the README will give users a useful provenance path for reproducing the dataset.

🟡 M2. S tree metadata can expose the pNP CDS already present in the GFF3 [click to expand]

data/nextstrain/orthohantavirus/andv/s/genome_annotation.gff3#L8 includes the overlapping pNP CDS at 122..313, matching the RefSeq S record NC_003466.1. data/nextstrain/orthohantavirus/andv/s/tree.json#L5 only lists N and nuc under meta.genome_annotations.

Effect: The dataset can still use the GFF3 annotation, but adding pNP to the tree metadata would make the S segment annotation more complete and consistent.

Fix: Add pNP to meta.genome_annotations in the S tree, or remove it from the GFF3 if it should not be part of the dataset annotation.

🟡 M3. Trees are ready for genotype coloring once labels are added [click to expand]

All three trees declare meta.colorings[].key == "gt" at data/nextstrain/orthohantavirus/andv/l/tree.json#L22, data/nextstrain/orthohantavirus/andv/m/tree.json#L22, and data/nextstrain/orthohantavirus/andv/s/tree.json#L22, but all inspected leaves have accession and author attributes, not gt, clade_membership, or branch clade labels. Docker nextclade run confirms the effect: the clade output column is empty for the L, M, and S references and examples.

Effect: Users get a tree placement and QC result, which is already useful. Adding labels would make the Genotype coloring actionable in the UI and fill the clade column in CLI output.

Fix: Populate genotype/clade labels on tree nodes, or remove the gt coloring until assignments are available.

🔵 L1. Several added text files lack trailing newlines [click to expand]

git diff 41c473c..HEAD -- data/nextstrain/orthohantavirus/andv data_output/nextstrain/orthohantavirus/andv reports 15 No newline at end of file markers across added source and generated dataset files.

Fix: Add trailing newlines to the source files and rebuild the generated output.

Notes

Click to expand

The dataset paths nextstrain/orthohantavirus/andv/{l,m,s} are lowercase, segment-specific, registered in data/nextstrain/collection.json#L101, and appear in data_output/index.json.
This is a timely addition: it gives Nextclade users immediate L, M, and S segment support for Andes virus during an active international response.
The source and generated dataset zips include the expected seven files for each segment: CHANGELOG.md, README.md, genome_annotation.gff3, pathogen.json, reference.fasta, sequences.fasta, and tree.json.
The reference sequence is present as a tree leaf in all three segment trees: NC_003468 in L, NC_003467 in M, and NC_003466 in S.
The example sequence is present as a tree leaf for L (AF291704) and S (AF004660). The M example (AF004659) is not a leaf and will be placed at runtime, which is acceptable if intentional.
The tree leaf counts are 72 for L, 134 for M, and 103 for S. Leaf accessions and authors are populated for all leaves, but num_date values are absent, so I could not assess temporal range from the tree metadata.
The reference leaves have empty branch mutation sets and Docker nextclade run reports zero private mutations for all three references, so the reference sequences are represented in the trees.
The source pathogen.json files do not set $schema, version, compatibility, or meta; this is schema-allowed, and the generated output adds version.tag: unreleased.
The appearance of this dataset is temporally aligned with the MV Hondius Andes hantavirus response, but the submitted files do not contain outbreak-specific sample labels, dates, country metadata, or clade labels.

Glossary

Click to expand

QC (quality control). Nextclade status and scoring checks for sequence quality signals such as missing data, mixed sites, private mutations, clustered SNPs, frameshifts, and stop codons. ↩
RefSeq. NCBI curated reference sequence record used here for the Chile-9717869 Andes virus segment references. ↩
GFF3 (Generic Feature Format version 3). A tabular genome annotation format using 1-based inclusive coordinates for features such as genes and CDS intervals. ↩
CDS (coding sequence). The nucleotide interval translated into a protein product. ↩

References

Click to expand

Bradfute, Steven B., Charles H. Calisher, Boris Klempa, Jonas Klingström, Jens H. Kuhn, Lies Laenen, et al. 2024. "ICTV Virus Taxonomy Profile: Hantaviridae 2024." Journal of General Virology 105. https://doi.org/10.1099/jgv.0.001975 ↩
Meissner, J. D., J. E. Rowe, M. K. Borucki, and S. C. St Jeor. 2002. "Complete nucleotide sequence of a Chilean hantavirus." Virus Research 89:131-143. https://doi.org/10.1016/S0168-1702(02)00129-6 ↩
Medina, Rafael A., Fernando Torres-Perez, Heriberto Galeno, Marcela Navarrete, Pablo A. Vial, R. Eduardo Palma, et al. 2009. "Ecology, genetic diversity, and phylogeographic structure of Andes virus in humans and rodents in Chile." Journal of Virology 83:2446-2459. https://doi.org/10.1128/JVI.01057-08 ↩
Coelho, R., S. Kehl, N. Periolo, E. Biondo, D. Alonso, C. Perez, et al. 2025. "Virological characterization of a new isolated strain of Andes virus involved in the recent person-to-person transmission outbreak reported in Argentina." PLOS Neglected Tropical Diseases 19:e0013205. https://doi.org/10.1371/journal.pntd.0013205 ↩
Toledo, J., M. M. Haby, L. Reveiz, L. Sosa Leon, R. Angerami, and S. Aldighieri. 2022. "Evidence for human-to-human transmission of hantavirus: A systematic review." Journal of Infectious Diseases 226:1362-1371. https://doi.org/10.1093/infdis/jiab461 ↩

rneher · 2026-05-08T20:23:18Z

ha! merged at the same time!

But I fixed the issue in the README upstream. So with next update (likely tomorrow or Sunday), that will be fixed.

andv: initial addition

fb5a256

rneher temporarily deployed to refs/pull/446/merge May 8, 2026 16:31 — with GitHub Actions Inactive

nextstrain-bot and others added 2 commits May 8, 2026 16:32

chore: rebuild [skip ci]

ef473e0

andv: fix examples, adjust annotation

89e1da4

rneher had a problem deploying to refs/heads/andv May 8, 2026 20:03 — with GitHub Actions Error

rneher temporarily deployed to refs/pull/446/merge May 8, 2026 20:03 — with GitHub Actions Inactive

chore: rebuild [skip ci]

4f2c262

andv: fix examples, adjust annotation

d5952c5

rneher had a problem deploying to refs/heads/andv May 8, 2026 20:10 — with GitHub Actions Error

rneher deployed to refs/pull/446/merge May 8, 2026 20:11 — with GitHub Actions Active

nextstrain-bot and others added 2 commits May 8, 2026 20:12

chore: rebuild [skip ci]

896ee40

andv: adjust private mutations threshold

23fb04c

rneher deployed to refs/heads/andv May 8, 2026 20:16 — with GitHub Actions Active

chore: rebuild [skip ci]

ca1b713

rneher merged commit 331bdbc into master May 8, 2026

rneher deleted the andv branch May 8, 2026 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

andv: initial addition#446

andv: initial addition#446
rneher merged 8 commits intomasterfrom
andv

rneher commented May 8, 2026

Uh oh!

rneher commented May 8, 2026

Uh oh!

ivan-aksamentov commented May 8, 2026

Uh oh!

rneher commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rneher commented May 8, 2026

Description of proposed changes

Checklist

Uh oh!

rneher commented May 8, 2026

Uh oh!

ivan-aksamentov commented May 8, 2026

Testing

Science

Blocking issues

Non-blocking issues

Notes

Glossary

References

Uh oh!

rneher commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants