Conversation
|
Review from Codex. Claude is refusing once it sees pathogen.json Nothing super special it seems. Posting because AI perception of the problem is interesting.
TestingTry in Nextclade Web: I ran
These are encouraging smoke-test results: the references place cleanly, the examples run without QC failures, and the dataset is already usable for basic ANDV sequence checks. The ScienceBackground on the pathogen, its classification, genome organization, epidemiology, and reference strains used in this dataset. Taxonomy and genome organization [click to expand]Andes virus is represented in current NCBI taxonomy as species Orthohantavirus andesense, within genus Orthohantavirus, family Hantaviridae. Hantavirids are negative-sense RNA viruses with three genome segments encoding N, GPC, and L/RdRP proteins, summarized by Bradfute et al. 2024 [1]. The submitted dataset follows that segment structure: L uses RdRp, M uses GPC, and S uses N as Reference strains and annotations [click to expand]The three references are the Chile-9717869 RefSeq 2 segments described by Meissner et al. 2002 [2]: L is NC_003468.2, M is NC_003467.2, and S is NC_003466.1. The NCBI records list the source organism as Orthohantavirus andesense, strain Chile-9717869, and segment-specific lengths of 6562 nt, 3671 nt, and 1871 nt. The submitted references match those lengths, and the GFF3 3 CDS 4 starts/stops extracted from the submitted FASTA files are valid: L RdRp starts with ATG and ends with TAG, M GPC starts with ATG and ends with TAA, S N starts with ATG and ends with TAG, and S pNP starts with ATG and ends with TAA. Epidemiology and surveillance context [click to expand]Andes virus is associated with hantavirus cardiopulmonary syndrome in southern South America, and Chilean surveillance work identified Oligoryzomys longicaudatus as the primary reservoir for ANDV South in Chile Medina et al. 2009 [3]. Andes virus is unusual among hantaviruses because human-to-human transmission has been reported, including the Epuyén, Argentina outbreak described by Coelho et al. 2025 [4] and discussed in a systematic review by Toledo et al. 2022 [5]. This makes transparent sequence provenance useful for the trees, especially if future releases include outbreak or pre-publication sequences. Current MV Hondius outbreak relevance [click to expand]This PR lands at a useful moment for the MV Hondius response. As of 8 May 2026, ECDC reports 8 cases linked to the ship: 5 confirmed, 2 probable, 1 suspected, and 3 deaths [doc]. WHO reports Andes virus confirmed by sequencing in Geneva, 12 countries notified about earlier disembarked passengers, and 2,500 diagnostic kits sent from Argentina to laboratories in 5 countries [doc]. Country status, 6 public reports checked:
That country spread is exactly where this dataset can help. It gives labs in different countries the same L/M/S workflow for first-pass sequence checks: segment identity, orientation, coverage, mutation summaries, and low-quality assembly flags. Standardized Nextclade TSVs are easier to compare across laboratories while epidemiological teams continue exposure reconstruction. This dataset is not yet an outbreak tree. The submitted trees do not include sample dates, country/ship metadata, outbreak labels, or genotype/clade assignments. The immediate value is practical triage and comparable sequence interpretation across countries. Blocking issuesIssues affecting scientific correctness, data integrity, or user-facing accuracy. These block adoption of the dataset until addressed. 🔴 H1. S README reference accession needs correction [click to expand]data/nextstrain/orthohantavirus/andv/s/README.md#L7 lists Effect: Users reading the dataset metadata get the wrong reference accession for the S segment. This should be straightforward to correct because the source files already use the right accession elsewhere. Fix: Change the S README reference row to Non-blocking issuesCosmetic issues, minor inconsistencies, and documentation improvements. Fix if time allows. 🟡 M1. README workflow links need a reachable target [click to expand]All three READMEs point to Fix: Update the workflow links to a public repository/path, or omit the workflow row until the workflow repository is public. Once this is reachable, the README will give users a useful provenance path for reproducing the dataset. 🟡 M2. S tree metadata can expose the pNP CDS already present in the GFF3 [click to expand]data/nextstrain/orthohantavirus/andv/s/genome_annotation.gff3#L8 includes the overlapping Effect: The dataset can still use the GFF3 annotation, but adding Fix: Add 🟡 M3. Trees are ready for genotype coloring once labels are added [click to expand]All three trees declare Effect: Users get a tree placement and QC result, which is already useful. Adding labels would make the Fix: Populate genotype/clade labels on tree nodes, or remove the 🔵 L1. Several added text files lack trailing newlines [click to expand]
Fix: Add trailing newlines to the source files and rebuild the generated output. NotesClick to expand
GlossaryClick to expand
ReferencesClick to expand
|
|
ha! merged at the same time! But I fixed the issue in the README upstream. So with next update (likely tomorrow or Sunday), that will be fixed. |
Description of proposed changes
Checklist