Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(export): In export schema, allow cds name with nuc prefix, as long as not equal to nuc #1434

Merged
merged 4 commits into from
Mar 14, 2024

Conversation

corneliusroemer
Copy link
Member

@corneliusroemer corneliusroemer commented Mar 7, 2024

Fixes #1433

Description of proposed changes

Fixes bug #1433 introduced in v23.1.0, that causes validation to fail when gene names start with nuc, e.g. nucleocapsid.

Regex excluded all strings that start with nuc not just nuc. Adding a trailing $ to the lookahead fixes that.

Checklist

  • Checks pass
  • If making user-facing changes, add a message in CHANGES.md summarizing the changes in this PR

It seems like we don't really test export validation much. I had a look whether I could quickly add a test case for this bug to prevent regression but couldn't see a neat way. I think we should not block this on lacking tests, but rather add an issue to improve testing of validation. Would be nice if there was a way to do more unit tests for export here: tests/test_validate.py - and potentially also test validation in an E2E test.

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm that this fixes the case where keys in the branch_attrs.mutations couldn't start with "nuc" ✅

There's a related bug in the part of the schema which checks the meta.genome_annotations, where we use the same regex but fail to disallow additional properties (which is why it wasn't flagged up in validation). Could you add the following patch to this PR?

diff --git a/augur/data/schema-annotations.json b/augur/data/schema-annotations.json
index 37a9075a..e064a34b 100644
--- a/augur/data/schema-annotations.json
+++ b/augur/data/schema-annotations.json
@@ -24,8 +24,9 @@
         }
     },
     "required": ["nuc"],
+    "additionalProperties": false,
     "patternProperties": {
-        "^(?!nuc)[a-zA-Z0-9*_-]+$": {
+        "^(?!nuc$)[a-zA-Z0-9*_-]+$": {
             "$comment": "Each object here defines a single CDS",
             "type": "object",
             "oneOf": [{ "$ref": "#/$defs/startend" }, { "$ref": "#/$defs/segments" }],

P.S. The reason for the negative look-ahead is that we use a different schema for "nuc" objects vs "CDS" objects.

Co-authored-by: james hadfield <hadfield.james@gmail.com>
@corneliusroemer
Copy link
Member Author

Sorry for the delay @jameshadfield - I've added the patch now

@corneliusroemer corneliusroemer merged commit a9572d5 into master Mar 14, 2024
20 checks passed
@corneliusroemer corneliusroemer deleted the fix-export-schema branch March 14, 2024 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Export schema wrongly fails on gene names starting with 'nuc' due to lookahead
2 participants