Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

slow validation caused by string replaceAll to remove nulls #120

Merged

Conversation

awgymer
Copy link
Collaborator

@awgymer awgymer commented Oct 18, 2023

Fixes #90

It turns out replaceAll to remove null values from the json string representation of the samplesheet is the bottleneck here.

Using JsonGenerator with .excludeNulls to convert the object to a string and back again to a JSONArray solves this.

Using a single-end samplesheet with 60 rows:

before

Oct-18 11:57:31.911 [main] DEBUG nextflow.validation.SchemaValidator - Starting validation: 'samplesheet.se.60.csv' with 'assets/schema_input.json'
Oct-18 11:57:31.916 [main] DEBUG nextflow.validation.SchemaValidator - Removing null
Oct-18 12:07:39.610 [main] DEBUG nextflow.validation.SchemaValidator - nulls removed
Oct-18 12:07:39.613 [main] DEBUG nextflow.validation.SchemaValidator - Validation passed: 'samplesheet.se.60.csv' with 'assets/schema_input.json'

after

Oct-18 12:29:44.364 [main] DEBUG nextflow.validation.SchemaValidator - Starting validation: 'samplesheet.se.60.csv' with 'assets/schema_input.json'
Oct-18 12:29:44.369 [main] DEBUG nextflow.validation.SchemaValidator - Removing null
Oct-18 12:29:44.370 [main] DEBUG nextflow.validation.SchemaValidator - nulls removed
Oct-18 12:29:44.373 [main] DEBUG nextflow.validation.SchemaValidator - Validation passed: 'samplesheet.se.60.csv' with 'assets/schema_input.json'

N.B. Interestingly it should be noted that if you use a yaml input format samplesheet rather than a csv you do not see the same issue. This leads me to suspect that null values in a yaml are being parsed differently at some point before this:

Oct-18 11:47:07.491 [main] DEBUG nextflow.validation.SchemaValidator - Starting validation: 'samplesheet.se.60.yaml' with 'assets/schema_input.json'
Oct-18 11:47:07.502 [main] DEBUG nextflow.validation.SchemaValidator - Removing null
Oct-18 11:47:07.503 [main] DEBUG nextflow.validation.SchemaValidator - nulls removed
Oct-18 11:47:07.503 [main] DEBUG nextflow.validation.SchemaValidator - Validation passed: 'samplesheet.se.60.yaml' with 'assets/schema_input.json'

@nvnieuwk
Copy link
Collaborator

Looks good to me! Thanks @awgymer

@nvnieuwk nvnieuwk merged commit 2abce0d into nextflow-io:master Oct 23, 2023
3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Very slow validation for single-end fastq samplesheets
2 participants