Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with post-trimmed read 2 sample names from FastQC in MultiQC #690

Closed
pcantalupo opened this issue Aug 18, 2021 · 1 comment
Closed
Labels
bug Something isn't working

Comments

@pcantalupo
Copy link
Contributor

I ran the pipeline (v3.3) on 6 paired end samples. It seems that the pipeline is not preserving the R2 sample name properly after trimming. See the fastq status check heatmaps before and after trimming. For example, JCV1_1 remains as JCV1_1 but JCV1_2 becomes JCV1. Same happens for all the samples.

Before:
fastqc-status-check-heatmap_beforeTRIMGalore

After:
fastqc-status-check-heatmap_afterTRIMGalore

Also, this affects the General stats table where the FastQC trimmed values (the last three columns) are in the wrong row.

genstats

This issue also occurred on a different RNAseq project where I used v3.0 of the pipeline.

Here are the pre and post trimming FastQC results. looks like the 'sample' column in the Post FastQC is not named correctly for the R2 read

Pre trimming:

Sample	Filename	File type	Encoding	Total Sequences	Sequences flagged as poor quality	Sequence length	%GC	total_deduplicated_percentage	avg_sequence_length	basic_statistics	per_base_sequence_quality	per_tile_sequence_quality	per_sequence_quality_scores	per_base_sequence_content	per_sequence_gc_content	per_base_n_content	sequence_length_distribution	sequence_duplication_levels	overrepresented_sequences	adapter_content
JCV1_1	JCV1_1.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	53988221.0	0.0	35-74	48.0	51.65727699433496	73.45675394638397	pass	pass	pass	pass	fail	pass	pass	warn	warn	pass	pass
JCV1_2	JCV1_2.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	53988221.0	0.0	35-74	49.0	56.03894332814916	73.45215231300176	pass	fail	pass	pass	fail	pass	pass	warn	warn	pass	pass
JCV3_1	JCV3_1.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	55790711.0	0.0	35-74	48.0	49.57419108155582	73.44694628466019	pass	pass	pass	pass	fail	pass	pass	warn	fail	pass	pass
JCV3_2	JCV3_2.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	55790711.0	0.0	35-74	49.0	55.62103312671023	73.44326395123375	pass	fail	pass	pass	fail	pass	pass	warn	warn	pass	pass
JCV5_1	JCV5_1.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	51641365.0	0.0	35-74	48.0	55.536616814309006	73.42772939870973	pass	pass	pass	pass	fail	pass	pass	warn	warn	pass	pass
JCV5_2	JCV5_2.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	51641365.0	0.0	35-74	49.0	62.153167989875435	73.42946053033262	pass	fail	pass	pass	fail	pass	pass	warn	warn	pass	pass
Mock1_1	Mock1_1.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	64865850.0	0.0	35-74	48.0	47.9371879640707	73.41032418445145	pass	pass	pass	pass	fail	pass	pass	warn	fail	pass	pass
Mock1_2	Mock1_2.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	64865850.0	0.0	35-74	49.0	57.283224581998546	73.40567429548831	pass	fail	pass	pass	fail	pass	pass	warn	warn	pass	pass
Mock3_1	Mock3_1.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	46656078.0	0.0	35-74	48.0	55.6823353585729	73.4445828901435	pass	pass	pass	pass	fail	pass	pass	warn	warn	pass	pass
Mock3_2	Mock3_2.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	46656078.0	0.0	35-74	49.0	53.2820294003623	73.43593981474396	pass	warn	pass	pass	fail	pass	pass	warn	warn	pass	pass
Mock5_1	Mock5_1.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	57498035.0	0.0	35-74	48.0	50.44614854207501	73.45285368447809	pass	pass	pass	pass	fail	pass	pass	warn	warn	pass	pass
Mock5_2	Mock5_2.fastq.gz	Conventional base calls	Sanger / Illumina 1.9	57498035.0	0.0	35-74	49.0	55.536796769515774	73.44776766719072	pass	fail	pass	pass	fail	pass	pass	warn	warn	pass	pass

Post trimming:

Sample	Filename	File type	Encoding	Total Sequences	Sequences flagged as poor quality	Sequence length	%GC	total_deduplicated_percentage	avg_sequence_length	basic_statistics	per_base_sequence_quality	per_tile_sequence_quality	per_sequence_quality_scores	per_base_sequence_content	per_sequence_gc_content	per_base_n_content	sequence_length_distribution	sequence_duplication_levels	overrepresented_sequences	adapter_content
JCV1	JCV1_2_val_2.fq.gz	Conventional base calls	Sanger / Illumina 1.9	53936652.0	0.0	18-74	49.0	58.1674796198945	72.05477585075172	pass	fail	pass	pass	warn	pass	pass	warn	warn	pass	pass
JCV1_1	JCV1_1_val_1.fq.gz	Conventional base calls	Sanger / Illumina 1.9	53936652.0	0.0	18-74	48.0	53.96828580058934	72.11584226622001	pass	pass	pass	pass	fail	pass	pass	warn	warn	pass	pass
JCV3	JCV3_2_val_2.fq.gz	Conventional base calls	Sanger / Illumina 1.9	55724377.0	0.0	18-74	49.0	57.65426668969561	72.0466925991833	pass	fail	pass	pass	warn	pass	pass	warn	warn	pass	pass
JCV3_1	JCV3_1_val_1.fq.gz	Conventional base calls	Sanger / Illumina 1.9	55724377.0	0.0	18-74	48.0	51.8237332193932	72.1178986352777	pass	pass	pass	pass	fail	pass	pass	warn	warn	pass	pass
JCV5	JCV5_2_val_2.fq.gz	Conventional base calls	Sanger / Illumina 1.9	51561784.0	0.0	20-74	49.0	64.99403650656434	72.448301400898	pass	fail	pass	pass	fail	pass	pass	warn	warn	pass	pass
JCV5_1	JCV5_1_val_1.fq.gz	Conventional base calls	Sanger / Illumina 1.9	51561784.0	0.0	20-74	48.0	58.42667796286335	72.58038077192985	pass	pass	pass	pass	fail	pass	pass	warn	warn	pass	pass
Mock1	Mock1_2_val_2.fq.gz	Conventional base calls	Sanger / Illumina 1.9	64734888.0	0.0	18-74	49.0	59.44608404392952	72.04222582419544	pass	fail	pass	pass	warn	pass	pass	warn	warn	pass	pass
Mock1_1	Mock1_1_val_1.fq.gz	Conventional base calls	Sanger / Illumina 1.9	64734888.0	0.0	18-74	48.0	50.47607122113252	72.11616595366628	pass	pass	pass	pass	fail	pass	pass	warn	warn	pass	pass
Mock3	Mock3_2_val_2.fq.gz	Conventional base calls	Sanger / Illumina 1.9	46598471.0	0.0	18-74	48.0	55.28937387473722	72.08408889639318	pass	warn	pass	pass	warn	pass	pass	warn	warn	pass	pass
Mock3_1	Mock3_1_val_1.fq.gz	Conventional base calls	Sanger / Illumina 1.9	46598471.0	0.0	18-74	48.0	58.07780212586319	72.10868051872346	pass	pass	pass	pass	fail	pass	pass	warn	warn	pass	pass
Mock5	Mock5_2_val_2.fq.gz	Conventional base calls	Sanger / Illumina 1.9	57435151.0	0.0	18-74	49.0	57.55793259095435	72.04206507614127	pass	fail	pass	pass	warn	pass	pass	warn	warn	pass	pass
Mock5_1	Mock5_1_val_1.fq.gz	Conventional base calls	Sanger / Illumina 1.9	57435151.0	0.0	18-74	48.0	52.662549864037864	72.11543154121767	pass	pass	pass	pass	fail	pass	pass	warn	warn	pass	pass

@drpatelh thinks the issue is due to this line

- '_2_val_2'
and would have to be tested without that line to see if it fixes the problem.

For reference, this issue was discussed on the rnaseq Slack channel on Monday Aug 16th 2021

@pcantalupo pcantalupo added the bug Something isn't working label Aug 18, 2021
drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue Sep 22, 2021
@drpatelh
Copy link
Member

Should be fixed in drpatelh@11562dc

@drpatelh drpatelh changed the title R2 sample names are not preserved in the FastQC trimmed Status Check heatmap nor General Stats table Error with post-trimmed read 2 sample names from FastQC in MultiQC Sep 22, 2021
drpatelh added a commit that referenced this issue Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants