First of all, if not active, activate the artic-ncov2019 conda environment:
conda activate artic-ncov2019
Then use the command:
artic guppyplex
with the following parameters:
What? | parameter | Our value |
---|---|---|
The input directory containing the reads | --directory | ~/workdir/data_artic/basecall_01/ |
The output file | --output | ~/workdir/data_artic/basecall_filtered_01.fastq |
Minimum read length | --min-length | 400 |
Maximum read length | --max-length | 700 |
(optional) Skip quality check | --skip-quality-check |
Since the quality check has been done along with the basecalling, we can use the flag --skip-quality-check
. That will improve runtime, but does not really change much.
To perform the filtering for one dataset, we can use the following command:
artic guppyplex --skip-quality-check --min-length 400 --max-length 700 --directory ~/workdir/data_artic/basecall_01/ --output ~/workdir/data_artic/basecall_filtered_01.fastq
Perform that step for the first (01) dataset only to save time. Do the other datasets later, when there is time left.
If you wanted to do that for all datasaets, you could do that in a loop:
for i in {1..5} do artic guppyplex --skip-quality-check --min-length 400 --max-length 700 --directory ~/workdir/data_artic/basecall_0$i --output ~/workdir/data_artic/basecall_filtered_0$i.fastq done
In the next step, we use the filtered reads to generate consensus sequences.
ARTIC bioinformatics SOP https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html