Skip to content

Step by step running susCovONT

Marit Hetland edited this page Mar 30, 2021 · 1 revision

This is a step-by-step guide for how to run the pipeline and transfer the resulting files to another server for storage. This was written for use of lab personell at susamr and may be useful for people who are not particularly familiar with the command line. Otherwise, please just use the normal How to run instructions.

Running susCovONT on a Linux computer

When you have copied the fast5_pass and fastq_pass folders and the sequencing_summary*.txt file to the Linux (See GridION: Transferring files):

  1. Set the name of the run folder (RUN_NAME) and the path were you run your COVID runs as variables:
RUN_NAME=20210213_1359_X5_FAO88697_5cf6e6f0     ### Change to your current run name
ONT_COVID_PATH=/media/susamr/maggie/ONT_covid   ### Change to the path where you run all susCovONT run folders
  1. Go to that location on your Linux
cd ${ONT_COVID_PATH}/${RUN_NAME}/
  1. Then run the susCovONT pipeline:
susCovONT -i . -s sample_names.csv

Where -i . is the input directory you just cd-ed in to in step 2 with name ${RUN_NAME} (for example: 20210213_1359_X5_FAO88697_5cf6e6f0) and -s sample_names.csv is a file that connects barcodes with sample names, which should look like:

barcode,sample_name
barcode01,NK
barcode02,V123456_P1
barcode03,E234567_P1

You can make the sample_names.csv file in two ways: Either in Excel where you save it as CSV (comma separated) or you can do it in the terminal:

nano sample_names.csv

Type in the header "barcode,sample_name" and then the keys (barcode01, NK etc). When you are finished in nano, click ctrl+X and then Y to save the output.

After running susCovONT

When susCovONT has successfully been run and you’ve got your output report, move the files to be stored on your server of choice. This assumes you can connect to the server via samba from the Linux.

  1. On the Linux, go into the folder on the server where you store the completed Covid-runs, for example:
cd /run/user/1000/gvfs/smb-share:server=yourserver.net,share=location/folder/Covid_genomics/CovSeq_2021/

(or open the folder in Files and right-click + "Open in Terminal")

  1. If not already created, make a folder for the run:
mkdir ${RUN_NAME}
  1. Then copy the files (change the path and run_name to fit your current run)
rsync -chavzP --stats --no-perms --omit-dir-times ${ONT_COVID_PATH}/${RUN_NAME}/*  ./${RUN_NAME}/* 
  1. When the files have been copied to the server, you can remove them from the Linux:
rsync -chavzP --stats --no-perms --omit-dir-times --remove-source-files ${ONT_COVID_PATH}/${RUN_NAME}/*  ./${RUN_NAME}/*

When the files have successfully been stored on your server (you should check with ls fast5_pass and cat fast5_pass/*/*5 | head that the files have transferred and are not empty), you can delete the run folder from that run on the GridION and Linux.

Update master FASTA and REPORT files

In the CovSeq_2021 directory there are three files:

  • A REPORT.CSV (CovSeq_2021__all_report.csv) file with the output-report (lineage, qc, etc.) for all genomes
  • A REPORT.CSV (CovSeq_2021__nonFAIL_report.csv) file with the output-report (lineage, qc, etc.) for all PASS or WARN genomes (i.e. not FAIL)
  • A FASTA (CovSeq_2021__nonFAIL.fasta) file with all genomes that have QC_status PASS or WARN (i.e. not FAIL)

The last thing to do is to update these files. Do this by:

Update the REPORT.CSV files with the latest run:

cd /run/user/1000/gvfs/smb-share:server=yourserver.net,share=location/folder/Covid_genomics/CovSeq_2021/
cat ${RUN_NAME}/*_report.csv | grep -v "^Sample" >> CovSeq_2021__all_report.csv
cat ${RUN_NAME}/*_report.csv | grep -v "^Sample" |  grep -v "FAIL" >> CovSeq_2021__nonFAIL_report.csv

Update the FASTA file with the latest run:

cd /run/user/1000/gvfs/smb-share:server=yourserver.net,share=location/folder/Covid_genomics/CovSeq_2021/
cat ${RUN_NAME}/003_consensusFasta/*_sequences.fasta >> CovSeq_2021__nonFAIL.fasta

Remake the MASTER files

If you accidentally delete or overwrite either of these files, you can easily remake them with the commands below. You do not need to change any paths or names here, so just copy and paste the code:

cd /run/user/1000/gvfs/smb-share:server=yourserver.net,share=location/folder/Covid_genomics/CovSeq_2021/

#Remove the files that are to be recreated
rm CovSeq_2021__all_report.csv CovSeq_2021__nonFAIL_report.csv CovSeq_2021__nonFAIL.fasta

#Create report for all runs
cat */*_report.csv | head -1 >> CovSeq_2021__all_report.csv
cat */*_report.csv | grep -v "^Sample" >> CovSeq_2021__all_report.csv

#Then create report of those with only PASS and WARN QC
cat CovSeq_2021__all_report.csv |  grep -v "FAIL" >> CovSeq_2021__nonFAIL_report.csv

#Create FASTA file with all of the PASS and WARN genomes:
cat */003_consensusFasta/*_sequences.fasta >> CovSeq_2021__nonFAIL.fasta