# Quality check and filtering with fastp

An alternative to `fastqc` is another program called [fastp](https://github.com/OpenGene/fastp#simple-usage). This program allows you to both examine the quality of your reads as well as trim reads simultaneously

Our untrimmed reads are in `/home/gea_user/fastq`

In [None]:
ls /home/gea_user/data/pre-imported/sra-fastq

We can run fastp specifying the following options: 

- `-i`: name of our input file (we will have a filename variable called `$input`)
- `-o`: name of our output file (we will have a filename variable that will be `filename_fastp_trimmed.gz`)
- `-q`: quality phred limit - the [phred](https://en.wikipedia.org/wiki/Phred_quality_score) quality value that a base is qualified (we will set this to a score of 30 - `-q 30`).
- `-l`: Minimum length requirement (we will set this to 75 nucleotides - `-l 75`)
- `-h`: Produce a report in the html output (we will have a file name variable `$input` and we will name our report with the filename - `-h $filename_fastp_trimmed.gz_report.html`
- `-w`: Threads (we will set this to 8 - `-w 8` since CyVerse nodes will typically have 8 CPUs)

In [None]:
for input in /home/gea_user/data/pre-imported/sra-fastq/*.fastq.gz
 do
 output=$(basename --suffix=.fastq.gz $input)_fastp_trimmed.gz
 echo $input
 htmlreporttitle=$output\_report.html
 jsonreporttitle=$output\_report.json
 fastp -h $htmlreporttitle\
  -j $jsonreporttitle\
  -i $input\
  -o $output\
  -q 30\
  -l 75\
  -w 8
 done

We now have 18 output files - the `.html` files contain a report with tables and graphs and the `.json` files are a text-based format with similar summary information. Let's organize the files by creating new directories for them. The files ending in `_fastp_trimmed.gz` are the sequence reads that have been trimmed and are ready for alignment with Kallisto. 

In [None]:
mkdir -p /home/gea_user/rna-seq-project/fastp-results
mkdir -p /home/gea_user/rna-seq-project/fastp-trimmed
mv *.html /home/gea_user/rna-seq-project/fastp-results
mv *.json /home/gea_user/rna-seq-project/fastp-results
mv *_fastp_trimmed.gz /home/gea_user/rna-seq-project/fastp-trimmed

The analysis reports from `fastp` are here:

In [None]:
ls /home/gea_user/rna-seq-project/fastp-results

The trimmed reads from `fastp` are here (and ready to be used in the Kallisto notebook):

In [None]:
ls /home/gea_user/rna-seq-project/fastp-trimmed