Skip to content
eolesin edited this page Dec 3, 2021 · 3 revisions

Carbon oxidation state investigations of whole genomes

starting from human-cleaned data, we deduplicate the reads. This information is important for read mapping and abundance estimates, but apparently when it comes to the carbon oxidation state calculations we want only non-duplicated reads.

´´´

dereplicate illumina paired-end reads

conda activate cd-hit cd-hit-est -i 02_HUMAN_Decontam/GS19-ROV16-BS04-cleanR1.fq -j 02_HUMAN_Decontam/GS19-ROV16-BS04-cleanR2.fq -o 13_CDHIT/BS04_cdhitout_R1 -op 13_CDHIT/BS04_cdhitout_R2 -M 1000000 ´´´

Clone this wiki locally