-
Notifications
You must be signed in to change notification settings - Fork 0
05_COMAPPING
I decided to go back and reformat the contigs so that each contig had a prefix indicating which sample it belonged to. The reasoning for this is that we might be able to have a central Anvio profile which includes contigs from all samples.
One issue I saw was that using a cutoff threshold of 2500 was eliminating 70% of the contigs in the sample. This could be a huge disadvantage, as we are essentially throwing away a majority of the data at this step. But I tried it anyway.
for SET in `cat /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/AMOR_2019`; \
do anvi-script-reformat-fasta $SET/$SET.final.contigs.fa -l 2500 \
--simplify-names -o $SET/$SET.fa; anvi-gen-contigs-database \
--num-threads 40 -f $SET/$SET.fa -o $SET/$SET.contigs.db; \
done
# redid the contig databases after reformatting the contigs from each sample with
# their respective sample name prefixes (changes "-" to "_" in these names):
for SET in `cat /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/AMOR_2020_Good`; \
do bar=${SET//-/_}; anvi-script-reformat-fasta $SET/final.contigs.fa -l 1000 \
--simplify-names --prefix c_$bar -o $SET/$SET.prefixed.fa; \
done
for SET in `cat /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/AMOR_2020_Good`; \
do anvi-gen-contigs-database --num-threads 40 -f $SET/$SET.prefixed.fa \
-o $SET/$SET.prefixed.contigs.db& \
done
2. Second, I made a concatenated contigs file that includes all contigs of all samples we want to use in this study.
I concatenate the contig files using the prefixed contig files in each sample subdirectory. A disadvantage of this method is that there are probably many reads that are able to map to many contigs, and this could make the data "noisy and difficult" https://groups.google.com/g/anvio/c/G-PXEjqcbmc?pli=1.
find -type f -name '*prefixed.fa' -exec cat {} + > merged.contigs.fa
anvi-script-reformat-fasta -l 2500 merged.contigs.fa -o merged.contigs2500.fa
anvi-gen-contigs-database --num-threads 40 -f merged.contigs2500.fa -o merged.contigs2500.db
This is what I would call "co-mapping" rather than "co-assembly". This is for the sake of improving differential coverage information for binning later.
# Build the bowtie2 databases
for SET in `cat AMOR_2020_Good`; do bowtie2-build 03_INDIV_ASSEMBLY/$SET/$SET.prefixed.fa 05_COMAPPING/$SET --threads 20; done
# Create folders for each of the mappings to live, named for the sample used for the assembly mapped to.
for i in $samples; do mkdir $i; done
# Perform the mapping. Each sample has its folder. Each folder contains mapping data for all samples.