05_COMAPPING

0. Create nice deflines and build the contigs databases for both the 2019 and 2020 samples.

I decided to go back and reformat the contigs so that each contig had a prefix indicating which sample it belonged to. The reasoning for this is that we might be able to have a central Anvio profile which includes contigs from all samples.

One issue I saw was that using a cutoff threshold of 2500 was eliminating 70% of the contigs in the sample. This could be a huge disadvantage, as we are essentially throwing away a majority of the data at this step. But I tried it anyway.

1. At first, I made individual contig databases for each of the samples

for SET in `cat /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/AMOR_2019`; \
    do anvi-script-reformat-fasta $SET/$SET.final.contigs.fa -l 2500 \
    --simplify-names -o $SET/$SET.fa; anvi-gen-contigs-database \
    --num-threads 40 -f $SET/$SET.fa -o $SET/$SET.contigs.db; \
done

# redid the contig databases after reformatting the contigs from each sample with
# their respective sample name prefixes (changes "-" to "_" in these names):
for SET in `cat /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/AMOR_2020_Good`; \
    do bar=${SET//-/_}; anvi-script-reformat-fasta $SET/final.contigs.fa -l 1000   \
    --simplify-names --prefix c_$bar -o $SET/$SET.prefixed.fa;  \
done

for SET in `cat /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/AMOR_2020_Good`; \
    do anvi-gen-contigs-database --num-threads 40 -f $SET/$SET.prefixed.fa \ 
    -o $SET/$SET.prefixed.contigs.db& \
done

2. Second, I made a concatenated contigs file that includes all contigs of all samples we want to use in this study.

I concatenate the contig files using the prefixed contig files in each sample subdirectory. A disadvantage of this method is that there are probably many reads that are able to map to many contigs, and this could make the data "noisy and difficult" https://groups.google.com/g/anvio/c/G-PXEjqcbmc?pli=1.

find -type f -name '*prefixed.fa' -exec cat {} + > merged.contigs.fa

anvi-script-reformat-fasta -l 2500 merged.contigs.fa -o merged.contigs2500.fa
anvi-gen-contigs-database --num-threads 40 -f merged.contigs2500.fa  -o merged.contigs2500.db

3. Third, I went ahead and mapped all reads from all samples to each individual assembly.

This is what I would call "co-mapping" rather than "co-assembly". This is for the sake of improving differential coverage information for binning later.

# Build the bowtie2 databases
for SET in `cat AMOR_2020_Good`; do bowtie2-build 03_INDIV_ASSEMBLY/$SET/$SET.prefixed.fa 05_COMAPPING/$SET --threads 20; done

# Create folders for each of the mappings to live, named for the sample used for the assembly mapped to.

samples=$(echo "$(cat /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/AMOR_2020_Good)")
for i in $samples; do mkdir $i; done


# Perform the mapping. Each sample has its folder. Each folder contains mapping data for all samples.
while read line;
	do
	SET=$(echo $line);
	samples=$(echo "$(cat AMOR_2020_Good)");          	# return all entries in file
    	delimiter=",";          
    	declare -a Smparray=($(echo $samples | tr "$delimiter" " "));         # create sample name array
    	for samp in "${Smparray[@]}";
    	do
	bowtie2 --threads 40 \
	-x 05_COMAPPING/$SET \
	-1 02_HUMAN_Decontam/$samp-cleanR1.fq \
	-2 02_HUMAN_Decontam/$samp-cleanR2.fq \
	--no-unal \
	-S 05_COMAPPING/$SET/$samp.sam;
	samtools view -F 4 -bS 05_COMAPPING/$SET/$samp.sam > 05_COMAPPING/$SET/$samp-RAW.bam&& 
            samtools sort 05_COMAPPING/$SET/$samp-RAW.bam -o 05_COMAPPING/$SET/$samp.bam;     
            samtools index 05_COMAPPING/$SET/$samp.bam;     
            rm 05_COMAPPING/$SET/$samp.sam 05_COMAPPING/$SET/$samp-RAW.bam;    
        done;  
    done < AMOR_2020_Good

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05_COMAPPING

0. Create nice deflines and build the contigs databases for both the 2019 and 2020 samples.

1. At first, I made individual contig databases for each of the samples

2. Second, I made a concatenated contigs file that includes all contigs of all samples we want to use in this study.

3. Third, I went ahead and mapped all reads from all samples to each individual assembly.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally