Skip to content

09_GENOME_DEREPLICATION

eolesin edited this page Jun 8, 2021 · 21 revisions

Run anvi-summarize on the dastool consensus bins

Do this to get the genomes out of anvio and into FASTA format.

for i in `cat AMOR_2020_Good`; do anvi-summarize -p ${MERGED_PATH}${i}-MERGED-PROFILE/PROFILE.db -c ${CONTIG_PATH}${i}/${i}.prefixed.contigs.db --reformat-contig-names -C dastool -o $i; done

Rename the bin contigs

Include the original sample name in addition to the bin info. Within the ANVI-SUMMARIZE output folder:

for i in `cat AMOR_2020_Good`; 
    do 
    bin_names=$(ls ${i}/bin_by_bin)
    for j in $bin_names; do sed -i "s/>/>${i}_/g" ${i}/bin_by_bin/${j}/${j}-contigs.fa; done
done

Create full file path list of all genomes.

# Within the ANVI-SUMMARIZE output folder:
for i in pwd; do find ~+ -type f -name "*-contigs.fa"; done >> genome_paths

Clone this wiki locally