Skip to content

03_ASSEMBLY

eolesin edited this page Jan 17, 2021 · 8 revisions

Step 0: Metadata preparation

Gather and integrate the metadata you think you may need later on in analysis. We had some sequencing information plus some sample details AND file size records of the human contamination that we removed using bbmap in the previous step. I wrote a script to join these files based on a common column name: integrate.py.

Step 1: Create a program that picks out samples based on metadata values and puts them into a comma-separated list.

Some of the information I hard-coded into the script, but put in input lines where one could alternatively ask for input. I didn't need so much flexibility for creating lists for assemblies, but maybe we will need this flexibility in future file lists.

See: pickout_input2.py

Step 2: Use pickout_input2.py to populate a file with the datasets you want to run megahit with.

We are open (and easily able) to run different co-assemblies, but for now decided to run one assembly per site. Some of these sites have around 22 samples, others have just 5. Definitely open to hearing other strategies, but for now we decided to do the following co-assemblies:

  • Lokis Castle
  • All Favne (NPD field) - 22 samples
  • Jan Mayen Gradient
  • Bruse gradient
  • Soria Moria
  • Ægir

Step 3: Perform the co-assemblies using a bash loop with megahit:

while read line; do  \
     Dataset=$(echo $line | cut -d" " -f1); \
     R1s=$(echo $line | cut -d" " -f2);   \      
     R2s=$(echo $line | cut -d" " -f3);    \
     megahit -1 $R1s -2 $R2s --min-contig-len 1000 -m 0.85 -o 03_ASSEMBLIES/$Dataset/ -t 40 ; done < METADATA/megahitSamples.txt



Clone this wiki locally