-
Notifications
You must be signed in to change notification settings - Fork 0
03_ASSEMBLY
Gather and integrate the metadata you think you may need later on in analysis. We had some sequencing information plus some sample details AND file size records of the human contamination that we removed using bbmap in the previous step. I wrote a script to join these files based on a common column name: integrate.py.
Step 1: Create a program that picks out samples based on metadata values and puts them into a comma-separated list.
Some of the information I hard-coded into the script, but put in input lines where one could alternatively ask for input. I didn't need so much flexibility for creating lists for assemblies, but maybe we will need this flexibility in future file lists.
See: pickout_input2.py
We are open (and easily able) to run different co-assemblies, but for now decided to run one assembly per site. Some of these sites have around 22 samples, others have just 5. Definitely open to hearing other strategies, but for now we decided to do the following co-assemblies:
- Lokis Castle
- All Favne (NPD field) - 22 samples
- Jan Mayen Gradient
- Bruse gradient
- Soria Moria
- Ægir
while read line; do \
Dataset=$(echo $line | cut -d" " -f1); \
R1s=$(echo $line | cut -d" " -f2); \
R2s=$(echo $line | cut -d" " -f3); \
megahit -1 $R1s -2 $R2s --min-contig-len 1000 -m 0.85 -o 03_ASSEMBLIES/$Dataset/ -t 40 ; done < METADATA/megahitSamples.txt
In 2020 Dahle group sent 60 samples for sequencing from various chimneys across the AMOR. The wiki here is to share the pipeline I used to process this dataset. The intent is to be specific about all steps involved, and to provide other lab members with this information so that they do not have to repeat the same time-consuming processes. By using my Git page, there is an added benefit of accountability and having someone to email if something doesn't work for you. :)