-
Notifications
You must be signed in to change notification settings - Fork 0
/
06-Genome.Rmd
executable file
·59 lines (43 loc) · 3.38 KB
/
06-Genome.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
```{r setupgenome, include=FALSE}
rm(list = ls()) ; invisible(gc()) ; set.seed(42)
library(knitr)
library(tidyverse)
library(raster)
library(leaflet)
library(sf)
theme_set(bayesplot::theme_default())
opts_chunk$set(
echo = F, message = F, warning = F, fig.height = 6, fig.width = 8,
cache = T, cache.lazy = F)
```
# Genome assembly
*Assembly of high-quality annotated reference genomes for three trees of three different species.*
## Sequencing
* CNRGV, Toulouse
* Genomic reads with PacBio HiFi
* Optical maps with Bionano
* Illumina HiSeq/NovaSeq
## Assembly workflow
### V1
The first assembly will be produced by William Marande in the CNRGV.
### V2
A second version can be produced using a `singularity` and `snakemake` workflow developed by Ludovic Duvaux (possible collaboration).
The wrokflow uses the following steps:
1. converting `bam` files to `fastq`: converting HiFi reads to regular fastq with the package `simlink`
1 assembly of contigs: using `hifiasm` first but step to test additional assembler such as `canu` and `spades`
1. removing haplotigs: [`purge_dups`](https://github.com/dfguan/purge_dups) and `purge_haplotigs`
1. optical maps: proprietary program of bionano & manual curation by CNRGV to obatin a haploid optical map
1. gap closing: using Illumina reads with [`SOAPdenovo2`](https://sourceforge.net/projects/soapdenovo2/files/GapCloser/bin/r6/)
1. quality check:
* [`Jellyfish`](https://github.com/gmarcais/Jellyfish)
* [KAT plot](https://www.researchgate.net/publication/309161336_The_present_and_future_of_de_novo_whole-genome_assembly)
* BUSCO
* synteny:
* Dgeanies
* jupiterPlot
* quast
## Annotation workflow
Automatic singularity & snakemake workflow to annotate genomes: https://github.com/sylvainschmitt/genomeAnnotation .
## Project description
> In each species, using a single genotype, such as leaves from a single branch, we will extract DNA and obtain long reads -2 runs of each Mini and Prometh ION targeting a total coverage of 100x -and short reads -sequencing one of the libraries to be used for the detection of mutations (below) using Illumina NovaSeq6000 technology and targeting a minimum coverage of 100x. The sequencing services will be subcontracted to CEA Genoscope, Evry. Ideally, we would use cambium tissue to produce the reference genomes, however, since a limited amount of cambium tissue can be sampled per tree, we will use leaf tissue. Assembling the quality-filtered reads from both sequencing technologies will provide long scaffolds (Johnson et al., 2019). In each species, we will build an “optimal map” of the genome using high-resolution restriction maps from single, labelled molecules of DNA obtained through Bionano technology (subcontracted to CNRGV, Toulouse). In combination with the previous scaffolds this will provide a chromosomal assembly.The three genomes will be annotated using automated pipelines (Bolger, Arsova, & Usadel, 2018).
> Seedling original genotype (T4.1): Short read sequencing data will be demultiplexed, quality-filtered, and mapped to the reference genomes. The “zero mutation reference” genotype will be constructed from the consensus genotype calls obtained from short read sequencing of the three cambium samples using a standard variant calling approach such as GATK (“Best Practices for Variant Calling with the GATK,” 2015).