# Genome Sequencing and Analyses of Ralstonia solanacearum Phylotype I Strains FJAT-91, FJAT-452 and FJAT-462 Isolated from Tomato, Eggplant, and Chilli Pepper in China

Abstract: Ralstonia solanacearum is an extremely destructive pathogen able to cause disease in a wide range of host plants. Here we report the draft genome sequences of the strains FJAT-91, FJAT-452 and FJAT-462, isolated from tomato, eggplant, and chilli pepper, respectively, in China. In addition to the genome annotation, we performed a search for type-III secreted effectors in these strains, providing a detailed annotation of their presence and distinctive features compared to the effector repertoire of the reference phylotype I strain (GMI1000). In this analysis, we found that each strain has a unique effector repertoire, encoding both strain-specific effector variants and variations shared among all three strains. Our study, based on strains isolated from different hosts within the same geographical location, provides insight into effector repertoires sufficient to cause disease in different hosts, and may contribute to the identification of host specificity determinants for R. solanacearum. 

#### Pipeline: Firstly, we de novo assembled genomes of China strains and annotated the genome. Secondly, we analysed Type III Effectors repertoire in our strains and compared them to the reference genome.

## Part One. De novo genome assembly, annotation and classification of China bacteria strains

### Firstly, genome assembly with SOAPdenovo2, and close the gap with GapCloser. Check the quality of assembly with quast. Meta-assembly was performed with reference genome GMI1000 using CONTIGuator.

```sh
cd ~/sunyd/albert_bacteria/process
bash ~/sunyd/albert_bacteria/script/genome_assembly.sh ~/sunyd/albert_bacteria
bash ~/sunyd/albert_bacteria/script/scaffold_assembly.sh ~/sunyd/albert_bacteria
```

Label|FJAT-91 Size (Mb)|FJAT-452 Size (Mb)|FJAT-462 Size (Mb)|Topology
-----|-----|-----|-----|-----:
Chromosome|2.86|3.29|3.06|circular
Plasmid|1.41|1.61|1.53|circular
Scaffolds|0.35|0.43|0.5|linear
Table 1. Summary of genome: one chromosome and one plasmid.

### Secondly, genome annotation was performed using Prokka with the option for non-coding RNA (ncRNA) search. The COG database and Pfam v30.0 were used for functional annotation of genes.

```sh
bash ~/sunyd/albert_bacteria/script/prokka.sh ~/sunyd/albert_bacteria/process
bash ~/sunyd/albert_bacteria/script/prokka_result_calculation.sh ~/sunyd/albert_bacteria/process
```

Attribute|FJAT-91||FJAT-452||FJAT-462||
-----|-----|-----|-----|-----|-----|-----:
|Value|% of total|Value|% of total|Value|% of total
Genome size (bp)|4620128|100|5334434|100|5083617|100
DNA coding (bp)|3003037|65|3696229|69.29|3397556|66.83
DNA G + C (bp)|2799660|60.6|3324908|62.33|3123835|61.45
DNA scaffolds|329|100|309|100|358|100
Total genes|6522|100|6729|100|6758|100
Protein coding genes|6457|99|6658|98.94|6696|99.08
RNA genes|65|1|71|1.06|62|0.92
Genes with function prediction|2544|39.4|3075|46.19|2855|42.64
Genes assigned to COGs|2714|42.03|3263|49.01|3046|45.49
Genes with Pfam domains|2361|36.56|2948|44.28|2674|39.93
Genes with signal peptides|270|4.18|334|5.02|303|4.53
Genes with transmembrane helices|291|4.51|349|5.24|311|4.64
Table 2. Genome and annotation statistics of the three newly sequenced Ralstonia solanacearum strains.

### Thirdly, the species tree built.

#### For assembly strains, we used tblastn and biopython to extract the aligned sequences; for other strains, we used blastp and blastdbcmd to extract the aligned sequences. Then we used muscle to make an alignment,  Gblocks to delete unaligned sequences, and build a phylogenetic tree with phyml.

```sh
bash ~/sunyd/albert_bacteria/script/prepare_genome_others.sh
bash ~/sunyd/albert_bacteria/script/species_tree.sh ~/sunyd/albert_bacteria
```

<img src="https://github.com/fengkuangbaozha/bacteria_assembly_T3E/blob/fig/paper_fig/fig2.png?raw=true" style="width: 600px;"/>
Figure 2. Phylogenetic tree showing the position of the Ralstonia solanacearum sequenced in this study, relative to other sequenced strains from the same species.

## Part Two. Compare the repertoire of type-III effectors between China strains and the reference strain.

### Firstly, identify type-III effectors in China strains and compare with the reference genome.

#### Type III effectors in the three newly sequenced strains were identified and annotated in two steps: First, 52, 62 and 60 of the T3Es from the R. solanacearum species complex were identified in FJAT-91，FJAT-452 and FJAT-452, respectively, based on Prokka annotations. Second, the assembled genome sequences of three strains were BLAST to T3Es protein sequences with the e-value cut-off of 1e-30, and 72, 78 and 75 T3Es are identified in FJAT-91, FJAT-452 and FJAT-462, respectively.

```sh
bash ~/sunyd/albert_bacteria/script/t3e_identification.sh ~/sunyd/albert_bacteria/process
```

|FJAT-91|FJAT-452|FJAT-462
-----|-----|-----|-----:
Number of T3E genes by Prokka annotation |52|62|60
Total T3E genes after homology search|72|78|75
Number of T3E genes not present in GMI1000|2|4|4
Annotation of T3E genes not present in GMI1000|RipAL,ripF2|RipAL,ripF2,ripS7,hyp7|RipAL,ripBE,ripF2,hyp7
Number of T3E genes in GMI1000 but not found in newly sequenced strain|7|3|6
Annotation of T3E genes in GMI1000 but not found in newly sequenced stain|RipAG,ripAI,ripM,ripP3,ripS4,ripY,hyp16|RipM,ripP3,hyp16|RipAI,ripAM,ripM,ripP3,ripS4,hyp16
Table 3. Annotation and comparison of Type III effector genes in the three newly sequenced strains. 

### Secondly, extract new type III effectors sequences for experimental validation.

```sh
bash ~/sunyd/albert_bacteria/script/new_t3e_check.sh ~/sunyd/albert_bacteria/process
```

<img src="https://raw.githubusercontent.com/fengkuangbaozha/bacteria_assembly_T3E/fig/paper_fig/fig3.png" style="height: 600px;"/>
Figure 3.  Schematic diagram of ripBE (a), hyp7 (b) and ripAL (c) sequence alignment. The nucleotide sequence of these genes in the strains sequenced in this study is 100% identical to each other, except for hyp7, which has an insertion annotated as a transposase in FJAT-462 (numbers indicate the insertion site). The percentage of identity compared to the orthologs in other sequenced strains is indicated in the figure.

### Thirdly, Compare common type III effectors variations (SNPs and INDELs) between China strains and GMI1000.

#### 1. Download Ralstonia solanacearum GMI1000  genome and annotation from EMBL and T3E sequences from ralstoT3E website.

```sh
bash ~/sunyd/albert_bacteria/script/prepare_t3e_GMI1000.sh
```

#### 2. Analyze SNPs and INDELs between China strains and GMI1000. Firstly, clean reads with SolexaQA and cutadapt. Secondly, map the reads to GMI1000 genome with BWA. Thirdly, identify SNPs and INDELs and only keep high quality ones with samtools. Forthly, annotate variation with snpEff. Fifthly, compare variation among three China strains. Sixthly, annotate the variation type.

```sh
bash ~/sunyd/albert_bacteria/script/snpeff_GMI1000.sh #####build the effector gtf, an independent database in snpEff
bash ~/sunyd/albert_bacteria/script/genomic_variation.sh ~/sunyd/albert_bacteria/process
```

Type|FJAT-91|FJAT-452|FJAT-462
-----|-----|-----|-----:
All variants|652|798|692
Missense variant|268|350|299
Synonymous variant|378|439|382
Frameshift variant|2|4|6
Inframe deletion|1|1|2
Inframe insertion|3|3|2
stop codon gain|0|0|1
stop codon loss|0|1|0
Table 4. Numbers and types of sequence variations (SNPs and INDELs) identified in the Type III effector genes (SNPs and INDELs) between three newly sequenced strains and the reference stain GMI1000 

<img src="https://github.com/fengkuangbaozha/bacteria_assembly_T3E/blob/fig/paper_fig/fig4.png?raw=true" alt="Drawing" style="width: 400px;"/>
Figure 4. Venn diagram of T3E gene variants identified in FJAT-91, FJAT-452 and FJAT-462 when compared to the reference stain GMI1000.

```sh
bash ~/sunyd/albert_bacteria/script/check_variation.sh ~/sunyd/albert_bacteria/process
```

<img src="https://github.com/fengkuangbaozha/bacteria_assembly_T3E/blob/fig/paper_fig/fig5.png?raw=true" alt="Drawing" style="width: 600px;"/>
Figure 5. Examples of shared sequence variations in ripA1, ripAZ1 and ripX genes among three newly sequenced stains compared to the reference strain GMI1000. The right panel shows the resulting alterations in the amino acid sequences of the T3E proteins.

### Notice: our partner is responsible for the genome sequencing and following experimental validation.