Skip to content

Read length matters: identifying the phiStx2 att site

PeterSlickers edited this page Jun 16, 2011 · 17 revisions

HPA has releases a third WGS sequence of an isolate of a 2011 HUSEC outbreak strain. Sequencing was done on a 454 GS Junior system which typically yields read length of 400-500 nt. This is 4-5fold longer than with either Ion Torrent or Illumina. The initial assembling of the data comprises only 13 contigs with the biggest contig being longer than 2.6 Mb or half the chromosomal size.

With this assembling it is possible to umambiguously identify the location of the stx2 carrying prophage in the chromosome. The stx2 carrying phage is integrated into the wrbA gene, a well known attachment site for stx carrying bacteriophages. The attachment site is located in the first (and largest) contig of the HPA assembling. Due to attachment of the phage the wrbA becomes interrupted.

The phage attachment site within the wrbA gene is not occupied in strain 55989 (CU928145.2).

Preliminary feature table:

wrbA       scaffold00001   498032  498591  -1 truncated
int_wrbA   scaffold00001   498607  499941  -1
xis_wrbA   scaffold00001   499970  500269  -1
stxA2      scaffold00001   522514  523473   1 
stxB2      scaffold00001   523485  523754   1
B7MQ08     scaffold00001   549237  550503   1
B7MQ09     scaffold00001   550573  558954   1
wrbA       scaffold00001   559460  559513  -1 truncated

The site-specific integrase and exisionase genes are located at one end of the prophage.

Two bacteriophagal genes of unknown function (highly similar to Unigene accession B7MQ08,B7MQ09 from ED1a) reside next to the other end of the prophage. Allelic variants of B7MQ08,B7MQ09 are also found at a second site within scaffold00001, thus pointing to another prophage insertion site. B7MQ09 encodes a large protein with 2793 aa, which may be a phage structural protein.

The multisequence alignment of B7MQ09 shows that several contigs from the AFOB and AFOG assemblies matched this large gene only partially. One can easily guess that the assembling process for short reads gets fooled due to the presence of two sligthly different targets.

The approximate size of prophage phiStx2 is:

559460-498591 = 60869 nt

A multisequence alignment of the wrbA gene clearly reveals that the Life-Technologies AFOB00000000 assembling is not reliable at the attachment site. Is contains three contigs matching the wrbA gene, two of them (AFOB01000188.1,AFOB01000143.1) representing the interrupted gene, while the third contig (AFOB01000030.1) shows an uninterrupted gene (though with a distorted reading frame). The third contig must now be said to be an artefact of the assembling process where CU928145.2 (which comprises the uninterupted wrbA gene) was used as a skeleton.

I'm wondering if we will see conventional Sanger sequencing in the next round of this sequencing race.

The Illumina way

BGI has solved the problem of assembling small reads by creating an additional set of Illumina paired end reads (spacer sizes 500bp, 2kb, 6kb, see [https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki/Assemblies]). In this way the BGI was able to assemble the first complete genome of the 2011 outbreak strain.

As with the HPA assembling, the prophage habouring stx2 is integrated into the wrbA gene.

wrbA       TY-2482_chromosome  5166557  5167116 -1  truncated
int_wrbA   TY-2482_chromosome  5167132  5168466 -1
xis_wrbA   TY-2482_chromosome  5168495  5168794 -1
stxA2      TY-2482_chromosome  5191039  5191998  1
stxB2      TY-2482_chromosome  5192010  5192279  1
B7MQ08     TY-2482_chromosome  5217762  5219027  1
B7MQ09     TY-2482_chromosome  5219097  5227478  1
wrbA       TY-2482_chromosome  5228139  5228181 -1  truncated