-
Notifications
You must be signed in to change notification settings - Fork 4
Complete Documentation in single page
- Enter your gene name(s) into the ‘Search TILLING data’ box (Fig 1a).You may also search for mutant lines using the IWGSC scaffold or by blasting the sequence. Blasting the sequence may be useful for when your gene model is not complete or not present in the IWGSC reference (see below for more details).
- Select the population you wish to find mutations in. You may select Cadenza (hexaploid), Kronos (tetraploid) or both (Fig 1b). The default is to search both populations.
- Click ‘Search’ (Fig 1c). This will open up an HTML output (Fig 3).
- You may also search for all mutations present in a specific mutant line by entering the Kronos or Cadenza mutant line name. Please note that it may take between 3-8 minutes to retrieve each lines from the database and display the data in the HTML output.
- There is also an option to upload a “
.csv
” file containing a list of your terms of choice (Fig 1d).
An alternative way to search for mutations in your gene of interest is to BLAST the gene sequence. This could be useful if, for example, your gene of interest has an incomplete gene model or no gene model present in the IWGSC CSS assembly. Copy and paste your sequence into the ‘BLAST scaffolds’ box (Fig 1e) and click ‘BLAST’. The result will include:
- The visual representation of the query sequence and the hits to the database (Fig 2a).
- Table with the BLAST output (Fig 2b). Each row will be a scaffold that shows similarity to the Query input sequence. The scaffolds will either be an IWGSC CSS scaffold (e.g. IWGSC_CSS_5AL_scaff_2810281) or de novo assembled scaffolds with the “UCW_Kronos_U” or “TGAC_Cadenza_U” prefix. These ‘U’ scaffolds consist of de novo assembled reads from the exome capture and sequencing that did not align to any of the IWGSC CSS scaffolds (See Krasileva et al; Supplementary online text section 2.1). Clicking on the scaffold name (Fig 2c) will link to the HSP alignment below (Fig 2d). Beside the scaffold name is a link called “Mutations” which will hyperlink to the HTML output for the EMS mutations in the subject sequence (Fig 2e). This table is the same table that would have been obtained by searching the database using that scaffold name as described above. The E value and length of the Subject sequence are also part of the output.
- The High-scoring Segment Pair (HSP) alignment of the Query and Subject sequences. Here the “View Mutations” button will also open up the HTML output for the EMS mutations (Fig 2f).
The HTML file output will have 25 columns and one row for each mutation found in your gene(s) of interest. A description of the columns is in table 1.
Column Number | Column name | Description |
---|---|---|
1 | Scaffold | This is the name of the IWGSC CSS scaffold using the EnsemblPlants nomenclature. The name has a hyperlink to the actual FASTA sequence of the scaffold which allows users to identify the position within the genomic DNA context. |
2 | Chr | This indicates to which chromosome an IWGSC scaffold is assigned to and is a useful filtering feature. |
3 | Line | Name of the mutant line, which consists of the name of the mutant population (Cadenza or Kronos) and a four digit number (e.g Cadenza1574). |
4 | Category | EMS mutations were classified according to different categories which are detailed below |
5 | Position | The position of the SNP relative to the start of the respective IWGSC scaffold. The actual FASTA sequence is available through the hyperlink in the scaffold name (Column 1) |
6 | Chromosome position | The position of the SNP relative to the chromosome pseudomolecules hosted at EnsemblPlants (Release 27) |
7 | Rf | Reference: This is the base call at the position in the reference genome of Chinese Spring (CS) published by the IWGSC. |
8 | wt | Wildtype: This is the base call at the particular position in a non-mutagenized Wild Type control of either the hexaploid variety Cadenza or the tetraploid variety Kronos. |
9 | mt | Mutant: This is the base call at the position in the EMS-mutagenized individual from either the hexaploid variety Cadenza or the tetraploid variety Kronos. |
10 | Het/hom | Classification of the individual as heterozygous or homozygous for the mutant allele. If the wildtype coverage (WT_cov) is 0 or <15% of the total reads then the SNP is called as homozygous mutant. Otherwise the SNP is called as heterozygous. |
11 | WT cov | Wild type coverage: Number of reads with the Wild type allele at the position. |
12 | Mut cov | Mutant coverage: Number of reads with the Mutant allele at the position. |
13 | Gene | This is the name of the IWGSC gene model that is present on the IWGSC scaffold. More than one gene can be present on a single scaffold. The “.X” ending refers to the transcript variant which was used to predict the effect of the mutation. The gene name is hyperlinked to the EnsemblPlants gene page. |
14 | Consequence | Consequence of the mutation on the transcript as predicted by the Variant Effect Predictor (VEP) tool from Ensembl based on the IWGSC gene model. A full detail of the different effects predicted is described in VEP |
15 | cDNA pos | Position of the mutation relative to the cDNA of the predicted IWGSC gene model. |
16 | CDS pos | Position of the mutation relative to the coding sequence (CDS) of the predicted IWGSC gene model. |
17 | Amino acids | Indicates whether a mutation leads to the change of an amino acid in the translated protein of the predicted gene model. For UTR, intron and synonymous variants this value is empty; for non-synonymous mutations the amino acids encoded by the wild type and mutant alleles are shown. For example a value of “P/L” means that the wild type codon encoding for a proline (P) residue was mutated to a codon encoding for a Leucine (L) residue. |
18 | Codons | Sequence of the wild type / mutant codon in the mutant line. The uppercase letter indicates the base which was mutated. For example, cCa/cTa corresponds to a mutation in the middle position of the CCA codon (P) which was mutated to a CTA codon (L). |
19 | Sift score | Sorting Intolerant From Tolerant (SIFT) probability score that that the amino acid change is tolerated. Scores <0.05 are considered deleterious whereas all others are considered tolerated. SIFT was implemented through Ensembl VEP as described in Krasileva et al; Materials and Methods section 3.6. |
20 | Primer type | The predicted specificity of the designed KASP assays according to its ability to preferentially amplify a single target genome (specific), discriminate against at least one alternative genome (semi-specific) or not able to discriminate and hence amplifying all genomes or paralogues of the gene (non-specific). Specific and semi-specific assays are preferred where possible for downstream validation of the mutation. Primer were designed using http://polymarker.tgac.ac.uk/ (Ramirez-Gonzalez et al 2015). |
21 | Orientation | Plus (+) or minus (-) according to the strand orientation of wild type and alternative primers on the IWGSC scaffold (column 1) sequence. |
22 | WT primer* | Sequence of the wild type primer. Uppercase bases within the primer sequence correspond to genome specific SNPs, whereas the 3’ uppercase base corresponds to the target EMS SNP. |
23 | Alt primer* | Sequence of the alternative primer which amplifies the mutant allele. |
24 | Common | Common primer sequence for the KASP assay |
- Note: The diagnostic allelic primers do not contain probe sequences for fluorescent dyes (FAM and HEX/VIC) used in a typical KASP assay. These must be added to the 5’ end of the primers. Add the FAM probe sequence (GAAGGTGACCAAGTTCATGCT) to the 5’ end of the WT primer and VIC/HEX probe (GAAGGTCGGAGTCAACGGATT) to the 5’ end of the Alt primer.
####Exporting results:
The HTML output can also be exported as an Excel compatible “.csv
” file by clicking on the “Export
” button (Fig 3a). This allows users to download the results for a given scaffold, gene or mutant line and perform additional analysis and filtering steps in a local environment.
####Filtering:
The results can be filtered for specific terms or names by using the search box in the HTML output (Fig 3b). For example the search term “stop
” will filter the results to show only those mutations which have a “stop_gained
” consequence. To remove the filtering, press the “x” at the end of the Search box.
####Sorting:
The results for a given column can be sorted by clicking directly on the column header (Fig 3c). For example to display the mutations based on their zygocity then the “Het/Hom
” header can be clicked, or the “cDNA pos
” header can be clicked to sort the mutations based on their position within the target gene cDNA.
####Displaying columns:
The specific column to be displayed can be modified by pressing the “show/hide columns
” icon (Fig 3d) and selecting or deselecting the desired columns.
###Categories of mutations: We defined five categories of mutations to be displayed on the www.wheat-tilling.com website. The first three relate to the minimum number of reads including a mutation that were required by the MAPS pipeline to consider a mutation real. The minimum coverage (MC) to call a mutation was established independently for homozygous (Hom) and heterozygous (Het) mutants using the HomMC and HetMC parameters (See Krasileva et al; Materials and Methods section 3.2.1). We selected as the optimal threshold a minimum coverage of five mutant reads for heterozygous (HetMC5) and three for homozygous mutations (HomMC3). Additional mutations were also identified at lower coverages HetMC4/HomMC3 and HetMC3/HomMC2 but at the expense of potential errors. It is safer to use mutations identified at higher stringencies (HetMC5/HomMC3), but if a desired mutation is identified in the lower categories, it still has a good probability of being real. For a detailed discussion on the use of these categories please see Krasileva et al Main text, Supplemental Online Text section 2.3 and tables S6-S7.
- HetMC5/HomMC3: Highest stringency level. We estimated the potential error in this category to be below 1% using a series of parameters. Mutations in this category are >99% EMS-type and the % EMS error was calculated at 0.2%.
- HetMC4/HomMC3: Includes an additional ~350,000 Kronos and ~630,000 Cadenza mutants detected at heterozygous coverage of four. Mutations in this category are ~86% EMS-type in both populations and the % EMS error was calculated at <3%.
- HetMC3/HomMC2: This includes an additional ~600,000 Kronos and ~1,050,000 Cadenza mutants detected at heterozygous coverage of three and homozygous coverage of two. Mutations in this category are >56% EMS and the %EMS error was calculated at 10%.
Two additional categories were used to classify the mutations above.
4. Multi-Map (MM): This category was used to designate regions which have high sequence similarity across multiple scaffolds and hence reads from regions mapped to multiple scaffolds. These regions corresponds to recently duplicated genes, homoeologues with unusually high sequence similarity or artificially duplicated scaffolds generated during the assembly of the IWGSC reference sequence (see Krasileva et al Materials and Methods Section 3.3).
5. Residual Heterogeneity (RH): This category corresponds to SNPs which were present in multiple individuals and are most likely the result of residual heterogeneity in the initial seed stocks used for the EMS mutagenesis (see Krasileva et al Materials and Methods Section 3.4).
The www.wheat-tilling.com output displays by default only HetMC5/HomMC3 mutations. If users cannot find an adequate mutation within this category they can click the “Show lower quality mutations
” to display mutations classified as HetMC4/HomMC3, HetMC3/HomMC2, MM and RH (Fig 3e).
When selecting a mutant line, several criteria should be taken into consideration. There will be many lines with point mutations in the gene of interest; therefore, it may not be immediately obvious which line is the best to take forward. Every case will be slightly different and depend on the aim of the study but here we will describe some common usages. We recently published a review which might serve as a useful reference for general info on wheat, available wheat genomic resources available and issues surrounding polyploidy in wheat [Borrill et al 2015] (http://onlinelibrary.wiley.com/doi/10.1111/nph.13533/full).
###Type of mutation
The "Consequence
" column in the results table will describe the type of mutation predicted based on gene model. As these consequence are predicted based on the IWGSC gene model it is important to assess the annotation of the gene to ensure that this is consistent with the user’s gene model. If the IWGSC gene model is consistent with the user’s gene model, then the predictions will be fine. If the gene model is incomplete or is missing (for example in the case of UCW_Kronos_U and TGAC_Cadenza_U scaffolds, then a manual annotation will be required (see below).
In many cases, the most desirable consequence will be “stop_gained
” (premature termination codon), which will most likely result in a loss of function of the gene of interest. It is important to determine where in the protein the stop codon occurs as this might have implications in the phenotype.
If there are no “stop_gained
” mutations available in the database, then a “splice_donor_variant
” or “splice_acceptor_variant
” would be of interest. These are mutations in the GT or AG splice sites that are at the start and end of introns, respectively. A mutation that results in loss of one of these sites often leads to incorrect splicing resulting in a non-functional protein. As before, it is important to understand the context of the mutation and identify downstream GT or AG sequences that could potentially be used as alternative splice sites.
It is normally advisable to select truncation (“stop_gained
” or “splice_acceptor/donor_variants
”) mutants where possible. If there are none available then missense mutations can be pursued. These are coding sequence mutations that result in a change of amino acid. The effect of these mutations on the protein was predicted using the Sorting Intolerant From Tolerant (SIFT) algorithm which scores the probability that a particular amino acid substitution will be tolerated in the protein. It can be interpreted in the same way as a p-value where a lower SIFT score implies a low probability that the substitution will be tolerated and hence it is classifies as deleterious (e.g. a SIFT score <0.05 implies that the mutation is very likely to affect protein function). More information on how the SIFT algorithm works can be found at http://sift.jcvi.org. Alternatively, highly conserved residues within specific protein domains could be assessed for missense mutations.
It is important to invest time in defining the best mutations to be used in the crossing scheme. As described in Borrill et al 2015:
“This is especially relevant in situations of functional redundancy between homoeologues, where it is necessary to cross individual mutants to generate double or triple knockouts to observe a phenotype. The use of truncations in all homoeologues will generate a complete null across genomes, thus allowing for correct interpretation of the resulting phenotype. However, the use of missense mutations is risky if one of the mutations does not effectively abolish gene function, thereby limiting the phenotypic effect in the double/triple mutant.”
###Zygosity Mutations may be in either a homozygous or heterozygous state. Homozygous mutations have the advantage that they will be present in all seeds and so fewer seeds will need to be screened to confirm mutations. In addition phenotypes can be evaluated immediately in the plants since they should all be homozygous for the desired mutation (if the phenotype is expressed in a single mutant).
Heterozygous mutations require some additional work but have distinct advantages. The sequence information for a particular mutant comes from a single M2 plant and the seeds which are shipped to users are either M4 or M5 seed (two or three additional self-pollinated generations). Heterozygous mutations will most likely still be segregating in the M4 or M5 seed so a larger number of seeds will need to be screened to identify homozygous mutations. Despite this apparent drawback, the opportunity to identify both homozygous mutant and wild type plants provides a proper experimental control in case the user wants to assess phenotypes in this first generation. It is important to consider that some of the mutations that were heterozygous in the M2 may have been fixed in the M4 or M5 seed, whereas other mutations may have been lost through genetic drift.
In all cases it is advisable that users confirm the mutations with the designed KASP assays or alternative methods (for example Sanger sequencing) to ensure the presence of the mutation in the plants that will be phenotyped or used for crossing.
###Population We have developed two populations which we use for complementary purposes. As outlined in Uauy et al 2009:
“We use the tetraploid TILLING population to generate mutants for basic research projects because it is easier and faster to generate complete null mutants. A single generation of crosses between A and B genome mutations, followed by selection of homozygous double mutants in the F2 populations is sufficient to generate null mutants. However, when a targeted mutant has important breeding applications we screen the hexaploid TILLING population for mutations, because hexaploid wheat represents most of the wheat grown around the world (~95%).”
###Category
As stated above, the www.wheat-tilling.com output displays by default HetMC5/HomMC3 mutations. If users cannot find an adequate mutation within this category they can click the “Show lower quality mutations
” to display mutations classified as HetMC4/HomMC3, HetMC3/HomMC2, MM and RH. As in any category, users are advised to order primers and confirm the mutation in the seeds received.
###Primer availability If several mutant lines have a suitable mutation, one factor to consider is the availability of chromosome_specific KASP genotyping primers (column 20). However, the absence of genotyping primers should not stop you from selecting a mutation if it is otherwise fit for purpose as primers for KASP and alternative genotyping methods can be designed manually.
Seeds can be ordered directly through the UK Germplasm Resource Unit (GRU) website called SeedStor which has a link at the bottom of the HTML output (Fig 3f) and the bottom of the home page (Fig 1). This link will direct you to a shopping cart for the Kronos and Cadenza TILLING populations. Mutations were identified in DNA from a single M2 plant whose M3 seeds were pooled and bulked for distribution. Therefore the seeds which are distributed correspond to M4 or M5 seed (depending on seed availability).
To order seeds you need to enter the mutant line number(s) (for example Kronos1234 or Cadenza0250) into the ordering form. Seeds from both populations can be placed within one order. You can also request seeds from the wild type Kronos or Cadenza parents. The form is sent to the GRU who will process the order, which is kept confidential at all times. The developers of the populations do not have access to the orders being produced.
Request for seeds operate on a cost recovery basis for parties interested in research purposes only (£15 per line + £10 flat handling fee per shipment). This MTA allows the use of the mutants for research purposes only. This includes crossing between mutants and Cadenza and Kronos wild type plants. In cases were potential industrial application are envisaged and crosses are performed to cultivars/breeding lines different to Cadenza and Kronos, there is an MTA which confers “freedom to operate” for the mutant lines. This MTA is £200 per line (+£10 handling fee per shipment) and provides a non-exclusive right to use the mutant line and derivatives for commercial purposes. Details are described in the corresponding MTAs available in the SeedStor website.
##6. What to do next ###Germinating and growing plants: Please visit the www.wheat-training.com website which has detailed explanation relating how to germinate and the basics of growing wheat under glasshouse conditions .
###Designing crossing schemes: Many of the mutations will need to be back-crossed to reduce the mutation load or inter-crossed to produce double and triple mutants for the corresponding homoeologues. We describe different crossing schemes and approaches in the www.wheat-training.com website.
##7. What to do if you have an incomplete gene model As mentioned above, the predicted consequence of a mutation is based on the IWGSC CSS gene model. Therefore, if the gene model is incomplete, missing or incorrect then the predicted consequence of a mutation may not be adequate. Users can manually annotate their gene model based on the IWGSC or the UCW_Kronos_U and TGAC_Cadenza_U scaffolds and then used this to determine the effect of the mutations on their alternative or improved gene model. We explain this process below and in the wheat-training website.
###Use of CODDLE and PARSESNP to identify mutations CODDLE and PARSESNP are two programs which can be used to predict the consequence of mutations according to the improved gene model.
- Obtain the scaffold name of the IWGSC CSS, Kronos_U or Cadenza_U scaffold that the sequence is on.
- Use the scaffold name as the search query on www.wheat-tilling.com to download the excel file with all the mutations in this scaffold.
- Use this to create a variants text file: this should consist of two columns. The first column should contain all the mutation positions following the format REF base, scaffold position, MUT base. The second column should contain the name of the mutant line containing this mutation. See Fig 4 for an example. This should be saved as a '
.txt
' file. - Hint: to generate the information for column 1 you can concatenate the '
ref
', 'pos
' and 'mt
' columns from the exportedcsv
file. Paste the formula “=concatenate(H2,E2,I2)
” into the excel output to produce this format for the first mutation in row 2.
- Download the genomic sequence of the whole IWGSC CSS scaffold by clicking on the name of the scaffold in the HTML output table
- Make a coding sequence for your gene using fragments of the genomic sequence. It is very important that there are no SNPs between the genomic sequence and coding sequence.
- Go to the CODDLE website
- Copy and paste your genomic sequence into the ‘
Submit genomic sequence
’ box (Fig 5a) - Copy and paste your CDS into the ‘
Submit cDNA sequence
’ box (Fig 5b) - Click ‘
Begin Processing
’ (Fig 5c) - You will first be taken to a page with predicted conserved blocks in your protein, click the ‘
Proceed with PARSESNP
’ button
- On the PARSESNP page upload your variants text file (point 3 above; Fig 6a), set the ‘
No. of variants to enter by hand
’ (Fig 6b) to 0 and click ‘PARSE-SNPs in Your Gene
’ (Fig 6c) - On the next page (your variants file displayed in a table on the webpage) click ‘
submit
’. - The result will be a table with information including the predicted effect of all the mutations in your variants file on your gene of interest.
- From here you can proceed with selecting your mutant line as normal and then ordering seeds through SeedStor.