Dan Gatti and Belinda Cornes have updated the annotation files for mouse genome build GRCm39. These also include sex-averaged genetic map positions for an updated version of the Cox et al. (2009) map; see CoxMapV3. (Note that the CoxMapV3 maps were corrected on 2023-03-17 using the original crimap software rather than the "improved" version of crimap, and that we've further "smoothed" the maps slightly to avoid segments with 0 recombination.)
-
GigaMUGA:
gm_uwisc_v4.csv
,gm_uwisc_dict_v4.csv
-
MegaMUGA:
mm_uwisc_v4.csv
,mm_uwisc_dict_v4.csv
-
MiniMUGA:
mini_uwisc_v5.csv
,mini_uwisc_dict_v5.csv
-
Original MUGA:
muga_uwisc_v4.csv
,muga_uwisc_dict_v4.csv
I had identified a number of potential problems in the GigaMUGA annotation file, as well as discrepancies between the GigaMUGA and MegaMUGA files. I suspect that some of the columns in the GigaMUGA annotation file from UNC have been at least partially scrambled.
I emailed GeneSeek to get files with the probe sequences on the
arrays, and on 2018-11-02 I received a .xlsx
file by email from Ben
Pejsar, Genomic Market Development Manager, Neogen GeneSeek
Operations.
My goals were to:
-
blast the sequences for all markers in each of the arrays against the mouse genome
-
figure out which SNPs have a single hit in the mouse genome, and to where
-
compare the sequences and probe locations, and the markers with multiple hits, to the UNC annotation file
Summary of findings:
-
the
unique
column in the UNC annotation file for the GigaMUGA array was messed up. -
we should use
NA
for chromosome and position of markers whose probe does not have a single perfect match in the mouse genome assembly -
for a small number of markers (the transversions, with two-bead Illumina probes), the probe sequence in the GeneSeek file includes the SNP and the SNP basepair positions in the UNC GigaMUGA file were off by 1.
-
For the markers with unique probes, the GigaMUGA annotation file has the correct chromosome and position (except for the off-by-1 cases), while the MegaMUGA annotation file has six markers with incorrect chromosome assignment.
-
There are a bunch of markers with different names but the same probe sequence. More troubling, there are 29 markers that are on both the MegaMUGA and GigaMUGA arrays but with different probes on the two arrays. These are switches from plus to minus strand but without changing the marker name, and for 8 of them, the sequence on one array is either not unique or has no perfect match in the genome.
The following document describes what I've found:
The new annotation files are in the UWisc
directory of this repository.
This includes a file, mm_gm_commonmark_uwisc_v1.csv
,
indicating which markers are assaying common SNPs, within and between
the two arrays.
-
UWisc
- the new annotation files -
Blast
- includes R code for constructing fasta files with the array sequences, and for usingblastn
map them to the mouse genome. The ReadMe file explains the source for the mouse genome files, and of the command-line blastn program. (installedblastn
on linux withsudo apt install ncbi-blast+
) -
GeneSeek
- includes the.xlsx
file with probe sequences, from Ben Pejsar at GeneSeek. -
Python
-xlsx2csv.py
script for pulling worksheets from a.xlsx
file as a CSV file. -
R
- R code and R Markdown files with the analyses.new_annotations.Rmd
is the key document. -
UNC
- the ReadMe file has URLs for the UNC annotation files. -
GenMaps
- raw genetic map files derived using the Mouse Map Converter. -
docs
- compiled RMarkdown files, available on the web:
Vivek Kumar asked me to take a look at the miniMUGA array, using an annotation file he got from Fernando Pardo Manuel de Villena.
The miniMUGA paper has now been published, with some additions to the array. Initially published at bioRxiv on 2020-03-14, it provides official annotations with the Supplemental material, as Table S2.
My original analysis is at https://kbroman.org/MUGAarrays/mini_annotations.html
But I've now added a comparison to the new annotations: https://kbroman.org/MUGAarrays/mini_revisited.html
My annotation files are in the UWisc
directory, with the
original ones labeled v1
and the ones based on the new array v2
.
Note that new miniMUGA annotation information was provided with Blanchard et al. (2024). (For the uniquely mapped markers, these new annotations match the positions that I provide.) See Supplementary Table 2, whose columns are defined in Supplementary Table 3. Download the Supplementary Table 2 CSV file directly with the link https://gsajournals.figshare.com/ndownloader/files/47717242 Note that the positions are in build GRCm38 (mm10).
Mandy Chen asked me to take a look at the original MUGA array, using the annotations at UNC, http://csbio.unc.edu/MUGA/snps.muga.Rdata.
My analysis is at https://kbroman.org/MUGAarrays/muga_annotations.html
My annotation files are in the UWisc
directory.
The code in this repository are released under the MIT License.