Skip to content

kbroman/MUGAarrays

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Re-derive MegaMUGA and GigaMUGA annotation files

doi badge

Update for GRCm39

Dan Gatti and Belinda Cornes have updated the annotation files for mouse genome build GRCm39. These also include sex-averaged genetic map positions for an updated version of the Cox et al. (2009) map; see CoxMapV3. (Note that the CoxMapV3 maps were corrected on 2023-03-17 using the original crimap software rather than the "improved" version of crimap, and that we've further "smoothed" the maps slightly to avoid segments with 0 recombination.)


I had identified a number of potential problems in the GigaMUGA annotation file, as well as discrepancies between the GigaMUGA and MegaMUGA files. I suspect that some of the columns in the GigaMUGA annotation file from UNC have been at least partially scrambled.

I emailed GeneSeek to get files with the probe sequences on the arrays, and on 2018-11-02 I received a .xlsx file by email from Ben Pejsar, Genomic Market Development Manager, Neogen GeneSeek Operations.

My goals were to:

  • blast the sequences for all markers in each of the arrays against the mouse genome

  • figure out which SNPs have a single hit in the mouse genome, and to where

  • compare the sequences and probe locations, and the markers with multiple hits, to the UNC annotation file

Summary of findings:

  • the unique column in the UNC annotation file for the GigaMUGA array was messed up.

  • we should use NA for chromosome and position of markers whose probe does not have a single perfect match in the mouse genome assembly

  • for a small number of markers (the transversions, with two-bead Illumina probes), the probe sequence in the GeneSeek file includes the SNP and the SNP basepair positions in the UNC GigaMUGA file were off by 1.

  • For the markers with unique probes, the GigaMUGA annotation file has the correct chromosome and position (except for the off-by-1 cases), while the MegaMUGA annotation file has six markers with incorrect chromosome assignment.

  • There are a bunch of markers with different names but the same probe sequence. More troubling, there are 29 markers that are on both the MegaMUGA and GigaMUGA arrays but with different probes on the two arrays. These are switches from plus to minus strand but without changing the marker name, and for 8 of them, the sequence on one array is either not unique or has no perfect match in the genome.

The following document describes what I've found:

The new annotation files are in the UWisc directory of this repository. This includes a file, mm_gm_commonmark_uwisc_v1.csv, indicating which markers are assaying common SNPs, within and between the two arrays.


Contents

  • UWisc - the new annotation files

  • Blast - includes R code for constructing fasta files with the array sequences, and for using blastn map them to the mouse genome. The ReadMe file explains the source for the mouse genome files, and of the command-line blastn program. (installed blastn on linux with sudo apt install ncbi-blast+)

  • GeneSeek - includes the .xlsx file with probe sequences, from Ben Pejsar at GeneSeek.

  • Python - xlsx2csv.py script for pulling worksheets from a .xlsx file as a CSV file.

  • R - R code and R Markdown files with the analyses. new_annotations.Rmd is the key document.

  • UNC - the ReadMe file has URLs for the UNC annotation files.

  • GenMaps - raw genetic map files derived using the Mouse Map Converter.

  • docs - compiled RMarkdown files, available on the web:

  • Makefile - GNU Make file to automate/document the analyses.


MiniMUGA

Vivek Kumar asked me to take a look at the miniMUGA array, using an annotation file he got from Fernando Pardo Manuel de Villena.

The miniMUGA paper has now been published, with some additions to the array. Initially published at bioRxiv on 2020-03-14, it provides official annotations with the Supplemental material, as Table S2.

My original analysis is at https://kbroman.org/MUGAarrays/mini_annotations.html

But I've now added a comparison to the new annotations: https://kbroman.org/MUGAarrays/mini_revisited.html

My annotation files are in the UWisc directory, with the original ones labeled v1 and the ones based on the new array v2.

Note that new miniMUGA annotation information was provided with Blanchard et al. (2024). (For the uniquely mapped markers, these new annotations match the positions that I provide.) See Supplementary Table 2, whose columns are defined in Supplementary Table 3. Download the Supplementary Table 2 CSV file directly with the link https://gsajournals.figshare.com/ndownloader/files/47717242 Note that the positions are in build GRCm38 (mm10).


Original MUGA

Mandy Chen asked me to take a look at the original MUGA array, using the annotations at UNC, http://csbio.unc.edu/MUGA/snps.muga.Rdata.

My analysis is at https://kbroman.org/MUGAarrays/muga_annotations.html

My annotation files are in the UWisc directory.


License

The code in this repository are released under the MIT License.