Join GitHub today
SLiMEnrich is a framework assessing the enrichment of domain-motif interactions in protein-protein interaction (PPI) data, implemented as a Shiny App and standalone R script. SLiMEnrich will identify or predict Domain-Motif Interactions (DMIs) from PPI data and estimate the background distribution of expected DMI by chance through a randomisation approach. Enrichment statistics and known/predicted DMI found in the PPI are output to the user. A visual schematic of SLiMEnrich is provided below.
SLiMEnrich can be used to:
- Estimate the enrichment of SLiM-mediated DMI in a given PPI dataset.
- Generate predictions for DMI-mediated from a PPI dataset. Predictions are based on known SLiM-Domain interactions.
- Estimate the False Positive Rate of those DMI predictions.
With a bit of imagination, SLiMEnrich can be adapted to generate and assess predictions for other kinds of interactions. See docs for details or contact us if you need help.
SLiMEnrich is available via the SLiMEnrich RShiny Webserver and can be run using
Rscript from the commandline. For commandline options, please run:
Rscript slimenrich.R -h
- Installation and Setup. How to run SLiMEnrich locally.
- Quick Tutorial. Start here. Analysis walkthrough using example data.
- SLiMEnrich Input. Input data and formatting requirements.
- Analysis and Outputs. Explanations of SLiMEnrich outputs.
- Randomisation and Statistics. Details of the SLiMEnrich calculations.
How SLiMEnrich works
A schematic representation of the main SLiMEnrich pipeline. SLiMEnrich takes four input files: 1. PPI data provided by the user as a set of pairwise putative motif-containing proteins ("mProteins") and their domain-containing interaction partners ("dProteins"); 2. A file providing known or predicted motif occurrences within the mProtein sequences (e.g. known ELM instances (the default) or SLiMProb predictions); 3. A DMI file defining Motif-Domain interactions, relating to the DMI Strategy employed (by default, known ELM interactions are used); 4. A file that links dProteins to their domain composition (by default, human Pfam domains from UniprotKB are used). Input data is combined to establish the complete set of known/predicted “potential DMI” dependent on the DMI strategy selected (see DMI Strategies and Input files and formats for details): ELMi-Protein – for highest stringency, the DMI file directly links mProteins to known dProtein DMI partners (Motifs and Domains input not used); ELMc-Protein – for medium stringency, the DMI file links Motif classes to known dProtein DMI partners (Domains input not used); ELMc-Domain – for lowest stringency, the DMI file links Motif classes to known interacting Domains. Potential DMIs are then mapped on to the input PPI to identify the "Predicted DMIs" in the real data. PPI data is randomised (shuffled) 1000 times and re-mapped to potential DMIs to determine the background distribution of predicted DMIs in the case of random association (see text for details). Finally, the “Random DMI” distribution is compared to the observed “Predicted DMIs” to determine DMI enrichment in the data. Results are output in the form of a tables, a histogram of the Random DMI distribution with the observed count and empirical P-value marked, and an interactive network of the known/predicted DMIs found in the PPI data.
The SLiMEnrich paper is currently under submission. If you use SLiMEnrich in your research, please cite:
Idrees S, Pérez-Bercoff Å and Edwards RJ (2016). Predicting molecular mimicry in viruses using computational methods [version 1; not peer reviewed]. F1000Research 5:2599 (poster) (doi: 10.7490/f1000research.1113345.1)