Join GitHub today
Analysis and Outputs
Clicking the LOAD DATA button (or running
slimenrich.R) will load the four input files and parse the relevant fields, depending on the DMI strategy selected:
elmiprot links ELM instances directly to ELM interaction partners:
dProtein. (Motif and Domain file content is ignored. For custom DMI data, make sure the "Motif" field has
mProteincontent and the "Domain" field has
elmcprot links ELM classes to their interacting proteins:
dProtein. (Domain file content is ignored. For custom DMI data, make sure the "Domain" field has
elmcdom links ELM classes to their interacting domains:
NOTE: Although SLiMEnrich uses ELM interactions by default, these can be replaced by uploading a custom DMI File.
Once loaded and parsed, all possible DMI are extracted and compared to loaded PPI data (
dProtein) to identify/predict DMIs in these data. Random PPI datasets are generated and used to establish the null distribution of expected DMI counts in the absence of real DMIs. Real and random data is compared and output as a histogram, along with a network of the identified/predicted DMIs in the data. These are discussed in more detail with the relevant output tab of the app, below. Outputs for the commandline version of SLiMEnrich are similar but less interactive.
SLiMEnrich Output tabs
The SLiMEnrich App has nine display tabs: Uploaded Data, Potential DMIs, Predicted DMIs, Summary, Histogram, Motifs, Domains, Network and Help. Each tab can be clicked to visualise the required results/information. Help text for the data tabs can be hidden by checking the Hide tab info text checkbox.
1- Uploaded Data:
This tab shows the content of the four uploaded files used for analysis: PPIs, DMI, Motifs and Domains. (NOTE: for elmiprot and elmcprot strategies, the Motif and/or Domain file will have been made from the DMI file.) Unless user files are selected for upload, these will be the relevant files from
data/. For human PPI data, the user need only load their PPI data and press LOAD DATA. In the absence of a PPI file, the example PPI data from
data/ will be used. By default, the full file contents will be shown in the tables. Checking the Show parsed data columns box will display which data have been parsed from each input file for the later stages (e.g. what data is being used for
2- Potential DMIs
This tab shows all possible DMIs for the loaded PPI data by mapping all possibe
dProtein links from the loaded data, where
dProtein are both found in the PPI data, but not necessarily interacting with each other. This represents the overall pool of the DMIs, given the proteins (but ignoring the actual interactions) in the PPI data. For more details of how mapping is done, refer to Input Data Mapping. By default, all possible links will be shown. To reduce to the set of non-redundant
dProtein pairs, check the Show NR potential DMI box.
3- Predicted DMIs
This tab shows the actual DMIs that are identified (when restricting to known ELM interactions) or predicted in the PPI dataset through mapping
dProtein PPI pairs to
dProtein pairs in the set of Potential DMIs (i.e. all possible DMIs. For more details of how mapping is done, refer to Input Data Mapping. By default, all possible links will be shown. To reduce to the set of non-redundant
dProtein pairs used for calculations, check the Show NR predicted DMI box.
This tab gives a graphical representation of the unique number of
dProtein identifiers involved in the predicted DMI dataset. Numbers can be visualised upon mouse hover on the bars. Colours match the Network output.
This tab is the main SLiMEnrich output. Clicking on this tab will trigger the PPI randomisation and generate the expected distribution of predicted DMIs based on purely chance associations between PPI
dProteins. In brief, the input PPI dataset is shuffled without replacement (i.e. keeping the original number of interacting partners per protein) and the number of predicted DMI calculated for the shuffled data. By default, this is done 1000 times, which can be set using the Number of randomisations (or
--random=INTEGER on the commandline).
The histogram shows the expected distribution of predicted DMIs from these randomised PPI data, and marks the observed number of predicted DMIs in the real data. The following values are then calculated and displayed:
- P-value: this is an empirical p-value based on the proportion of randomised PPI datasets that equal or exceed the number of DMI observed in the real data.
- Enrichment: this is the ratio of observed non-redundant DMI to the mean of the random non-redundent DMI.
- FDR: this is the estimated proportion of observed DMI that are false positives, based on the mean random DMI count, excluding any random datasets exceeding the observed predicted DMI count.
Clicking SETTINGS will open options for customising the histogram. The bin size (default 1) and x-axis extension can also be set for the commandline version (
-x, respectively). Note that the histogram x-axis will not truncate before the maximum real or random DMI count and any Extend X-axis End setting below this number will be ignored.
- Normalise DMI counts will convert all DMI counts to a value relative to the mean random DMI count (e.g. divide all values by mean random DMI) to enable comparisons between datasets of very different sizes.
- Convert to distribution of estimated real DMI will subtract the random DMI counts from the predicted DMI to generate a distribution of estimated real DMI.
This shows the frequency of each
Motif identified involved in the predicted DMIs and the total proportion of predicted
dProtein DMIs that were linked via that motif. Frequencies can be visualised in a bar chart by clicking INTERACTIVE VIEW.
This shows the frequency of each
Domain identified involved in the predicted DMIs and the total proportion of predicted
dProtein DMIs that were linked via that domain. Frequencies can be visualised in a bar chart by clicking INTERACTIVE VIEW.
This tab generates a visualisation of the predicted DMI network. Different network layouts can be selected and the network can be explored by dragging and moving nodes. Colours match those used in the summary tab.
NOTE: SLiMEnrich will display whatever identifiers have been used for DMI mapping. (ELM, Pfam and Uniprot by default.) For accessible network labels, please use accessible input data, e.g. HGNC gene symbols for
dProtein fields. Due to the inherent risks of errors and failing to keep up-to-date, SLiMEnrich does not perform any identifier mapping.
The final tab shows the README and summarises the other tab contents, even when Hide tab info text is checked.
Details of the commandline version of the app will be added here. They are similar to Shiny version, and will be generated in an output directory (
--output FILE, default
STDOUT will report the outputs as the program runs.