Analysis and Outputs

SobiaIdrees edited this page Jul 26, 2018 · 33 revisions

Analysis Overview

Clicking the LOAD DATA button (or running slimenrich.R) will load the four input files and parse the relevant fields, depending on the DMI strategy selected:

  • elmiprot links ELM instances directly to ELM interaction partners: mProtein-dProtein. (Motif and Domain file content is ignored. For custom DMI data, make sure the "Motif" field has mProtein content and the "Domain" field has dProtein content.)
  • elmcprot links ELM classes to their interacting proteins: mProtein-Motif-dProtein. (Domain file content is ignored. For custom DMI data, make sure the "Domain" field has dProtein content.)
  • elmcdom links ELM classes to their interacting domains: mProtein-Motif-Domain-dProtein.

NOTE: Although SLiMEnrich uses ELM interactions by default, these can be replaced by uploading a custom DMI File.

Once loaded and parsed, all possible DMI are extracted and compared to loaded PPI data (mProtein-dProtein) to identify/predict DMIs in these data. Random PPI datasets are generated and used to establish the null distribution of expected DMI counts in the absence of real DMIs. Real and random data is compared and output as a histogram, along with a network of the identified/predicted DMIs in the data. These are discussed in more detail with the relevant output tab of the app, below. Outputs for the commandline version of SLiMEnrich are similar but less interactive.

SLiMEnrich Output tabs

The SLiMEnrich App has nine display tabs: Uploaded Data, Potential DMIs, Predicted DMIs, Summary, Histogram, Motifs, Domains, Network and Help. Each tab can be clicked to visualise the required results/information. Help text for the data tabs can be hidden by checking the Hide tab info text checkbox.

1- Uploaded Data:

This tab shows the content of the four uploaded files used for analysis: PPIs, DMI, Motifs and Domains. (NOTE: for elmiprot and elmcprot strategies, the Motif and/or Domain file will have been made from the DMI file.) Unless user files are selected for upload, these will be the relevant files from data/. For human PPI data, the user need only load their PPI data and press LOAD DATA. In the absence of a PPI file, the example PPI data from data/ will be used. By default, the full file contents will be shown in the tables. Checking the Show parsed data columns box will display which data have been parsed from each input file for the later stages (e.g. what data is being used for mProtein, Motif, Domain and dProtein).

uploadeddata

2- Potential DMIs

This tab shows all possible DMIs for the loaded PPI data by mapping all possibe mProtein-Motif-Domain-dProtein links from the loaded data, where mProtein and dProtein are both found in the PPI data, but not necessarily interacting with each other. This represents the overall pool of the DMIs, given the proteins (but ignoring the actual interactions) in the PPI data. For more details of how mapping is done, refer to Input Data Mapping. By default, all possible links will be shown. To reduce to the set of non-redundant mProtein-dProtein pairs, check the Show NR potential DMI box.

potentialdmis

3- Predicted DMIs

This tab shows the actual DMIs that are identified (when restricting to known ELM interactions) or predicted in the PPI dataset through mapping mProtein and dProtein PPI pairs to mProtein-dProtein pairs in the set of Potential DMIs (i.e. all possible DMIs. For more details of how mapping is done, refer to Input Data Mapping. By default, all possible links will be shown. To reduce to the set of non-redundant mProtein-dProtein pairs used for calculations, check the Show NR predicted DMI box.

preddmis

4- Summary

This tab gives a graphical representation of the unique number of mProtein, Motif, Domain and dProtein identifiers involved in the predicted DMI dataset. Numbers can be visualised upon mouse hover on the bars. Colours match the Network output.

summaryStats

5- Histogram

This tab is the main SLiMEnrich output. Clicking on this tab will trigger the PPI randomisation and generate the expected distribution of predicted DMIs based on purely chance associations between PPI mProteins and dProteins. In brief, the input PPI dataset is shuffled without replacement (i.e. keeping the original number of interacting partners per protein) and the number of predicted DMI calculated for the shuffled data. By default, this is done 1000 times, which can be set using the Number of randomisations (or --random=INTEGER on the commandline).

The histogram shows the expected distribution of predicted DMIs from these randomised PPI data, and marks the observed number of predicted DMIs in the real data. The following values are then calculated and displayed:

  • P-value: this is an empirical p-value based on the proportion of randomised PPI datasets that equal or exceed the number of DMI observed in the real data.
  • Enrichment: this is the ratio of observed non-redundant DMI to the mean of the random non-redundent DMI.
  • FDR: this is the estimated proportion of observed DMI that are false positives, based on the mean random DMI count, excluding any random datasets exceeding the observed predicted DMI count.

Clicking SETTINGS will open options for customising the histogram. The bin size (default 1) and x-axis extension can also be set for the commandline version (-b and -x, respectively). Note that the histogram x-axis will not truncate before the maximum real or random DMI count and any Extend X-axis End setting below this number will be ignored.

  • Normalise DMI counts will convert all DMI counts to a value relative to the mean random DMI count (e.g. divide all values by mean random DMI) to enable comparisons between datasets of very different sizes.
  • Convert to distribution of estimated real DMI will subtract the random DMI counts from the predicted DMI to generate a distribution of estimated real DMI.

hist

6- Motifs

This shows the frequency of each Motif identified involved in the predicted DMIs and the total proportion of predicted mProtein-dProtein DMIs that were linked via that motif. Frequencies can be visualised in a bar chart by clicking INTERACTIVE VIEW.

motifs

7- Domains

This shows the frequency of each Domain identified involved in the predicted DMIs and the total proportion of predicted mProtein-dProtein DMIs that were linked via that domain. Frequencies can be visualised in a bar chart by clicking INTERACTIVE VIEW.

domains

8- Network

This tab generates a visualisation of the predicted DMI network. Different network layouts can be selected and the network can be explored by dragging and moving nodes. Colours match those used in the summary tab.

network

NOTE: SLiMEnrich will display whatever identifiers have been used for DMI mapping. (ELM, Pfam and Uniprot by default.) For accessible network labels, please use accessible input data, e.g. HGNC gene symbols for mProtein and dProtein fields. Due to the inherent risks of errors and failing to keep up-to-date, SLiMEnrich does not perform any identifier mapping.

9- Help

The final tab shows the README and summarises the other tab contents, even when Hide tab info text is checked.

Commandline output

Details of the commandline version of the app will be added here. They are similar to Shiny version, and will be generated in an output directory (--output FILE, default ./output/). STDOUT will report the outputs as the program runs.

SLiMEnrich Documentation

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.