GitHub - herrsalmi/IntOmics: Multi omics data integration tool

Multi omics data integration tool

IntOmics is a tool for integrating secretomics and transcriptomics data in order to uncover paracrine cell crosstalk mechanisms.

Usage

java -jar intOmics.jar -p <file> -g <file> [options]*

Arguments

Option	Description
`-p <file>`	Text file containing secreted proteins
`-g <file>`	Text file containing gene differential expression data
`-f <string>`	Output format: html, tsv or fwf. Default: html
`-db <string>`	Pathway database: [KEGG, WIKIPATHWAYS, REACTOME]. Default: KEGG
`-s <int>`	Minimum score for PPI (range from 0 to 1000). Default: 900
`-fc <double>`	Fold change cutoff. Default: 1.5
`-pv <double>`	P-value cutoff. Default: 0.05
`-gpv <double>`	P-value cutoff for GSEA. Default: 0.05
`-t <int>`	Number of threads to use. Default: 4 or max available if less
`-d <string>`	Custom separator for input files: Default: ';'
`--species <string>`	Species for your data. Default: human
`--no-cached-sets`	Pull an up-to-date list of pathways
`--ignore-check`	Ignore checks when pulling updated pathways
`--no-cached-ppi`	Disable usage of cached PPI data
`-h`	Print the help screen

Input files

Input files should be in CSV format and can have a header line starting with #. The default column separator is ;, but a different one can be specified using option -d.

Secreted proteins

Text file containing protein names or corresponding Entrez gene id, each one on a separate line.

Differential expression data

Text file in CSV format with three columns: gene name, p value and fold change.

Supported species

The tool currently supports four vertebrate species: Human, Mouse, Rat, Cow. If you're working with another species please open a new issue, and I'll be sure to address it.

Outputs

There are two main output files:

A file either in HTML, TSV or FWF format containing:
- Protein: secreted proteins symbol.
- Protein description: full name of the protein.
- Gene: symbol corresponding to membrane protein-coding gene.
- Gene description: full name of the gene.
- I score: interaction score between the protein and the receptor.
- Pathways: list of pathways with enrichment scores and p-values.
An HTML file representing the network of interactions between secreted proteins and cell receptors.

Gene set enrichment analysis

The GSEA implemented in this tool is slightly different from the on proposed by Subramanian et al. (2005).

Gene sets are defined as pathways from either KEGG, WIKIPATHWAYS or REACTOME. KEGG is chosen by default if option -db is not specified. This tool has prebuilt WIKIPATHWAYS and KEGG sets for the human genome, but an up-to-date version can be rebuilt by using option --no-cached-sets and stored in sets/ folder for future use. Note that if no new pathways exist, the prebuilt version will be used. This argument though has no effect when using REACTOME as no prebuilt sets are available, and the online service is always queried.

Protein-protein interactions

Protein-protein interactions data from StringDB is used to establish a link between secreted proteins and surface receptors. Interaction scores rank from 0 to 1000, and they do not indicate the strength or the specificity of the interaction. Instead, they are indicators of confidence. A score of 500 would indicate that roughly every second interaction might be erroneous (i.e., a false positive).

score	confidence
x > 900	highest confidence
x > 700	high confidence
x > 400	medium confidence
x > 150	low confidence

A cached network of human PPI is used when the interaction score threshold is greater than 700. You can override this behavior by using option --no-cached-ppi.

Sample data

Secreted proteins and DE testing results sample data are provided for testing purpose. To run the example use the following command:

java -jar intOmics.jar -p secreted.csv -g de_testing.csv

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.idea		.idea
sample		sample
src		src
target/classes		target/classes
.gitignore		.gitignore
IntOmics.iml		IntOmics.iml
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi omics data integration tool

Usage

Arguments

Input files

Secreted proteins

Differential expression data

Supported species

Outputs

Gene set enrichment analysis

Protein-protein interactions

Sample data

About

Releases 2

Packages

Contributors 2

Languages

License

herrsalmi/IntOmics

Folders and files

Latest commit

History

Repository files navigation

Multi omics data integration tool

Usage

Arguments

Input files

Secreted proteins

Differential expression data

Supported species

Outputs

Gene set enrichment analysis

Protein-protein interactions

Sample data

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages