Skip to content

inteGREAT is an algorithm for differential and non-differential integration of multiple genomic data sets.

License

Notifications You must be signed in to change notification settings

faryabib/inteGREAT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

inteGREAT

See https://github.com/GregorySchwartz/integreat for the latest version.

Description

inteGREAT is an algorithm for integrating multiple data sources. inteGREAT can be easily extended to differential integration by including samples from multiple phenotypes, such as disease and control.

Installation

stack install integreat

Sample usage

Integration with sample abundances

integreat --dataInput in.csv --alignmentMethod CosineSimilarity --edgeMethod SpearmanCorrelation

Integration with pre-calculated network

integrate --dataInput in.csv --premade --alignmentMethod CosineSimilarity

Documentation

integreat, Gregory W. Schwartz. Integrate data from multiple sources to find
consistent (or inconsistent) entities.

Usage: integreat [--dataInput STRING] [--vertexInput STRING] [--entityDiff TEXT]
                 [--alignmentMethod STRING] [--edgeMethod STRING]
                 [--walkerRestart DOUBLE] [--steps INT] [--premade] [--test]
                 [--entityFilter INT] [--entityFilterStdDev DOUBLE]
                 [--permutations INT]

Available options:
  -h,--help                Show this help text
  --dataInput STRING       ([STDIN] | FILE) The input file containing the data
                           intensities. Follows the format:
                           dataLevel,dataReplicate,vertex,intensity. dataLevel
                           is the level (the base level for the experiment, like
                           "proteomic_cancer" or "RNA_cancer" for instance,
                           requires at least two levels), dataReplicate is the
                           replicate in that experiment that the entity is from
                           (the name of that data set with the replicate name,
                           like "RNA_cancer_1"), and vertex is the name of the
                           entity (must match those in the vertex-input), and
                           the intensity is the value of this entity in this
                           data set.
  --vertexInput STRING     ([Nothing] | FILE) The input file containing
                           similarities between entities. Follows the format:
                           vertexLevel1,vertexLevel2,
                           vertex1,vertex2,similarity. vertexLevel1 is the level
                           (the base title for the experiment, "data set") that
                           vertex1 is from, vertexLevel2 is the level that
                           vertex2 is from, and the similarity is a number
                           representing the similarity between those two
                           entities. If not specified, then the same entity
                           (determined by vertex in data-input) will have a
                           similarity of 1, different entities will have a
                           similarity of 0.
  --entityDiff TEXT        ([Nothing] | STRING) When comparing entities that are
                           the same, ignore the text after this separator. Used
                           for comparing phosphorylated positions with another
                           level. For example, if we have a strings ARG29 and
                           ARG29_7 that we want to compare, we want to say that
                           their value is the highest in correlation, so this
                           string would be "_"
  --alignmentMethod STRING ([CosineSimilarity] | RandomWalker | RandomWalkerSim)
                           The method to get integrated vertex similarity
                           between levels. CosineSimilarity uses the cosine
                           similarity of each vertex in each network compared to
                           the other vertices in other networks. RandomWalker
                           uses a random walker with restart based network
                           algnment algorithm in order to get similarity.
                           RandomWalkerSim uses a random walker with restart and
                           actually simulates the walker to get a stochastic
                           result.
  --edgeMethod STRING      ([SpearmanCorrelation] | PearsonCorrelation ) The
                           method to use for the edges between entities in the
                           coexpression matrix.
  --walkerRestart DOUBLE   ([0.25] | PROBABILITY) For the random walker
                           algorithm, the probability of making a jump to a
                           random vertex. Recommended to be the ratio of the
                           total number of vertices in the top 99% smallest
                           subnetworks to the total number of nodes in the
                           reduced product graph (Jeong, 2015).
  --steps INT              ([100] | STEPS) For the random walker algorithm, the
                           number of steps to take before stopping.
  --premade                ([False] | BOOL) Whether the input data (dataInput)
                           is a pre-made network of the format "[(["VERTEX"],
                           [("SOURCE", "DESTINATION", WEIGHT)])]", where VERTEX,
                           SOURCE, and DESTINATION are of type INT starting at
                           0, in order, and WEIGHT is a DOUBLE representing the
                           weight of the edge between SOURCE and DESTINATION.
  --test                   ([False] | BOOL) Whether the input data from premade
                           is from a test run. If supplied, the output is
                           changed to an accuracy measure. In this case, we get
                           the total rank below the number of permuted vertices
                           divided by the theoretical maximum (so if there were
                           five changed vertices out off 10 and two were rank 8
                           and 10 while the others were in the top five, we
                           would have (1 - ((3 + 5) / (10 + 9 + 8 + 7 + 6))) as
                           the accuracy.
  --entityFilter INT       ([Nothing] | INT) The minimum number of samples an
                           entity must appear in, otherwise the entity is
                           removed from the analysis.
  --entityFilterStdDev DOUBLE
                           ([Nothing] | DOUBLE) Remove entities that have less
                           than this value for their standard deviation among
                           all samples.
  --permutations INT       ([1000] | INT) The number of permutations for cosine
                           similarity permutation test or bootstrap. Right now
                           just does bootstrap and only shows the first
                           comparison if there are multiple comparisons.

About

inteGREAT is an algorithm for differential and non-differential integration of multiple genomic data sets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published