See https://github.com/GregorySchwartz/integreat for the latest version.
inteGREAT
is an algorithm for integrating multiple data sources. inteGREAT
can be easily extended to differential integration by including samples from
multiple phenotypes, such as disease and control.
stack install integreat
integreat --dataInput in.csv --alignmentMethod CosineSimilarity --edgeMethod SpearmanCorrelation
integrate --dataInput in.csv --premade --alignmentMethod CosineSimilarity
integreat, Gregory W. Schwartz. Integrate data from multiple sources to find consistent (or inconsistent) entities. Usage: integreat [--dataInput STRING] [--vertexInput STRING] [--entityDiff TEXT] [--alignmentMethod STRING] [--edgeMethod STRING] [--walkerRestart DOUBLE] [--steps INT] [--premade] [--test] [--entityFilter INT] [--entityFilterStdDev DOUBLE] [--permutations INT] Available options: -h,--help Show this help text --dataInput STRING ([STDIN] | FILE) The input file containing the data intensities. Follows the format: dataLevel,dataReplicate,vertex,intensity. dataLevel is the level (the base level for the experiment, like "proteomic_cancer" or "RNA_cancer" for instance, requires at least two levels), dataReplicate is the replicate in that experiment that the entity is from (the name of that data set with the replicate name, like "RNA_cancer_1"), and vertex is the name of the entity (must match those in the vertex-input), and the intensity is the value of this entity in this data set. --vertexInput STRING ([Nothing] | FILE) The input file containing similarities between entities. Follows the format: vertexLevel1,vertexLevel2, vertex1,vertex2,similarity. vertexLevel1 is the level (the base title for the experiment, "data set") that vertex1 is from, vertexLevel2 is the level that vertex2 is from, and the similarity is a number representing the similarity between those two entities. If not specified, then the same entity (determined by vertex in data-input) will have a similarity of 1, different entities will have a similarity of 0. --entityDiff TEXT ([Nothing] | STRING) When comparing entities that are the same, ignore the text after this separator. Used for comparing phosphorylated positions with another level. For example, if we have a strings ARG29 and ARG29_7 that we want to compare, we want to say that their value is the highest in correlation, so this string would be "_" --alignmentMethod STRING ([CosineSimilarity] | RandomWalker | RandomWalkerSim) The method to get integrated vertex similarity between levels. CosineSimilarity uses the cosine similarity of each vertex in each network compared to the other vertices in other networks. RandomWalker uses a random walker with restart based network algnment algorithm in order to get similarity. RandomWalkerSim uses a random walker with restart and actually simulates the walker to get a stochastic result. --edgeMethod STRING ([SpearmanCorrelation] | PearsonCorrelation ) The method to use for the edges between entities in the coexpression matrix. --walkerRestart DOUBLE ([0.25] | PROBABILITY) For the random walker algorithm, the probability of making a jump to a random vertex. Recommended to be the ratio of the total number of vertices in the top 99% smallest subnetworks to the total number of nodes in the reduced product graph (Jeong, 2015). --steps INT ([100] | STEPS) For the random walker algorithm, the number of steps to take before stopping. --premade ([False] | BOOL) Whether the input data (dataInput) is a pre-made network of the format "[(["VERTEX"], [("SOURCE", "DESTINATION", WEIGHT)])]", where VERTEX, SOURCE, and DESTINATION are of type INT starting at 0, in order, and WEIGHT is a DOUBLE representing the weight of the edge between SOURCE and DESTINATION. --test ([False] | BOOL) Whether the input data from premade is from a test run. If supplied, the output is changed to an accuracy measure. In this case, we get the total rank below the number of permuted vertices divided by the theoretical maximum (so if there were five changed vertices out off 10 and two were rank 8 and 10 while the others were in the top five, we would have (1 - ((3 + 5) / (10 + 9 + 8 + 7 + 6))) as the accuracy. --entityFilter INT ([Nothing] | INT) The minimum number of samples an entity must appear in, otherwise the entity is removed from the analysis. --entityFilterStdDev DOUBLE ([Nothing] | DOUBLE) Remove entities that have less than this value for their standard deviation among all samples. --permutations INT ([1000] | INT) The number of permutations for cosine similarity permutation test or bootstrap. Right now just does bootstrap and only shows the first comparison if there are multiple comparisons.