Skip to content
public
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

README.md

protein_complex_maps

#Scripts for handling protein complex map data

##Elution correlation ###Correlation matrices for each experiment, each species, and all experiments concatenated

python ./protein_complex_maps/external/infer_complexes/score.py

input: tab separated wide elution profile: prot_ids [tab] total_spectral_count [tab] frac1_spectral_count [tab] ...

output: corr_poisson

output is a giant all by all matrix

Example

python ./protein_complex_maps/external/infer_complexes/score.py ./examples/Hs_helaN_ph_hcw120_2_psome_exosc_randos.txt poisson

###Reformat all by all to tidy (3 column)

python ./protein_complex_maps/features/convert_correlation.py

input: corr_poisson

output: corr_poisson.pairs

P1 P2 correlation_coefficient; For all protein pairs

Example

python ./protein_complex_maps/features/convert_correlation.py --input_correlation_matrix ./examples/Hs_helaN_ph_hcw120_2_psome_exosc_randos.txt.corr_poisson --input_elution_profile ./examples/Hs_helaN_ph_hcw120_2_psome_exosc_randos.txt --output_file ./examples/Hs_helaN_ph_hcw120_2_psome_exosc_randos.txt.corr_poisson_tidy

###Feature matrix

Any feature which you can put on a pair of proteins

python ./protein_complex_maps/features/build_feature_matrix.py

input: all .corr_poisson.pairs

output: feature_matrix.txt

Note: this is the point to put in additional features like AP-MS etc. as long as it describes a pair of proteins

pairs Feature1 Feature2 Feature3
P1 P2 value1 value2 value3
... ... ... ...
PN PN-1 value4 value5 value6

n x m, where n = #prots choose 2, m = # of features

Example

python ./protein_complex_maps/features/build_feature_matrix.py --input_pairs_files ./examples/Hs_helaN_ph_hcw120_2_psome_exosc_randos.txt.corr_poisson_tidy --output_file ./examples/Hs_helaN_ph_hcw120_2_psome_exosc_randos.txt.corr_poisson_tidy.featmat

###Format corum into test and training sets Remove redundancy from corum (merge similar clusters)

python ./protein_complex_maps/complex_merge.py

input: nonredundant_allComplexesCore_mammals.txt output: nonredundant_allComplexesCore_mammals_merged06.txt

Randomly split the corum complexes into training and test (split)

python ./protein_complex_maps/features/split_complexes.py

input: complexes nonredundant_allComplexesCore_mammals_merged06.txt

output:

  • [input_basename].test.txt
  • [input_basename].train.txt
  • [input_basename].test_ppis.txt
  • [input_basename].train_ppis.txt
  • [input_basename].neg_test_ppis.txt
  • [input_basename].neg_train_ppis.txt

Takes any pairwise overlap between train and test ppi, and randomly removes ppi from either test or train. So say complex 1 = AB, AC, BC & complex 2 = AB AC AD BC BD => complex 1 = AB BC, complex 2 = AB AD CD Also make sure complexes between training and test are completely separated

Example

python ./protein_complex_maps/complex_merge.py --cluster_filename ./examples/allComplexesCore_geneid.txt --output_filename ./examples/allComplexesCore_geneid_merged06.txt --merge_threshold 0.6
python ./protein_complex_maps/features/split_complexes.py --input_complexes ./examples/allComplexesCore_geneid_merged06.txt

###Make feature matrix w/ labels from corum

python ./protein_complex_maps/features/add_label.py

input: feature_matrix.txt

output: corum_train_labeled.txt

(These are the possible labels)

  • +1 positive label = pair is co-complex in corum
  • -1 negative label = pair is in corum, but not in same complex
  • 0 = at least one protein in the pair is not in corum

###Make input for the SVM

Convert to libsvm format training set, strips out a lot of headers, etc.

python ./protein_complex_maps/features/feature2libsvm.py

input: corum_train_labeled.txt

output: corum_train_labeled.libsvm1.txt, tab separated

SVM biased toward large numbers in features. Scaling just puts all features scaled to 1.

$LIBSVM_HOME/svm-scale

input: corum_train_labeled.libsvm1.scale_parameters

output: corum_train_labeled.libsvm1.scale.txt

SVM training and parameter sweep to optimize C and gamma

parameter sweep using training set (trains on 9/10th, compared to leave out)

python $LIBSVM_HOME/tools/grid.py

input: corum_train_labeled.libsvm1.scale.txt

output: corum_train_labeled.libsvm1.scale.txt.out

###Train classifier

Takes optimal c and g from SVM training and trains a classifier

$LIBSVM_HOME/svm-train

input: corum_train_labeled.libsvm1.scale.txt

output: corum_train_labeled.libsvm1.scale.model_c_g (with c and g values)

predict unlabeled set w/ test set on train model

$LIBSVM_HOME/svm-predict

input: corum_train_labeled.libsvm0.scaleByTrain.txt, corum_train_labeled.libsvm1.scale.model_c_g

output: corum_train_labeled.libsvm0.scaleByTrain.resultsWprob

probability ordered list of pairs

python ./protein_complex_maps/features/svm_results2pairs.py

inputs: corum_train_labeled.txt, corum_train_labeled.libsvm0.scaleByTrain.resultsWprob

output: corum_train_labeled.libsvm0.scaleByTrain.resultsWprob_pairs_noself_nodups_wprob.txt

###Cluster PPis At this point, want to find clusters (dense regions)

two-stage clustering

python ./protein_complex_maps/features/clustering_parameter_optimization.py

inputs:

  • corum_train_labeled.libsvm1.scale.libsvm0.scaleByTrain.resultsWprob_pairs_noself_nodups_wprob.txt

  • nonredundant_allComplexesCore_mammals_merged06.train.txt

outputs:

  • corum_train_labeled.libsvm1.scale.libsvm0.scaleByTrain.resultsWprob_pairs_noself_nodups_wprob_combined.best_cluster_wOverlap_nr_allComplexesCore_mammals_psweep_clusterone_mcl.txt
  • corum_train_labeled.libsvm1.scale.libsvm0.scaleByTrain.resultsWprob_pairs_noself_nodups_wprob.best_cluster_wOverlap_nr_allComplexesCore_mammals_psweep_clusterone_mcl.out

Do a parameter sweep (about 1000 different possibilities

  • PPi score threshold [1.0, 0.9., 0.8 ... .1]
  • Clusterone parameters
    • overlap (jaccard score) [0.8, 0.7, 0.6] -- merging complexes with overlap
    • density (threshold of total number of interactions vs. total possible interactions) unconnected -> fully connected
  • MCL inflation [1.2, 3, 4, 7]

Process: Run through clusterone, then run clusters from clusterone through MCL.

Output: a set of clusters times # of possible combinations

Select best set of clusters (usually a couple thousand) by comparing to corum training complex set using K-Cliques metric or other comparison metric

###Generate Cytoscape Network

Make clusters into pairs

python ./protein_complex_maps/util/cluster2pairwise.py

input: corum_train_labeled.libsvm1.scale.libsvm0.scaleByTrain.resultsWprob_pairs_noself_nodups_wprob_combined.best_cluster_wOverlap_nr_allComplexesCore_mammals_psweep_clusterone_mcl.[best].txt

output: corum_train_labeled.libsvm1.scale.libsvm0.scaleByTrain.resultsWprob_pairs_noself_nodups_wprob_combined.best_cluster_wOverlap_nr_allComplexesCore_mammals_psweep_clusterone_mcl.[best].pairsWclustID.txt

Make clusters into node table

python ./protein_complex_maps/util/cluster2node_table.py

input: corum_train_labeled.libsvm1.scale.libsvm0.scaleByTrain.resultsWprob_pairs_noself_nodups_wprob_combined.best_cluster_wOverlap_nr_allComplexesCore_mammals_psweep_clusterone_mcl.[best].txt

output: corum_train_labeled.libsvm1.scale.libsvm0.scaleByTrain.resultsWprob_pairs_noself_nodups_wprob_combined.best_cluster_wOverlap_nr_allComplexesCore_mammals_psweep_clusterone_mcl.[best].nodeTable.txt

Make edge attribute table

python ./protein_complex_maps/util/pairwise2clusterid.py

inputs:

  • corum_train_labeled.libsvm1.scale.libsvm0.scaleByTrain.resultsWprob_pairs_noself_nodups_wprob.txt
  • corum_train_labeled.libsvm1.scale.libsvm0.scaleByTrain.resultsWprob_pairs_noself_nodups_wprob.best_cluster_wOverlap_nr_allComplexesCore_mammals_psweep_clusterone_mcl.[best].txt

output: corum_train_labeled.libsvm1.scale.libsvm0.scaleByTrain.resultsWprob_pairs_noself_nodups_wprob.best_cluster_wOverlap_nr_allComplexesCore_mammals_psweep_clusterone_mcl.[best].edgeAttributeWClusterid.txt

Load into Cytoscape

About

Public repository for building and analyzing protein complex maps

Resources

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.