features and labels #4

Mohamed-ElhajAbdou · 2021-01-16T23:48:57Z

Hello, i have two questions hope i get the answers from you

1- first the rule of the sequence alignment is that to extract a chunks of subsequences represents the first sequence

2- and then those alignments are fed to the covariance matrix to extract a matrix called covariance matrix the measures the correlations between each of these alignments with each other

3-from what i understand it that proteins contact map describe the distance matrix as a label , like for example the distance between the first amino acid in the first chain and the first amino acid in the second chain is equal to 200 A, we set a threshold with 8 A so the proteins contact map description for this distance number will be "not in contact" "False" or in binary world "0" is im right with that understanding

My Questions
First
1-what is the rule of the covariance matrix
2- what is the rule of proteins contact map are those the labels of the matrix distances if so what is the rule of the covariance matrix
3- what is the input to the neural network model
A- what is the feature, are those the distance matrix if yes what is the rule of covariance matrix
B- what is the label of these features are Proteins contact map is the labels in (0's and 1's )

Second
1- i want from you kindly to give me a hint or steps which is the first script to use and second and so on cuz i want to cite your paper so i started to inspired from your great work

thanks in advance

DanBuchan · 2021-01-20T15:40:46Z

Hi, you've copy and pasted the exact same question to 2 of our repos. Can you clarify which package you actually want to know about.

Mohamed-ElhajAbdou · 2021-01-20T17:35:14Z

Hello Dan. Actually I had copied my qustion In the two repo hoping that any one answering me however
Let make the qustion for this contribution
First from what I understood from the research paper that the input features are 441 features for example
Contact map matrix that represents the contact distances between each two residues
Secondary structure generated file and so on
My question
After generating all those features separately using the tools mentioned in the paper
What is the input to the neural network ? ...are the input is directly feeding the neural network each one of those features ...each one of this features represents a separated vector???
For example
Vector_1 protein contact map matrix
Vector_2 secondary structure output
And so on ????
A second question what is the labels for all of those features??? Are the labels are generated separately that related to each vector feature for example
Vector_1 contact map matrix ...label_1 is a matrix describes each of distances are in contact or not (0's, 1's)
Or ( true, false )
If yes i don't know what is the labels for the others vectors if my assumption right !!!?
Lastly I want to ask what is the role of the covarance matrix...yes its measures the similarly between the aligned sequences ...but are they treated as feature vector too ???
Thanks in advance

Mohamed-ElhajAbdou · 2021-01-20T17:46:06Z

very sorry for duplicated question i hope to get the answers from you it will be apparated

shaunmk · 2021-01-20T17:57:19Z

Hi,

The LxLx441-dimensional input is the covariance matrix, not the inter-residue distances or contacts. Contacts are predicted as the output of the neural network.
The covariance matrix is combined (concatenated) in an appropriate fashion with the remaining features and fed as input to the neural network. You can see the steps in src/cov21stats.c (441 covariance features) and src/deepmetapsicov_makepredmap.c. (other features). When predicting contacts, the neural network uses all the different input features at the same time, and there is only one type of label or output, i.e. Binary contacts.

Does that clear things up?

Mohamed-ElhajAbdou · 2021-01-20T18:05:02Z

hi Shaunmk, thank you for your answer, from your explanation,
the input to the Neural network is the covariance matrix and fused with the all features such as ( contact map, secondary structure, solvent accessibility...etc).
so what kind of fusion is done ?

shaunmk · 2021-01-20T19:15:52Z

It's a simple concatenation. For a given pair of residues, the inputs are a 501-dimensional vector. The first 441 elements of this vector are the covariance values for that residue pair. The remaining 60 elements are the other features.

Mohamed-ElhajAbdou · 2021-01-20T19:35:29Z

like python matrices concatenation like this example !!!!
covariance_matrix = np.array([[0, 1, 2],[3, 4, 5],[6, 7, 8]])
feature_vector_1 (contact map)= ([[10, 11, 12],[13, 14, 15], [16, 17, 18]])
...etc
Neural_network_input=np.concatenate((covariance_matrix , feature_vector_1 ,feature_vector_2,...,feature_vector_66))
also as a supervised learning method it sould be a lebels
are the labels had been assigned for each dimension separately or for all the dimensions at a time to be equal to the length of the L
and what kind of labels values

shaunmk · 2021-01-21T09:56:55Z

In the case of 2D matrices, it would be more like
np.stack((covariance_matrix, feature_vector_1), axis=0)

As I said above, the contact map is NOT included in the inputs to the model. The contact map forms the labels that are being predicted. The neural net takes input of dimensions L x L x 501, and returns an output (the contact map) of dimensions L x L x 1. Analogous to images (where there are 3 RGB channels in the image), here, our input has 501 channels.

Mohamed-ElhajAbdou · 2021-02-19T06:10:15Z

Okay I understand your idea well ...but could you kindly provide me just one sample from your dataset ...so I can see the input sequence and the protien contact map labels ....I just want dataset for just one sample and so i will be able create my own dataset and train my model on it
Thanks in advance

shaunmk · 2021-02-19T10:57:22Z

The test/ directory has an example input sequence and multiple sequence alignment. A PSIBLAST PSSM is also provided for full reproducibility. The script test/test_DMP.sh shows how to use these files to produce the prediction. The expected output (the contact predictions) are also provided in test/example_con, though these are in CASP RR format.

If you'd like to capture the PyTorch tensor that contains the predictions as a numeric matrix, you will need to modify deepmetapsicov_consens/pytorch_metacov_consenspred_030model.py to output the variable result after line 64, using something like:

result = (result + result.transpose(3,2))/2.0

and then print the contents of result.

This operation of taking the mean of results and its transpose ensures that the output contact map is symmetric (this is because the raw output of the neural net is close to symmetric but not exactly so). This operation is done on a per-element basis when writing the CASP-format contacts, on line 69.

shaunmk · 2021-07-06T13:44:00Z

Closing due to inactivity; please reopen if you still have issues.

DanBuchan mentioned this issue Jan 20, 2021

features and labels psipred/DeepCov#5

Closed

shaunmk closed this as completed Jul 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

features and labels #4

features and labels #4

Mohamed-ElhajAbdou commented Jan 16, 2021

DanBuchan commented Jan 20, 2021

Mohamed-ElhajAbdou commented Jan 20, 2021

Mohamed-ElhajAbdou commented Jan 20, 2021 •

edited

Loading

shaunmk commented Jan 20, 2021

Mohamed-ElhajAbdou commented Jan 20, 2021

shaunmk commented Jan 20, 2021

Mohamed-ElhajAbdou commented Jan 20, 2021

shaunmk commented Jan 21, 2021

Mohamed-ElhajAbdou commented Feb 19, 2021

shaunmk commented Feb 19, 2021

shaunmk commented Jul 6, 2021

features and labels #4

features and labels #4

Comments

Mohamed-ElhajAbdou commented Jan 16, 2021

DanBuchan commented Jan 20, 2021

Mohamed-ElhajAbdou commented Jan 20, 2021

Mohamed-ElhajAbdou commented Jan 20, 2021 • edited Loading

shaunmk commented Jan 20, 2021

Mohamed-ElhajAbdou commented Jan 20, 2021

shaunmk commented Jan 20, 2021

Mohamed-ElhajAbdou commented Jan 20, 2021

shaunmk commented Jan 21, 2021

Mohamed-ElhajAbdou commented Feb 19, 2021

shaunmk commented Feb 19, 2021

shaunmk commented Jul 6, 2021

Mohamed-ElhajAbdou commented Jan 20, 2021 •

edited

Loading