Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

features and labels #4

Closed
Mohamed-ElhajAbdou opened this issue Jan 16, 2021 · 11 comments
Closed

features and labels #4

Mohamed-ElhajAbdou opened this issue Jan 16, 2021 · 11 comments

Comments

@Mohamed-ElhajAbdou
Copy link

Hello, i have two questions hope i get the answers from you

1- first the rule of the sequence alignment is that to extract a chunks of subsequences represents the first sequence

2- and then those alignments are fed to the covariance matrix to extract a matrix called covariance matrix the measures the correlations between each of these alignments with each other

3-from what i understand it that proteins contact map describe the distance matrix as a label , like for example the distance between the first amino acid in the first chain and the first amino acid in the second chain is equal to 200 A, we set a threshold with 8 A so the proteins contact map description for this distance number will be "not in contact" "False" or in binary world "0" is im right with that understanding

My Questions
First
1-what is the rule of the covariance matrix
2- what is the rule of proteins contact map are those the labels of the matrix distances if so what is the rule of the covariance matrix
3- what is the input to the neural network model
A- what is the feature, are those the distance matrix if yes what is the rule of covariance matrix
B- what is the label of these features are Proteins contact map is the labels in (0's and 1's )

Second
1- i want from you kindly to give me a hint or steps which is the first script to use and second and so on cuz i want to cite your paper so i started to inspired from your great work

thanks in advance

@DanBuchan
Copy link
Contributor

Hi, you've copy and pasted the exact same question to 2 of our repos. Can you clarify which package you actually want to know about.

@Mohamed-ElhajAbdou
Copy link
Author

Hello Dan. Actually I had copied my qustion In the two repo hoping that any one answering me however
Let make the qustion for this contribution
First from what I understood from the research paper that the input features are 441 features for example
Contact map matrix that represents the contact distances between each two residues
Secondary structure generated file and so on
My question
After generating all those features separately using the tools mentioned in the paper
What is the input to the neural network ? ...are the input is directly feeding the neural network each one of those features ...each one of this features represents a separated vector???
For example
Vector_1 protein contact map matrix
Vector_2 secondary structure output
And so on ????
A second question what is the labels for all of those features??? Are the labels are generated separately that related to each vector feature for example
Vector_1 contact map matrix ...label_1 is a matrix describes each of distances are in contact or not (0's, 1's)
Or ( true, false )
If yes i don't know what is the labels for the others vectors if my assumption right !!!?
Lastly I want to ask what is the role of the covarance matrix...yes its measures the similarly between the aligned sequences ...but are they treated as feature vector too ???
Thanks in advance

@Mohamed-ElhajAbdou
Copy link
Author

Mohamed-ElhajAbdou commented Jan 20, 2021

very sorry for duplicated question i hope to get the answers from you it will be apparated

@shaunmk
Copy link
Member

shaunmk commented Jan 20, 2021

Hi,

The LxLx441-dimensional input is the covariance matrix, not the inter-residue distances or contacts. Contacts are predicted as the output of the neural network.
The covariance matrix is combined (concatenated) in an appropriate fashion with the remaining features and fed as input to the neural network. You can see the steps in src/cov21stats.c (441 covariance features) and src/deepmetapsicov_makepredmap.c. (other features). When predicting contacts, the neural network uses all the different input features at the same time, and there is only one type of label or output, i.e. Binary contacts.

Does that clear things up?

@Mohamed-ElhajAbdou
Copy link
Author

hi Shaunmk, thank you for your answer, from your explanation,
the input to the Neural network is the covariance matrix and fused with the all features such as ( contact map, secondary structure, solvent accessibility...etc).
so what kind of fusion is done ?

@shaunmk
Copy link
Member

shaunmk commented Jan 20, 2021

It's a simple concatenation. For a given pair of residues, the inputs are a 501-dimensional vector. The first 441 elements of this vector are the covariance values for that residue pair. The remaining 60 elements are the other features.

@Mohamed-ElhajAbdou
Copy link
Author

like python matrices concatenation like this example !!!!
covariance_matrix = np.array([[0, 1, 2],[3, 4, 5],[6, 7, 8]])
feature_vector_1 (contact map)= ([[10, 11, 12],[13, 14, 15], [16, 17, 18]])
...etc
Neural_network_input=np.concatenate((covariance_matrix , feature_vector_1 ,feature_vector_2,...,feature_vector_66))
also as a supervised learning method it sould be a lebels
are the labels had been assigned for each dimension separately or for all the dimensions at a time to be equal to the length of the L
and what kind of labels values

@shaunmk
Copy link
Member

shaunmk commented Jan 21, 2021

In the case of 2D matrices, it would be more like
np.stack((covariance_matrix, feature_vector_1), axis=0)

As I said above, the contact map is NOT included in the inputs to the model. The contact map forms the labels that are being predicted. The neural net takes input of dimensions L x L x 501, and returns an output (the contact map) of dimensions L x L x 1. Analogous to images (where there are 3 RGB channels in the image), here, our input has 501 channels.

@Mohamed-ElhajAbdou
Copy link
Author

Okay I understand your idea well ...but could you kindly provide me just one sample from your dataset ...so I can see the input sequence and the protien contact map labels ....I just want dataset for just one sample and so i will be able create my own dataset and train my model on it
Thanks in advance

@shaunmk
Copy link
Member

shaunmk commented Feb 19, 2021

The test/ directory has an example input sequence and multiple sequence alignment. A PSIBLAST PSSM is also provided for full reproducibility. The script test/test_DMP.sh shows how to use these files to produce the prediction. The expected output (the contact predictions) are also provided in test/example_con, though these are in CASP RR format.

If you'd like to capture the PyTorch tensor that contains the predictions as a numeric matrix, you will need to modify deepmetapsicov_consens/pytorch_metacov_consenspred_030model.py to output the variable result after line 64, using something like:

result = (result + result.transpose(3,2))/2.0

and then print the contents of result.

This operation of taking the mean of results and its transpose ensures that the output contact map is symmetric (this is because the raw output of the neural net is close to symmetric but not exactly so). This operation is done on a per-element basis when writing the CASP-format contacts, on line 69.

@shaunmk
Copy link
Member

shaunmk commented Jul 6, 2021

Closing due to inactivity; please reopen if you still have issues.

@shaunmk shaunmk closed this as completed Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants