-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What to do with the local_representations and global_representations #6
Comments
Hi @rdenise,
We will probably fix it at some point in the future, by simply running the entire pretraining process from scratch (this time not losing any file...). Unfortunately, I cannot guarantee when this will happen. Notably, the code of ProteinBERT does work properly, so if you run the entire pretraining from scratch (which should take you 1 month on a single GPU to get to the same performance as the published model), then you should end up with a model that provides GO annotations. I truly apologize for this... |
hello @nadavbra Flow: seq → local, global def encode_X(self, seqs, seq_len):
return [
tokenize_seqs(seqs, seq_len),
np.zeros((len(seqs), self.n_annotations), dtype = np.int8)
]
encoded_x = input_encoder.encode_X(seqs, seq_len)
local_representations, global_representations = model.predict(encoded_x, batch_size=batch_size) The Because I read in other answers that the ProteinBert model can use sequence data to predict corresponding GO annotation information, this has left me a bit confused. My goal is to treat the ProteinBert model as an encoder, input a protein sequence, and then obtain the corresponding local and global feature representations for downstream tasks. Sincerely, |
Hi @yelou2022, The global features predicted by ProteinBERT are GO annotations, whether or not it gets them as input. If it gets some GO annotations as input, then it's easier to predict other GO annotations and you should expect more accurate predictions, but it will do its best to predict the GO annotations given the sequence even if you don't provide any annotations as input. I hope it clarifies things. |
I understood, thank you. |
Hi @nadavbra, The global representation vector that I am getting is not of the size 8943 as mentioned above. What I am getting is a vector of size 15599. Were there any changes done since the time this was discussed in this thread? |
@srikrishnan-b I suppose that includes hidden layers via |
Hello everyone,
After using the model I have two array that are the local_representations and global_representations
But now I don't know what to do to have the GO annotations of my sequences ?
all the best
The text was updated successfully, but these errors were encountered: