-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace SMILES input to Coulomb matrix #56
Comments
The h5 files is expected to have two groups: "data_train" and "data_test". You might want to do something like this, from the repo's
Then, using a chunking function we defined, we do:
|
Thanks for your answer,
and this part
that is why I failed to rename by dataset |
What I mean is that you have to split your dataset of coulomb matrices into a train set and a test set. A helper function to do that is the train_test_split function from sklearn.model_selection. Then, you would do something like
That will create an h5 with your data split into train and test as expected by train.py. However, know that train.py (and all the other scripts in the repo) use a particular network topology that probably won't work with the shape of your data. The model is defined at https://github.com/maxhodak/keras-molecules/blob/master/molecules/model.py. As you can see, the dimensions of the input tensors as defined are |
Oh, I think I get it. |
Just a question, how is the latent_dim determined to be 292? |
That was the latent dimension that was reported in the Gomez-Bombarelli
paper that is referenced in the README.
…On Tue, Jan 24, 2017 at 4:37 AM, jeffrey9909 ***@***.***> wrote:
Just a question, how is the latent_dim determined to be 292?
I am trying to modify the code by myself the moment, but I have not idea
about this...
Thanks.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#56 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFGDhopCRjBO5c1S3l6GUAjgZKHXMBaKks5rVcZjgaJpZM4LpHaM>
.
|
I am working on changing the input form SMILES to Coulomb matrix, 200 Coulomb matrices (29*29 matrix) with the HOMO-LUMO gap have been produced and saved into a .h5 file by the following code:
And I try to run the train.py directly with the generated process.h5 and give me this error message
I think that the problem comes from the way that I save the file is different from the original preprocess.py... But I cannot get the original idea and thus don't know how should I modify my code.
The preprocess.py I am using is here
And I want to known other than the ''naming'' problem as I have mentioned, will the NN work as I expected if I directly import the Coulomb matrix to replace the SMILES strings? Is there any part of the code I will need to modify? Thank You.
I know that this is not a good way to ask questions, but I really need some help. Any help is appreciated. Thank You.
The text was updated successfully, but these errors were encountered: