Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inability to process SMILES with certain SMILES characters #2

Open
sohaibbu2015 opened this issue May 14, 2022 · 3 comments
Open

Inability to process SMILES with certain SMILES characters #2

sohaibbu2015 opened this issue May 14, 2022 · 3 comments

Comments

@sohaibbu2015
Copy link

sohaibbu2015 commented May 14, 2022

Hello,
My team and I are interested in using your package to facilitate the generation of new molecules with potential to have a certain type of toxicity. In order to do that, we explored the ability of your package to use a user-defined scoring function as the center of the training protocol. In addition to that, a new dataset is required to complete the training process. Although your package proved to be able to carry out training using user-defined functions, it seems to have some issues handling SMILES representations that contain certain characters. After some investigation, it seems like the package fails to process SMILES that contain / and \ characters which are used to indicate the cis and trans positions of atoms. we were wondering if there exist an easy fix to this problem and if yes, what should be done to fix that issue.

This the error it keeps showing
data processing 0/44
Traceback (most recent call last):
File "main.py", line 235, in
learn(mol_sml, args)
File "main.py", line 121, in learn
subgraph_set_init, input_graphs_dict_init = data_processing(smiles_list, args.GNN_model_path, args.motif)
File "/home/qspt_user/data_efficient_grammar/grammar_generation.py", line 42, in data_processing
subgraphs.append(SubGraph(subgraph_i_mapped, mapping_to_input_mol=subgraph_i_mapped, subfrags=list(cluster)))
File "/home/qspt_user/data_efficient_grammar/private/molecule_graph.py", line 91, in init
super(SubGraph, self).init(mol, is_subgraph=True, mapping_to_input_mol=mapping_to_input_mol)
File "/home/qspt_user/data_efficient_grammar/private/molecule_graph.py", line 15, in init
self.hypergraph = mol_to_hg(mol, kekulize=True, add_Hs=False)
File "/home/qspt_user/data_efficient_grammar/private/hypergraph.py", line 744, in mol_to_hg
bipartite_g = mol_to_bipartite(mol, kekulize)
File "/home/qspt_user/data_efficient_grammar/private/hypergraph.py", line 692, in mol_to_bipartite
mol = standardize_stereo(mol)
File "/home/qspt_user/data_efficient_grammar/private/hypergraph.py", line 938, in standardize_stereo
atom_idx_1 = each_bond.GetStereoAtoms()[0]
IndexError: Index out of range

@gmh14
Copy link
Owner

gmh14 commented May 16, 2022

Hey,

My current suggestion is to remove these "/", "", or "*" from the SMILES strings since we now cannot handle any stereo/geometry information of molecular graphs.

@ABB-ADD
Copy link

ABB-ADD commented Apr 10, 2023

Hello, we also met the same problem. Have you solved it?

@mahuahuahua
Copy link

Hello, we also met the same problem. Have you solved it?
Hello, did you solve this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants