Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve / change products_single_atoms filter #3

Open
avaucher opened this issue Oct 3, 2022 · 0 comments
Open

Improve / change products_single_atoms filter #3

avaucher opened this issue Oct 3, 2022 · 0 comments

Comments

@avaucher
Copy link
Member

avaucher commented Oct 3, 2022

Imported from internal issue number 24 (Dec 2020).


Looking into the cases why 2000 reactions from [a project] were marked by this filter,

        if self.products_single_atoms(reaction):
            valid = False
            reasons.append("products_single_atoms")

it appears that actually most of them (maybe 90%) are actually rather "No product". This is mainly reactions that have the same starting material and product.

I would therefore:

  • Add a "no product" filter
  • Reformulate the products_single_atoms filter so that it does not return True in this case.

Thinking a bit further, I am not sure if we really need this "single atom" filter. The remaining 10% cases are reactions in which we had, in the beginning, sth like X>>X.Cl or X>>X.[NH4+]. We need to consider:

  • There are also cases in which the ion has more than one heavy atom (triflate, ...)
  • Mainly it's also about renaming the filter. I'd say that "single atoms" are the symptom, not the actual reason why we remove the reaction.

Similarly, products_subset_of_reactants currently returns True if there are no products - if we add the "no product" filter, this one is not needed anymore.

What may be to discuss is the following:
Currently, it looks to me like the molecules present on both sides are removed before entering the function with the filters - which leads to the problem(s) above. If I see it correctly, this is not necessary: products_subset_of_reactants would mark the reaction as invalid anyway and so it would be filtered afterwards anyway.
[@others] anything I am missing there?


Decision:

update implementations of product_single_atom and products_subset_of_reactants so that they are not tagged when the products are empty.

Add a column in csv, sth like rxn_before_smiles_processing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant