Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the chemical information for each rule environment with a given rule? #26

Closed
chemcomp-Jen opened this issue Feb 10, 2021 · 5 comments

Comments

@chemcomp-Jen
Copy link

Hi all,

I built a mmpDB using the example "test_data.smi" shown on the github page. Then, I run a query to grep all the rules from the schema as follows: (please correct me if I did something wrong with the query)

c = cursor.execute(
            "SELECT rule_environment.rule_id, from_smiles.smiles, from_smiles.num_heavies, to_smiles.smiles, to_smiles.num_heavies, "
            "          rule_environment.radius, "
            "          rule_environment_statistics.id, property_name_id, count, avg, std, kurtosis, skewness, min, q1, median, q3, max, paired_t, p_value "
            "  FROM rule, rule_environment, rule_environment_statistics, "
            "          rule_smiles as from_smiles, rule_smiles as to_smiles "
            " WHERE rule_environment.id = rule_environment_id "
            "   AND rule_environment_statistics.rule_environment_id = rule_environment_id "
            "   AND rule_environment.rule_id = rule.id "
            "   AND rule.from_smiles_id = from_smiles.id "
            "   AND rule.to_smiles_id = to_smiles.id ")

After that, I took a look at all rules. I found it is difficult to understand the rule environments for the same rule.

For example below, for the rule id 0, there are 11 environments with different rule-environment-id. But how could I get the local chemical information for each environment? That will help me understand the difference between those environments.

rule_id from_smiles from_smiles_nHeavies to_smiles to_smiles_nHeavies environ_radius rule_environ_id prop_id count avg
0 [*:1]c1ccccc1N 7 [*:1]c1ccccc1O 7 0 1 0 2 1
0 [*:1]c1ccccc1N 7 [*:1]c1ccccc1O 7 1 3 0 1 1
0 [*:1]c1ccccc1N 7 [*:1]c1ccccc1O 7 2 5 0 1 1
0 [*:1]c1ccccc1N 7 [*:1]c1ccccc1O 7 3 7 0 1 1
0 [*:1]c1ccccc1N 7 [*:1]c1ccccc1O 7 4 9 0 1 1
0 [*:1]c1ccccc1N 7 [*:1]c1ccccc1O 7 5 11 0 1 1
0 [*:1]c1ccccc1N 7 [*:1]c1ccccc1O 7 1 214 0 1 1
0 [*:1]c1ccccc1N 7 [*:1]c1ccccc1O 7 2 216 0 1 1
0 [*:1]c1ccccc1N 7 [*:1]c1ccccc1O 7 3 218 0 1 1
0 [*:1]c1ccccc1N 7 [*:1]c1ccccc1O 7 4 220 0 1 1
0 [*:1]c1ccccc1N 7 [*:1]c1ccccc1O 7 5 222 0 1 1

Thanks,
Jen

@KramerChristian
Copy link
Contributor

Dear Jen,

If I understand you correctly, you would like to see something like a SMILES or SMARTS for a given environment. This is currently not stored in the DB, so there is no direct way to access it.

A workaround could be that for a given rule_environment, you extract one sample pair and display this, together with the radius (as integer). If the from_smiles and to_smiles are known, the environment for a given radius can then be inferred from looking at the pair.

Please note that the environment is grouped via the Morgan fingerprint, rooted at the attachment atom. The Morgan Fingerprint for each atom amongst others features the number of neighboring atoms. Therefore, a radius 1 Fingerprint in reality is more a kind of radius 1.5 fingerprint. For this reason, if you wanted to depict the Morgan Fingerprint environment as a SMILES, this would get be rather complicated.

Does this answer your question?

@chemcomp-Jen
Copy link
Author

Hi Christian,

Thanks for the reply. Yes, I would like to see the SMILES or SMARTS for a given environment. Good to know there is a workaround. I think it should work.

I will give a try to extract one example pair/or all pairs (if possible) from a given rule_environment. Would you mind give an example of how to do that? Is there any existing functions in the package I can use?

Thanks,
Jen

@KramerChristian
Copy link
Contributor

Hi Jen,

there is no existing function that does exactly what you ask. But if you are coding already anyway, and you know the rule_environment_id for which you want to see a pair, you could try something like this:

sql = """
SELECT pair.id,
lhs_compound.clean_smiles, rhs_compound.clean_smiles,
lhs_compound.public_id, rhs_compound.public_id
FROM pair,
compound as lhs_compound, compound as rhs_compound
WHERE pair.rule_environment_id = ?
AND pair.compound1_id = lhs_compound.id
AND pair.compound2_id = rhs_compound.id
LIMIT 1
"""
c = cursor.execute(sql, (environment_id, ))

Disclaimer: The concept of the statement should work, but I have not tested it and my python has gotten a little rusty lately, you may need to fix some of the commas etc. Please let me know if this works.

@chemcomp-Jen
Copy link
Author

Hi Christians,

Thank you so much for the details. I will code the query and try to grab the pair information. I will keep you posted how it works.

Best, Jen

@chemcomp-Jen
Copy link
Author

Hi Christian, It works as expected!

Thanks, Jen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants