Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with non-reproducibility in the energy calculation #25

Open
alexandrefassio opened this issue Oct 5, 2023 · 2 comments
Open

Issue with non-reproducibility in the energy calculation #25

alexandrefassio opened this issue Oct 5, 2023 · 2 comments

Comments

@alexandrefassio
Copy link

Hi,

As a proof-of-concept, I've been trying to sample the H-CDR3 structure of a given antibody to see if your method would be able to reproduce the crystal structure conformation. For the designed structures I'm especially interested in identifying those conformations where the RMSD is <= 2 Å.

I'm not sure if my protocol is correct, but I've been running it as follows:

  • Step 1: run design_pdb.py with the config file 'strpred.yml'. I've modified it to sample only the H-CDR3.
  • Step 2: run diffab/tools/relax/run.py with the pipeline set to 'pyrosetta' to relax the sampled structures.
  • Step 3: run diffab/tools/eval/run.py to obtain RMSDs and energies.

With this protocol, I was able to identify some conformations where the RMSD was smaller than 2 Å.

However, I noticed that across different runs, the energy for the reference structure was changing, which would make my analysis not comparable and non-reproducible. For instance, the absolute difference between the dG_ref in two runs was 100, which is a lot.

So, I started debugging the code and found out that 'diffab/tools/relax/run.py' relaxes the reference and 'diffab/tools/eval/run.py' uses it as the reference to calculate the energy. That explains why across different runs I got different dG_refs. I then set PyRosetta to use the same seed in 'diffab/tools/relax/run.py' and I was able to get the same values across different calls of 'diffab/tools/eval/run.py'.

Despite that, I'm still getting minor differences for the reference within the same run. See an example below:

image

The difference is small, but it is strange to have different energies for the same complex. I tried to set PyRosetta's seed in 'diffab/tools/eval/run.py', but it didn't work.

Would you know what is going on here?

thank you for this very great tool and for your help.
bests

@userscy
Copy link

userscy commented Nov 15, 2023

Hi,
I have the same problem as you, and i wonder how did you set PyRosetta to use the same seed in 'diffab/tools/relax/run.py'?
I look forward to your prompt reply.

@alexandrefassio
Copy link
Author

alexandrefassio commented Nov 27, 2023

Hi @userscy

After debugging the code for a while I figured out two things about this issue.

  1. You need to modify line # 17 from diffab/utils/transforms.merge.py as shown below:

chains = {c: i for i, c in enumerate(sorted(chains))}

The original code doesn't have the sorted(), which causes the sampled structures (outputted PDB files) to have random chain orders. This does make a difference in reproducibility.

Finally, to set the seed in Pyrosetta, you need to add the flag -constant_seed into the init() method in diffab/tools/relax/pyrosetta_relaxer.py. It will look like this:

pyrosetta.init(' '.join([
    '-mute', 'all',
    '-use_input_sc',
    '-ignore_unrecognized_res',
    '-ignore_zero_occupancy', 'false',
    '-load_PDB_components', 'false',
    '-relax:default_repeats', '2',
    '-no_fconfig',
    '-constant_seed'
]))
  1. However, despite achieving reproducible runs with this strategy, I found out that comparison across different CDRs is not possible with the Pyrosetta protocol. That's because the protocol they've implemented in DiffAb minimizes only CDR residues and their immediate neighbors. That means the final minimized REF structure will be different across CDRs. More specifically, if you design sequences for CDR1 and CDR3, for instance, then you cannot compare the dG results between them.

It seems to me that the OpenMM protocol minimizes the whole structure, which makes more sense. So I've been using it instead.

Let me know if this works for you.

Have you achieved any interesting results on this matter as well?

bests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants