DiffFit: Visually-Guided Differentiable Fitting of Molecule Structures to Cryo-EM Map
IEEE VIS 2024 Submission arXiv preprint, Video, OSF repo
- Install
- Demo Usage Scenario 1: Fit a single structure
- Demo Usage Scenario 2: Composite multiple structures
- Demo Usage Scenario 3: Identify unknown densities
- Download the repository and unzip to a path. The following guide will use
D:\GIT\DiffFitViewer
as this path. - Run ChimeraX command
devel build D:\GIT\DiffFitViewer; devel install D:\GIT\DiffFitViewer
- Open the system command line shell, install PyTorch, Biopython, mrcfile, scikit-learn to ChimeraX's Python
- Find ChimeraX's Python, you may find this guide useful. The following commands will use
C:\Users\luod\AppData\Local\ChimeraX\bin\python.exe
. - Install PyTorch via the following command or according to its official doc
C:\Users\luod\AppData\Local\ChimeraX\bin\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
- Install Biopython, mrcfile, scikit-learn via the following command
C:\Users\luod\AppData\Local\ChimeraX\bin\python.exe -m pip install biopython mrcfile scikit-learn
- Find ChimeraX's Python, you may find this guide useful. The following commands will use
Now, DiffFit should be fully installed. Launch it via Tools > Volume Data > DiffFit
Right-click in the panel to access DiffFit's help page.
- Download PDB-8JGF and EMD-36232
- note the resolution as
2.7
Å from the webpage - extract the map
- put the files (
8jgf.cif
andemd_36232.map
) under, for example,D:\GIT\DiffFitViewer\run\input\8JGF
- note the resolution as
- Drop both files into ChimeraX,
- take a note for the
pixel
value from the log, which represents the grid spacing for this volume, which is1.04
in this case - move and rotate the molecule and then save it (select it, choose "Save selected atoms only", uncheck "Use untransformed coordinates") as
8JGF_transformed.cif
. This step is only for demo purpose and is not necessary for real use cases
- take a note for the
- Put
8JGF_transformed.cif
underD:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif
- Simulate a map for the molecule
- Create two folders,
subunits_mrc
andsubunits_npy
, underD:\GIT\DiffFitViewer\run\input\8JGF\
- Open a new ChimeraX session and run
runscript "D:\GIT\DiffFitViewer\src\convert2mrc_npy.py" "D:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif" "D:\GIT\DiffFitViewer\run\input\8JGF\subunits_mrc" "D:\GIT\DiffFitViewer\run\input\8JGF\subunits_npy" 2.7 1.04
- Create two folders,
- Run DiffFit. Set the parameters as follows and hit
Run!
- Target volume:
D:\GIT\DiffFitViewer\run\input\8JGF\emd_36232.map
- Structures folder:
D:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif
- Structures sim-map folder:
D:\GIT\DiffFitViewer\run\input\8JGF\subunits_mrc
- Output folder:
D:\GIT\DiffFitViewer\run\output\8JGF
- Experiment name:
fit_single_demo
- Target surface threshold:
0.20
. Or use the author recommended contour level0.162
. DiffFit is very robust against this parameter, a value between 0.02 - 0.4 is fine in this case. - Leave the rest as default and hit
Run!
- Target volume:
- After freezing for a couple of seconds (less than 15 seconds on one RTX 4090), ChimeraX should be back and responsive to you. Click the
View
tab to examine the results.- Save the molecule if desired
- You may take a look at the optimization steps
- If you want to change the cluster tolerance, or if you run Compute on a cluster, or if you accidentally close ChimeraX after Compute run, you can View the results by the following parameter settings
- Target volume:
D:\GIT\DiffFitViewer\run\input\8JGF\emd_36232.map
- Structures folder:
D:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif
- Data folder:
D:\GIT\DiffFitViewer\run\output\8JGF\fit_single_demo
- Clustering - Shift Tolerance:
0.5
or the value you desire - Clustering - Angle Tolerance:
0.5
or the value you desire - Hit
Load
- Target volume:
- Download PDB-8SMK and EMD-40589
- note the resolution as
3.5
Å from the webpage - extract the map
- put the files (
8smk.cif
andemd_40589.map
) under, for example,D:\GIT\DiffFitViewer\run\input\8SMK
- note the resolution as
- Drop both files into ChimeraX,
- take a note for the
pixel
value from the log, which represents the grid spacing for this volume, which is0.835
in this case - move and rotate the molecule and then save it (select it, choose "Save selected atoms only", uncheck "Use untransformed coordinates") as
8SMK_transformed.cif
. This step is only for demo purpose and is not necessary for real use cases
- take a note for the
- Create a folder
subunits
underD:\GIT\DiffFitViewer\run\input\8SMK
- Split the chains into individual .cif files and simulate a map for each chain
- Open a new ChimeraX session and run
runscript "D:\GIT\DiffFitViewer\src\split_chains.py" "D:\GIT\DiffFitViewer\run\input\8SMK\8SMK_transformed.cif" "D:\GIT\DiffFitViewer\run\input\8SMK\subunits" 3.5 0.835
- Put all generated .cif files under
D:\GIT\DiffFitViewer\run\input\8SMK\subunits_cif
- Put all generated .mrc files under
D:\GIT\DiffFitViewer\run\input\8SMK\subunits_mrc
- Delete all generated .npy files, or put them under
D:\GIT\DiffFitViewer\run\input\8SMK\subunits_npy
- Keep only the unique chains (A, B, C) in
subunits_cif
andsubunits_mrc
- Open a new ChimeraX session and run
- Run DiffFit. Set the parameters as follows and hit
Run!
- Target volume:
D:\GIT\DiffFitViewer\run\input\8SMK\emd_40589.map
- Structures folder:
D:\GIT\DiffFitViewer\run\input\8SMK\subunits_cif
- Structures sim-map folder:
D:\GIT\DiffFitViewer\run\input\8SMK\subunits_mrc
- Output folder:
D:\GIT\DiffFitViewer\run\output\8SMK
- Experiment name:
round1
- Target surface threshold:
0.8
. Or use the author recommended contour level5.0
. DiffFit is very robust against this parameter, a value between 0.1 - 5.0 is fine in this case. - # shifts:
30
- # quaternions:
300
- Leave the rest as default and hit
Run!
- Target volume:
- After freezing for a couple of seconds (less than 30 seconds on one RTX 4090), ChimeraX should be back and responsive to you. Click the
View
tab to examine the results.- Examine the fit, sort by a different metric
- If you want to change the cluster tolerance, or if you run Compute on a cluster, or if you accidentally close ChimeraX after Compute run, you can View the results by the following parameter settings
- Target volume:
D:\GIT\DiffFitViewer\run\input\8SMK\emd_40589.map
- Structures folder:
D:\GIT\DiffFitViewer\run\input\8SMK\subunits_cif
- Data folder:
D:\GIT\DiffFitViewer\run\output\8SMK\composite_unique_chains
- Clustering - Shift Tolerance:
6
or the value you desire - Clustering - Angle Tolerance:
15
or the value you desire - Hit
Load
- Target volume:
- Save a molecule if desired
- Set the Resolution as
3.5
, and clickSimulate volume
- Change the surface level threshold for the simulated volume if necessary
- Click
Zero density
- Repeat the last 4 steps (Save, Simulate, Zero) for the same
Mol Id
at a different place, or for a differentMol Id
until there is no good fit - Save the last
working volume
byFile > Save > Files of type as MRC > Map as the desired one
as a new name, for example,emd_40589_round_1.mrc
- Repeat Step 5-6 until satisfied with the whole compositing
- Change the Target volume as:
D:\GIT\DiffFitViewer\run\input\8SMK\emd_40589_round_1.mrc
- If needed, take out the already fitted chains from
subunits_cif
andsubunits_mrc
- Give a new Experiment name:
round2
- You may lower the # shifts, for example, to
10
, and the # quaternions to100
- Hit
Run!
- Change the Target volume as:
The whole procedure is the same as in Scenario 1: Fit a single structure,
only that there will be multiple structures under subunits_cif
.
There is a demo data set with one volume map and three structures to search against.
If you have put DiffFit under D:\GIT\DiffFitViewer
,
you can just hit Run!
in the Compute tab and then go to the View tab.
If otherwise, you just need to change the path for the input and the output data.
If you want to search against the whole candidate library for this case from DomainFit, you can either follow Steps 1-3 from its doc to generate the PDB files for the domains, or just download the ones generated by us from this Google Drive link. Of note is that we generated 359 PDB files by following DomainFit's Steps 1-3, instead of the mentioned 344 files.
The computing time for searching the whole candidate library on one RTX 4090 is about 10 minutes.