Skip to content

nanovis/DiffFitViewer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DiffFit: Visually-Guided Differentiable Fitting of Molecule Structures to Cryo-EM Map

IEEE VIS 2024 Submission arXiv preprint, Video, OSF repo

YouTube tutorial videos (coming soon)

  1. Install
  2. Demo Usage Scenario 1: Fit a single structure
  3. Demo Usage Scenario 2: Composite multiple structures
  4. Demo Usage Scenario 3: Identify unknown densities

Install

  1. Download the repository and unzip to a path. The following guide will use D:\GIT\DiffFitViewer as this path.
  2. Run ChimeraX command devel build D:\GIT\DiffFitViewer; devel install D:\GIT\DiffFitViewer
  3. Open the system command line shell, install PyTorch, Biopython, mrcfile, scikit-learn to ChimeraX's Python
    1. Find ChimeraX's Python, you may find this guide useful. The following commands will use C:\Users\luod\AppData\Local\ChimeraX\bin\python.exe.
    2. Install PyTorch via the following command or according to its official doc
      C:\Users\luod\AppData\Local\ChimeraX\bin\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
      
    3. Install Biopython, mrcfile, scikit-learn via the following command
      C:\Users\luod\AppData\Local\ChimeraX\bin\python.exe -m pip install biopython mrcfile scikit-learn
      

Now, DiffFit should be fully installed. Launch it via Tools > Volume Data > DiffFit

image

Right-click in the panel to access DiffFit's help page.

image

Demo usage scenarios

Scenario 1: Fit a single structure

  1. Download PDB-8JGF and EMD-36232
    1. note the resolution as 2.7Å from the webpage
    2. extract the map
    3. put the files (8jgf.cif and emd_36232.map) under, for example, D:\GIT\DiffFitViewer\run\input\8JGF
  2. Drop both files into ChimeraX,
    1. take a note for the pixel value from the log, which represents the grid spacing for this volume, which is 1.04 in this case
    2. move and rotate the molecule and then save it (select it, choose "Save selected atoms only", uncheck "Use untransformed coordinates") as 8JGF_transformed.cif. This step is only for demo purpose and is not necessary for real use cases
  3. Put 8JGF_transformed.cif under D:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif
  4. Simulate a map for the molecule
    1. Create two folders, subunits_mrc and subunits_npy, under D:\GIT\DiffFitViewer\run\input\8JGF\
    2. Open a new ChimeraX session and run runscript "D:\GIT\DiffFitViewer\src\convert2mrc_npy.py" "D:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif" "D:\GIT\DiffFitViewer\run\input\8JGF\subunits_mrc" "D:\GIT\DiffFitViewer\run\input\8JGF\subunits_npy" 2.7 1.04
  5. Run DiffFit. Set the parameters as follows and hit Run!
    1. Target volume: D:\GIT\DiffFitViewer\run\input\8JGF\emd_36232.map
    2. Structures folder: D:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif
    3. Structures sim-map folder: D:\GIT\DiffFitViewer\run\input\8JGF\subunits_mrc
    4. Output folder: D:\GIT\DiffFitViewer\run\output\8JGF
    5. Experiment name: fit_single_demo
    6. Target surface threshold: 0.20. Or use the author recommended contour level 0.162. DiffFit is very robust against this parameter, a value between 0.02 - 0.4 is fine in this case.
    7. Leave the rest as default and hit Run!
  6. After freezing for a couple of seconds (less than 15 seconds on one RTX 4090), ChimeraX should be back and responsive to you. Click the View tab to examine the results.
    1. Save the molecule if desired
    2. You may take a look at the optimization steps
  7. If you want to change the cluster tolerance, or if you run Compute on a cluster, or if you accidentally close ChimeraX after Compute run, you can View the results by the following parameter settings
    1. Target volume: D:\GIT\DiffFitViewer\run\input\8JGF\emd_36232.map
    2. Structures folder: D:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif
    3. Data folder: D:\GIT\DiffFitViewer\run\output\8JGF\fit_single_demo
    4. Clustering - Shift Tolerance: 0.5 or the value you desire
    5. Clustering - Angle Tolerance: 0.5 or the value you desire
    6. Hit Load

Scenario 2: Composite multiple structures

  1. Download PDB-8SMK and EMD-40589
    1. note the resolution as 3.5Å from the webpage
    2. extract the map
    3. put the files (8smk.cif and emd_40589.map) under, for example, D:\GIT\DiffFitViewer\run\input\8SMK
  2. Drop both files into ChimeraX,
    1. take a note for the pixel value from the log, which represents the grid spacing for this volume, which is 0.835 in this case
    2. move and rotate the molecule and then save it (select it, choose "Save selected atoms only", uncheck "Use untransformed coordinates") as 8SMK_transformed.cif. This step is only for demo purpose and is not necessary for real use cases
  3. Create a folder subunits under D:\GIT\DiffFitViewer\run\input\8SMK
  4. Split the chains into individual .cif files and simulate a map for each chain
    1. Open a new ChimeraX session and run runscript "D:\GIT\DiffFitViewer\src\split_chains.py" "D:\GIT\DiffFitViewer\run\input\8SMK\8SMK_transformed.cif" "D:\GIT\DiffFitViewer\run\input\8SMK\subunits" 3.5 0.835
    2. Put all generated .cif files under D:\GIT\DiffFitViewer\run\input\8SMK\subunits_cif
    3. Put all generated .mrc files under D:\GIT\DiffFitViewer\run\input\8SMK\subunits_mrc
    4. Delete all generated .npy files, or put them under D:\GIT\DiffFitViewer\run\input\8SMK\subunits_npy
    5. Keep only the unique chains (A, B, C) in subunits_cif and subunits_mrc
  5. Run DiffFit. Set the parameters as follows and hit Run!
    1. Target volume: D:\GIT\DiffFitViewer\run\input\8SMK\emd_40589.map
    2. Structures folder: D:\GIT\DiffFitViewer\run\input\8SMK\subunits_cif
    3. Structures sim-map folder: D:\GIT\DiffFitViewer\run\input\8SMK\subunits_mrc
    4. Output folder: D:\GIT\DiffFitViewer\run\output\8SMK
    5. Experiment name: round1
    6. Target surface threshold: 0.8. Or use the author recommended contour level 5.0. DiffFit is very robust against this parameter, a value between 0.1 - 5.0 is fine in this case.
    7. # shifts: 30
    8. # quaternions: 300
    9. Leave the rest as default and hit Run!
  6. After freezing for a couple of seconds (less than 30 seconds on one RTX 4090), ChimeraX should be back and responsive to you. Click the View tab to examine the results.
    1. Examine the fit, sort by a different metric
    2. If you want to change the cluster tolerance, or if you run Compute on a cluster, or if you accidentally close ChimeraX after Compute run, you can View the results by the following parameter settings
      1. Target volume: D:\GIT\DiffFitViewer\run\input\8SMK\emd_40589.map
      2. Structures folder: D:\GIT\DiffFitViewer\run\input\8SMK\subunits_cif
      3. Data folder: D:\GIT\DiffFitViewer\run\output\8SMK\composite_unique_chains
      4. Clustering - Shift Tolerance: 6 or the value you desire
      5. Clustering - Angle Tolerance: 15 or the value you desire
      6. Hit Load
    3. Save a molecule if desired
    4. Set the Resolution as 3.5, and click Simulate volume
    5. Change the surface level threshold for the simulated volume if necessary
    6. Click Zero density
    7. Repeat the last 4 steps (Save, Simulate, Zero) for the same Mol Id at a different place, or for a different Mol Id until there is no good fit
    8. Save the last working volume by File > Save > Files of type as MRC > Map as the desired one as a new name, for example, emd_40589_round_1.mrc
  7. Repeat Step 5-6 until satisfied with the whole compositing
    1. Change the Target volume as: D:\GIT\DiffFitViewer\run\input\8SMK\emd_40589_round_1.mrc
    2. If needed, take out the already fitted chains from subunits_cif and subunits_mrc
    3. Give a new Experiment name: round2
    4. You may lower the # shifts, for example, to 10, and the # quaternions to 100
    5. Hit Run!

Scenario 3: Identify unknown densities

The whole procedure is the same as in Scenario 1: Fit a single structure, only that there will be multiple structures under subunits_cif.

There is a demo data set with one volume map and three structures to search against. If you have put DiffFit under D:\GIT\DiffFitViewer, you can just hit Run! in the Compute tab and then go to the View tab. If otherwise, you just need to change the path for the input and the output data.

If you want to search against the whole candidate library for this case from DomainFit, you can either follow Steps 1-3 from its doc to generate the PDB files for the domains, or just download the ones generated by us from this Google Drive link. Of note is that we generated 359 PDB files by following DomainFit's Steps 1-3, instead of the mentioned 344 files.

The computing time for searching the whole candidate library on one RTX 4090 is about 10 minutes.

About

DiffFit: Visually-Guided Differentiable Fitting of Molecule Structures to Cryo-EM Map

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published