This repository is of the integrative model of the NuRD subcomplexes based on data from negative stain EM, chemical crosslinking, X-ray crystallography, DIA-MS, SEC-MALLS and COSMIC (Cancer Mutations Database). It contains input data, scripts for modeling and results including bead models and localization probability density maps. The modeling was performed using IMP (Integrative Modeling Platform).
These integrative structures were deposited in the PDB-Dev database with accession codes PDBDEV_00000152 (MHM), PDBDEV_00000154 (MHR), and PDBDEV_00000155 (NuDe). The negative-stain EM map used for modeling MHR was deposited in the EMDB database with accession code EMD-27557.
- inputs : contains the subdirectories for the input data used for modeling all the subcomplexes.
- scripts : contains all the scripts used for modeling and analysis of the models.
- results : contains the models and the localization probability densities of the top cluster of the subcomplexes .
- test : scripts for testing the sampling
These are the independent simulations:
- Modeling of MHR subcomplex :
mhr - Modeling of MHM subcomplex :
mhm - Modeling of NuDe subcomplex :
nude - Modeling of MHR without using the EM data :
mhr_xl_ctrl
To run the sampling, run modeling scripts like this \
for runid in `seq 1 NRUNS` ; do mpirun -np NCORES $IMP python scripts/sample/SUBCOMPLEXNAME_modeling.py prod $runid ; done
where,
$IMP is the setup script corresponding to the IMP installation directory (omit for binary installation),
SUBCOMPLEXNAME is mhr, mhm, nude, mhr_ctrl
NRUNS is the number of runs,
and NCORES is the number of cores on which replica exchange is to be carried out.
- For MHR:
SUBCOMPLEXNAME = mhr,NCORES = 8andNRUNS = 50 - For MHM:
SUBCOMPLEXNAME = mhm,NCORES = 8andNRUNS = 30 - For NuDe:
SUBCOMPLEXNAME = nude,NCORES = 8andNRUNS = 50 - For MHR without using the EM data:
SUBCOMPLEXNAME = mhr_xl_ctrl,NCORES = 8andNRUNS = 30
Good scoring models were selected using pmi_analysis (Please refer to pmi_analysis tutorial for more detailed explaination) along with our variable_filter_v1.py script. These scripts are run as described below:
-
First, run
run_analysis_trajectories.pyas follows:
$IMP run_analysis_trajectories.py modeling run_
where,$IMPis the setup script corresponding to the IMP installation directory (omit for binary installation),
modelingis the directory containing all the runs and
run_is the prefix for the names of individual run directories.
Alternatively, one can also run thesubmit_run_analysis_trajectories.shscript from thescripts/analysis/pmi_analysisdirectory -
Then run
variable_filter_v1.pyon the major cluster obtained as follows:
$IMP variable_filter_v1.py -c N -g MODEL_ANALYSIS_DIRwhere,$IMPis the setup script corresponding to the IMP installation directory (omit for binary installation),
Nis the cluster number of the major cluster,
MODEL_ANALYSIS_DIRis the location of the directory containing the selected_models*.csv.
This can also be run using thesubmit_variable_filter_v1.shscript from thescripts/analysis/pmi_analysisdirectory.
Please also refer to the comments in thevariable_filter_v1.pyfor more details. -
The selected good scoring models were then extracted using
run_extract_good_scoring_models.pyas follows:
$IMP python run_extract_good_scoring_models.py modeling run_ CLUSTER_NUMBER
where,$IMPis the setup script corresponding to the IMP installation directory (omit for binary installation),
modelingis the path to the directory containing all the individual runs and
CLUSTER_NUMBERis the number of the major cluster to be extracted.
This can also be run using the scriptsubmit_run_extract_models.shfrom thescripts/analysis/pmi_analysisdirectory.
A separate directory named sampcon was created and a density.txt file was added to it. This file contains the details of the domains to be split for plotting the localisation probability densities. Finally, sampling exhaustiveness tests were performed using imp-sampcon as shown in scripts/analysis/pmi_analysis/*_sampcon.sh.
where, * is the name of the complex.
-
Crosslink violations were analyzed as follows:
for xltype in adh bs3dss; do python get_xlink_viol_csv.py -c CLUSTER_NUMBER -m MODELANALYSIS_DIR -r modeling -k $xltype -t 35.0 & done
and
python get_xlink_viol_csv.py -c CLUSTER_NUMBER -m MODELANALYSIS_DIR -r modeling -k dmtmm -t 25.0
One acn also use theget_xl_viol_validation_set.pyscript from thescripts/analysis/xlvioldirectory after changing the inputs section in the script as follows:
$imp python get_xl_viol_validation_set.py -ia ../cluster.0.sample_A.txt -ib ../cluster.0.sample_B.txt -ra ../../MODEL_ANALYSIS_DIR/A_gsm_clust0.rmf3 -rb ../../MODEL_ANALYSIS_DIR/B_gsm_clust0.rmf3 -ta -ra ../../MODEL_ANALYSIS_DIR/A_gsm_clust0.txt -c ../cluster.0/cluster_center_model.rmf3 -x XL_FILE -t THRESHOLD
where,XL_FILEis a file containing the crosslinks to be analysed. -
The above scripts generate files mentioning the minimum distance for each crosslink. These files were then passed to
xl_distance_hist_plotter.pyas follows:
python xl_distance_hist_plotter.py FILE_NAME XL_NAME THRESHOLD
where,FILE_NAMEis the name of the file,
XL_NAMEis the name of the linker used, and
THRESHOLDis the distance threshold for that linker.
This script will generate a histogram of the minimum distances spanned by the crosslinks. -
Then, the files obtained from scripts in point 1 were passed to
binner_cx-circos.pyas follows:
python binner_cx-circos.py FILE_NAME
where,FILE_NAMEis the name of the file.
This script generates a binned version of the input file which can then be used to make the crosslink plots using CIRCOS. -
Contact maps were plotted for the NuDe models as follows:
scripts/analysis/cosmic_and_distance-maps/submit_contact_maps_all_pairs_surface.py
This script calls thescripts/analysis/cosmic_and_distance-maps/contact_maps_all_pairs_surface.pyscript. Please use--helpforcontact_maps_all_pairs_surface.pyscript for more details. -
Finally, COSMIC cancer mutations were annotated on the models as follows:
python color_mutations/color_mutation.py -i cluster.0/cluster_center_model.rmf3 -r 10 -mf mutations.txt
For each of the simulations, the following files are in the results directory
cluster_center_model.rmf3: representative bead model of the major clusterchimera_densities.py: to view the localization densities (.mrc files)xl_violations.txt: list of violated crosslinks
For the NuDe models, mutation_colored_model.rmf and Distance_Maps are also added.
Author(s): Shreyas Arvindekar, Matthew J. Jackman, Jason K.K. Low, Michael J. Landsberg, Joel P. Mackay, Shruthi Viswanath
Date: May 19th, 2022
License: CC BY-SA 4.0
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0
International License.
Last known good IMP version:
Testable: Yes
Parallelizeable: Yes
Publications: Arvindekar, S, Jackman, MJ, Low, JKK, Landsberg, MJ, Mackay, JP, Viswanath, S. Molecular architecture of nucleosome remodeling and deacetylase sub-complexes by integrative structure determination. Protein Science. 2022; 31( 9):e4387. DOI: 10.1002/pro.4387.

