diff --git a/docs/source/enm.rst b/docs/source/enm.rst index 103daee..658e5ff 100644 --- a/docs/source/enm.rst +++ b/docs/source/enm.rst @@ -15,7 +15,7 @@ Input of a raw ensemble (uniform populations) by reading a single PDB file. getpdb file Arguments - * ``file`` - file name + * ``file`` - file name or PDB identifier, must have extension '.pdb' if it specifies a file Remarks * the PDB file can contain several models (conformers) or a single one, by default, the first conformer is used diff --git a/docs/source/ensemble_analysis.rst b/docs/source/ensemble_analysis.rst index c95f5b6..e933ffa 100644 --- a/docs/source/ensemble_analysis.rst +++ b/docs/source/ensemble_analysis.rst @@ -172,9 +172,10 @@ Superposition of conformers in an ensemble .. code-block:: matlab - superimpose input [range [template [template_range [mode]]]] + superimpose output input [range [template [template_range [mode]]]] Arguments + * ``output`` - name of the output file, extension '.pdb' is appended, if none * ``input`` - identifier of the input ensemble * ``range`` - optional MMMx address that specifies only a range of a conformer for analysis, e.g. `(A)187-320` * ``template`` - template ensemble or structure (optional) @@ -184,6 +185,21 @@ Remarks * by default, superposition is to the first conformer of the input ensemble if no range is provided * if a template and central are specified, superposition is to central conformer of a superensemble consisting of input and template * the range argument '(*)' selects the complete structure + +``save`` +--------------------------------- + +Save ensemble to a single PDB file + +.. code-block:: matlab + + save output ensemble_id + +Arguments + * ``output`` - name of the output file, extension '.pdb' is appended, if none + * ``ensemble_id`` - identifier of the ensemble to be save +Remarks + * populations are stored in a REMARK 400 field ``compare`` --------------------------------- diff --git a/docs/source/experiment_design.rst b/docs/source/experiment_design.rst index 5be65d7..135c86d 100644 --- a/docs/source/experiment_design.rst +++ b/docs/source/experiment_design.rst @@ -181,6 +181,33 @@ Remarks * in the block you can specify as many site pairs and pair lists as you wish * labels specified in the pair list(s) prevail over labels specified in the command, if these are inconsistent, it is reported in the log file * if the same label is used at both sites, just provide this one; for orthogonal labeling, use the syntax ``label_1|label_2`` + * if residue addresses do not contain a conformer specification, the distribution corresponds to all conformers + +``trivariate`` +--------------------------------- + +Compute and save trivariate distance distributions and plot their 2D and 1D sum projections for site triples. This is a block key. + +.. code-block:: matlab + + trivariate labels entity outname + 'address_1' 'address_2' 'address_3' + [] + .trivariate + +Arguments + * ``labels`` - label or labels for specified site triples, see :ref:`Rotamer libraries ` for list of available labels + * ``entity`` - entity identifier specified in any of the input commands + * ``outname`` - basis file name for the output distributions and plots + * ``address_1`` - MMMx :ref:`residue address ` for first site + * ``address_2`` - MMMx :ref:`residue address ` for second site + * ``address_3`` - MMMx :ref:`residue address ` for third site +Remarks + * in the block you can specify as many site triples as you wish + * if the same label is used at both sites, just provide this one; for orthogonal labeling, use the syntax ``label_1|label_2|label_3`` + * output is large and thus in a binary Matlab file, variables are 'trivariate' for the 3D array and r_axis_1, 'r_axis_2', 'r_axis_3' for the three distance axes + * sequence of dimensions is 'site1-site2', 'site1-site3', 'site2-site3' + * use 'scipy.io.loadmat' from the SciPy library for importing to Python ``ENMpairs`` --------------------------------- diff --git a/docs/source/prepare.rst b/docs/source/prepare.rst index 72baab6..a2bcf25 100644 --- a/docs/source/prepare.rst +++ b/docs/source/prepare.rst @@ -37,7 +37,7 @@ Remarks ``getcyana`` --------------------------------- -Input of a raw ensemble (uniform populations) by reading a single pseudo-PDB file as generated by CYANA. +Input of a raw ensemble (uniform populations) by reading a single PDB file generated by CYANA. .. code-block:: matlab @@ -46,11 +46,12 @@ Input of a raw ensemble (uniform populations) by reading a single pseudo-PDB fil Arguments * ``file`` - file name * ``identifier`` - module-internal entity identifier - * ``maxchains`` - reading is stop after the specified number of chains, optional, defaults to all chains + * ``maxchains`` - reading is stopped after the specified number of chains, optional, defaults to all chains Remarks - * information on pseudo-atoms is removed + * information on pseudo-residues is removed * standard PDB residue names are set for nucleic acids - * parameter ``maxchains`` allows for skipping pseudo-chains that simulate labels + * parameter ``maxchains`` allows for skipping pseudo-chains that simulate only labels + * reside types CYSS and CYSM are converted to CYS, label atoms in CYSM are skipped ``save`` --------------------------------- diff --git a/source/distance_distribution.m b/source/distance_distribution.m index e725844..c9c2098 100644 --- a/source/distance_distribution.m +++ b/source/distance_distribution.m @@ -85,7 +85,7 @@ confstr = site1(si+1:ei-1); residue1 = [site1(1:si-1) site1(ei+1:end)]; else - confstr = '1'; + confstr = '*'; residue1 = site1; end if strcmp(strtrim(confstr),'*') @@ -149,7 +149,7 @@ confstr = site2(si+1:ei-1); residue2 = [site2(1:si-1) site2(ei+1:end)]; else - confstr = '1'; + confstr = '*'; residue2 = site2; end if strcmp(strtrim(confstr),'*') diff --git a/source/get_cyana_pdb.m b/source/get_cyana_pdb.m index 8332c85..5008b3f 100644 --- a/source/get_cyana_pdb.m +++ b/source/get_cyana_pdb.m @@ -1,17 +1,17 @@ function [entity,exceptions] = get_cyana_pdb(fname,options,entity) % -% GET_CYANA_PDB Load structure from CYANA pseudo-PDB file +% GET_CYANA_PDB Load structure from CYANA dialect PDB file % % [entity,exceptions] = GET_CYANA_PDB(fname) % Returns an (entity) structure in MMMx:atomic representation % % [entity,exceptions] = GET_CYANA_PDB(fname,options) % Reads topology and Cartesian coordinates of a biological entity -% from a CYANA pseudo-PDB file +% from a CYANA dialect PDB file % % [entity,exceptions] = GET_CYANA_PDB(fname,options,entity) % Adds Cartesian coordinates of a conformer to an input biological entity -% by reading from a CYANA pseudo-PDB file +% by reading from a CYANA dialect PDB file % % INPUT % ident PDB identifier, if four characters without extension, otherwise @@ -33,10 +33,28 @@ % errors but not for warnings, for warnings, only the last one % is reported, cell containing an empty array, if no exception % +% Realization: +% +% - the reader skips residue types that are introduced purely for +% restraint-based optimization purposes: +% PL, NL, ML, LL, LL2, LLL, LL5, LP, LN, LM,ION, ORI +% - the reader converts modified CYS residues to CYS: +% CYSS, CYSM +% - the reader converts non-standard nucleotide rseidue names to +% standard names: +% RADE to A +% RCYT to C +% RGUA to G +% URA to U +% ADE to DA +% CYT to DC +% GUA to DG +% THY to DT + % This file is a part of MMMx. License is MIT (see LICENSE.md). % -% G. Jeschke, 2021 +% G. Jeschke, 2021-2022 exceptions{1} = []; warnings = 0; % counter for warnings @@ -54,8 +72,6 @@ maxchain = options.maxch; end -skipmode = true; - if ~contains(fname,'.') ifid = fopen(strcat(fname,'.pdb')); else @@ -81,56 +97,58 @@ cres = -1000; chain = 0; atnum = 0; -plnum = 0; -ll5num = 0; -lnnum = 0; -orinum = 0; -nlnum = 0; -lpnum = 0; +is_atom_line = false; while 1 && chain <= maxchain tline = fgetl(ifid); if ~ischar(tline), break, end - if length(tline)<26 % catches too short end line of MolProbity files + if length(tline)<26 % catches too short lines + if length(tline) >= 6 + record = tline(1:6); + if strcmp(record,'MODEL ') + cres = -1000; + chain = 0; + atnum = 0; + end + end fprintf(ofid,'%s\n',tline); else - resnum = str2double(tline(23:26)); - if resnum > 531 - tline(22) = 'B'; - end record = tline(1:6); - if strcmp(record,'ATOM ') && skipmode - chainid = tline(22); - resnum = str2double(tline(23:26)); - skip = true; - if chainid == 'A' && resnum >= 58 && resnum <= 531 - skip = false; - end - if chainid == 'B' && resnum >= 288+315 && resnum <= 370+315 - skip = false; - end - if skip - continue; - end - end restype = tline(18:21); switch restype case ' PL ' skipped = true; - case 'LL5 ' - skipped = true; case ' LN ' skipped = true; + case ' ML ' + skipped = true; case ' NL ' skipped = true; + case 'LL2 ' + skipped = true; + case 'LL5 ' + skipped = true; + case 'LLL ' + skipped = true; case ' LP ' skipped = true; + case ' LL ' + skipped = true; + case ' LM ' + skipped = true; + case 'ION ' + skipped = true; case 'ORI ' skipped = true; otherwise skipped = false; end - % fprintf(1,'%s\n',record); + if strcmp(record,'MODEL ') + cres = -1000; + chain = 0; + atnum = 0; + end if strcmp(record,'ATOM ') || strcmp(record,'HETATM') + is_atom_line = true; resnum = str2double(tline(23:26)); if resnum ~= cres && resnum ~= cres + 1 && ~skipped if chain > 0 % insert chain termination record @@ -151,10 +169,6 @@ atnum = atnum + 1; tline(7:11) = sprintf('%5i',atnum); tline(22) = chain_tags(chain); - resnum0 = str2double(tline(24:26)); - if tline(22) == 'B' - tline(24:26) = sprintf('%i',resnum0-315); - end if strcmp(record,'HETATM') cres = resnum; fprintf(ofid,'%s\n',tline); @@ -162,24 +176,30 @@ restype = tline(18:21); switch restype case ' PL ' - plnum = plnum + 1; + atnum = atnum - 1; + case ' ML ' + atnum = atnum -1; + case ' NL ' + atnum = atnum -1; + case ' LL ' + atnum = atnum -1; + case 'LL2 ' + atnum = atnum - 1; + case 'LLL ' atnum = atnum - 1; case 'LL5 ' - ll5num = ll5num + 1; atnum = atnum - 1; case ' LN ' - lnnum = lnnum + 1; - atnum = atnum -1; - case ' NL ' - nlnum = nlnum + 1; atnum = atnum -1; case ' LP ' - lpnum = lpnum + 1; + atnum = atnum -1; + case ' LM ' + atnum = atnum -1; + case 'ION ' atnum = atnum -1; case 'ORI ' - orinum = orinum + 1; atnum = atnum -1; - case 'RGUA' + case 'RGUA' tline(18:21) = ' G '; cres = resnum; fprintf(ofid,'%s\n',tline); @@ -195,14 +215,44 @@ tline(18:21) = ' C '; cres = resnum; fprintf(ofid,'%s\n',tline); + case 'ADE ' + tline(18:21) = ' DA '; + cres = resnum; + fprintf(ofid,'%s\n',tline); + case 'CYT ' + tline(18:21) = ' DC '; + cres = resnum; + fprintf(ofid,'%s\n',tline); + case 'GUA ' + tline(18:21) = ' DG '; + cres = resnum; + fprintf(ofid,'%s\n',tline); + case 'THY ' + tline(18:21) = ' DT '; + cres = resnum; + fprintf(ofid,'%s\n',tline); + case {'CYSS','CYSM'} + tline(18:21) = 'CYS '; + cres = resnum; + switch tline(13:16) + case {' C ',' O ',' N ',' H ',' CA ',' HA ',' CB ',' HB2',' HB3',' SG '} + fprintf(ofid,'%s\n',tline); + otherwise + atnum = atnum - 1; + end otherwise cres = resnum; fprintf(ofid,'%s\n',tline); end end + else + is_atom_line = false; end end - if ~skipped + if strcmp(record,'TER ') + fprintf(ofid,'%s\n',tline); + end + if ~skipped && is_atom_line tline0 = tline; end end diff --git a/source/trivariate_distribution.m b/source/trivariate_distribution.m index bfa320b..600a574 100644 --- a/source/trivariate_distribution.m +++ b/source/trivariate_distribution.m @@ -85,7 +85,7 @@ confstr = site1(si+1:ei-1); residue1 = [site1(1:si-1) site1(ei+1:end)]; else - confstr = '1'; + confstr = '*'; residue1 = site1; end if strcmp(strtrim(confstr),'*') @@ -149,7 +149,7 @@ confstr = site2(si+1:ei-1); residue2 = [site2(1:si-1) site2(ei+1:end)]; else - confstr = '1'; + confstr = '*'; residue2 = site2; end if strcmp(strtrim(confstr),'*') @@ -206,14 +206,14 @@ conformers2 = conformers2(1:cpoi); end -% extract conformer selection of site32 if any, set default, if none +% extract conformer selection of site3 if any, set default, if none si = strfind(site3,'{'); ei = strfind(site3,'}'); if ~isempty(si) && ~isempty(ei) confstr = site3(si+1:ei-1); residue3 = [site3(1:si-1) site3(ei+1:end)]; else - confstr = '1'; + confstr = '*'; residue3 = site3; end if strcmp(strtrim(confstr),'*')