Skip to content

Commit

Permalink
Reader for CYANA-dialect PDB files extended and debugged, new default…
Browse files Browse the repository at this point in the history
… behaviour of distance distribution computation (whole ensemble) in ExperimentDesign
  • Loading branch information
gjeschke committed Apr 25, 2022
1 parent 0b6d8ca commit 639acc6
Show file tree
Hide file tree
Showing 7 changed files with 154 additions and 60 deletions.
2 changes: 1 addition & 1 deletion docs/source/enm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Input of a raw ensemble (uniform populations) by reading a single PDB file.
getpdb file
Arguments
* ``file`` - file name
* ``file`` - file name or PDB identifier, must have extension '.pdb' if it specifies a file
Remarks
* the PDB file can contain several models (conformers) or a single one, by default, the first conformer is used

Expand Down
18 changes: 17 additions & 1 deletion docs/source/ensemble_analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,10 @@ Superposition of conformers in an ensemble

.. code-block:: matlab
superimpose input [range [template [template_range [mode]]]]
superimpose output input [range [template [template_range [mode]]]]
Arguments
* ``output`` - name of the output file, extension '.pdb' is appended, if none
* ``input`` - identifier of the input ensemble
* ``range`` - optional MMMx address that specifies only a range of a conformer for analysis, e.g. `(A)187-320`
* ``template`` - template ensemble or structure (optional)
Expand All @@ -184,6 +185,21 @@ Remarks
* by default, superposition is to the first conformer of the input ensemble if no range is provided
* if a template and central are specified, superposition is to central conformer of a superensemble consisting of input and template
* the range argument '(*)' selects the complete structure

``save``
---------------------------------

Save ensemble to a single PDB file

.. code-block:: matlab
save output ensemble_id
Arguments
* ``output`` - name of the output file, extension '.pdb' is appended, if none
* ``ensemble_id`` - identifier of the ensemble to be save
Remarks
* populations are stored in a REMARK 400 field

``compare``
---------------------------------
Expand Down
27 changes: 27 additions & 0 deletions docs/source/experiment_design.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,33 @@ Remarks
* in the block you can specify as many site pairs and pair lists as you wish
* labels specified in the pair list(s) prevail over labels specified in the command, if these are inconsistent, it is reported in the log file
* if the same label is used at both sites, just provide this one; for orthogonal labeling, use the syntax ``label_1|label_2``
* if residue addresses do not contain a conformer specification, the distribution corresponds to all conformers

``trivariate``
---------------------------------

Compute and save trivariate distance distributions and plot their 2D and 1D sum projections for site triples. This is a block key.

.. code-block:: matlab
trivariate labels entity outname
'address_1' 'address_2' 'address_3'
[]
.trivariate
Arguments
* ``labels`` - label or labels for specified site triples, see :ref:`Rotamer libraries <rotamer_concept>` for list of available labels
* ``entity`` - entity identifier specified in any of the input commands
* ``outname`` - basis file name for the output distributions and plots
* ``address_1`` - MMMx :ref:`residue address <MMMx_addresses>` for first site
* ``address_2`` - MMMx :ref:`residue address <MMMx_addresses>` for second site
* ``address_3`` - MMMx :ref:`residue address <MMMx_addresses>` for third site
Remarks
* in the block you can specify as many site triples as you wish
* if the same label is used at both sites, just provide this one; for orthogonal labeling, use the syntax ``label_1|label_2|label_3``
* output is large and thus in a binary Matlab file, variables are 'trivariate' for the 3D array and r_axis_1, 'r_axis_2', 'r_axis_3' for the three distance axes
* sequence of dimensions is 'site1-site2', 'site1-site3', 'site2-site3'
* use 'scipy.io.loadmat' from the SciPy library for importing to Python

``ENMpairs``
---------------------------------
Expand Down
9 changes: 5 additions & 4 deletions docs/source/prepare.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Remarks
``getcyana``
---------------------------------

Input of a raw ensemble (uniform populations) by reading a single pseudo-PDB file as generated by CYANA.
Input of a raw ensemble (uniform populations) by reading a single PDB file generated by CYANA.

.. code-block:: matlab
Expand All @@ -46,11 +46,12 @@ Input of a raw ensemble (uniform populations) by reading a single pseudo-PDB fil
Arguments
* ``file`` - file name
* ``identifier`` - module-internal entity identifier
* ``maxchains`` - reading is stop after the specified number of chains, optional, defaults to all chains
* ``maxchains`` - reading is stopped after the specified number of chains, optional, defaults to all chains
Remarks
* information on pseudo-atoms is removed
* information on pseudo-residues is removed
* standard PDB residue names are set for nucleic acids
* parameter ``maxchains`` allows for skipping pseudo-chains that simulate labels
* parameter ``maxchains`` allows for skipping pseudo-chains that simulate only labels
* reside types CYSS and CYSM are converted to CYS, label atoms in CYSM are skipped

``save``
---------------------------------
Expand Down
4 changes: 2 additions & 2 deletions source/distance_distribution.m
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@
confstr = site1(si+1:ei-1);
residue1 = [site1(1:si-1) site1(ei+1:end)];
else
confstr = '1';
confstr = '*';
residue1 = site1;
end
if strcmp(strtrim(confstr),'*')
Expand Down Expand Up @@ -149,7 +149,7 @@
confstr = site2(si+1:ei-1);
residue2 = [site2(1:si-1) site2(ei+1:end)];
else
confstr = '1';
confstr = '*';
residue2 = site2;
end
if strcmp(strtrim(confstr),'*')
Expand Down
146 changes: 98 additions & 48 deletions source/get_cyana_pdb.m
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
function [entity,exceptions] = get_cyana_pdb(fname,options,entity)
%
% GET_CYANA_PDB Load structure from CYANA pseudo-PDB file
% GET_CYANA_PDB Load structure from CYANA dialect PDB file
%
% [entity,exceptions] = GET_CYANA_PDB(fname)
% Returns an (entity) structure in MMMx:atomic representation
%
% [entity,exceptions] = GET_CYANA_PDB(fname,options)
% Reads topology and Cartesian coordinates of a biological entity
% from a CYANA pseudo-PDB file
% from a CYANA dialect PDB file
%
% [entity,exceptions] = GET_CYANA_PDB(fname,options,entity)
% Adds Cartesian coordinates of a conformer to an input biological entity
% by reading from a CYANA pseudo-PDB file
% by reading from a CYANA dialect PDB file
%
% INPUT
% ident PDB identifier, if four characters without extension, otherwise
Expand All @@ -33,10 +33,28 @@
% errors but not for warnings, for warnings, only the last one
% is reported, cell containing an empty array, if no exception
%
% Realization:
%
% - the reader skips residue types that are introduced purely for
% restraint-based optimization purposes:
% PL, NL, ML, LL, LL2, LLL, LL5, LP, LN, LM,ION, ORI
% - the reader converts modified CYS residues to CYS:
% CYSS, CYSM
% - the reader converts non-standard nucleotide rseidue names to
% standard names:
% RADE to A
% RCYT to C
% RGUA to G
% URA to U
% ADE to DA
% CYT to DC
% GUA to DG
% THY to DT


% This file is a part of MMMx. License is MIT (see LICENSE.md).
%
% G. Jeschke, 2021
% G. Jeschke, 2021-2022

exceptions{1} = [];
warnings = 0; % counter for warnings
Expand All @@ -54,8 +72,6 @@
maxchain = options.maxch;
end

skipmode = true;

if ~contains(fname,'.')
ifid = fopen(strcat(fname,'.pdb'));
else
Expand All @@ -81,56 +97,58 @@
cres = -1000;
chain = 0;
atnum = 0;
plnum = 0;
ll5num = 0;
lnnum = 0;
orinum = 0;
nlnum = 0;
lpnum = 0;
is_atom_line = false;
while 1 && chain <= maxchain
tline = fgetl(ifid);
if ~ischar(tline), break, end
if length(tline)<26 % catches too short end line of MolProbity files
if length(tline)<26 % catches too short lines
if length(tline) >= 6
record = tline(1:6);
if strcmp(record,'MODEL ')
cres = -1000;
chain = 0;
atnum = 0;
end
end
fprintf(ofid,'%s\n',tline);
else
resnum = str2double(tline(23:26));
if resnum > 531
tline(22) = 'B';
end
record = tline(1:6);
if strcmp(record,'ATOM ') && skipmode
chainid = tline(22);
resnum = str2double(tline(23:26));
skip = true;
if chainid == 'A' && resnum >= 58 && resnum <= 531
skip = false;
end
if chainid == 'B' && resnum >= 288+315 && resnum <= 370+315
skip = false;
end
if skip
continue;
end
end
restype = tline(18:21);
switch restype
case ' PL '
skipped = true;
case 'LL5 '
skipped = true;
case ' LN '
skipped = true;
case ' ML '
skipped = true;
case ' NL '
skipped = true;
case 'LL2 '
skipped = true;
case 'LL5 '
skipped = true;
case 'LLL '
skipped = true;
case ' LP '
skipped = true;
case ' LL '
skipped = true;
case ' LM '
skipped = true;
case 'ION '
skipped = true;
case 'ORI '
skipped = true;
otherwise
skipped = false;
end
% fprintf(1,'%s\n',record);
if strcmp(record,'MODEL ')
cres = -1000;
chain = 0;
atnum = 0;
end
if strcmp(record,'ATOM ') || strcmp(record,'HETATM')
is_atom_line = true;
resnum = str2double(tline(23:26));
if resnum ~= cres && resnum ~= cres + 1 && ~skipped
if chain > 0 % insert chain termination record
Expand All @@ -151,35 +169,37 @@
atnum = atnum + 1;
tline(7:11) = sprintf('%5i',atnum);
tline(22) = chain_tags(chain);
resnum0 = str2double(tline(24:26));
if tline(22) == 'B'
tline(24:26) = sprintf('%i',resnum0-315);
end
if strcmp(record,'HETATM')
cres = resnum;
fprintf(ofid,'%s\n',tline);
else % wrong ATOM records still need to be fixed
restype = tline(18:21);
switch restype
case ' PL '
plnum = plnum + 1;
atnum = atnum - 1;
case ' ML '
atnum = atnum -1;
case ' NL '
atnum = atnum -1;
case ' LL '
atnum = atnum -1;
case 'LL2 '
atnum = atnum - 1;
case 'LLL '
atnum = atnum - 1;
case 'LL5 '
ll5num = ll5num + 1;
atnum = atnum - 1;
case ' LN '
lnnum = lnnum + 1;
atnum = atnum -1;
case ' NL '
nlnum = nlnum + 1;
atnum = atnum -1;
case ' LP '
lpnum = lpnum + 1;
atnum = atnum -1;
case ' LM '
atnum = atnum -1;
case 'ION '
atnum = atnum -1;
case 'ORI '
orinum = orinum + 1;
atnum = atnum -1;
case 'RGUA'
case 'RGUA'
tline(18:21) = ' G ';
cres = resnum;
fprintf(ofid,'%s\n',tline);
Expand All @@ -195,14 +215,44 @@
tline(18:21) = ' C ';
cres = resnum;
fprintf(ofid,'%s\n',tline);
case 'ADE '
tline(18:21) = ' DA ';
cres = resnum;
fprintf(ofid,'%s\n',tline);
case 'CYT '
tline(18:21) = ' DC ';
cres = resnum;
fprintf(ofid,'%s\n',tline);
case 'GUA '
tline(18:21) = ' DG ';
cres = resnum;
fprintf(ofid,'%s\n',tline);
case 'THY '
tline(18:21) = ' DT ';
cres = resnum;
fprintf(ofid,'%s\n',tline);
case {'CYSS','CYSM'}
tline(18:21) = 'CYS ';
cres = resnum;
switch tline(13:16)
case {' C ',' O ',' N ',' H ',' CA ',' HA ',' CB ',' HB2',' HB3',' SG '}
fprintf(ofid,'%s\n',tline);
otherwise
atnum = atnum - 1;
end
otherwise
cres = resnum;
fprintf(ofid,'%s\n',tline);
end
end
else
is_atom_line = false;
end
end
if ~skipped
if strcmp(record,'TER ')
fprintf(ofid,'%s\n',tline);
end
if ~skipped && is_atom_line
tline0 = tline;
end
end
Expand Down
8 changes: 4 additions & 4 deletions source/trivariate_distribution.m
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@
confstr = site1(si+1:ei-1);
residue1 = [site1(1:si-1) site1(ei+1:end)];
else
confstr = '1';
confstr = '*';
residue1 = site1;
end
if strcmp(strtrim(confstr),'*')
Expand Down Expand Up @@ -149,7 +149,7 @@
confstr = site2(si+1:ei-1);
residue2 = [site2(1:si-1) site2(ei+1:end)];
else
confstr = '1';
confstr = '*';
residue2 = site2;
end
if strcmp(strtrim(confstr),'*')
Expand Down Expand Up @@ -206,14 +206,14 @@
conformers2 = conformers2(1:cpoi);
end

% extract conformer selection of site32 if any, set default, if none
% extract conformer selection of site3 if any, set default, if none
si = strfind(site3,'{');
ei = strfind(site3,'}');
if ~isempty(si) && ~isempty(ei)
confstr = site3(si+1:ei-1);
residue3 = [site3(1:si-1) site3(ei+1:end)];
else
confstr = '1';
confstr = '*';
residue3 = site3;
end
if strcmp(strtrim(confstr),'*')
Expand Down

0 comments on commit 639acc6

Please sign in to comment.