Skip to content

Processing AtDTX14 data (5Y50)

keitaroyam edited this page Jan 23, 2018 · 5 revisions

The following describes how A. thaliana DTX14 (AtDTX14) datasets can be processed using KAMO (documentation in Japanese / English).

References

  • Original paper
    • Miyauchi et al. (2017) "Structural basis for xenobiotic extrusion by eukaryotic MATE transporter." Nature Communications doi: 10.1038/s41467-017-01541-0 PDB: 5Y50

Raw data

  • Available in Zenodo. DOI
  • Collected on BL32XU, SPring-8
  • MX225HS CCD detector (2x2 binning), 10×10 or 18×10 μm2 beam, 1 Å wavelength
  • 5, 10, or 20°/dataset, 1°/frame (shutterless)
  • 288 and 85 datasets collected automatically (ZOO system) and manually from 10+22 cryoloops
  • P212121; a=52.8, b=86.8, c=116.4 Å

How data were processed in the original paper

GUI command 'kamo' was used by specifying exclude_resolution_range=4.55,4.4 to exclude a strong ring pattern by lipid. XDS (ver. May 1, 2016 BUILT=20160617) was used for integration and no prior crystal information was employed.

148 out of 373 datasets were indexed and integrated, and 139 datasets belonged to the largest group of consistent unit cells:

[ 1] 139 members:
 Averaged P1 Cell= 52.79 86.72 116.43 90.32 90.33 90.30
 Possible symmetries:
   freq symmetry     a      b      c     alpha  beta   gamma reindex
     23 P 1         52.79  86.72 116.43  90.32  90.33  90.30 a,b,c
      6 P 1 2 1     52.79  86.72 116.43  90.32  90.33  90.30 a,b,c
      8 P 1 2 1     86.72  52.79 116.43  89.67  90.32  89.70 b,-a,c
     12 P 1 2 1     52.79 116.43  86.72  89.68  90.30  89.67 a,-c,b
     90 P 2 2 2     52.79  86.72 116.43  90.32  90.33  90.30 a,b,c

As P222 symmetry was the most frequent one except P1, P222 was assumed and the XDS_ASCII files were re-indexed to P222 symmetry.

To remove outliers having extremely different unit cell parameters, filter_cell.R was used and 8 datasets were removed.

Next, several clustering procedures were tested, and finally a subcluster (the second largest cluster) in clustering result by CC was found to be the best result (having the largest CC1/2). This result consisted of 100 datasets and was found in ccc_2.6A_framecc_b+B_goodcell/cluster_0129/run_03/XSCALE.LP:

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     7.78       18239     728       734       99.2%      13.0%     13.1%    18239   28.45     13.3%    99.7*    -4    0.838     464
     5.51       33853    1217      1217      100.0%      20.9%     19.3%    33853   19.22     21.3%    99.6*     2    0.885     946
     4.50       40157    1457      1537       94.8%      24.4%     20.6%    40152   18.11     24.9%    99.4*    -2    0.975    1182
     3.90       44566    1601      1776       90.1%      34.5%     30.3%    44558   13.83     35.1%    98.4*     1    0.928    1333
     3.49       57924    1996      1996      100.0%      53.9%     54.8%    57924    8.59     54.9%    97.4*    -1    0.802    1729
     3.18       66004    2257      2257      100.0%     105.1%    126.7%    66004    4.23    106.9%    93.4*    -1    0.683    1983
     2.95       68068    2329      2329      100.0%     182.8%    238.4%    68068    2.48    186.0%    87.4*    -1    0.632    2065
     2.76       73256    2528      2528      100.0%     309.6%    411.1%    73256    1.57    315.1%    80.8*     0    0.580    2260
     2.60       78809    2767      2767      100.0%     671.2%    910.8%    78809    0.90    683.3%    60.9*    -2    0.536    2494
    total      480876   16880     17141       98.5%      57.6%     66.4%   480863    7.79     58.7%    99.4*    -1    0.713   14456

This result was produced by the following command:

#!/bin/sh
# settings
dmin=2.6 # resolution
clustering_dmin=3.0  # resolution for CC calculation
anomalous=false # true or false
lstin=formerge_goodcell.lst # list of XDS_ASCII.HKL files
use_ramdisk=true # set false if there is few memory or few space in /tmp
# _______/setting

kamo.multi_merge \
        workdir=ccc_${dmin}A_framecc_b+B_goodcell resolution.estimate=true \
        lstin=${lstin} d_min=${dmin} anomalous=${anomalous} \
        program=xscale xscale.reference=bmin \
        reject_method=framecc+lpstats rejection.lpstats.stats=em.b+bfactor \
        clustering=cc cc_clustering.d_min=${clustering_dmin} cc_clustering.b_scale=false cc_clustering.use_normalized=false \
        cc_clustering.min_cmpl=90 cc_clustering.min_redun=2 \
        xscale.use_tmpdir_if_available=${use_ramdisk} \
        batch.engine=sge batch.par_run=merging batch.nproc_each=8 nproc=8 batch.sge_pe_name=par

Clone this wiki locally