Skip to content

Processing rac COTE MicroED data

keitaroyam edited this page Mar 26, 2021 · 1 revision

Raw data

  • Available in Zenodo. DOI
  • Collected on Talos Arctica G2 (200 kV)
  • Ceta (indirect CMOS) detector
  • Images collected using Velox software (.emd format)
  • Stage tilt controlled using SerialEM

Preparation

Here I use DIALS-2.1.0. Please download and install DIALS first.

To process emd files using DIALS, you need to install the dxtbx class. Please download FormatEMD.py and execute

dxtbx.install_format -g FormatEMD.py

Specify -u instead of -g if you do not have permission to DIALS installation directory.

See also here for the tips for electron diffraction data processing.

Processing biotin data using DIALS+KAMO

1. Generate a mask

It is recommended to make a beam stop mask. You can draw polygon on the dials.image_viewer window. Here is the parameters I used.

dials.import microed_cotamd_200413/Camera_Nano_Ceta_0009_850_mm.emd
dials.generate_mask imported.expt use_trusted_range=false untrusted.polygon="0  964  950  962  986  934 1126  932 1174  972 1314 1006 1170 1056 1140 1084 1040 1098  976 1086  946 1068  938 1064    0 1070    0  964    0  964"

2. Prepare metadata

We need to check the useful frame range for each dataset because stage tilt and detector were not synchronized. Save the following script as check_valid_ranges.py

import h5py
import numpy
import json

def get_metadata(metadata):
    mds = []
    for i in xrange(metadata.shape[1]):
        metadata_array = metadata[:, i].T
        mdata_string = metadata_array.tostring().decode("utf-8")
        mds.append(json.loads(mdata_string.rstrip('\x00')))

    return mds
# get_metadata()

def analyse_angle(metadata):
    alphas = []
    
    for i, md in enumerate(metadata):
        alpha = numpy.rad2deg(float(md["Stage"]["AlphaTilt"]))
        alphas.append(alpha)

    d_alphas = numpy.diff(alphas)
    if len(d_alphas) == 0: return [-1,-1], None, None
    q25, q50, q75 = numpy.percentile(d_alphas, [25, 50, 75])
    iqr = q75-q25
    iqrc = 1.5
    lowlim, highlim = q25 - iqrc*iqr, q75 + iqrc*iqr
    d_alphas2 = d_alphas[numpy.where(numpy.logical_and(d_alphas>lowlim, d_alphas<highlim))] # outlier rejected
    d_alpha_z = abs(d_alphas-numpy.mean(d_alphas2))/numpy.std(d_alphas2)

    valid_range = [0, len(metadata)-1]
    for i in xrange(len(metadata)-1):
        if d_alpha_z[i] < 3: break
        valid_range[0] = i+1

    for i in reversed(xrange(len(metadata)-1)):
        if d_alpha_z[i] < 3: break
        valid_range[1] = i

    if valid_range[0] > valid_range[1]:
        valid_range = [0, len(metadata)-1] # reset
        
    mean_alpha_step = (alphas[valid_range[1]] - alphas[valid_range[0]])/(valid_range[1]-valid_range[0])

    return valid_range, mean_alpha_step, alphas[valid_range[0]]
# analyse_angle()


if __name__ == "__main__":
    import sys
    for f in sys.argv[1:]:
        try:
            h = h5py.File(f, "r")
            image_path = h["/Data/Image"]
            k = image_path.keys()[0]
            metadata = get_metadata(h["/Data/Image/%s/Metadata" % k])
            valid_range, mean_alpha_step, alpha_start = analyse_angle(metadata)
            print f, "%s % 7.4f % 7.4f" %(",".join(map(lambda x:str(x+1), valid_range)), alpha_start, mean_alpha_step)
        except:
            print f, "0,0 nan nan"

and run

python ./check_valid_ranges.py `find microed_cotamd_200413 -name \*.emd` > valid_ranges.txt

3. Indexing and integration

Here is the script to process all datasets (assuming you have *.emd files in the same directory):

root=microed_cotamd_200413
mask=pixels.mask
for n in 0009 0012 0016 0019 0022 0025 0028 0031 0035 0039 0042 0046 0049 0052 0056 0059 0102 0106 0110 0113 0116 0119 0123 0126 0130 0133 0137 0140 0144 0147 0151 0154 0158 0201 0204 0208 0211 0215 0218 0222 0225 0228 0231 0
235 0238 0242 0246 0249 0252 0256 0300 0305 0307 0310 0314 0317 0321 0324 0327 0331 0335 0338 0341 0345 0349 0353 0356 0400 0403 0407 0412 0414 0418 0421 0425 0428 0432 0435 0439 0442 0446 0449 0453 0457 0500 0504 0507 0511 0
514 0518 0522 0526 0529 0533 0537 0541 0544 0548 0552 0555 0600 0602 0606 0609 0613 0616 0619 0623 0627 0629 0632 0635 0638 0641 0644 0649 0651 0654 0657 0700 0703 0705 0708 0711 0714 0717 0720 0724 0727 0730 0733 0735 0738 0
742 0745 0748 0751 0754 0805 0808 0811 0814 0818 0821 0824 0827 0831 0834 0838 0841 0844 0848 0851 0856 0858 0901 0905 0909 0912 0916 0919 0923 0926 0930 0933 1125 1129 1137 1140 1143 1147 1150 1153 1156 1200 1203 1206 1209 1
213 1216 1331 1340 1344 1347 1351 1354 1357 1401 1404 1408 1411 1415 1419 1422 1426 1429 1432 1436 1439 1442 1445 1448 1452 1455 1458 1502 1505 1508 1512 1515 1519 1522 1526 1530 1534 1537 1541 1545 1548 1551 1554 1557 1600 1
612 1616 1619 1623 1626 1630 1633 1636 1639 1643 1647 1651 1654 1657 1700 1704 1708 1711 1715 1718 1722 1726 1729 1733 1737 1740 1744 1748 1751 1754 1757 1800 1804 1807 1810 1813 1816 1820 1823 1826 1830 1834 1838 1842 1845 1
849 1853 1857 1900 1904 1908 1911 1914 1917 1920 1924 1927 1930 1934 1937 1941 1944 1947 1951 1955 1958 2002 2005 2009 2012 2016 2020 2023 2027 2030 2034 2037 2040 2043 2046 2049 2053 2057 2101 2104 2108 2112 2115 2118 2121 2
125 2129 2132 2137 2140 2143 2146 2150 2153 2156 2200 2204 2208 2211 2215 2218 2222 2225 2229 2232 2236 2240 2243 2247 2250 2254 2257 2301 2304 2308 2311 2315 2318 2322 2326 2329 2333 2336 2340 2344 2347 2351 2355 2358
do
 f=Camera_Nano_Ceta_${n}_850_mm.emd
 vr=`awk '/'$f'/{print $2}' ../valid_ranges.txt`
 test "$vr" == "0,0" && echo "skipping $f" && continue
 test -e dials_$n/imported.expt && continue
 mkdir -pv dials_$n
 qsub -S /bin/bash -V -wd $PWD/dials_$n -j y -pe smp 4 <<+
dials.import $root/Camera_Nano_Ceta_${n}_850_mm.emd geometry.detector.panel.origin=-29.43864,28.53116,-819.71 image_range=$vr mask=$mask goniometer.axes=1,0,0 slow_fast_beam_centre=1024,1055
DXTBX_EMD_OFFSET=200 dials.find_spots nproc=\$NSLOTS threshold.dispersion.global_threshold=200  d_max=8 imported.expt
#dials.search_beam_position imported.expt strong.refl nproc=\$NSLOTS
test -e optimised.expt && expt=optimised.expt || expt=imported.expt
orgd=\$PWD
for meth in fft1d fft3d
do
 for ameth in simple local
 do
  cd \$orgd
  suf=\${meth}_\${ameth}
  dials.index \$expt strong.refl detector.fix=distance index_assignment.method=\$ameth indexing.method=\$meth max_lattices=1 \
              output.experiments=indexed_\${suf}.expt output.reflections=indexed_\${suf}.refl output.log=dials.index_\${suf}.log || continue
  nlatt=\`grep -c "__id__.*crystal" indexed_\${suf}.expt\`
  for i in \`seq 1 \$nlatt\`
  do
   mkdir refine_bravais_settings_\${suf}
   dials.refine_bravais_settings indexed_\${suf}.expt indexed_\${suf}.refl crystal_id=\$i detector.fix=distance output.prefix=\${i}_ output.directory=refine_bravais_settings_\${suf} log=refine_bravais_settings_\${suf}/\${i}_d
ials.refine_bravais_settings.log
  done

  mkdir \$suf
  cd \$suf
  dials.refine ../indexed_\${suf}.expt ../indexed_\${suf}.refl detector.fix=distance scan_varying=true 
  test -e refined.expt && prefix=refined || prefix=../indexed_\${suf}
  DXTBX_EMD_OFFSET=200 dials.integrate \${prefix}.expt \${prefix}.refl nproc=\$NSLOTS profile.gaussian_rs.min_spots.per_degree=0 profile.gaussian_rs.min_spots.overall=0 || continue
  
  mkdir exported_0
  cd exported_0
  dials.export ../integrated.expt ../integrated.refl  format=xds_ascii 
  echo setting c2 | pointless DIALS.HKL > pointless.log
  cd ..
  for i in \`seq 0 \$((nlatt-1))\`
  do
   test -e exported_0/DIALS_\${i}.HKL || continue
   mkdir exported_\$i
   ln -s ../exported_0/DIALS_\${i}.HKL exported_\$i/DIALS.HKL
   echo setting c2 | pointless exported_\$i/DIALS.HKL > exported_\$i/pointless.log
  done
 done
done
+

done

4. Merging

I decided to use results from fft1d_simple. Let us prepare files for merging.

kamo.multi_prep_merging workdir=kamo_1_fft1d_simple xdsdir=dials_\*/fft1d_simple/exported_0/

It should say:

Making groups from 338 results

[ 1] 324 members:
 Averaged P1 Cell= 11.95 12.27 14.42 110.82 100.67 94.80
 Possible symmetries:
   freq symmetry     a      b      c     alpha  beta   gamma reindex
    324 P 1         11.95  12.27  14.42 110.82 100.67  94.80 a,b,c
      0 C 1 2 1     27.02  12.27  11.95  94.80 103.62  85.70 -b-2*c,-b,-a

[ 2] 8 members:
 Averaged P1 Cell= 9.13 14.91 16.03 114.24 81.21 100.85
 Possible symmetries:
   freq symmetry     a      b      c     alpha  beta   gamma reindex
      8 P 1          9.13  14.91  16.03  65.76  81.21  79.15 -a,b,-c
      0 C 1 2 1     29.51   9.13  16.83  91.22 119.61  96.84 -a-2*b,a,b+c
      0 C 1 2 1     18.97  25.75   9.13  81.35 117.54  89.71 -a-b-c,a+b-c,a
      0 C 1 2 1     26.00  16.83   9.13  88.78 101.67  85.47 -b+c,b+c,-a
      0 I 2 2 2      9.13  16.83  25.75  95.01  81.35  91.22 -a,-b-c,-a-b+c
...

So I chose group 1 and P1 symmetry.

Then go to kamo_1_fft1d_simple/ directory and edit merge_blend.sh. I edited like this:

#!/bin/sh
# settings
dmin=0.8 # resolution
anomalous=false # true or false
lstin=formerge.lst # list of XDS_ASCII.HKL files
use_ramdisk=true # set false if there is few memory or few space in /tmp
# _______/setting

kamo.multi_merge \
        workdir=blend_${dmin}A_framecc_b+B \
        lstin=${lstin} d_min=${dmin} anomalous=${anomalous} \
        space_group=None reference.data=None \
        program=xscale xscale.reference=bmin xscale.degrees_per_batch=1 \
        reject_method=framecc+lpstats rejection.lpstats.stats=em.b+bfactor \
        clustering=blend blend.min_cmpl=80 blend.min_redun=8 blend.max_LCV=None blend.max_aLCV=None \
        max_clusters=None xscale.use_tmpdir_if_available=${use_ramdisk} \
        batch.engine=sge batch.par_run=merging batch.nproc_each=8 nproc=8 batch.sge_pe_name=smp

I took the second largest cluster based on the CC1/2. Here is the snippet of cluster_summary.dat:

     cluster    ClH  LCV aLCV run ds.all ds.used  Cmpl Redun I/sigI Rmeas CC1/2 Cmpl.ou Red.ou I/sig.ou Rmeas.ou CC1/2.ou Cmpl.in Red.in I/sig.in Rmeas.in CC1/2.in SigAno.in CCano.in WilsonB Aniso.bst Aniso.wst dmin.est
cluster_0323 101.72  4.7  0.7   1    324     324  95.9 205.2  29.89  55.3  99.5    90.7  204.1     0.00    -99.9     11.6    97.2  187.8   175.64      7.3    100.0       1.2    -23.0    3.27      0.79      1.65 0.87
cluster_0323 101.72  4.7  0.7   2    324     324  94.0 183.2  30.32  34.0  99.5    89.1  180.9     3.15    847.1      9.6    95.1  171.5   175.33      6.6     99.9       1.1    -21.0    3.20      0.79      2.15 0.84
cluster_0323 101.72  4.4  0.7   3    324     279  96.0 166.6  27.02  57.7  98.9    96.4  166.2     1.67    882.0     40.5    93.7  154.6   174.55      6.3    100.0       1.1    -23.0    1.21      0.63      2.14 0.77
cluster_0322  61.64  3.3  0.5   1    236     236  96.5 149.4  24.51  59.4  99.5    92.2  149.0     0.00    -99.9     22.4    97.2  137.9   143.68      7.5     99.9       1.1    -19.0    3.23      0.84      7.12 0.87
cluster_0322  61.64  3.3  0.5   2    236     236  97.6 133.0  23.72  36.2  99.5    97.5  132.3     1.73   1095.2     23.3    95.5  125.7   144.13      6.8     99.9       1.1    -22.0    3.15      0.75      2.07 0.82
cluster_0322  61.64  3.3  0.5   3    236     202  96.5 119.4  22.08  53.3  99.0    96.8  119.1     1.35   1061.2     52.5    94.4  111.7   141.02      6.5    100.0       1.1    -20.0    1.60      0.61      4.67 0.78
cluster_0321  49.56  3.1  0.5   1    162     162  98.1 100.7  19.01  64.1  98.3    96.4  100.6     0.00    -99.9      3.4    97.2   93.0   115.72      7.6     99.9       1.1    -22.0    3.25      0.82      1.77 0.87
cluster_0321  49.56  3.1  0.5   2    162     162  97.0  89.7  18.96  37.8  99.5    95.6   89.3     1.25   1527.7      3.1    93.7   86.3   119.97      6.9     99.9       1.0    -25.0    3.29      0.80      1.74 0.85
cluster_0321  49.56  2.8  0.5   3    162     140  95.3  82.3  17.86  58.4  98.5    95.5   82.2     0.96   1598.7     17.7    93.0   77.4   119.27      6.6     99.9       1.1    -23.0    1.63      0.65      1.71 0.80
...

Here is the statistics from blend_0.8A_framecc_b+B/cluster_0322/run_03/.

 UNIT_CELL_CONSTANTS=    11.95    12.27    14.41 110.734 100.784  94.739

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     2.40       30150     270       286       94.4%       6.4%      5.2%    30147  141.02      6.5%   100.0*   -20    1.123     256
     1.70       60819     514       536       95.9%      12.2%      9.7%    60811   82.10     12.3%    99.8*   -28    0.910     491
     1.39       79594     659       681       96.8%      21.6%     20.5%    79588   44.19     21.7%    99.6*   -11    0.849     635
     1.20       93670     785       818       96.0%      34.6%     38.7%    93663   27.14     34.7%    97.6*     5    0.801     755
     1.07      113132     949       982       96.6%      44.0%     55.8%   113118   19.66     44.2%    97.0*     8    0.754     909
     0.98      114970     961       994       96.7%      95.4%    143.6%   114955    8.55     95.8%    90.1*     8    0.617     917
     0.91      123713    1022      1057       96.7%     183.4%    296.3%   123704    4.66    184.3%    68.7*     7    0.551     984
     0.85      139906    1167      1204       96.9%     315.5%    528.2%   139884    2.89    316.9%    49.0*     6    0.485    1111
     0.80      149358    1254      1295       96.8%    1056.3%   1792.5%   149338    1.35   1061.2%    52.5*    10    0.430    1203
    total      905312    7581      7853       96.5%      53.1%     77.2%   905208   22.08     53.3%    99.0*     1    0.651    7261

Anisotropy:
           direction B_eigen Resol(CC1/2=0.5)
0.91a*-0.39b*-0.16c*    0.99 0.61
0.30a*+0.86b*-0.41c*    0.50 0.77
    0.06a*-0.07b*+c*    4.10 4.67

Note that actual space group is P-1.

Clone this wiki locally