# Preprocess ICRP-107 to create dataset for radioactivedecay

### Introduction

Decay datasets for radioactivedecay are saved as three files:
- `c.npz`: pre-calculated sparse matrix *C* (Amaku et al. (2010)) (NPZ NumPy compressed array format)
- `cinverse.npz`: pre-calculated sparse matrix *C<sup>-1</sup>* (inverse of *C*) (NPZ NumPy compressed array format)
- `radionuclides_decay_consts.npz`: Two NumPy arrays containing radionuclide names and the decay constants (NPZ NumPy compressed array format)

This notebook creates input files using the data in <a href="http://www.icrp.org/publication.asp?id=ICRP%20Publication%20107">ICRP 107: Nuclear Decay Data for Dosimetric Calculations</a>.

### Initial set up and read in ICRP-107 data into a DataFrame

First load the necessary Python modules.

In [1]:
import pandas as pd
import numpy as np
from scipy import sparse
import fortranformat as ff
import re
import requests, zipfile, io, shutil

Now we need to download and read in the data from the ICRP-07.NDX data file provided as a supplement to ICRP 107. First read a prepared CSV file listing all elements, their symbols and atomic numbers.

In [2]:
elements = pd.read_csv("element_list.csv", index_col="Symbol")[["Element","Z"]]
elements.head()

Unnamed: 0_level_0,Element,Z
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1
H,Hydrogen,1
He,Helium,2
Li,Lithium,3
Be,Beryllium,4
B,Boron,5


Define functions to
* convert half-life in units of μs, ms, m, h, d, y into seconds;
* return atomic number and mass number from a radionuclide string.

In [3]:
year = 365.2422 # days in year used for conversion by ICRP 107 (see JAERI 1347 & JAEA-Data/Code 2007-021)
num_nuclides = 1252

def convert_half_life(halflife, unit):
    units = {"us":1.0E-6, "ms":1.0E-3, "s":1.0, "m":60.0, "h":60.0*60.0,
             "d":60.0*60.0*24.0, "y":60.0*60.0*24.0*year}
    return float(halflife)*units[unit]

def get_Z_A(radionuclide):
    [Z, A] = radionuclide.split("-")
    Z = elements.loc[Z, "Z"]
    if A[-1].isalpha():
        A = A[:-1]
    return Z, int(A)

Prepare a pandas DataFrame for the ICRP 107 decay data.

In [4]:
icrp_col_names = ["Radionuclide", "Element", "Z", "A", "Metastable_state", "Half-life_s",
                  "Num_decay_modes", "Mode_1", "Fraction_1", "Progeny_1", "Mode_2",
                  "Fraction_2", "Progeny_2", "Mode_3", "Fraction_3", "Progeny_3",
                  "Mode_4", "Fraction_4", "Progeny_4"]
icrp = pd.DataFrame(columns=icrp_col_names)

Download ICRP 107 Supplemental Material and read data from ICRP-07.NDX file line by line into the DataFrame.

In [5]:
url = "https://journals.sagepub.com/doi/suppl/10.1177/ANIB_38_3/suppl_file/P107JAICRP_38_3_Nuclear_Decay_Data_suppl_data.zip"
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()

file_NDX = open("P 107 JAICRP 38(3) Nuclear Decay Data for Dosimetric Calculations(supplementary data)/ICRP-07.NDX", "r")
file_NDX.readline()

# fortran format of data in the ICRP 107 Index File (Table 1 footnote, ICRP107)
ffline = ff.FortranRecordReader("(a7,a8,a2,a8,3i7,i6,1x,3(a7,i6,e11.0,1x),a7,i6,e11.0,f7.0,2f8.0,3i4,i5,i4,e11.0,e10.0,e9.0)")
rows = []
for i in range(0, num_nuclides):
    line = ffline.read(file_NDX.readline())
    line = [i.strip() if isinstance(i,str) else i for i in line]
    add = {"Radionuclide": line[0]}
    add["Element"] = add["Radionuclide"].split("-")[0]
    add["Z"], add["A"] = get_Z_A(add["Radionuclide"])
    if add["Radionuclide"][-1].isalpha(): 
        add["Metastable_state"] = add["Radionuclide"][-1]
    add["Half-life_s"] = convert_half_life(line[1],line[2])
    
    # parse decay modes and progeny
    modes = re.findall(r"(A|B\-|ECB\+|EC|IT|SF)",line[3])
    j=0
    while j < 4 and line[8+j*3] != "":
        add["Fraction_" + str(j+1)] = line[8+2+j*3]
        add["Progeny_" + str(j+1)] = line[8+j*3]
        if add["Progeny_" + str(j+1)] == "SF": 
            add["Mode_" + str(j+1)] = "SF"
        else:
            Z, A = get_Z_A(add["Progeny_" + str(j+1)])
            if add["Z"] == Z and add["A"] == A: add["Mode_" + str(j+1)] = "IT"
            elif add["Z"] - 2 == Z and add["A"] - 4 == A: add["Mode_" + str(j+1)] = "α"
            elif add["Z"] + 1 == Z and add["A"] == A: add["Mode_" + str(j+1)] = "β-"
            elif add["Z"] - 1 == Z and add["A"] == A: 
                if "EC" in modes: add["Mode_" + str(j+1)] = "EC"
                else: add["Mode_" + str(j+1)] = "β+ & EC"
        j += 1
    add["Num_decay_modes"] = j
    
    rows.append(add)
icrp = icrp.append(rows, ignore_index=True)
file_NDX.close()

shutil.rmtree("P 107 JAICRP 38(3) Nuclear Decay Data for Dosimetric Calculations(supplementary data)")

Remove NaN values, set DataFrame index to the radionuclide string, and check completed DataFrame. Export completed DataFrame to CSV file for analysis elsewhere.

In [6]:
icrp = icrp.replace(np.nan, "", regex=True)
icrp.set_index("Radionuclide", inplace=True)
icrp.to_csv("icrp.csv", index=True)
icrp.head()

Unnamed: 0_level_0,Element,Z,A,Metastable_state,Half-life_s,Num_decay_modes,Mode_1,Fraction_1,Progeny_1,Mode_2,Fraction_2,Progeny_2,Mode_3,Fraction_3,Progeny_3,Mode_4,Fraction_4,Progeny_4
Radionuclide,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Ac-223,Ac,89,223,,126.0,1,α,0.99,Fr-219,,,,,,,,,
Ac-224,Ac,89,224,,10008.0,2,EC,0.909,Ra-224,α,0.091,Fr-220,,,,,,
Ac-225,Ac,89,225,,864000.0,1,α,1.0,Fr-221,,,,,,,,,
Ac-226,Ac,89,226,,105732.0,3,β-,0.83,Th-226,EC,0.17,Ra-226,α,6e-05,Fr-222,,,
Ac-227,Ac,89,227,,687057400.0,2,β-,0.9862,Th-227,α,0.0138,Fr-223,,,,,,


### Order ICRP DataFrame so progeny always come below their parent

The radionuclides need to be ordered so that the progeny (daughters) are always lower in the DataFrame than their parent. This is so the subsequent matrices that we create are lower triangular.

To achieve this we first count how many times each radioactive decay process occurs in the ICRP 107 dataset.

In [7]:
print("β+ or electron capture:", icrp.stack().value_counts()["β+ & EC"]
      + icrp.stack().value_counts()["EC"])
print("β-:", icrp.stack().value_counts()["β-"])
print("α:", icrp.stack().value_counts()["α"])
print("Isomeric Transition (IT):", icrp.stack().value_counts()["IT"])
print("Spontaneous Fission (SF):", (icrp.stack().value_counts()["SF"]/2).astype(np.int64))

β+ or electron capture: 684
β-: 539
α: 183
Isomeric Transition (IT): 178
Spontaneous Fission (SF): 28


The outcomes of these decay processes are as follows:
- β+ or electron capture (EC): $\mathrm{^{A}_{Z}X} \rightarrow \mathrm{^{A}_{Z-1}Y}$
- β- decay: $\mathrm{^{A}_{Z}X} \rightarrow \mathrm{^{A}_{Z+1}Y}$
- α decay: $\mathrm{^{A}_{Z}X} \rightarrow \mathrm{^{A-4}_{Z-2}Y}$
- IT decay: $\mathrm{^{Am}_{Z}X} \rightarrow \mathrm{^{A}_{Z}X}$ or $\mathrm{^{An}_{Z}X} \rightarrow \mathrm{^{A}_{Z}X}$
- SF decay: The ICRP-107 dataset does not contain data for the outcomes (progeny) from spontaneous fission decays

We order by decreasing mass number (A), followed by decreasing atomic number (Z) (as there are more Beta+ and EC decays than Beta- decays), then by decreasing isomer index (n, m, ground state).

In [8]:
icrp.sort_values(by=["A", "Z", "Metastable_state"], inplace=True, ascending=[False, False, False])
icrp.head()

Unnamed: 0_level_0,Element,Z,A,Metastable_state,Half-life_s,Num_decay_modes,Mode_1,Fraction_1,Progeny_1,Mode_2,Fraction_2,Progeny_2,Mode_3,Fraction_3,Progeny_3,Mode_4,Fraction_4,Progeny_4
Radionuclide,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Fm-257,Fm,100,257,,8683200.0,2,α,0.9979,Cf-253,SF,0.0021,SF,,,,,,
Fm-256,Fm,100,256,,9456.0,2,α,0.081,Cf-252,SF,0.919,SF,,,,,,
Es-256,Es,99,256,,1524.0,1,β-,1.0,Fm-256,,,,,,,,,
Fm-255,Fm,100,255,,72252.0,2,α,1.0,Cf-251,SF,2.3e-07,SF,,,,,,
Es-255,Es,99,255,,3438720.0,3,β-,0.92,Fm-255,α,0.08,Bk-251,SF,4.5e-05,SF,,,


Now it is necessary to correct the positions of the remaining radionuclides that are still incorrectly ordered. This is achieved by looping over all the radionuclides in the table, and checking if their progeny are located lower in the table or not. If not, the parent and progeny row positions are switched. This takes a few passes until all progeny are correctly located below their parents.

In [9]:
nuclide_list = list(icrp.index)
swapping = 1
while swapping >= 1:
    swaps = 0
    for parent in nuclide_list:
        for i in range(0, icrp.at[parent, "Num_decay_modes"]):
            if (icrp.at[parent, "Mode_" + str(i+1)] in ["stable", "SF"]): continue
            progeny = icrp.at[parent, "Progeny_" + str(i+1)]
            if (progeny not in nuclide_list): continue
            j = nuclide_list.index(parent)
            k = nuclide_list.index(progeny)
            if  j > k:
                nuclide_list[j], nuclide_list[k] = nuclide_list[k], nuclide_list[j]
                icrp = icrp.reindex(index=nuclide_list)
                
                swaps +=1
    print("Iteration", swapping, "number of swaps:", swaps)
    swapping += 1
    if swaps == 0: swapping = 0

Iteration 1 number of swaps: 265
Iteration 2 number of swaps: 81
Iteration 3 number of swaps: 22
Iteration 4 number of swaps: 4
Iteration 5 number of swaps: 0


The sorted DataFrame looks like this. Note this is just one of many possible solutions for sorting the DataFrame.

In [10]:
icrp.head()

Unnamed: 0_level_0,Element,Z,A,Metastable_state,Half-life_s,Num_decay_modes,Mode_1,Fraction_1,Progeny_1,Mode_2,Fraction_2,Progeny_2,Mode_3,Fraction_3,Progeny_3,Mode_4,Fraction_4,Progeny_4
Radionuclide,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Fm-257,Fm,100,257,,8683200.0,2,α,0.9979,Cf-253,SF,0.0021,SF,,,,,,
Es-256,Es,99,256,,1524.0,1,β-,1.0,Fm-256,,,,,,,,,
Fm-256,Fm,100,256,,9456.0,2,α,0.081,Cf-252,SF,0.919,SF,,,,,,
Cf-255,Cf,98,255,,5100.0,1,β-,1.0,Es-255,,,,,,,,,
Es-255,Es,99,255,,3438720.0,3,β-,0.92,Fm-255,α,0.08,Bk-251,SF,4.5e-05,SF,,,


### Make the *&Lambda;* matrix

Now we make the sparse lower triangular matrix *&Lambda;*, which captures the decay pathways and branching relations between the radionuclides. _&Lambda;_ is set up based on Eq. (6) in Amaku et al. (2010). The diagonal elements are all *-&lambda;<sub>jj</sub>*, i.e. negative decay constant for each radionuclide. The off-diagonal elements are all of the form *BF<sub>ij</sub>&times;&lambda;<sub>jj</sub>* for *i* > *j*, where *BF<sub>ij</sub>* is the branching fraction from parent *j* to progeny *i*. The non-zero elements beneath the *jj*<sup>th</sup> element in each column are first progeny of radionuclide *j*.

In [11]:
rows = np.array([], dtype=np.int)
cols = np.array([], dtype=np.int)
data = np.array([], dtype=np.float)
lambdas = []
ln2 = np.log(2)

for parent in nuclide_list:
    j = nuclide_list.index(parent)
    rows = np.append(rows, [j])
    cols = np.append(cols, [j])
    lambd = ln2/icrp.at[parent, "Half-life_s"]
    data = np.append(data, -lambd)
    lambdas = np.append(lambdas, lambd)
    for d in range(0, icrp.at[parent, "Num_decay_modes"]):
        if (icrp.at[parent, "Mode_" + str(d+1)] in ["stable", "SF"]): continue
        progeny = icrp.at[parent, "Progeny_" + str(d+1)]
        if (progeny not in nuclide_list): continue
        i = nuclide_list.index(progeny)
        rows = np.append(rows, [i])
        cols = np.append(cols, [j])
        data = np.append(data, [lambd*icrp.at[parent, "Fraction_" + str(d+1)]])

lambda_mat = sparse.csc_matrix((data, (rows, cols)))
print(lambda_mat)

  (0, 0)	-7.982623693568561e-08
  (11, 0)	7.965860183812067e-08
  (1, 1)	-0.00045482098461938666
  (2, 1)	0.00045482098461938666
  (2, 2)	-7.330236681048491e-05
  (14, 2)	5.937491711649278e-06
  (3, 3)	-0.00013591121187449907
  (4, 3)	0.00013591121187449907
  (4, 4)	-2.0157127668433178e-07
  (5, 4)	1.8544557454958525e-07
  (18, 4)	1.6125702134746543e-08
  (5, 5)	-9.593467039804369e-06
  (19, 5)	9.593467039804369e-06
  (6, 6)	-4.899259121854292e-06
  (8, 6)	4.801273939417206e-06
  (9, 6)	3.723436932609262e-09
  (23, 6)	1.5677629189933735e-08
  (7, 7)	-2.9098791483628596e-08
  (8, 7)	5.063189718151376e-14
  (23, 7)	2.9098791483628596e-08
  (8, 8)	-5.942619860767707e-05
  (24, 8)	5.939113715049854e-05
  (9, 9)	-1.326039142485356e-07
  (22, 9)	4.110721341704604e-10
  (10, 10)	-2.6741789373454678e-06
  :	:
  (1229, 1229)	-0.004624680948491762
  (1230, 1230)	-0.0017610446660567716
  (1231, 1231)	-9.205875375992048e-06
  (1232, 1231)	9.205875375992048e-06
  (1232, 1232)	-0.005154123766098162


### Calculate the matrices *C* and *C<sup>-1</sup>*

We now need to make the sparse matrices *C* and *C<sup>-1</sup>*, which are given by Eqs. (10) and (13) in Amaku et al. (2010), respectively. The diagonal elements of both matrices are 1. *C* and *C<sup>-1</sup>* differ from *&Lambda;* in that there are non-zero elements beneath the *jj*<sup>th</sup> element in each column for all progeny of *j*, i.e. everything in its full decay chain, not just the immediate daughters.

Therefore we have to find all the progeny in the decay chain of each radionuclide. We do this by looping backwards over each column in *&Lambda;* to build up lists of the radionuclides in the decay chain of each parent. We then set up the basic structure (i.e. define the non-zero elements) of sparse matrices *C* and *C<sup>-1</sup>*.

In [12]:
rows_dict = {}
for i in range(num_nuclides-1, -1, -1):
    a,_ = lambda_mat[:,i].nonzero()
    b = a
    for j in a: 
        if j > i: 
            b = np.unique(np.concatenate((b,rows_dict[j])))
    rows_dict[i] = b

rows_C = np.array([], dtype=np.int)
cols_C = np.array([], dtype=np.int)
for i in range(0, num_nuclides):
    rows_C = np.concatenate((rows_C,rows_dict[i]))
    cols_C = np.concatenate((cols_C,np.array([i]*len(rows_dict[i]))))

C = sparse.csc_matrix((np.array([0.0]*rows_C.size, dtype=np.float64), (rows_C, cols_C)))
inv_C = sparse.csc_matrix((np.array([0.0]*rows_C.size, dtype=np.float64), (rows_C, cols_C)))

Now calculate *C* and *C<sup>-1<sup>*. Note that only the non-zero elements of *C<sub>kj</sub>* and *C<sup>-1</sup><sub>kj</sub>*  need to be considered for the sums in Eqs. (10) and (13) of Amaku et al. (2010).

In [13]:
for index in range(0, rows_C.size):
    i = rows_C[index]
    j = cols_C[index]
    if i == j: C[i,i] = 1.0
    else:
        sigma = 0.0
        for k in rows_dict[j]:
            if k == i: break
            sigma += lambda_mat[i,k]*C[k,j]
        C[i,j] = sigma/(lambda_mat[j,j]-lambda_mat[i,i])
        if abs((lambda_mat[j,j]-lambda_mat[i,i])/lambda_mat[j,j]) < 1E-2: print(nuclide_list[i],nuclide_list[j])

print(C)

Tb-149m Dy-149
Tc-94m Ru-94
  (0, 0)	1.0
  (11, 0)	0.2149304510823558
  (12, 0)	0.3092549887757277
  (26, 0)	1.6673333823311864e-06
  (27, 0)	-2.1886219841968786
  (28, 0)	0.6670578044856869
  (44, 0)	-8.221228332117224e-09
  (45, 0)	-0.0005132711188135064
  (63, 0)	1.6940313550395857e-08
  (64, 0)	-3.25029328435887e-10
  (84, 0)	5.729981712733799e-16
  (85, 0)	1.9839845345547896e-13
  (107, 0)	9.337894514086377e-21
  (108, 0)	-3.4800312259121484e-20
  (125, 0)	6.015080863868941e-26
  (140, 0)	3.925030408278183e-31
  (141, 0)	2.925325789105694e-31
  (155, 0)	9.954570634613105e-35
  (168, 0)	1.0936484105239762e-38
  (180, 0)	9.263620642376127e-34
  (181, 0)	1.3926316472183365e-42
  (204, 0)	9.177379854562548e-37
  (205, 0)	3.971300363297629e-33
  (1, 1)	1.0
  (2, 1)	-1.1921331316187593
  :	:
  (1229, 1229)	1.0
  (1230, 1230)	1.0
  (1231, 1231)	1.0
  (1232, 1231)	0.0017893143431099263
  (1232, 1232)	1.0
  (1233, 1233)	1.0
  (1234, 1234)	1.0
  (1235, 1235)	1.0
  (1236, 1235)	-1.0037800841

In [14]:
for index in range(0, rows_C.size):
    i = rows_C[index]
    j = cols_C[index]
    if i == j: inv_C[i,i] = 1.0
    else:
        sigma = 0.0
        for k in rows_dict[j]:
            if k == i: break
            sigma -= C[i,k]*inv_C[k,j]
        inv_C[i,j] = sigma 

print(inv_C)

  (0, 0)	1.0
  (11, 0)	-0.2149304510823558
  (12, 0)	-1.9581224669792963
  (26, 0)	3.4400701404568286e-09
  (27, 0)	1.616743290326295
  (28, 0)	1.0015587901034935
  (44, 0)	-3.535069754207345e-17
  (45, 0)	1.0410539561107375
  (63, 0)	7.881397760887759e-05
  (64, 0)	-0.2951840014198052
  (84, 0)	-1.1638463530363777e-21
  (85, 0)	1.002264400186415
  (107, 0)	-2.6469779601696886e-23
  (108, 0)	-0.08498326646636585
  (125, 0)	-0.001172501058839104
  (140, 0)	-4.1359030627651384e-24
  (141, 0)	2.650354207567864e-24
  (155, 0)	1.5311182909296246e-28
  (168, 0)	7.642026357090839e-32
  (180, 0)	5.6100437910958565e-27
  (181, 0)	-7.653252456969543e-36
  (204, 0)	-1.100276461233485e-29
  (205, 0)	4.7696652034816774e-26
  (1, 1)	1.0
  (2, 1)	1.1921331316187593
  :	:
  (1229, 1229)	1.0
  (1230, 1230)	1.0
  (1231, 1231)	1.0
  (1232, 1231)	-0.0017893143431099263
  (1232, 1232)	1.0
  (1233, 1233)	1.0
  (1234, 1234)	1.0
  (1235, 1235)	1.0
  (1236, 1235)	1.0037800841012794
  (1236, 1236)	1.0
  (1237, 

### Save the outputs

Write output files containing *C* and *C<sup>-1</sup>* in SciPy sparse format. Write out file containing NumPy arrays with the radionuclide names, the decay constants (s<sup>-1</sup>), dictionaries containing the progeny and the branching fractions, and the days to year conversion number.

In [15]:
sparse.save_npz("./c.npz", C)
sparse.save_npz("./cinverse.npz", inv_C)

prog_bfs_modes = np.array([{}]*len(nuclide_list))
for i in range(0, len(nuclide_list)):
    bfs = {}
    modes = {}
    for d in range(0, icrp.at[nuclide_list[i], "Num_decay_modes"]):
        progeny = icrp.at[nuclide_list[i], "Progeny_" + str(d+1)]
        bfs[progeny] = icrp.at[nuclide_list[i], "Fraction_" + str(d+1)]
        modes[progeny] = icrp.at[nuclide_list[i], "Mode_" + str(d+1)]
    bfs = {key: value for key, value in sorted(bfs.items(), key=lambda x: x[1], reverse=True)}
    
    prog_bfs_modes[i] = {progeny: [bf, modes[progeny]] for progeny, bf in bfs.items()}

np.savez_compressed("./decay_data.npz", radionuclides=np.array(nuclide_list),
                    decay_consts=lambdas, prog_bfs_modes=prog_bfs_modes,
                    year_conv=year)