# Estimating molecular volumes to aid in powder X-ray diffraction indexing
> An overview of using database-derived atomic volumes to aid PXRD indexing.

- toc: true
- badges: false
- comments: true
- categories: [PXRD, Indexing]
- author: Mark Spillman
- image: images/Volumes.png

# Introduction

An [article](http://scripts.iucr.org/cgi-bin/paper?S0108768101021814) published in 2001 by D. W. M. Hofmann describes how crystallographic databases can be used to derive the average volume occupied by atoms of each element in crystal structures. Using his tabulated values, it's possible to rapidly estimate the volume occupied by a given molecule, and use this to aid indexing of powder diffraction data. This is particularly useful for labotartory diffraction data, which is generally associated with lower figures of merit such as de Wolff's $M_{20}$ and Smith and Snyder's $F_N$, which can make discriminating between alternative options more challenging. Other volume estimation methods, notably the 18 Å rule are also commonly used, though Hofmann's volumes give generally more accurate results.

I've put together a freely available web-app, *HofCalc*, which should display reasonably well on mobile devices as well as PCs/laptops. You can access it at the following address:

[https://hofcalc.herokuapp.com](https://hofcalc.herokuapp.com)

![](images/HofCalc.png)

This post will explain how it works, and will look at some examples of how it can be used in practice. I'm grateful to Norman Shankland who provided invaluable feedback and assistance with debugging of the app.

# Hofmann volumes

After applying various filters to crystal structures deposited in the CSD, Hofmann ended up with a dataset comprised of 182239 structures. Hofmann only considers the elements up to atomic number 100 (fermium) in his work, and assumes that the volume of the unit cell is equivalent to:

$$V_{est} = \sum\limits_{i=1}^{100} n_i\bar{v_i}(1+\bar{\alpha}T) = \bold{n\bar{v}}(1+\bar{\alpha}T)$$

Where $n_i$ is the number of atoms of element $i$ in the unit cell, and $\bar{v_i}$ is the average volume occupied by an atom of element $i$. This equation also assumes that the atomic volume varies linearly with temperature.

He split the dataset into 20 subsets, then applied linear regression to solve the above equation for each of the subsets. This allowed him to find the average volumes occupied by atoms of each element, and due to the splitting of the data into subsets, he also obtains their standard deviations. His use of a temperature parameter allowed him to provide volumes for all of the elements represented in the CSD at 298 K.

You can download a ```.json``` file containing his volumes [here](https://github.com/mspillman/blog/blob/master/_notebooks/files/Hofmann-volumes.json).

The coefficient of thermal expansion, $\bar{\alpha}$, was found to be $0.95 \times 10^{-4} K^{-1}$.

# Comparison to other atomic volumes

Let's compare Hofmann's volumes to those obtained from other sources. To do that, I downloaded the [atomic radii data](https://en.wikipedia.org/wiki/Atomic_radii_of_the_elements_(data_page)) from wikipedia, which I've saved as an Excel spreadsheet which you can download [here](https://github.com/mspillman/blog/blob/master/_notebook/files/wikipedia_radii.xlsx). We'll convert these radii into volumes, and plot them alongside Hofmann's volumes.

The chart below will let you highlight the different types of volumes.

In [2]:
#collapse-hide
import json
import pandas as pd
import numpy as np
import altair as alt

with open("files/Hofmann-volumes.json") as hv:
    hofmann_volumes = json.load(hv)
hv.close()

vols = []
for i, key in enumerate(hofmann_volumes.keys()):
    vols.append([i+1, key, hofmann_volumes[key]])

df = pd.DataFrame(vols)
df.columns = ["Atomic number", "Element", "Hofmann"]
df.reset_index(drop=True, inplace=True)
df.replace("N/A", np.NaN, inplace=True)

wikiradii = pd.read_excel("files/wikipedia_radii.xlsx")
wikiradii.replace("", np.NaN, inplace=True)

radtype = ["Empirical","Calculated","vdW","Covalent-single","Covalent-triple",
        "Metallic"]
for r in radtype:
    # Radii are in pm so /100 to convert to angstroms.
    df[r] = (4*np.pi/3)*(wikiradii[r].values.astype(float)/100)**3


# Convert our dataframe to long-form as this is what is expected by altair
dflong = df.melt("Atomic number", var_name="Volume",
                value_vars=["Hofmann"] + radtype)

# Restore the element symbols to the long dataframe
element = []
for an in dflong["Atomic number"]:
    element.append(df["Element"][df["Atomic number"] == an].item())
dflong["Element"] = element

# Select tool modifies opacity of plotted points
selection = alt.selection_single(
    name='Select type of',
    fields=['Volume'],
    init={'Volume': "Hofmann"},
    bind={'Volume': alt.binding_select(options=["Hofmann"] + radtype)}
)

# scatter plot, modify opacity based on selection
alt.Chart(dflong).mark_point().add_selection(
    selection
).encode(
    x=alt.X('Element:N',sort=dflong["Atomic number"].values),
    y=alt.Y("value:Q", axis=alt.Axis(title='Volume / Å³')),
    tooltip=['Element', 'Volume:N', 'value'],
    opacity=alt.condition(selection, alt.value(1.0), alt.value(.1)),
    color="Volume:N"
).properties(width=850, height=500).configure_axis(
    grid=False
).configure_view(
    strokeWidth=0
)


As can be seen, for most of the elements, Hofmann's CSD-derived values are fairly different to the other sources.

# HofCalc - using the web app

The web app is available at [http://hofcalc.herokuapp.com](http://hofcalc.herokuapp.com).

HofCalc makes use two key python libraries to [process chemical formulae](https://github.com/xnx/pyvalem), [resolve chemical names](https://github.com/mcs07/PubChemPy) prior to processing. This provides a really convenient interface for users, who as you'll see below, can easily mix and match between these different formats in order to obtain the information they need.

## Formulae and names

### Basic use
The simplest option is to enter the chemical formula or name of the material of interest. Names are resolved by querying [PubChem](https://pubchem.ncbi.nlm.nih.gov/), so common abbreviations for solvents can often be used e.g. DMF.
Note that formulae can be prefixed with a multiple, e.g. 2H2O

| Search term |   Type  | $V_{Hofmann}$ |
|:-----------:|:-------:|:-------------:|
| ethanol     | name    | 69.61         |
| CH3CH2OH    | formula | 69.61         |
| water       | name    | 21.55         |
| 2H2O        | formula | 43.10         |


### Multiple search terms

It is also possible to search for multiple items simultaneously, and mix and match name and formulae by separating individual components with a semicolon. This means that for example, 'amodiaquine dihydrochloride dihydrate' can also be entered as 'amodiaquine; 2HCl; 2H2O'.

|              Search term              | Total $V_{Hofmann}$ |
|:-------------------------------------:|:-------------------:|
| carbamazepine; L-glutamic acid        | 497.98              |
| zopiclone; 2H2O                       | 496.02              |
| C15H12N2O; CH3CH2COO-; Na+            | 419.79              |
| sodium salicylate; water              | 204.21              |
| amodiaquine dihydrochloride dihydrate | 566.61              |
| amodiaquine; 2HCl; 2H2O               | 566.61              |


### More complex examples - hemihydrates

In cases where fractional multiples of search components are required, such as with hemihydrates, care should be taken to check the evaluated chemical formula for consistency with the expected formula.
|                                    Search term              |     Evaluated as     | $V_{Hofmann}$     | Divide by | Expected Volume |
|:-----------------------------------------------------------:|:--------------------:|:-----------------:|:---------:|:---------------:|
| Calcium sulfate hemihydrate                                 | Ca2 H2 O9 S2         | 253.07            | 2         | 126.53          |
| calcium; calcium; sulfate; sulfate; water                   | Ca2 H2 O9 S2         | 253.07            | 2         | 126.53          |
| calcium; sulfate; 0.5H2O                                    | Ca1 H1.0 O4.5 S1     | 126.53            | -         | 126.53          |
| Codeine phosphate hemihydrate                               | C36 H50 N2 O15 P2    | 1006.77           | 2         | 503.38          |
| codeine; codeine; phosphoric acid; phosphoric acid; water   | C36 H50 N2 O15 P2    | 1006.77           | 2         | 503.38          |
| codeine; phosphoric acid; 0.5H2O                            | C18 H25.0 N1 O7.5 P1 | 503.38            | -         | 503.38          |

### Charged species in formulae

Charges could potentially interfere with the parsing of chemical formulae. For example, two ways of representing an oxide ion:
| Search term | Evaluated as |
|:-----------:|:------------:|
| O-2         | 1 x O        |
| O2-         | 2 x O        |

Whilst is is recommended that charges be omitted from HofCalc queries, if including charges in your queries, ensure that the correct number of atoms has been determined in the displayed atom counts or the downloadable summary file. For more information on formatting formulae, see the pyvalem documentation (link in references)


## Temperature

The temperature, $T$ (in kelvin) is automatically included in the volume calculation via the following equation:

$$V = \sum{n_{i}v_{i}}(1 +  \alpha(T - 298))$$

Where $n_{i}$ and $v_{i}$ are the number and Hofmann volume (at 298 K) of the $i$th element in the chemical formula, and $\alpha = 0.95 \times 10^{-4} K^{-1}$.


## Unit cell volume

If the volume of a unit cell is supplied, then the unit cell volume divided by the estimated molecular volume will also be shown.

|   Search term   | $V_{cell}$       | $V_{Hofmann}$  | $\frac{V_{cell}}{V_{Hofmann}}$ |
|:---------------:|:----------------:|:--------------:|:------------------------------:|
| zopiclone, 2H2O | 1874.61          | 496.02         | 3.78                           |
| verapamil, HCl  | 1382.06          | 667.57         | 2.07                           |


## Summary Files

Each time HofCalc is used, a downloadable summary file is produced. It is designed to serve both as a record of the query for future reference and also as a method to sense-check the interpretation of the entered terms, with links to the PubChem entries where relevant.
An example of the contents of the summary file for the following search terms is given below.

 - carbamazepine; indomethacin with T = 293 K and unit cell volume = 2921.6 Å³

```json
{
    "combined": {
        "C": 34,
        "H": 28,
        "N": 3,
        "O": 5,
        "Cl": 1
    },
    "individual": {
        "carbamazepine": {
            "C": 15,
            "H": 12,
            "N": 2,
            "O": 1
        },
        "indomethacin": {
            "C": 19,
            "H": 16,
            "Cl": 1,
            "N": 1,
            "O": 4
        }
    },
    "user_input": [
        "carbamazepine",
        "indomethacin"
    ],
    "PubChem CIDs": {
        "carbamazepine": 2554,
        "indomethacin": 3715
    },
    "PubChem URLs": {
        "carbamazepine": "https://pubchem.ncbi.nlm.nih.gov/compound/2554",
        "indomethacin": "https://pubchem.ncbi.nlm.nih.gov/compound/3715"
    },
    "individual_volumes": {
        "carbamazepine": 303.86,
        "indomethacin": 427.77
    },
    "V_Cell / V_Hofmann": 3.99,
    "Temperature": 293,
    "Hofmann Volume": 731.62,
    "Hofmann Density": 1.35
}
```

# Conclusions

Hofmann's volumes give more accurate estimates of molecular volumes in crystals, and should be used in preference to the 18 Å rule where possible.

To make this easier for people, I've made a web-app that can be used to very rapidly and conveniently obtain these estimates.