# X-ray diffraction (XRD) indexing calculations

*Author: Enze Chen (University of California, Berkeley)*

![Powder XRD spectra](https://raw.githubusercontent.com/enze-chen/learning_modules/master/fig/XRD_unlabeled.png)

This Jupyter notebook is meant expose MSE students **with little to no Python experience** to a programmatic way of indexing a XRD spectra. 
It walks through the process step-by-step, so it is quite long. 
I tried to include ample explanations for the scientific computing techniques and how they relate to math and MSE concepts. 
If you already know another language like MATLAB or R, you should see some similarities. 
At the end, your results will be shown in a **table**, not on a spectra like the image above.

## Prerequisites

To get the most out of this notebook, you should already have:

* Familiarity with how XRD physically works and how to use Bragg's law to index peaks by hand.

## Learning goals

By the end of this notebook, you should be able to:

* *Identify* basic scientific computing libraries and operations in Python.
* *Write* Python code to index peaks given your own data.

## How to run this notebook

If you are viewing this notebook on [Google Colaboratory](https://colab.research.google.com/github/enze-chen/learning_modules/blob/master/mse/XRD_indexing.ipynb), then all the software is already set up for you (hooray). If you want to run the notebook locally, make sure all the Python libraries in the [`requirements.txt`](https://github.com/enze-chen/learning_modules/blob/master/requirements.txt) file are installed.

For pedagogical reasons, there are a few places for you to fill in your own code in order to make the notebook fully functional. These are delineated with the dashed lines as follows, and you should **only change what's inside**. You don't have to edit the text or code anywhere else. I've also included "**TODO**" to separate the background context from the actual instructions.

```python
# ----------  YOUR CODE HERE  ---------- #

# -------------------------------------- #
```

To execute each cell of the notebook and automatically advance to the next cell, press `Shift+Enter`. If you edit the code in a cell, just press `Shift+Enter` to run it again. You have to execute **all** the code cells in this notebook in order from top to bottom (so don't skip around). A number `[#]` will appear to the left of the code cell once it's done executing.

When done successfully, you'll be able to generate a table with all the relevant calculations at the end.

------------------------------------------------------------------------

## Introduction and motivation

### Why should I learn this?

These were my motivations in creating this notebook:   
* Computational tools are becoming **increasingly pervasive** in all sub-fields of MSE.
    * In characterization, experimental data from high-energy physics and 4D-STEM are being generated at the rate of hundreds of GBs per second. The **bottleneck to scientific discovery** is now in processing and analyzing this data.
* Both academia and industry are beginning to view programming as a **core competency**, even if you are "not a programmer."
* Indexing powder XRD patterns is a **routine procedure with several repetitive tasks**, making it a great candidate for programmatic solutions. Write once, run anytime.
* All the buzz around computer science can make these topics appear overwhelming and exclusive. Hopefully this notebook **lowers that barrier** by a teeny amount.
* The choice of Python (over MATLAB, for example) is because it's **open source**, [beginner-friendly](https://xkcd.com/353/), easily integrated with [Jupyter](https://jupyter.org/), and **[insanely popular](https://149351115.v2.pressablecdn.com/wp-content/uploads/2017/09/projections-1-1024x878.png)**.
    * If this notebook gets you excited, there are plenty of great resources and courses out there (e.g., CS 61) for learning pure Python.
    * MSE 215: Computational Materials Science at UC Berkeley is a course that uses Python.
    * DATA 100 and PHYSICS 188 also teach you Python and applied mathematics / data science.

### Important equations
The most important equation is **Bragg's law**, given by 

$$ n\lambda = 2d \sin(\theta) \tag{1} $$

where $n$ is the order (typically $1$), $\lambda$ is the wavelength, $d$ is the interplanar spacing, and $\theta$ is the Bragg angle. 

At the very end, if you're interested in finding the lattice constant to identify the element, you will have to relate the lattice constant to the interplanar spacing. The formula depends on the crystal system, and for cubic systems it is given by

$$ d = \frac{a}{\sqrt{h^2 + k^2 + l^2}} \tag{2} $$

where $a$ is the lattice constant and $h,k,l$ are the Miller indices for the plane.

For more information, please reference [Elements of X-Ray Diffraction (3rd) - Cullity and Stock](https://www.pearson.com/us/higher-education/program/Cullity-Elements-of-X-Ray-Diffraction-3rd-Edition/PGM113710.html).

## Python library imports
These are all the required Python libraries. Like in many other languages (Java, C++, Julia, R), you have to import any special libraries before you can use their functions in your code.

* [NumPy](https://numpy.org/) is a popular scientific computing library in Python.

* [pandas](https://pandas.pydata.org/) is a popular Python library for working with tabular data. Parts of it are built with NumPy. It's pronounced exactly how you think it's pronounced but unfortunately has no relationship to the cute bear.

**TODO**: You have to execute the following code cell with `Shift+Enter` before any of the others.

In [None]:
# In-line comments in Python start with pound signs
# We assign aliases to save characters/space when we call them later; this is standard practice
import numpy as np
import pandas as pd

## Index the peaks and identify the structure from powder XRD data

Typically in an XRD experiment, you will measure a spectra which features peaks and corresponding $2\theta$ values. 
Along with the X-ray wavelength ($\lambda$), these are all you have (and need) to index the peaks and identify the crystal structure, possibly even the material (e.g. if it's a single element).

### 1. Inputs / experimental data

**TODO**: Start by typing in your known values:
* Wavelength: A **float** (decimal) in *nanometers*, most likely `0.154` corresponding to $\text{Cu-K}\alpha$.
* Angles: Values are $2\theta$ in *degrees*. The angles should be stored in a **list** (array) in the form 
```python
angles = [1.23, 4.56, 7.89]  
```
for however many $2\theta$ values you've measured. Don't forget to execute the code cell (`Shift+Enter`) when you're done.

If you don't have your own data, you can use the sample data [in this file](https://github.com/enze-chen/learning_modules/blob/master/data/xrd_peaks_CuKa.csv).

In [None]:
# ----------  YOUR CODE HERE  ---------- #

wavelength =    # TODO: wavelength in nm as a float
angles =        # TODO: 2theta angles in degrees

# -------------------------------------- #

### 2. Creating our first DataFrame

The way pandas organizes tabular data is by storing them inside [**DataFrame**](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) objects. 
We can create a DataFrame object using its constructor, which takes many possible forms. 
The constructor used here takes in a **dictionary** made up of several `key:value` pairs as follows:

```python
pd.DataFrame({name_col_1:data_col_1, name_col_2:data_col_2, ...})
```

where `name_col_#` is the name of the column and `data_col_#` are the values in that column. 
Note how we use the `pd` alias to reference pandas and call the DataFrame constructor.
We save this DataFrame as the `df` variable. 

We want to have a column for the X-ray wavelength and a second column for the $2\theta$ angles. The following code has a column for the wavelength already specified, where the name of the column is `'Wavelength'` and the values are the `wavelength` values from the previous cell.

**TODO**: Add another column in the constructor for the angles. Let's name the new column `'2Theta'` and map it to our `angles` variable that we created.

A handy feature of Jupyter notebooks is that if you write a variable on the **last line** of any code block, that variable will automatically be displayed in a nice format. This is very helpful for debugging and visualizations. No need for `print(df)`.
* *Note*: This is only for Jupyter notebooks and might crash your Python code if you try it elsewhere!

In [None]:
# ----------  YOUR CODE HERE  ---------- #

df = pd.DataFrame({'Wavelength':wavelength, })   # TODO: add an entry for angles

# -------------------------------------- #
df  # you should see two columns with your data!

If you're astute, you'll notice that pandas automatically duplicated the wavelength (a scalar) for each angle (stored in an array). This is called [array broadcasting](https://numpy.org/doc/stable/user/theory.broadcasting.html#array-broadcasting-in-numpy) in NumPy and generally must be exercised with extreme caution!

### 3. Numerical calculations
Note that if we assume $n=1$ in Bragg's law, then we have three unknowns. 
In our case, we know the wavelength and angle, so we can find the interplanar spacing as follows:

$$ d = \frac{\lambda}{2 \sin (\theta)} \tag{3} $$

We will try to do this in a very principled fashion. 

#### 3.1 Find $\theta$ 

First, we need to get the $\theta$ values. 
We can do this by creating a new column in the DataFrame whose entries are computed by dividing the existing column of $2\theta$ values by 2. 
The syntax for selecting an existing column in the DataFrame is by using its name as follows:

```python
df['2Theta']
```

To perform division, we can type `/ 2` after the column and store the result into a new `'Theta'` column in the DataFrame. By writing 

```python
df['Theta']
```

on the left-hand side of the `=` sign, we're automatically creating a new column that we're about to assign values to.

**TODO**: Finish the right-hand side of the code below.

In [None]:
# ----------  YOUR CODE HERE  ---------- #

df['Theta'] =    # TODO: calculate theta

# -------------------------------------- #
df   # you should now have a third column called 'Theta' that's been added

#### 3.2 Take the sine

Next, we'll compute the sine of the $\theta$ values using a function from the NumPy library. 
Specifically, it is the `np.sin()` function, whose argument inside the parentheses **must be in radians**. 
If you apply the function on an array of values, it will know to evaluate the sine of each element individually. 
Luckily, there is also a `np.radians()` function that will convert an array from degrees to radians for us. 

**TODO**: Write one line of code to add a column named `'Sine'` to your DataFrame. 
Your code should resemble the following, where you'll have to fill in the blank with the appropriate column from your existing DataFrame.

```python
df['Sine'] = np.sin(np.radians(______))
```

In [None]:
# ----------  YOUR CODE HERE  ---------- #

# TODO: Add a column for the sine of the angle

# -------------------------------------- #
df   # you should now have a fourth column called 'Sine'

#### 3.3 Calculate the interplanar spacing, $d$
Now you have all the pieces in place for calculating $d$. What's dope about pandas is that you can perform element-wise division of two columns with the same dimensions by use of the division operator, `/`. 

**TODO**: Create a new column `df['Distance']` that does exactly what **Equation (3)** says. `df['Wavelength']` and `df['Sine']` should both appear on the right hand side. Don't forget to include parentheses when you multiply the denominator by 2!

In [None]:
# ----------  YOUR CODE HERE  ---------- #

# TODO: Calculate the interplanar spacing


# -------------------------------------- #
df   # you should now have a fifth column called 'Distance'

#### 3.4 Take the ratio of $d^2$ values

At this point, you know to take the ratio between the first distance and each distance measured. 
However, I think there's a small "hack" to this. 
If you do it naively, you'll get numbers like `1.000`, `1.414`, `1.732`, and the like. 
Unless you know your square root approximations, this can be tricky to decipher. 
I prefer to **square** the distances first, and then take the ratio. 
I've done the first part below and stored the squared distances into a sixth column. 

**TODO**: Now you have to create a new column called `'Ratio'` whose values are *the first element* of `'Distance^2'` divided by *the entire column* `'Distance^2'`. The pandas syntax to access the first element in a column is

```python
df['Distance^2'][0]
```

The resulting column should have values that are close to integers or multiples of $\frac{1}{3}$.

In [None]:
df['Distance^2'] = df['Distance']**2   # Creates a sixth column

# ----------  YOUR CODE HERE  ---------- #

# TODO: Take the ratio

# -------------------------------------- #
df   # you should now have a seventh column called 'Ratio'

#### 3.5 Figure out the crystal structure and index the peaks

OK! At this point, the ratios should be enough for you to deduce which crystal structure you have on your hands. 
This will allow you to index the peaks accordingly. 
You can probably do this part by hand. 

## Extensions

### Solving for the crystal structure

To programmatically solve for the crystal structure, you will have to pattern match the ratios you obtained with known ratios for the simple crystal structures.
This is slightly tedious so I've omitted it here, but not anymore difficult to do than what we've already done.


### Lattice constants

If you want elemental identification, you will need to calculate the lattice constant. 
This should be fairly simple since you can pick any of the peaks from your final results and just apply **Equation (2)** from above. 
This only takes a couple seconds on any calculator. 
But if you do it with code for all the peak calculations, you can then run some statistics to quantify the variability.


### TEM diffraction

How might you modify the code to index a ring pattern from polycrystalline diffraction? What about a spot pattern for single crystals?


### Peak fitting/identification

Technically, the output of a powder XRD experiment is an entire spectra, not a set of angles for where the peaks are. 
It's stored as a column of $2\theta$ values followed by a column of $\text{Intensity}$ values. 
How might code help you extract the positions of the peaks?

----------------------------------

## Conclusion

Congratulations on making it to the end! 
This short notebook doesn't do NumPy or pandas justice, but hopefully it gives you a sneak peek at the power of scientific computing and the impact it can have in materials characterization and MSE more broadly. 
Or maybe the only utility for you is helping you complete your homework assignments, and that's fine too. 
If you have any remaining questions or ideas for this and other modules, please don't hesitate to reach out.

## Acknowledgements

I thank Han-Ming Hau for helpful discussions and my Stanford undergraduate instructors [Prof. Renee Sher](https://www.wesleyan.edu/academics/faculty/msher/profile.html) and [Dr. Arturas Vailionis](https://profiles.stanford.edu/arturas-vailionis) for teaching me XRD. 
I also thank my advisor [Prof. Mark Asta](https://mse.berkeley.edu/people_new/asta/) for his unwavering encouragement for my education-related pursuits. 
This interactive project is generously hosted on [GitHub](https://github.com/enze-chen/learning_modules) and [Google Colaboratory](https://colab.research.google.com/github/enze-chen/learning_modules/blob/master/mse/XRD_indexing.ipynb).