# X-ray diffraction (XRD) indexing calculations

*Author: Enze Chen (University of California, Berkeley)*

This Jupyter notebook is meant expose MSE students with little to no Python experience to a programmatic way of indexing a XRD spectra. It walks through the process step-by-step, so I apologize if it appears quite long. I tried to include ample explanations for the scientific computing techniques and how they relate to math and MSE concepts they might otherwise be more familiar with.

## Usage
For pedagogical reasons, there are a few places for the students to fill in their own code in order to make the notebook fully functional. These are delineated with the dotted lines as follows, and you should **only change what's inside**. You don't have to edit the text or code anywhere else.
```python
# ---------------------- #
# YOUR CODE HERE

# ---------------------- #
```
To execute each cell of the notebook and automatically advance to the next cell, press `Shift+Enter`. If you edit the code in a cell, just press `Shift+Enter` to run it again. You have to execute **all** the code cells in this notebook from top to bottom (so don't skip around). A number `[#]` will appear to the left of the code cell once it's done executing.

## But why should I learn this?
Good question! These were my motivations in creating this notebook:   
* Computational tools are becoming **increasingly pervasive** in all sub-fields of MSE.
    * In characterization, experimental data from high-energy physics and 4D-STEM are being generated at the rate of hundreds of GBs per second. The **bottleneck to scientific discovery** is now in processing and analyzing this data.
* Both academia and industry are beginning to view programming as a **core competency**, even if you are "not a programmer."
* Indexing powder XRD patterns is a **routine procedure with several repetitive tasks**, making it a great candidate for programmatic solutions. Write once, run anytime.
    * This also makes powder XRD a **great teaching example** for computationalists as well as experimentalists.
* All the buzz around computer science can make these topics appear overwhelming and exclusive. Hopefully this notebook **lowers that barrier** by a teeny amount.
* The choice of Python (over MATLAB, for example) is because it's **open source**, beginner-friendly, easily integrated with [Jupyter](https://jupyter.org/), and **[insanely popular](https://149351115.v2.pressablecdn.com/wp-content/uploads/2017/09/growth_major_languages-1-1400x1200.png)**.
    * If this notebook gets you excited, there are plenty of great resources and courses out there for learning Python. 
    

## Acknowledgements
I thank my Stanford undergraduate instructors [Prof. Renee Sher](https://www.wesleyan.edu/academics/faculty/msher/profile.html) and [Dr. Arturas Vailionis](https://profiles.stanford.edu/arturas-vailionis) for teaching me XRD. I also thank my advisor [Prof. Mark Asta](https://mse.berkeley.edu/people_new/asta/) for his unwavering encouragement for my education-related pursuits. You can find an interactive version of this notebook on [Google Colaboratory](https://colab.research.google.com/github/enze-chen/enze-chen.github.io/blob/master/files/XRD_indexing.ipynb).

## Important equations
The most important equation is **Bragg's law**, given by 

$$ n\lambda = 2d \sin(\theta) \tag{1} $$

where $n$ is the order (typically $1$), $\lambda$ is the wavelength, $d$ is the interplanar spacing, and $\theta$ is the Bragg angle. 

At the very end, if you're interested in finding the lattice constant to identify the element, you will have to relate the lattice constant to the interplanar spacing. The formula depends on the crystal system, and for cubic systems it is given by

$$ d = \frac{a}{\sqrt{h^2 + k^2 + l^2}} \tag{2} $$

where $a$ is the lattice constant and $h,k,l$ are the Miller indices for the plane.

For more information, please reference [Elements of X-Ray Diffraction (3rd) - Cullity and Stock](https://www.pearson.com/us/higher-education/program/Cullity-Elements-of-X-Ray-Diffraction-3rd-Edition/PGM113710.html).

## Python library imports
These are all the required Python libraries. Like in many other languages (Java, C++, Julia, R), you have to import any special libraries that you want to use in Python. 

[NumPy](https://numpy.org/) is a popular scientific computing library in Python.

[pandas](https://pandas.pydata.org/) is a popular Python library for working with tabular data. Parts of it are built with NumPy. It's pronounced exactly how you think it's pronounced but unfortunately has no relationship to the cute bear.

**TODO**: You have to execute the following code cell with `Shift+Enter` before any of the others.

In [None]:
# In-line comments in Python start with pound signs
# We give these libraries aliases to spare our fingers when we call them later
# The abbrievations shown here are standard in the scientific computing community
import pandas as pd
import numpy as np

## Index the peaks and identify the structure from powder XRD data
Typically in a XRD experiment, you will measure a spectra with peaks and corresponding $2\theta$ values. Along with the X-ray wavelength $\lambda$, these are all you have (and need) to index the peaks and identify the crystal structure, possibly even the material (e.g. if it's a single element).

### 1. Inputs / experimental data
**TODO**: Start by typing in your known values:
* Wavelength: A **float** (decimal) in nanometers, most likely `0.154` corresponding to $\text{Cu-K}\alpha$.
* Angles: Values are $2\theta$ in degrees. The angles should be stored in a **list** (array) in the form 
```python
angles = [1.23, 4.56, 7.89]  
```
for however many $2\theta$ values you've measured. Don't forget to execute the code cell (`Shift+Enter`) when you're done.

In [None]:
# ---------------------- #
# YOUR CODE HERE
wavelength = 
angles = 
# ---------------------- #

### 2. Creating our first DataFrame
The way pandas organizes tabular data is by storing it inside [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) objects. We can create a DataFrame object using its constructor, which takes many possible forms. The constructor used here takes in a **dictionary** made up of several `key:value` pairs as follows:
```python
pd.DataFrame({name_col1:data_col1, name_col2:data_col2,})
```
where `name_colX` is the name of the column and `data_colX` are the values in that column. We save this DataFrame as the `df` variable. 

We want to have a column for the X-ray wavelength and a second column for the $2\theta$ angles. The following code has a column for the wavelength already specified, where the name of the column is `'Wavelength'` and the values are the `wavelength` values from the previous cell.

**TODO**: Add another column in the constructor for the angles. Let's name the new column `'2Theta'` and map it to our `angles` variable that we created.

A handy feature of Jupyter notebooks is that if you write a variable on the **last line** of any code block, that variable will automatically be displayed in a nice format. This is very helpful for debugging and visualizations. No need for `print(df)`.
* *Note*: This is only for Jupyter notebooks and will crash your Python code if you try it elsewhere!

In [None]:
# ---------------------- #
# YOUR CODE HERE
df = pd.DataFrame({'Wavelength':wavelength, })
# ---------------------- #
df  # you should see two columns with your data!

If you're astute, you'll notice that pandas automatically duplicated the wavelength (a scalar) for each angle (stored in an array). This is called [array broadcasting](https://numpy.org/doc/stable/user/theory.broadcasting.html#array-broadcasting-in-numpy) in NumPy and generally must be exercised with extreme caution!

### 3. Numerical calculations
Note that if we assume $n=1$ in Bragg's law, then we have three unknowns. In our case, we know the wavelength and angle, so we can find the interplanar spacing as follows

$$ d = \frac{\lambda}{2 \sin (\theta)} \tag{3} $$

We will try to do this in a very principled fashion. 

#### 3.1 Theta
First, we need to get the $\theta$ values. We can do this by creating a new column in the DataFrame whose entries are computed by dividing the existing column of $2\theta$ values by 2. The syntax for selecting a column in the DataFrame is by using its name as follows:
```python
df['2Theta']
```
To perform division, we can type `/ 2` after the column and store the result into a new `'Theta'` column in the DataFrame. 

**TODO**: Finish the right-hand side of the code below.

In [None]:
# ---------------------- #
# YOUR CODE HERE
df['Theta'] = 
# ---------------------- #
df   # you should now have a third column called 'Theta' that's been added

#### 3.2 Sine
Next, we'll compute the sine of the angle using a function from the NumPy library. Specifically, it is the `np.sin()` function, whose argument inside the parentheses **must be in radians**. If you apply the function on an array of values, it will know to evaluate the sine of each element individually. Fortunately, there is also a `np.radians()` function that will convert an array from degrees to radians for us. 

**TODO**: Write one line of code to add a column named `'Sine'` to your DataFrame. Your code should resembles the following, where you'll have to fill in the blank with the appropriate column from your existing DataFrame.
```python
df['Sine'] = np.sin(np.radians(______))
```

In [None]:
# ---------------------- #
# YOUR CODE HERE

# ---------------------- #
df   # you should now have a fourth column called 'Sine'

#### 3.3 Interplanar spacing
Now you have all the pieces in place for calculating $d$. What's dope about pandas is that you can perform element-wise division of two columns by use of the division operator, `/`. 

**TODO**: Create a new column `df['Distance']` that does exactly what **Equation (3)** says. `df['Wavelength']` and `df['Sine']` should both appear on the right hand side. Don't forget to include parentheses when you multiply the denominator by 2!

In [None]:
# ---------------------- #
# YOUR CODE HERE

# ---------------------- #
df   # you should now have a fifth column called 'Distance'

#### 3.4 Take the ratio
At this point, you know to take the ratio between the first distance and each distance measured. However, I think there's a small "hack" to this. If you do it naively, you'll get numbers like `1.000`, `1.414`, `1.732`, and the like. Unless you know your square root approximations, this can be tricky to decipher. I prefer to **square** the distances first, and then take the ratio. I've done the first part below and stored the squared distances into a sixth column. 

**TODO**: Now you have to create a new column called `'Ratio'` whose values are *the first element* of `'Distance^2'` divided by the entire column `'Distance^2'`. The pandas syntax to access the first element in a column is
```python
df['Distance^2'][0]
```
The resulting column should have integral values or multiples of $\frac{1}{3}$.

In [None]:
df['Distance^2'] = df['Distance']**2   # Creates a sixth column
# ---------------------- #
# YOUR CODE HERE

# ---------------------- #
df   # you should now have a seventh column called 'Ratio'

#### 3.5 Figure out the crystal structure and index the peaks
OK! At this point, the ratios should be enough for you to deduce which cubic crystal structure you have on your hands. This will allow you to index the peaks accordingly. You can probably do this by hand. 

----------------------------------------

If you want to try it programmatically, you can read on. The following cells are completed for you, so you just have to run them, assuming you used the same variable names I suggested. There are probably smarter programmatic ways to do this, but I felt hacky, so we're going to create the following variables to store some constants.

In [None]:
fcc_ratios = np.array([3/3, 4/3, 8/3, 11/3, 12/3, 16/3, 19/3, 20/3])
bcc_ratios = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
sc_ratios = np.array([1, 2, 3, 4, 5, 6, 8, 9, 10])
dc_ratios = np.array([3/3, 8/3, 11/3, 16/3, 19/3, 24/3])
s_to_hkl = {1:'(100)', 2:'(110)', 3:'(111)', 4:'(200)', 5:'(210)', 6:'(211)', 
            8:'(220)', 9:'(221),(300)', 10:'(310)', 11:'(311)', 12:'(222)', 13:'(320)', 14:'(321)',
            16:'(400)', 17:'(410),(322)', 18:'(411),(330)', 19:'(331)', 20:'(420)'}

We'll use the theoretical `ratios` for comparison by finding the difference with our computed ratios and taking the norm of the resulting array. For each structure, we use the `s_to_hkl` dictionary to add the planes to our DataFrame, where 

$$ s \stackrel{\text{def}}{=} h^2 + k^2 + l^2 $$

is what Cullity calls the **quadratic form** of the Miller index. Based on the crystal system, we know exactly what factor to multiply our `df['Ratio']` by, namely the $s$ of the first diffraction plane. (Why?)

In [None]:
n = len(angles)   # How many angles did we actually collect?
                  # Avoids out-of-bounds errors when making comparison.
tol = 1e-2        # ALWAYS compare with a numerical tolerance; never == 0

err_fcc = np.linalg.norm(df['Ratio'] - fcc_ratios[:n])
err_bcc = np.linalg.norm(df['Ratio'] - bcc_ratios[:n])
err_sc  = np.linalg.norm(df['Ratio'] - sc_ratios[:n])
err_dc  = np.linalg.norm(df['Ratio'] - dc_ratios[:n])

if err_fcc < tol:
    print('Structure is likely face-centered cubic.')
    df['s'] = round(df['Ratio'] * 3)   # we use round() to get the nearest integer
    df['Plane'] = [s_to_hkl[s] for s in df['s']]   # the RHS is called a list comprehension
elif err_bcc < tol:
    print('Structure is likely body-centered cubic.')
    df['s'] = round(df['Ratio'] * 2)
    df['Plane'] = [s_to_hkl[s] for s in df['s']]
elif err_sc < tol:
    print('Structure is likely simple cubic.')
    df['s'] = round(df['Ratio'])
    df['Plane'] = [s_to_hkl[s] for s in df['s']]
elif err_dc < tol:
    print('Structure is likely diamond cubic.')
    df['s'] = round(df['Ratio'] * 3)
    df['Plane'] = [s_to_hkl[s] for s in df['s']]
else:
    print('Structure unclear.',
          'Maybe a non-cubic crystal system or a multi-component basis.')

And just for good measure, let's show our final DataFrame with all of our calculations.

In [None]:
df  # ta-da!

## Conclusion
Congratulations on making it to the end! This short notebook doesn't do NumPy or pandas justice, but hopefully it gives you a sneak peek at the power of scientific computing and the impact it can have in materials characterization and MSE more broadly. Or maybe the only utility for you is helping you complete your homework assignments, and that's fine too. If you have any remaining questions or ideas for this and other modules, please don't hesitate to reach out.

## Extensions

### Lattice constants
If you want elemental identification, you will need to calculate the lattice constant. This should be fairly simple since you can pick any of the peaks from your final results and just apply **Equation (2)** from above. This only takes a couple seconds on any calculator. But if you do it with code for all the peak calculations, you can then run some statistics to quantify the variability.

### TEM diffraction
How might you modify the code to index a ring pattern from polycrystalline diffraction? What about a spot pattern for single crystals?

### Peak fitting/identification
Technically, the output of a powder XRD experiment is a spectra, not a set of angles. It's stored as a column of $2\theta$ values followed by a column of $\text{Intensity}$ values. How might code help you extract the positions of the peaks?