In [None]:
!pip install pandas matplotlib 

# Example: MP588 (Radiation Production and Detection)

---

## Spectrum Visualization and Finding Peaks
Python has powerful tools for efficiently manipulating, analyzing, and visualizing data. We'll illustrate some of these capabilities for a common task in your medical physics coursework - visualizing and analyzing spectral data. 


#### Import useful packages

In [None]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt

#### Read in data from .csv file
[**Pandas**](https://pandas.pydata.org/) is a Python package that we can use to interact with tabular data. **Pandas** stores data in a structure called a ```DataFrame```, which is essentially a table with rows and columns. This is the main package we will use to interact with spreadsheets (Excel files, csv files, etc) in Python.

We can use the ```pd.read_csv``` function to read our csv file into a pandas DataFrame. Using the ```.head()``` method on this dataframe, we can print the first 5 elements. Similarly, we can use the ```.tail()``` method to print the last 5 elements.

In [None]:
path = 'data/spectrum.csv'

## If you are using google colab, uncomment this:
# from google.colab import drive
# drive.mount('/content/drive')
# %cd /content/drive/MyDrive/med-phys-python-bootcamp

spectrum = pd.read_csv(path)
print(f"The spectrum variable has a type: {type(spectrum)}")

print(f"\nThe first 5 rows of the spectrum data are: \n{spectrum.head()}")
print(f"\nThe last 5 rows of the spectrum data are: \n{spectrum.tail()}")

To retrieve a specific column from a pandas DataFrame, we can call the name of the column using square brackets.

In [None]:
all_counts = spectrum['Counts']

# The .max() method gives us the maximum value of a column
max_count = all_counts.max()

print(f"The maximum number of counts recorded in a single channel is: {max_count}")

## Data visualization
We can use the [**matplotlib**](https://matplotlib.org/) library to visualize our spectrum data. We've imported the **matplotlib.pyplot** package under the alias **plt**, so we can call these functions using **plt** as shown in the below cells.

First, let's use the ```help``` function to see the documentation for ```plt.plot()```, one of matplotlib's main functions for generating plots.

In [None]:
# Show the documentation for plt.plot()
help(plt.plot)

In [None]:
# Generate a matplotlib figure
plt.figure(figsize=(10, 6))

# Plot spectral counts vs. energy
plt.plot(spectrum['Energy (keV)'], spectrum['Counts'])
plt.xlabel('Energy (keV)')
plt.ylabel('Counts')
plt.title('My Spectrum')

plt.show()

Great! We have a plot of the data in our csv file. However, we would like to show only the most relevant energy levels. Let's try changing this plot to constrain the x-axis between 0 and 1000 keV. We'll do this by calling the ```plt.xlim``` function.

In [None]:
# Show the documentation for plt.xlim()
help(plt.xlim)

In [None]:
# Generate a matplotlib figure
plt.figure(figsize=(10, 6))

# Plot spectral counts vs. energy
plt.plot(spectrum['Energy (keV)'], spectrum['Counts'])
plt.xlabel('Energy (keV)')
plt.ylabel('Counts')
plt.title('My Spectrum')

plt.xlim(0, 1000)

plt.show()

Fantastic. Now, we would like to change the scale on the y-axis. A log scale would better show the distribution of count values. We do this in matplotlib using the ```plt.semilogy``` function instead of ```plt.plot```.

In [None]:
# Show the documentation for plt.semilogy()
help(plt.semilogy)

In [None]:
# Generate a matplotlib figure
plt.figure(figsize=(10, 6))

# Plot spectral counts vs. energy
plt.semilogy(spectrum['Energy (keV)'], spectrum['Counts'])
plt.xlabel('Energy (keV)')
plt.ylabel('Counts')
plt.title('My Spectrum')

plt.xlim(0, 1000)

plt.show()

Great! Now that our spectrum is displayed properly, let's use thresholding to find the peaks of this spectrum. Upon visual inspection, we see there are 4 major peaks. Let's use a threshold value of $10^3$ counts for the [0, 600] keV range, and let's use a threshold value of $10^2$ counts for the [600, 1000] keV range. We will use another package, **scipy**, to perform our thresholding.

In [None]:
# Generate a matplotlib figure
plt.figure(figsize=(10, 6))

# Plot spectral counts vs. energy
plt.semilogy(spectrum['Energy (keV)'], spectrum['Counts'])
plt.xlabel('Energy (keV)')
plt.ylabel('Counts')
plt.title('My Spectrum')

plt.xlim(0, 1000)

plt.axhline(y=10**3, xmin=0/1000, xmax=600/1000, linestyle='--', color='red')
plt.axhline(y=10**2, xmin=600/1000, xmax=1000/1000, linestyle='--', color='red')

plt.show()

In [None]:
# Import the find_peaks function from scipy
from scipy.signal import find_peaks

# First, filter our original dataframe into two dataframes according to each energy level range
first_energy_range_df = spectrum[(spectrum['Energy (keV)'] > 0) & (spectrum['Energy (keV)'] <= 600)].reset_index()
second_energy_range_df = spectrum[(spectrum['Energy (keV)'] >= 600) & (spectrum['Energy (keV)'] <= 1000)].reset_index()

# Next, we will use scipy to find the peaks in each of our new dataframes
first_thresh_indices_scipy, _ = find_peaks(first_energy_range_df['Counts'], height=10**3)
second_thresh_indices_scipy, _ = find_peaks(second_energy_range_df['Counts'], height=10**2)

first_range_thresh_rows = first_energy_range_df.loc[first_thresh_indices_scipy]
second_range_thresh_rows = second_energy_range_df.loc[second_thresh_indices_scipy]

# Finally, we can combine the thresholded rows into a single dataframe
threshold_spectrum = pd.concat([first_range_thresh_rows, second_range_thresh_rows])

print(f"The peak values that meet our threshold criteria are:\n{threshold_spectrum}")

In [None]:
# Generate a matplotlib figure
plt.figure(figsize=(10, 6))

# Plot spectral counts vs. energy
plt.semilogy(spectrum['Energy (keV)'], spectrum['Counts'])
plt.xlabel('Energy (keV)')
plt.ylabel('Counts')
plt.title('My Spectrum')

plt.xlim(0, 1000)

# Visualize the peaks
plt.semilogy(threshold_spectrum['Energy (keV)'], threshold_spectrum['Counts'], 'ro')

plt.show()