<center><a href="https://www.nvidia.com/en-us/training/"><img src="https://dli-lms.s3.amazonaws.com/assets/general/DLI_Header_White.png" width="400" height="186" /></a></center>

# 1b. Exploring Modalities

In the last lab, we learned about two different modes of data: LiDAR data and RGB data. In this lab, we will explore other kinds of data. Multimodal models is a large field of study, and getting practice with a variety of data types will make it easier when facing a new data type.

#### Learning Objectives

The goals of this notebook are to:
* Explore audio data
* Explore CT data

In [None]:
from scipy.io import wavfile
from scipy import fft
import matplotlib.pylab as plt
import matplotlib.animation as animation
import nibabel as nib
import numpy as np

import utils

import IPython
from IPython.core.display import HTML

## 1.1 Audio Data

The first type of data we will explore is audio data. Interestingly, we can use the same neural network techniques we use to analyze images to analyze audio. Let's see how. We'll use SciPy's [wavefile.read](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.io.wavfile.read.html) function to import the data.

In [None]:
rate, data = wavfile.read('data/cat_example-1.wav')

The rate is how many samples per second there are in the audio file. Sound is an analog wave, the higher fidelity its discrete digital representation reflects the original. For example, the highest frequency that can be reliably captured is half the sampling frequency (Nyquist limit). Learn more about it [here](https://en.wikipedia.org/wiki/Sampling_(signal_processing)).

In [None]:
rate  # Hz

Our `data` has two dimensions. The first is the total number of samples taken. The second is the number of channels. Since our sample file is stereo, it has `2` channels: one for each ear.

In [None]:
data.shape

The [WAV file format](https://en.wikipedia.org/wiki/WAV) captures a sound wave's amplitude. Because we have an amplitude sampled at a fixed interval of time, we can calculate the frequencies found in the audio by performing a [Fast Fourier transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform) on small windows of data. Once we have the frequencies, we can create a [Spectrogram](https://en.wikipedia.org/wiki/Spectrogram). 

In [None]:
_ = plt.specgram(data[:,0])

Here, the horizontal axis represents time, the vertical axis represents frequency, and brightness represents the frequency's amplitude. Because we have an image, we can now use a convolutional neural network to analyze it.

By the way, we can write a NumPy array into a `.wav` file using [wavfile.write](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.write.html). If you're in a classroom setting, please lower your volume to be respectful of the people around you. Can you recognize the sound? The answer is in the `...` below. Trying changing the `speed` value below to see how it changes the output.

In [None]:
speed = 1
new_rate = int(rate * speed)
wavfile.write('data/temp.wav', new_rate, data)
IPython.display.Audio('data/temp.wav')

Answer:
It's a cat purring. Thanks to [Mysid](https://en.wikipedia.org/wiki/File:Purr.ogg) for making this available.

## 1.2 CT Scan Data

The second type of data we will explore is CT (often pronounced "cat") data. [CT scans](https://www.mayoclinic.org/tests-procedures/ct-scan/about/pac-20393675) are an imaging tool created by repeatedly taking X-rays at different positions of a body. This kind of data is often represented using the [NIfTI](https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/nifti/format.html) file format. The [NiBabel](https://github.com/nipy/nibabel/tree/master) library provides useful tools to view this data. Let's start by looking at an example file `header`.

In [None]:
path = "data/cat_example-2.nii"

ct_file = nib.load(path)
print(ct_file.header)

The header contains information such as the position offset, the data type, and the size of the data array. Learn more about it [here](https://brainder.org/2012/09/23/the-nifti-file-format/).

The imaging data itself is a 3-dimensional matrix:

In [None]:
ct_matrix = ct_file.get_fdata()
ct_matrix.shape

Let's take a slice of data and view it with [Matplotlib](https://matplotlib.org/)

In [None]:
plt.imshow(ct_matrix[:,:,0], cmap="Greys")

It may be hard to tell, but this is a cross-section of a person's torso. The "C"-shaped object in the image is the bed of the CT machine, so let's rotate this person so they're comfortably laying right side up.

In [None]:
ct_matrix90 = np.rot90(ct_matrix, k=1, axes=(0, 1))
plt.imshow(ct_matrix90[:,:,0], cmap="Greys")

Much better! Next, we can animate each frame of the CT Scan so we can better identify medical anomalies. Let's define a function, `animate_ct_scan` to loop through each element along a specified data axis.

In [None]:
def animate_ct_scan(axis):
    # Generate some sample data for the frames
    frames = ct_matrix90.shape[axis]
    fig, ax = plt.subplots()

    # Define the animation function
    def update(frame):
        ax.clear()
        ct_slice = np.take(ct_matrix90, frame, axis=axis)
        ax.imshow(ct_slice, cmap="Greys")

    ani = animation.FuncAnimation(fig, update, frames=frames, interval=75)

    # Display the animation in Colab
    return HTML(ani.to_jshtml())

In [None]:
animate_ct_scan(2)

This animation moves from the top of this person's head down to the base of their spine. The two white areas that appear are their lungs.

Because the data is 3D, we can move along a different axis. The below will traverse the CT Scan from the left arm to the right.

In [None]:
animate_ct_scan(1)

To complete the set, let's see the data from top to bottom:

In [None]:
animate_ct_scan(0)

Biomedical analysis is one of the largest applications of multi-modal models. Because of their 3-dimensional shape, CT scan data is often analyzed with a neural network architecture called a [U-Net](https://arxiv.org/abs/1505.04597). These U-Nets are an evolution of convolutional neural networks and are used to highlight potentially anomalous regions in the image.

## Next

Congratulations on finishing this lab! There is a lot of interesting data in the world, and we hope this has piqued your interest in learning more. Hopefully by now, the experiments from the previous lab are now complete. Please go back and check it out. Before you do, please run the cell below to free up computational resources.

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<center><a href="https://www.nvidia.com/en-us/training/"><img src="https://dli-lms.s3.amazonaws.com/assets/general/DLI_Header_White.png" width="400" height="186" /></a></center>