# CryoEM Example - Synthetic and EMPIAR

### Context

Electron Microscopy aims to image particles (e.g., proteins, molecules) on a near atomic resolution. However, with great resolution comes low signal-to-noise ratio. As such, computer vision models need to be trained on noisy datasets in order to detect objects.

The structure of the objects being imaged is calculated by averaging hundreds if not thousands of frames of said object in different orientations, rotations and positions. The dataset generated is, consequently, large - usually, the pre-processed images (or with some degree of processing like motion correction) are uploaded onto an online database called EMPIAR.

In this notebook, we use scivision to load memory-friendly synthetic data (AlphabetSoup with a noise filter) and real data from the online database of EM images (EMPIAR), and run them through our pre-trained Object Detection in Images with Noise (odin) model.

## Import and configure packages

In [1]:
import matplotlib.pyplot as plt
import scivision

from matplotlib.colors import LogNorm
from matplotlib.patches import Rectangle

ModuleNotFoundError: No module named 'scivision'

In [None]:
# matplotlib settings
plt.rcParams["figure.figsize"] = (16,16)
plt.rcParams["image.cmap"] = "gray"

## Load the pretrained ODIN model

The model for the object detection in noisy images is build with a FastRCNN, and pre-trained on the synthetic dataset. With Scivision, we can load this pretrained model simply with one line:

In [None]:
# TODO: make odin an installable package
model = scivision.load_pretrained_model("https://github.com/alan-turing-institute/odin")

## Synthetic data example

The repository containing this notebook itself provides a **Scivision datasource**, describing some synthetic data, which we use in the first example.

Load the datasource contained in this directory:

In [None]:
cat = scivision.load_dataset("./")

Inspecting its contents, we notice that there are two entries: For this first example, we will use `synthetic_soup`:

In [None]:
list(cat.keys())

In [None]:
cat.synthetic_soup.description

In [None]:
images, bounding_boxes, labels = cat.synthetic_soup.read()

In [None]:
plt.imshow(images[0])

Highlight bounding boxes (ground truth):

In [None]:
def box_to_patch(xmin, ymin, xmax, ymax):
    return Rectangle(
        xy=(xmin, ymin),
        height=ymax - ymin,
        width=xmax - xmin,
        linewidth=1,
        edgecolor=(0,1,0),
        facecolor='none',
    )


plt.imshow(images[0])

ax = plt.gca()
for bbox in bounding_boxes[0]:
    ax.add_patch(box_to_patch(*bbox))

In [None]:
bounding_boxes_synthetic_pred = model.predict(images[0])

## EMPIAR example

We now look at the second entry in the catalog, `one_empiar_dataset`, which points to an entry in the [EMPIAR](https://www.ebi.ac.uk/empiar/) archive.

In this example, we load the entry 10050, which has the micrographs (i.e., the EM-acquired images) of the small protein complex Prx3.

![10050-l.gif](attachment:10050-l.gif)
**Figure 1.** Structure of the protein complex Prx3 obtained after averaging the raw data present in the EMPIAR 10050 entry.

This entry turn contains several sets of images, one of which we load:

In [None]:
keys = list(cat.one_empiar_dataset.keys())
keys

In [None]:
empiar_10050 = cat.one_empiar_dataset["VPP_Prx3_7.3res"]
empiar_10050

For our example, we want to load just one image from the dataset.  This is done with [`read_partition()`](https://intake.readthedocs.io/en/latest/roadmap.html?highlight=read_partition#reader-api).

In [None]:
image = empiar_10050.read_partition(0)

In [None]:
plt.imshow(image.sel(frame=2), norm=LogNorm())

In [None]:
bounding_boxes_empiar_pred = model.predict(image)

In [None]:
# TODO plot bounding boxes