[piezo]: https://en.wikipedia.org/wiki/Piezoelectric_sensor
[LTS5]: https://lts5www.epfl.ch/
[antenna arrays]: https://en.wikipedia.org/wiki/Phased_array
[DAS]: https://www.semanticscholar.org/paper/Coherent-array-imaging-using-phased-subarrays.-Part-Johnson-Karaman/f2f05db5d4ad3635e8744381df45cacfb97453b0/figure/0
[Tanter and Fink]: https://ieeexplore.ieee.org/ielx7/58/6689765/06689779.pdf?tp=&arnumber=6689779&isnumber=6689765&ref=aHR0cHM6Ly93d3cuZ29vZ2xlLmNoLw==
[directivity]: https://en.wikipedia.org/wiki/Directivity
[Ronneberger et al.]: https://arxiv.org/abs/1505.04597
[Perdios et al.]: https://ieeexplore.ieee.org/abstract/document/8580183
[residual]: https://ieeexplore.ieee.org/ielx7/42/8124116/07947200.pdf?tp=&arnumber=7947200&isnumber=8124116&ref=aHR0cHM6Ly9pZWVleHBsb3JlLmllZWUub3JnL2Fic3RyYWN0L2RvY3VtZW50Lzc5NDcyMDA=
[Montaldo et al.]: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4816058

## Introduction
In our research at [LTS5][LTS5] we analyse using deep learning for the purpose
of medical ultrasound (US) image reconstruction with a strong emphasize on 
convolutional neural networks (NNs). In this hands-on session we will share 
some of our insights and results, while also discussing some very important and 
hopefully useful general basics in deep learning.

In the next few paragraphs we provide you with some context about ultrasound
image reconstruction and our reasoning why we think deep learning might be very 
useful for US image reconstruction. Ultrasound experts and deep learning
purists can easily skip all paragraphs of the introduction except for the last
one, where we describe the task of the hands-on session.  

### Ultrasound basics

US imaging is a widely used medical imaging modality, which is relatively 
cheap, highly-flexible (potentially portable) and usually non-invasive. 
Additionally, in strong contrast to other widely used medical imaging 
modalities such as MRI and CT, it is real-time capable.

To reconstruct a US image, pulse-echo measurements are conducted where 
US waves are transmitted into the tissue of interest and the reflections due to  
local acoustic impedance inhomogeneities(声阻抗不均匀性) are measured. The time of flight
between sending a wave and receiving its echo provides information about
the positions of the reflector and the intensity of the reflection gives
information about the reflectors density.

For transmitting and receiving waves, US probes (transceiver) are used, which 
consists of an array of [piezoelectric][piezo] transducer elements, that allow 
to generate pressure waves (sound) from electrical signals, as well as the 
reciprocal process. Arrays of piezoelectric(压电) elements are used for beamforming, 
i.e. to focus on specific points inside the tissue and to steer the waves along 
different angles, analogous to [antenna arrays][antenna arrays]. Amongst the 
most widely used US probe types are the linear array, the convex array and the 
phased array.

![](figures/transducers.jpg)

Traditionally, a focused-wave scheme is used to sample the full region of 
interest (ROI), which means that the ROI is split into sub-regions and
for each a focused pulse-echo measurement is conducted.

![](figures/conventional_trans.png)![](figures/conventional_rec.png)

After $N$ pulse-echo measurements (one for each sub-region) we end up with $N$ 
voltage vs. time series at each transducer element. To reconstruct an image 
from the measurement data usually the [delay-and-sum][DAS] (DAS) method is 
used, which reconstructs each single point of the image by first shifting each 
elements voltage vs. time series in time (delay) by an amount corresponding to 
the physical distance between said element and the point inside the tissue. 
Then the shifted time series of each transducer element are summed to get 
the value of the image point. In the example US image below a convex array was 
used, which is why the image has the famous conic shape. 

![](figures/echographybaby.jpg)
  
### Ultrafast ultrasound imaging

In ultrafast US imaging ([Tanter and Fink][Tanter and Fink]) an 
unfocused plane or diverging wave (PW/DW) is sent into the tissue, which 
thereby insonifies the entire ROI at once. The reflections coming from all 
over the ROI are measured simultaneously using the transducer elements and are 
then combined using the delay-and-sum (DAS) method to form a full US image from 
a single pulse-echo measurement.

![](figures/ultrafast_trans.png)![](figures/ultrafast_rec.png)
 
Thus, compared to conventional imaging, where many pulse-echo measurements have 
to be conducted to reconstruct an image, only a single measurement is necessary 
to form a full US image. This has the potential to drastically increase the 
achieved frame-rate (remember: US imaging is real-time), the time 
resolution (potentially highly accurate velocity measurements) and lower the 
energy consumption drastically. Entirely new imaging modes have been unlocked 
by this new approach, such as:
 
 - Shear wave elastography
 - Functional US imaging
 - High-sensitivity vector-flow imaging

Unfortunately, mainly because of the fact that overall a much smaller amount 
of energy is used to insonify the tissue, an image reconstructed from a 
single ultrafast US measurement is usually of rather low quality (LQ). Below 
we show some simulated LQ example images reconstructed using single PW 
ultrafast imaging compared to their high-quality (HQ) counterparts. 
 
 ![](figures/lq_examples.png)
 
Most typical artifacts found in US images are strongly related to the 
[directivity][directivity] of the used probe array, thus:
 
 - Higher side lobes -> lower contrast
 - Broader main lobe -> lower resolution
 - Potential grating lobe (~array sampling) -> devastates entire image regions
  - Ghost artifact -> Can lead to misinterpretations
 
One image feature that may be viewed as an image artifact are the 
intensity fluctuations called "speckle", which are very current
in ultrasound images. However, different to noise, speckle actually contains
significant information about sub-resolution particles and thus can be used
for post-processing, such as blood flow measurement. Therefor, it is 
interesting for image reconstruction methods to preserve this pattern.
 
### From LQ to HQ image

There are several methods to augment the quality of the LQ images obtained
by ultrafast US imaging. The state-of-the-art method is called coherent
compounding ([Montaldo et al.][Montaldo et al.]), where, instead of a single 
one, several pulse-echo measurements are conducted. The PWs of the different 
pulse-echo measurements are steered at a different angle (using array 
properties) to lower the side-lobe influence. The resulting images are summed 
coherently. Using a sufficient amount of PWs (usually >30) results in high 
quality images. 

While the number of transmission used in coherent compounding is significantly 
lower than in conventional US imaging, it still sacrifices high
frame-rate for image quality. This works against the original strength of 
ultrafast US imaging. So, clearly, it would be interesting to find
a method of augmenting image quality without having to have a large number of 
transmissions.

### About Deep learning

When should we use it? The first question one should aks himself is do
I really need a deep learning, i.e. is there no state-of-the-art method that 
can provide what I want? Because of the "black-box" nature of deep learning it 
comes with a huge need for validation and testing, that is generally more
demanding than using more classical alternatives. Secondly, using deep learning
always comes with the need of a large, high-quality dataset, which is often
not simple or even impossible to acquire.

In biomedical imaging this is probably even more difficult than in other 
application areas of deep learning. A proper dataset means that in does not 
only have to be large (generally the more samples the better), it also needs to 
be of good quality. This means that the samples in the dataset must have a high 
diversity, such that they properly represent the distribution of all possible 
samples.

A first approach here would be to have an in-vivo dataset, i.e. real
US data gather from patients. This comes with the need for the proper 
equipment, a large number of voluntary participants and trained personal
that have the time to acquire thousands of images. It is imperative to have a 
lot of different patients, to achieve high diversity in the dataset. 

Secondly, one could think about an in-vitro dataset, i.e. a dataset constructed
by imaging phantoms in the laboratory. This comes with an even higher
need for equipement (various phantoms), but does not need any participants.
Again achieving a high diversity is quite hard, because to do so a plethora of 
different in-vitro phantoms must be used for data generation.

The last approach would be to simulate the data. The huge advantage here is 
that, if a proper simulator is at hand one can generate an amount of images 
that  is only limited by the available time. Additionally, both equipment 
requirements and working hours can be kept minimal. Achieving high
diversity in simulated data can easily be achieved by introducing randomness
in the generated images. However, one has to be extra-careful when using 
simulated data in deep learning  and needs to constantly test the performance 
of trained networks on in-vivo and in-vitro test data, to be sure no errors are 
introduced because of simulation.

### Task description

To summarize we saw that using ultrafast US imaging we can obtain a LQ
US image from a single pulse-echo measurement. Unfortunately, the images
are of rather poor quality. While methods exist to augment the image quality of 
ultrafast US images, they always compromise on one of the key strengths of it, 
e.g. its high frame-rate in the case of coherent compounding. Obviously, we'd 
love to find a method with which we can augment the achieved image quality 
without having to do additional measurements. 

In this session we will use deep learning, specifically a convolutional NN, 
to augment the quality of said LQ images, obtained by ultrafast US imaging. 
For training we purely rely on simulated US images, generated using an in-house 
simulator. The input to our network are the LQ images. As reference (label) 
images we use HQ images obtained from synthetic aperture (SA) beamforming, 
which is an extreme variant of the earlier described coherent compounding 
method.

The network architecture ([Perdios et al.]) we will use is a variant of the 
famous U-Net ([Ronneberger et al.][Ronneberger et al.]), adapted for image 
reconstruction.

 ![](figures/res_unet.png)
 
At this point we want to emphasize on two main differences of our architecture
and the conventional U-Net. The first difference is that we use "same" padding
in all convolutional layers instead of "valid" padding. The reason for this is 
simple. While in segmentation (original use of the U-Net) it may be feasible 
to only analyze the center part of the image, it is not feasible for image 
reconstruction, since a large part (the border) of the network input image 
would be lost.
 
The second significant difference is the use of a residual connection, which 
consists in adding the input image to the output of the last convolutional 
layer inside the network. This way, our network tries to learn only the 
difference between the input and the reference (label) image. This is a very
useful technique for cases where network input and output are inherently 
similar (e.g. denoising tasks). Residual connections are a well-known tool that 
augment learning performance of NNS (e.g. [Hu Chen et al.][residual]). 
They can, however, only be applied if the input and output of the network have 
the same dimensions.
 
Now, you should know everything you have to know to start some real hands-on. 

Enjoy!


### Data exploration awaits you in the [next notebook][next].

[next]: dlssus_data_exploration.ipynb

Or you can go back to the [outline].

[outline]: dlssus_main.ipynb