In [1]:
from imgseries import ImgSeries
from imgseries.analysis import Analysis, Formatter
from imgseries.analysis import PandasFormatter, PandasTsvJsonResults
from imgseries.viewers import AnalysisViewer

import pandas as pd
import matplotlib.pyplot as plt

%matplotlib tk

# Create custom analysis

Here, we show how to create a custom analysis tool for image series.
We will use a minimal example, where the analysis simply consists in extracting the minimum and maximum pixel value in each image.

Two classes have to be created, deriving from the following classes:
- `Analysis`: defines the calculation that is made on each image,
- `Formatter`: defines how to format and store the analysis (e.g. in a pandas dataframe).

There are also optional classes that can be subclassed:
- `Results` to access the results and analysis metadata, and potentially store them into (and load them from) files
- `AnalysisViewer` class can be defined in order to show live analysis and inspect results afterwards.

Below we will show step by step how to construct the analysis classes, using a simple image sequence to analyze:

In [2]:
images = ImgSeries('../data/img1')
images

ImgSeries, data length [30]
-- corrections: []
-- transforms: []
from FileSeries in . / ['../data/img1'], 30 files]

## 1) Define how the analysis is made (on a single image): `Analysis`

In [3]:
class BasicMinMax(Analysis):
    """Analysis of max pixel value in imgseries"""

    # If results are independent (results from one num do not depend from
    # analysis on other nums), one do not need to re-do the analysis when
    # asking for the same num twice, and parallel computing is possible
    independent_results = True

    def _analyze(self, img):
        """What to do on the image. Must return a dict of data"""
        val_min = img.min()
        val_max = img.max()
        return {'min': val_min, 'max': val_max}

Now the analysis can be tested on any image of the image sequence, identified by its number (index) in the sequence (`num`):

In [4]:
minmax = BasicMinMax(images)
minmax.analyze(num=20)

{'min': 27, 'max': 255, 'num': 20}

Note that the info about the image number (`num`) was automatically added in the data dictionnary.

## 2) Define how to store the sequence of results in a table or data structure: `Formatter`

### 2.1) General formatter

This is the role of the `Formatter` class. This class must define three obligatory methods and 2 optional:

**Obligatory**
- `_prepare_data_storage()`: How to create the data structure
- `_store_data()`: How to include the raw analysis data generated by `_analyze()` (see above) in the data structure
- `_to_results_data()`: return final data structure that will be stored in an `analysis.Results` object.

**Optional**
- `_regenerate_analysis_data()`: Basically the inverse function to `_store_data()` (useful for live viewing, see Viewers below)
- `_to_metadata()`: return metadata to be saved

In [5]:
class MinMaxFormatter(Formatter):

    # -------------------------- Obligatory methods --------------------------

    def _prepare_data_storage(self):
        """Prepare structure(s) that will hold the analyzed data"""
        self.data = pd.DataFrame(columns=('min', 'max'))
        self.data.index.name = 'num'  # convenient to use image# as index

    def _store_data(self, data):
        """How to store data generated by analysis on a single image.

        Input
        -----
        data is a dictionary, output of Analysis.analyze()
        """
        self.data.loc[data['num']] = (data['min'], data['max'])

    def _to_results_data(self):
        """How to pass stored data into an AnlysisResults class/subclass.

        For most simple cases, just store the final version of your data
        structure in results.data
        """
        return self.data

    # --------------------------- Optional methods ---------------------------

    def _regenerate_analysis_data(self, num):
        """OPTIONAL, how to move back from data structure do data dict.

        Basically the inverse of _store_data()
        (num is added automatically in the data afterwards)
        """
        data = {}
        data['min'] = self.analysis.results.data.loc[num, 'min']
        data['max'] = self.analysis.results.data.loc[num, 'max']
        return data

    def _to_metadata(self):
        """Metadata saving (dict); typically analysis parameters"""
        return {'info': 'Add your metadata here'}

The formatter class needs to be included as a class attribute in the analysis class:

In [6]:
class MinMax(BasicMinMax):
    Formatter = MinMaxFormatter

Now you can run the analysis on a (sub-)sequence of the images and see the results:

In [7]:
minmax = MinMax(images)
minmax.run(skip=2)
minmax.results.data.head()

100%|██████████| 15/15 [00:00<00:00, 297.51it/s]


Unnamed: 0_level_0,min,max
num,Unnamed: 1_level_1,Unnamed: 2_level_1
0,26,255
2,27,255
4,28,255
6,28,255
8,29,255


### 2.2) Pandas formatter

When the data is easily stored into a single pandas dataframe, it is often more convenient to use a pre-defined pandas formatter, which will take care of most of the logics above.

Now the methods that need to be subclassed are:

**Obligatory**
- `_column_names()`: indicate the name of the columns for the data (excluding num, file, folder etc.)
- `_data_to_results_row()`: Generate iterable of datat that fits in the defined columns

**Optional**
- `_results_row_to_data()`: Basically the inverse function to `_data_to_results_row()` [optional, see Viewer section below]
- `_to_metadata()`: return metadata to be saved

As can be seen below, the formatter is now much more compact!

In [8]:
class PandasMinMaxFormatter(PandasFormatter):

    # --------------------------- Required methods ---------------------------

    def _column_names(self):
        """Columns of the analysis data (iterable)"""
        return ('min', 'max')

    def _data_to_results_row(self, data):
        """Generate iterable of data that fits in the defined columns."""
        return (data['min'], data['max'])

    # --------------------------- Optional methods ---------------------------

    def _results_row_to_data(self, row):
        """Go from row of data to raw data (useful for post-analysis inspection)"""
        return {'min': row['min'], 'max': row['max']}

    def _to_metadata(self):
        """Metadata saving (dict); typically analysis parameters"""
        return {'info': 'Add your metadata here'}

Again, include the formatter as a class attribute:

In [9]:
class MinMax(BasicMinMax):
    Formatter = PandasMinMaxFormatter

In [10]:
minmax = MinMax(images)
minmax.run(skip=2)
minmax.results.data.head()

100%|██████████| 15/15 [00:00<00:00, 291.19it/s]


Unnamed: 0_level_0,folder,filename,time (unix),min,max
num,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,../data/img1,img-00610.png,1696408000.0,26,255
2,../data/img1,img-00612.png,1696408000.0,27,255
4,../data/img1,img-00614.png,1696408000.0,28,255
6,../data/img1,img-00616.png,1696408000.0,28,255
8,../data/img1,img-00618.png,1696408000.0,29,255


Note that in the results above, the unix time is extracted automatically from the image files. To import real time data of the images (if available), see `ImgSeries.load_times()`

Finally, once you have decided in a Formatter, it is possible to include it as a default formatter within your Analysis class, so that you don't have to pass it every time:

## 3) How to save/load results and metadata to/from files: `Results` [optional]

In order to be able to use `save()` / `load()` with actual data, either:
- subclass the `Results` class to define the `_save_data()` and `_load_data()` methods (automatically called by `save()` and `load()`, or
- use a pre-defined `Results` subclass (e.g. `PandasTsvJsonResults`, which saves pandas data to .tsv files)

In both cases, it is also possible to set a class attribute `default_filename` that sets the filename (without extension) that is used when calling `load()` or `save()` without arguments; the filename impacts both the data file (e.g. TSV) and the metadata file (JSON, see above).

Below is an example of use of `PandasTsvJsonResults` as the results class.

In [11]:
class MinMaxResults(PandasTsvJsonResults):
    """Results class that uses pandas to save to .tsv files"""
    default_filename = 'MinMax_Results'


class MinMax(BasicMinMax):
    """Analysis class which uses the above Results class"""
    Formatter = PandasMinMaxFormatter
    Results = MinMaxResults

In [12]:
minmax = MinMax(images, savepath='../data/untracked_data')
minmax.run(skip=2)
minmax.results.data.head()

100%|██████████| 15/15 [00:00<00:00, 309.53it/s]


Unnamed: 0_level_0,folder,filename,time (unix),min,max
num,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,../data/img1,img-00610.png,1696408000.0,26,255
2,../data/img1,img-00612.png,1696408000.0,27,255
4,../data/img1,img-00614.png,1696408000.0,28,255
6,../data/img1,img-00616.png,1696408000.0,28,255
8,../data/img1,img-00618.png,1696408000.0,29,255


In [13]:
minmax.results.save()









## 4) How to view and inspect results: `AnalysisViewer` [optional]

It is often convenient to view the analysis in real time, or inspect the results afterward in an interactive manner. In order to do so, it is possible to subclass `AnalysisViewer`.

**IMPORTANT NOTE**: if the live view does not appear, try using `plt.ion()` before, or use `plt.show()` after the commands necessiting interactive matplotlib graphs. Try also changing matplotlib's backend.

### *Live view of analysis*

Here is a minimal example where we will plot the images and the detected minimum of the image in real time during the analysis

A viewer must define the following methods:

**Obligatory**:
- `_create_figure()`
- `_first_plot()`
- `_update_plot()``

And if `create_figure()` creates (empty) axes called `self.ax_img` to display the images from the image series, one can directly use
- `_create_image()`
- `_update_image()`

to manage image initial plot and updates

In [24]:
class MinMaxViewer(AnalysisViewer):

    def _create_figure(self):
        """Must define self.figs and self.axs"""
        self.fig, self.axs = plt.subplots(2, 1)
        self.ax_img, self.ax_analysis = self.axs

    def _first_plot(self, data):
        """What to do when the first frame is displayed
        --> create curves and image objects etc.

        data is what comes out of Analysis.analyze() (dict);

        the 'image' and 'num' keys are automatically added by
        Analysis.analyze()
        (compared to the raw results of _analyze()).

        Must define self.updated_artists as an iterable of
        matplotlib artists that will be updated in subsequent
        frames.
        """
        self._create_image(data)

        # current analysis data
        # For the moment, we do not store any value in the analysis pt
        # It will be added by _update_plot() called just below
        # This is useful to just indicate graphical formatting
        self.pt, = self.ax_analysis.plot([], [], 'o', c='b')

        # Previous analyzed data stored in analysis.results (static)
        if self.analysis.results.data is not None:
            self.ax_analysis.plot(
                self.analysis.results.data.index,
                self.analysis.results.data['min'],
            )

        # all live analysis data : here we do a simple dict to store data in the
        # viewer, but it can be better to get directly the data from the
        # formatter's data instead of storing it again here.
        # Same as above, actual data will be plotted by _update_plot()
        self.analysis_data = {}
        self.curve, = self.ax_analysis.plot([], [], '.b', alpha=.5)

        # This is required due to the strategy above to not plot things
        # immediately but set the curves to have no data.
        self._update_plot(data)

        # Useful only if blitting is used
        self.updated_artists = (self.pt, self.curve, self.imshow)

    def _update_plot(self, data):
        """What to do upon iterations of the plot after the first time."""
        # Update displayed image and image number
        self._update_image(data)

        # Update plot with current data
        # One has to manage the case where data does not contain analysis
        # data (e.g. becaus one asks to display a num not previously analyzed)
        min_val = data.get('min', None)
        if min_val is None:
            # One can hide the point to show that no analysis is available
            self.pt.set_visible(False)
            return

        # Update current analysis point
        num = data['num']
        self.pt.set_data((num,), (min_val,))
        self.pt.set_visible(True)

        # Update existing analysis points
        self.analysis_data[num] = min_val
        nums = list(self.analysis_data.keys())
        vals = list(self.analysis_data.values())
        self.curve.set_data((nums, vals))

        # Adapt analysis axes to fit new data
        self._autoscale(self.ax_analysis)

Again, the viewer must be included as a class attribute

In [25]:
class MinMax_WithViewer(MinMax):
    """Analysis class with live view option"""
    Formatter = PandasMinMaxFormatter
    Results = MinMaxResults
    Viewer = MinMaxViewer

In [30]:
minmax = MinMax_WithViewer(images, savepath='../data/untracked_data')

# Analysis with live view (analysis stops and records data if window is closed)
minmax.animate(live=True, save=True)

<matplotlib.animation.FuncAnimation at 0x2a005c070>

The analysis from the live view is directly available after using the animate function with `live=True` enabled:

In [27]:
minmax.results.data.tail()

Unnamed: 0_level_0,folder,filename,time (unix),min,max
num,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
12,../data/img1,img-00622.png,1696408000.0,29,255
13,../data/img1,img-00623.png,1696408000.0,26,255
14,../data/img1,img-00624.png,1696408000.0,27,255
15,../data/img1,img-00625.png,1696408000.0,28,255
16,../data/img1,img-00626.png,1696408000.0,29,255


### *Interactive inspection after analysis*

In order to be able to use the `analysis.show()`, `analysis.inspect()` and `analysis.animate()` tools after the analysis has run, the `Formatter` used by the analysis must have the `_regenerate_data()` method defined (see above). This method created a dict of data similar to that made by `analysis.analyze()`, but from stored data instead of live analysis data.

The Viewer will be the same viewer as used for live view of analysis (see above)

In [31]:
minmax.show(num=10)

array([<Axes: title={'center': 'img #10'}>, <Axes: >], dtype=object)

In [29]:
minmax.animate()

<matplotlib.animation.FuncAnimation at 0x29ee0d780>

In [None]:
minmax.inspect()

<filo.viewers.KeyPressSlider at 0x2a03be320>

: 

**NOTE**: it is possible to load results and inspect them directly by using `analysis.regenerate()`, see examples done in the Contour Tracking and Grey Level Analysis notebooks.