# PVInspect meta data plugins

PVInspect comes with a simple plugin mechanism to customize, how meta data is loaded and saved along with image data.

In [None]:
!git clone https://github.com/ma0ho/pvinspect.git
%cd pvinspect
!git checkout rework
!pip install .

In [None]:
import pvinspect as pv
import pandas as pd
from pvinspect.data.meta import MetaDriver
from pathlib import Path

Let's first download some demo `data.csv`-file:

In [None]:
!gdown --id 16O4Wf_aGNuiiUw8qDFYKQgqmtXqEvPx4
!unzip pvinspect_demo_images

This demo dataset has labels available as a `labels.csv`-file:

In [None]:
pd.read_csv("images/labels.csv", sep=";").head()

Unnamed: 0,filename,defect probability,wafer,crack,inactive,blob,finger,testset
0,cell0001.png,1.0,mono,1,0,0,0,0
1,cell0002.png,1.0,mono,1,0,0,1,0
2,cell0003.png,1.0,mono,1,0,0,0,0
3,cell0004.png,0.0,mono,0,0,0,0,0
4,cell0005.png,1.0,mono,0,0,1,1,0


We now want to load the images along with the meta data from the csv-file. To this end, we implement a custom `MetaDriver` that controls, how data is loaded:

In [None]:
class CSVMetaDriver(MetaDriver):
    def read_sequence_meta(self, path):
        return pd.read_csv(path / "labels.csv", sep=";").rename(columns={"filename": "original_filename"})

    def read_image_meta(self, path):
        meta = self.read_sequence_meta(path.parent)
        return meta.query("original_filename == '{}'".format(path.name)).iloc[0] if meta is not None else None

    def save_sequence_meta(self, path, sequence):
        raise NotImplementedError()

    def save_image_meta(self, path, image):
        raise NotImplementedError()

This can be immediately used to load the data:

In [None]:
seq = pv.data.io.read_images(Path("images"), with_meta=True, meta_driver=CSVMetaDriver(), lazy=True)

What happens under the hood is that PVInspect passes the path argument from `read_images` to the `read_sequence_meta` from the custom meta driver. This returns a `pd.DataFrame` having the filename of the corresponding file set as `original_filename` and additional meta data as extra columns. Now, `read_images` uses the `original_filename` from the data frame to load the images. Of course, since `lazy` is set, loading the data actually only happens when it is needed. However, meta data is directly available:

In [None]:
seq.meta

Unnamed: 0,original_filename,defect probability,wafer,crack,inactive,blob,finger,testset
0,cell0001.png,1.0,mono,1,0,0,0,0
1,cell0002.png,1.0,mono,1,0,0,1,0
2,cell0003.png,1.0,mono,1,0,0,0,0
3,cell0004.png,0.0,mono,0,0,0,0,0
4,cell0005.png,1.0,mono,0,0,1,1,0
...,...,...,...,...,...,...,...,...
2619,cell2620.png,0.0,poly,0,0,0,0,0
2620,cell2621.png,0.0,poly,0,0,0,0,0
2621,cell2622.png,0.0,poly,0,0,0,0,0
2622,cell2623.png,0.0,poly,0,0,0,0,0
