# Tutorial 6. Subclassing dataloader object
## We need to populate the `Layer` object!
In this tutorial, we will learn how to subclassify the `DataLoader` class. This is usually not necessary since the XYDLoader and ImageLoader already provide 95% of coverage for the scientific data format (author thinks). However, through this tutorial we will implement the `EBSDLoader` class to load EBSD data more conveniently.

This library is highly modularized. Therefore, a central class to combine various functionalities offered by different classes is needed. The `Layer` object is exactly performing this operation.

Let's start with the most important detail:
"The provided data is loaded to the container by the dataloader. The sampling distortion of the loaded data is corrected by the transformer".

At this moment, the only available container object is `Container2D`. The dedicated container for 3D-sampled data will be released later on. However, this means our goal is to populate the `Container2D` correctly. To do this, you will need to implement a correct `DataLoader` to process the provided data to the correct format.

## First, let's visit thet Container2D object!
The `Container2D` class can be initialized with the x, y, and data columns. x and y are 1-dimensional array-like objects, while the columns are structured arrays or 2-dimensional arrays. If the given data is not a structured array, column names are automatically determined, such as Channel_0, Channel_1, and so on.

Container2D object is the subclass of the NumPy structured array. So you can use all NumPy functions that are working with Structured Arrays.

In [None]:
# Load data into the layer
from pyxc.core.container import Container2D
import numpy as np

x = np.linspace(0, 1, 10)
y = np.linspace(2, 10, 10)
data = np.random.random((10, 3))
example_container = Container2D(x_raw=x, y_raw=y, data=data)

As you can see, example_container is now initialized correctly.

In [None]:
example_container

We haven't provided the structured array. Therefore, column names for the data are automatically determined such as Channel_0, Channel_1, and Channel_2.

In [None]:
example_container.dtype.names

## Subcalssing `DataLoader` class to load special data
Okay, we've been used the example of EBSD data for a while. Let's make a data loader class to directly load the EBSD data. We will assume we need only X, Y, and Euler 1-3.

By using the default XYDLoader, we are going to do this. Note that we have used `column_parser` to extract x, y, and data columns accordingly.

In [None]:
# Load data into the layer
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography

EBSD = np.genfromtxt(
    "./data/SiC_in_NiSA.ctf", dtype=float, skip_header=15, delimiter="\t", names=True
)

layer_ebsd = Layer(
    data=column_parser(EBSD, format_string="dxy__ddd", return_unspecified=False),
    container=Container2D,
    dataloader=XYDLoader,
    transformer=Homography,
)

Similarily, we are able to put the whole parsing logic inside of the DataLoader class. Just implement the logic to the `prep()` method.

In [None]:
from pyxc.core.loader import DataLoaderBase
from pyxc.core.processor.arrays import xyd_splitter, column_parser


class EBSDLoader(DataLoaderBase):
    """A subclass of DataLoaderBase for loading and preprocessing single or multichannel image data.
    Image data is 2-dimensional array-like. It can be single channel, however it can be consisted of multiple channels.
    """

    def prep(self, data):
        x_serial, y_serial, prepped_data = xyd_splitter(
            column_parser(data, "dxy__ddd", return_unspecified=False)
        )
        return x_serial.flatten(), y_serial.flatten(), prepped_data

Now you can see that the EBSDLoader class can directly handle the EBSD data.

In [None]:
image = np.random.random((3, 3))
EBSDLoader(Container2D, EBSD)()