# Look inside ImageDataLoaders
> what is it? what's in it? Let's dig more inside of this handy class by looking inside of the actual source code.
- author: "Chansung Park"
- image: images/ImageDataLoaders/from_path_func.png
- image: images/disney/td.png
- toc: true
- comments: true
- categories: [image, data, fastai]
- permalink: /image_data_loaders/
- badges: false
- search_exclude: true

![from_path_func](ImageDataLoaders/from_path_func.png)
[ ImageDataLoaders.from_path_func ]

# Let's see the definition
Basic wrapper around several **`DataLoader`**s with factory methods for computer vision problems. Don't be bothered by **`delegates`** thing for now. I will explain what it is shortly.

In a nutshell, ImageDataLoaders provides a set of handy class methods to define a set of datasets to be fed into a model. As the plural form of the name indicates, it contains more than one dataset which means multiple datasets such as training/validation/test could be managed in one place.

```python
class ImageDataLoaders(DataLoaders):
    @classmethod
    @delegates(DataLoaders.from_dblock)
    def from_folder(...)

    @classmethod
    @delegates(DataLoaders.from_dblock)
    def from_path_func(...)

    @classmethod
    def from_name_func(...)

    @classmethod
    def from_path_re(...)
    
    @classmethod
    @delegates(DataLoaders.from_dblock)
    def from_name_re(...)

    @classmethod
    @delegates(DataLoaders.from_dblock)
    def from_df(...)

    @classmethod
    def from_csv(...)

    @classmethod
    @delegates(DataLoaders.from_dblock)
    def from_lists(...)
```


The name of each class methods explains itself pretty much. However, just remember, those suffixes **`_xxx`** simply means how you would like to define labels for each data. For instance, **`from_folder`** defines labels of each data by looking up the name of folders. **`from_path_func`** provides more flexible way than **`from_folder`**. Instead of specifying the folder name, we can actually write a function to extract which part of the path name should be used for labeling.
- with **`from_folder`**, the directory structure should strictly follow like below (the folder name for training/validation could be changed).
```
  - top_director - training
                 - validation
```
- when the directory structure is like below, **`from_folder`** can't be used. But you could parse the part of path name to be used as labels via **`from_path_func`**.
```
  - top_directory - training - training
                  - validation - validation
```
  - this examples looks silly, but you will soon realize there are many datasets structured in this way. You could either move the files into the parent directory or just simply use **`from_path_func`**.

Another cool method is **`from_path_re`**. It lets you to define labels by leveraging the power of regular expression. Even though you could implement your own regex paring logic in **`from_path_func`**, you could avoid from somewhat annoying boilerplates to set up regex with **`from_path_re`**. 

## Each methods

This cell provides a complete description of each class methods scrapped from the official document.

- **from_folder(...)**
  - Create from imagenet style dataset in `path` with `train` and `valid` subfolders (or provide `valid_pct`)
- **from_path_func(...)**
  - Create from list of `fnames` in `path`s with `label_func`
- **from_name_func(...)**
  - Create from the name attrs of `fnames` in `path`s with `label_func`
- **from_path_re(...)**
  - Create from list of `fnames` in `path`s with re expression `pat`
- **from_name_re(...)**
  - Create from the name attrs of `fnames` in `path`s with re expression `pat`
- **from_df(...)**
  - Create from `df` using `fn_col` and `label_col`
- **from_csv(...)**
  - Create from `path/csv_fname` using `fn_col` and `label_col`
- **from_lists(...)**
  - Create from list of `fnames` and `labels` in `path`

## Example usage with `from_path_func`
The example below is borrowed from [fastai official document](https://docs.fast.ai/vision.data.html#ImageDataLoaders.from_path_func).

```python
path = 'top directory'
fnames = 'list of files'

def label_func(x): 
    return x.parent.name

dls = ImageDataLoaders.from_path_func(path = path,
                                      fnames = fnames, 
                                      label_func = label_func)
```

**`from_path_func`** takes three parameters. The **`path`** is the path of the root directory of the project. **`fnames`** is a list containing all data files. It doesn't matter to include files stored in different sub-directories. Which file should belong to which label is determined via **`1abel_func`** function .

Let's look inside the **`label_func`** function. It is clear that it returns the name of the parent directory. For instance, if the path of a file is like **`datasets/train/image1.png`**, the label func will return **`train`** as the label for the **`image1.png`**.

# Let's look inside one of them, `from_folder(...)`

```python
    @classmethod
    @delegates(DataLoaders.from_dblock)
    def from_folder(cls, path, 
                    train='train', valid='valid', 
                    valid_pct=None, seed=None, vocab=None, 
                    item_tfms=None, batch_tfms=None, **kwargs):
        
        splitter = GrandparentSplitter(train_name=train, valid_name=valid) \
                   if valid_pct is None \
                   else RandomSplitter(valid_pct, seed=seed)
        
        get_items = get_image_files \
                    if valid_pct \
                    else partial(get_image_files, folders=[train, valid])
                
        dblock = DataBlock(blocks=(ImageBlock, CategoryBlock(vocab=vocab)),
                           get_items=get_items,
                           splitter=splitter,
                           get_y=parent_label,
                           item_tfms=item_tfms,
                           batch_tfms=batch_tfms)
        
        return cls.from_dblock(dblock, path, path=path, **kwargs)
```

## GrandparentSplitter function

```python
def _grandparent_idxs(items, name):
    def _inner(items, name): 
        return mask2idxs(Path(o).parent.parent.name == name for o in items)
    return [i for n in L(name) for i in _inner(items,n)]

def GrandparentSplitter(train_name='train', valid_name='valid'):
    "Split `items` from the grand parent folder names (`train_name` and `valid_name`)."
    def _inner(o):
        return _grandparent_idxs(o, train_name),_grandparent_idxs(o, valid_name)
    return _inner
```

`GrandparentSplitter` is a function that wraps up `_inner()` function defined as a nested function and returns it. `_inner()` function returns a tuple, each one returns a list of indicies marking where 



Then what does `_inner()` function do? It actually creates a tuple of two `_grandparent_idxs` functions. Then, we should look insde `_grandparent_idxs` function as well. It eventually calls `mask2idx` function after all.


`@delegates` is a decorator from `fastcore` package. It delegates all the parameters/arguments from a function specified in `(...)`. It even covers `**kwargs`.
- If you actually print out `from_folder`, `**kwargs` things will be revealed/demystified.
- please look at the picture below borrowed from [fastcore: An Underrated Python Library](https://fastpages.fast.ai/fastcore/) by [Hamel Husain](https://twitter.com/HamelHusain)

![delegates](ImageDataLoaders/delegates.png)