## KMNIST Dataset
- <https://github.com/rois-codh/kmnist>
- <https://pytorch.org/vision/stable/datasets.html#kmnist>
- <https://www.tensorflow.org/datasets/catalog/kmnist>

In [6]:
from pathlib import Path
from torchvision.datasets import KMNIST
from torchvision.transforms import ToTensor

In [9]:
type(KMNIST)

type

```python
KMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)
```

will

- call the `__init__()` method of the class `KMNIST`, create a folder under the path given in the `root` parameter if it does not already exist, and download the dataset of KMNIST into this folder
- `train=True` will set the returned value to the training dataset
- **(?)** `transform=ToTensor()`
- One single download will download both the training and the test sets
  - In other words, later on, when you want to access to the test set by calling the same class with parameter `train=False`, you can set the `download` parameter to `False`
  - Actually, if you mistakenly set `download=True`, as long as the `root` path has already had a copy of the download, the code will not make another unnecessary download


In [10]:
path_pytorch_dataset = Path.home() / "datasets/pytorch"
train_data = KMNIST(
    root=path_pytorch_dataset,
    train=True,
    download=True,
    transform=ToTensor(),
)
train_data, type(train_data)

(Dataset KMNIST
     Number of datapoints: 60000
     Root location: /home/phunc20/datasets/pytorch
     Split: Train
     StandardTransform
 Transform: ToTensor(),
 torchvision.datasets.mnist.KMNIST)

In [25]:
test_data = KMNIST(
    root=path_pytorch_dataset,
    train=False,
    download=True,
    transform=ToTensor(),
)
test_data, type(test_data)

(Dataset KMNIST
     Number of datapoints: 10000
     Root location: /home/phunc20/datasets/pytorch
     Split: Test
     StandardTransform
 Transform: ToTensor(),
 torchvision.datasets.mnist.KMNIST)

In [14]:
def non_dunder(obj):
    return [s for s in dir(obj) if not s.startswith("__")]

In [18]:
test_data.

In [15]:
non_dunder(train_data)

['_check_exists',
 '_format_transform_repr',
 '_repr_indent',
 'class_to_idx',
 'classes',
 'data',
 'download',
 'extra_repr',
 'processed_folder',
 'raw_folder',
 'resources',
 'root',
 'target_transform',
 'targets',
 'test_data',
 'test_file',
 'test_labels',
 'train',
 'train_data',
 'train_labels',
 'training_file',
 'transform',
 'transforms']

In [20]:
train_data.data

tensor([[[  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0],
         ...,
         [  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0]],

        [[  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0],
         ...,
         [  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0]],

        [[  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0],
         ...,
         [  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0]],

        ...,

        [[  0,   0,   0,  ...,   0,   0,   0],
         [  0,   0,   0,  ...,   0,   0,   0]

In [26]:
train_data.data.shape, test_data.data.shape

(torch.Size([60000, 28, 28]), torch.Size([10000, 28, 28]))