Skip to content

Among emnist dataset splits, there are cases where labels do not match or out of list occurs. #2630

@kh-mo

Description

@kh-mo

🐛 Bug

In the same code,

  • 'byclass', 'bymerge', 'balanced' split do not match label & img.
  • 'letters' split raises an out of list error.
  • 'digits', 'mnist' split work well.

To Reproduce

def imshow(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()
    
def show_img_with_gt(i):
    batch_size=16
    dset_tr = torchvision.datasets.EMNIST(root="./data", split=i, download=True, train=True, 
                                      transform=transforms.Compose([lambda img: torchvision.transforms.functional.rotate(img, -90),
                                                                    transforms.RandomHorizontalFlip(p=1),
                                                                    transforms.ToTensor()]))
    dset_loader = torch.utils.data.DataLoader(dset_tr, batch_size=batch_size)
    i, (image, label) = next(enumerate(dset_loader))
    imshow(torchvision.utils.make_grid(image))
    print('GroundTruth: ', ' '.join('%5s' % dset_tr.classes[label[j]] for j in range(batch_size)))

show_img_with_gt("byclass")
show_img_with_gt("bymerge")
show_img_with_gt("balanced")
show_img_with_gt("letters")
show_img_with_gt("digits")
show_img_with_gt("mnist")

image

image

image

image

image

image

cc @pmeier

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions