Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Omniglot Dataset #323
This is the loader for Omniglot Dataset.
One of the use cases of this dataset is for One-Shot Learning where we sample a pair from the dataset and train a Neural Network to learn the similarity metric between the pair.
P.S.: It is amazing how simple it is to write data loaders!
Thanks for the PR - I've left some inline comments
Thanks for the updates, I've left some more comments inline, mostly around the
@alykhantejani Writing the comments on the approach here because I couldn't find it anywhere when I had responded to your review earlier (weirdly)
I believe this way, it is certainly harder to arrive at a collision when
The problem with this approach would be that I am not staying true to what I wanted to achieve with the class. The total number of pairs combinatorially possible are huge and that is why I introduced a
Hi @activatedgeek , sorry for the delay in replying.
So, my first thought about the
We introduce a new
class MultiDataset(object): def __init__(self, dataset, num_outputs=1, transforms=None): self.dataset = dataset self.num_outputs = num_outputs self.transforms = transforms def __getitem__(self, idx): # here comes the logic to convert a 1d index into a # self.num_output indices, each of size len(self.dataset) individual_idx =  for i in range(self.num_outputs): individual_idx.append(idx % len(self.dataset)) idx = idx // len(self.dataset) result =  for i in reversed(idx): result.append(self.dataset[i]) if self.transforms is not None: result = self.transforms(result) return result def __len__(self): return len(self.dataset) ** self.num_outputs
This way, you generate on-the-fly an arbitrarily large dataset, that can accommodate pairs/triplets/etc of elements of the same dataset. Plus, the logic of how to combine the different targets of each dataset becomes something that the user should do (via the transforms in the
This is just a rough idea, but let me know what you think.
@fmassa That is a great idea. I was in fact wondering the same because I recently came across an obvious requirement of similar kind for the ImageNet/Mini-ImageNet datasets as well. It didn't feel right to create custom rules every now and then.
So here is what I will do - the
I will take up a
Does that sound good?