Error while creating data #1

saurabh502 · 2018-12-15T02:55:00Z

Hi I am getting below error while running below code

code:

data = (
    ImageItemList
        .from_folder('data/whale/input/train')
        .random_split_by_pct(seed=SEED)
        .label_from_func(lambda path: fn2label[path.name])
        .add_test(ImageItemList.from_folder('data/whale/input/test'))
        .transform(get_transforms(do_flip=False, max_zoom=1, max_warp=0, max_rotate=2), size=SZ, resize_method=ResizeMethod.SQUISH)
        .databunch(bs=BS, num_workers=NUM_WORKERS, path='data/whale/input')
)

error:

KeyError Traceback (most recent call last)
~\fastai1\fastai\courses\dl1\fastai\data_block.py in process_one(self, item)
277 def process_one(self,item):
--> 278 try: return self.c2i[item] if item is not None else None
279 except:

KeyError: 'w_d8a08f8'

During handling of the above exception, another exception occurred:

Exception Traceback (most recent call last)
in
3 .from_folder('data/whale/input/train')
4 .random_split_by_pct(seed=SEED)
----> 5 .label_from_func(lambda path: fn2label[path.name])
6 .add_test(ImageItemList.from_folder('data/whale/input/test'))
7 .transform(get_transforms(do_flip=False, max_zoom=1, max_warp=0, max_rotate=2), size=SZ, resize_method=ResizeMethod.SQUISH)

~\fastai1\fastai\courses\dl1\fastai\data_block.py in _inner(*args, **kwargs)
391 self.valid = fv(*args, **kwargs)
392 self.class = LabelLists
--> 393 self.process()
394 return self
395 return _inner

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self)
438 "Process the inner datasets."
439 xp,yp = self.get_processors()
--> 440 for i,ds in enumerate(self.lists): ds.process(xp, yp, filter_missing_y=i==0)
441 return self
442

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, xp, yp, filter_missing_y)
563 def process(self, xp=None, yp=None, filter_missing_y:bool=False):
564 "Launch the processing on self.x and self.y with xp and yp."
--> 565 self.y.process(yp)
566 if filter_missing_y and (getattr(self.x, 'filter_missing_y', None)):
567 filt = array([o is None for o in self.y])

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, processor)
66 if processor is not None: self.processor = processor
67 self.processor = listify(self.processor)
---> 68 for p in self.processor: p.process(self)
69 return self
70

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, ds)
284 ds.classes = self.classes
285 ds.c2i = self.c2i
--> 286 super().process(ds)
287
288 def getstate(self): return {'classes':self.classes}

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, ds)
36 def init(self, ds:Collection=None): self.ref_ds = ds
37 def process_one(self, item:Any): return item
---> 38 def process(self, ds:Collection): ds.items = array([self.process_one(item) for item in ds.items])
39
40 class ItemList():

~\fastai1\fastai\courses\dl1\fastai\data_block.py in (.0)
36 def init(self, ds:Collection=None): self.ref_ds = ds
37 def process_one(self, item:Any): return item
---> 38 def process(self, ds:Collection): ds.items = array([self.process_one(item) for item in ds.items])
39
40 class ItemList():

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process_one(self, item)
278 try: return self.c2i[item] if item is not None else None
279 except:
--> 280 raise Exception("Your validation data contains a label that isn't present in the training set, please fix your data.")
281
282 def process(self, ds):

Exception: Your validation data contains a label that isn't present in the training set, please fix your data.

Thanks in advance for your help!

The text was updated successfully, but these errors were encountered:

radekosmulski · 2018-12-15T13:12:16Z

Did you change the SEED value? I wonder if this might be because you have a different directory structure than in the readme. Could you try running running this with a directory structure as in the readme?

josemontiel · 2018-12-15T16:01:49Z

I'm new to fast ai, so thanks for sharing this repo! I'm having this same issue, I changed my file structure to match yours, the error persists.

radekosmulski · 2018-12-15T16:16:02Z

I won't have access to my computer until Monday. If noone finds a solution to this by then, I will probably just move the creation of a better validation set from the later NBs to the first one.

In the meantime, if you'd like to play with this, you could jump over to the later NBs, or could try attempt moving how the validation set is created there to here.

The other validation set is much less forgiving than this one so don't worry if you get a much poorer score locally.

josemontiel · 2018-12-15T16:22:23Z

@radekosmulski playing a bit with it and with the help of the fastai documentation, it works if you call no_split instead of random_split_by_pct. So, it seems to me that the issue might be caused by the fact that there are images with only one sample and that they could be falling as part of the validation set and not training, hence the label missing. Thoughts?

radekosmulski · 2018-12-15T16:38:14Z

You are right, that is exactly what is happening. Some whales have only a single image in the train set, this error message implies that there exists a whale whose all images got assigned to the validation set (be that one or more).

radekosmulski · 2018-12-19T09:26:53Z

I understand why the issue is happening, but I do not know why despite having the same seed and folder structure some people are getting the error.

My only thought is that maybe they removed some files or are using data from the playground competition.

I started to change the first_submission notebook to sample the validation set like in the later notebooks, but this makes the first_submission notebook overly complex.

Either way, I am not going to make the changes. Will add a note to the readme. If anyone is encountering the issue, please skip over to only_known_train.ipynb. By running all cells in the notebook you should get to ~0.760 on the LB. All other notebooks should work for you even if you are having issues with the first_submission one.

I think there is more value in keeping the first_submission NB simple and showing the natural evolution of code as I work towards solving a classification problem rather than backporting the creation of the validation set.

radekosmulski · 2018-12-19T09:41:24Z

Additional reasoning behind not making this change on kaggle here.

radekosmulski closed this as completed Dec 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while creating data #1

Error while creating data #1

saurabh502 commented Dec 15, 2018

radekosmulski commented Dec 15, 2018

josemontiel commented Dec 15, 2018

radekosmulski commented Dec 15, 2018

josemontiel commented Dec 15, 2018

radekosmulski commented Dec 15, 2018

radekosmulski commented Dec 19, 2018

radekosmulski commented Dec 19, 2018

Error while creating data #1

Error while creating data #1

Comments

saurabh502 commented Dec 15, 2018

error:

radekosmulski commented Dec 15, 2018

josemontiel commented Dec 15, 2018

radekosmulski commented Dec 15, 2018

josemontiel commented Dec 15, 2018

radekosmulski commented Dec 15, 2018

radekosmulski commented Dec 19, 2018

radekosmulski commented Dec 19, 2018