-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while creating data #1
Comments
Did you change the SEED value? I wonder if this might be because you have a different directory structure than in the readme. Could you try running running this with a directory structure as in the readme? |
I'm new to fast ai, so thanks for sharing this repo! I'm having this same issue, I changed my file structure to match yours, the error persists. |
I won't have access to my computer until Monday. If noone finds a solution to this by then, I will probably just move the creation of a better validation set from the later NBs to the first one. In the meantime, if you'd like to play with this, you could jump over to the later NBs, or could try attempt moving how the validation set is created there to here. The other validation set is much less forgiving than this one so don't worry if you get a much poorer score locally. |
@radekosmulski playing a bit with it and with the help of the fastai documentation, it works if you call |
You are right, that is exactly what is happening. Some whales have only a single image in the train set, this error message implies that there exists a whale whose all images got assigned to the validation set (be that one or more). |
I understand why the issue is happening, but I do not know why despite having the same seed and folder structure some people are getting the error. My only thought is that maybe they removed some files or are using data from the playground competition. I started to change the first_submission notebook to sample the validation set like in the later notebooks, but this makes the first_submission notebook overly complex. Either way, I am not going to make the changes. Will add a note to the readme. If anyone is encountering the issue, please skip over to I think there is more value in keeping the first_submission NB simple and showing the natural evolution of code as I work towards solving a classification problem rather than backporting the creation of the validation set. |
Additional reasoning behind not making this change on kaggle here. |
Hi I am getting below error while running below code
code:
error:
KeyError Traceback (most recent call last)
~\fastai1\fastai\courses\dl1\fastai\data_block.py in process_one(self, item)
277 def process_one(self,item):
--> 278 try: return self.c2i[item] if item is not None else None
279 except:
KeyError: 'w_d8a08f8'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
in
3 .from_folder('data/whale/input/train')
4 .random_split_by_pct(seed=SEED)
----> 5 .label_from_func(lambda path: fn2label[path.name])
6 .add_test(ImageItemList.from_folder('data/whale/input/test'))
7 .transform(get_transforms(do_flip=False, max_zoom=1, max_warp=0, max_rotate=2), size=SZ, resize_method=ResizeMethod.SQUISH)
~\fastai1\fastai\courses\dl1\fastai\data_block.py in _inner(*args, **kwargs)
391 self.valid = fv(*args, **kwargs)
392 self.class = LabelLists
--> 393 self.process()
394 return self
395 return _inner
~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self)
438 "Process the inner datasets."
439 xp,yp = self.get_processors()
--> 440 for i,ds in enumerate(self.lists): ds.process(xp, yp, filter_missing_y=i==0)
441 return self
442
~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, xp, yp, filter_missing_y)
563 def process(self, xp=None, yp=None, filter_missing_y:bool=False):
564 "Launch the processing on
self.x
andself.y
withxp
andyp
."--> 565 self.y.process(yp)
566 if filter_missing_y and (getattr(self.x, 'filter_missing_y', None)):
567 filt = array([o is None for o in self.y])
~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, processor)
66 if processor is not None: self.processor = processor
67 self.processor = listify(self.processor)
---> 68 for p in self.processor: p.process(self)
69 return self
70
~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, ds)
284 ds.classes = self.classes
285 ds.c2i = self.c2i
--> 286 super().process(ds)
287
288 def getstate(self): return {'classes':self.classes}
~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, ds)
36 def init(self, ds:Collection=None): self.ref_ds = ds
37 def process_one(self, item:Any): return item
---> 38 def process(self, ds:Collection): ds.items = array([self.process_one(item) for item in ds.items])
39
40 class ItemList():
~\fastai1\fastai\courses\dl1\fastai\data_block.py in (.0)
36 def init(self, ds:Collection=None): self.ref_ds = ds
37 def process_one(self, item:Any): return item
---> 38 def process(self, ds:Collection): ds.items = array([self.process_one(item) for item in ds.items])
39
40 class ItemList():
~\fastai1\fastai\courses\dl1\fastai\data_block.py in process_one(self, item)
278 try: return self.c2i[item] if item is not None else None
279 except:
--> 280 raise Exception("Your validation data contains a label that isn't present in the training set, please fix your data.")
281
282 def process(self, ds):
Exception: Your validation data contains a label that isn't present in the training set, please fix your data.
Thanks in advance for your help!
The text was updated successfully, but these errors were encountered: