Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception occured in ProgressCallback when calling event after_batch - preventing from running even the basic fastai tutorials #3809

Closed
prePhilip opened this issue Sep 29, 2022 · 6 comments

Comments

@prePhilip
Copy link

prePhilip commented Sep 29, 2022

Be sure you've searched the forums for the error message you received. Also, unless you're an experienced fastai developer, first ask on the forums to see if someone else has seen a similar issue already and knows how to solve it. Only file a bug report here when you're quite confident it's not an issue with your local setup.

Please see this model example of how to fill out an issue correctly. Please try to emulate that example as appropriate when opening an issue.

Please confirm you have the latest versions of fastai, fastcore, and nbdev prior to reporting a bug (delete one): YES
fastai version = '2.7.9'
fastcore version = '1.5.27'
torch version = '1.13.0.dev20220928'
Cuda version = 11.7

Running on Windows 11 pro wsl2

Describe the bug
When trying the basic cat vs dog classification, gets following error after learn.fine_tune(1)

TypeError: Exception occured in ProgressCallback when calling event after_batch:
unsupported format string passed to TensorBase.format

To Reproduce
Steps to reproduce the behavior:
`from fastai.vision.all import *
path = untar_data(URLs.PETS)/'images'

def is_cat(x): return x[0].isupper()
dls = ImageDataLoaders.from_name_func(
path, get_image_files(path), valid_pct=0.2, seed=42,
label_func=is_cat, num_workers=0, device ="cuda", item_tfms=Resize(224))

learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)`

see notebook at for full details https://github.com/prePhilip/fastaitest/blob/main/test.ipynb

Expected behavior
A clear and concise description of what you expected to happen.
learn.fine tune proceeds without any error

Error with full stack trace

Place between these lines with triple backticks:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [9], line 9
      4 dls = ImageDataLoaders.from_name_func(
      5     path, get_image_files(path), valid_pct=0.2, seed=42,
      6     label_func=is_cat, num_workers=0, device ="cuda", item_tfms=Resize(224))
      8 learn = vision_learner(dls, resnet34, metrics=error_rate)
----> 9 learn.fine_tune(1)

File ~/mambaforge/lib/python3.10/site-packages/fastai/callback/schedule.py:165, in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
    163 "Fine tune with `Learner.freeze` for `freeze_epochs`, then with `Learner.unfreeze` for `epochs`, using discriminative LR."
    164 self.freeze()
--> 165 self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
    166 base_lr /= 2
    167 self.unfreeze()

File ~/mambaforge/lib/python3.10/site-packages/fastai/callback/schedule.py:119, in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt, start_epoch)
    116 lr_max = np.array([h['lr'] for h in self.opt.hypers])
    117 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
    118           'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 119 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd, start_epoch=start_epoch)

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:256, in Learner.fit(self, n_epoch, lr, wd, cbs, reset_opt, start_epoch)
    254 self.opt.set_hypers(lr=self.lr if lr is None else lr)
    255 self.n_epoch = n_epoch
--> 256 self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:193, in Learner._with_events(self, f, event_type, ex, final)
    192 def _with_events(self, f, event_type, ex, final=noop):
--> 193     try: self(f'before_{event_type}');  f()
    194     except ex: self(f'after_cancel_{event_type}')
    195     self(f'after_{event_type}');  final()

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:245, in Learner._do_fit(self)
    243 for epoch in range(self.n_epoch):
    244     self.epoch=epoch
--> 245     self._with_events(self._do_epoch, 'epoch', CancelEpochException)

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:193, in Learner._with_events(self, f, event_type, ex, final)
    192 def _with_events(self, f, event_type, ex, final=noop):
--> 193     try: self(f'before_{event_type}');  f()
    194     except ex: self(f'after_cancel_{event_type}')
    195     self(f'after_{event_type}');  final()

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:239, in Learner._do_epoch(self)
    238 def _do_epoch(self):
--> 239     self._do_epoch_train()
    240     self._do_epoch_validate()

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:231, in Learner._do_epoch_train(self)
    229 def _do_epoch_train(self):
    230     self.dl = self.dls.train
--> 231     self._with_events(self.all_batches, 'train', CancelTrainException)

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:193, in Learner._with_events(self, f, event_type, ex, final)
    192 def _with_events(self, f, event_type, ex, final=noop):
--> 193     try: self(f'before_{event_type}');  f()
    194     except ex: self(f'after_cancel_{event_type}')
    195     self(f'after_{event_type}');  final()

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:199, in Learner.all_batches(self)
    197 def all_batches(self):
    198     self.n_iter = len(self.dl)
--> 199     for o in enumerate(self.dl): self.one_batch(*o)

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:227, in Learner.one_batch(self, i, b)
    225 b = self._set_device(b)
    226 self._split(b)
--> 227 self._with_events(self._do_one_batch, 'batch', CancelBatchException)

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:195, in Learner._with_events(self, f, event_type, ex, final)
    193 try: self(f'before_{event_type}');  f()
    194 except ex: self(f'after_cancel_{event_type}')
--> 195 self(f'after_{event_type}');  final()

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:171, in Learner.__call__(self, event_name)
--> 171 def __call__(self, event_name): L(event_name).map(self._call_one)

File ~/mambaforge/lib/python3.10/site-packages/fastcore/foundation.py:156, in L.map(self, f, gen, *args, **kwargs)
--> 156 def map(self, f, *args, gen=False, **kwargs): return self._new(map_ex(self, f, *args, gen=gen, **kwargs))

File ~/mambaforge/lib/python3.10/site-packages/fastcore/basics.py:840, in map_ex(iterable, f, gen, *args, **kwargs)
    838 res = map(g, iterable)
    839 if gen: return res
--> 840 return list(res)

File ~/mambaforge/lib/python3.10/site-packages/fastcore/basics.py:825, in bind.__call__(self, *args, **kwargs)
    823     if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
    824 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 825 return self.func(*fargs, **kwargs)

File ~/mambaforge/lib/python3.10/site-packages/fastai/learner.py:175, in Learner._call_one(self, event_name)
    173 def _call_one(self, event_name):
    174     if not hasattr(event, event_name): raise Exception(f'missing {event_name}')
--> 175     for cb in self.cbs.sorted('order'): cb(event_name)

File ~/mambaforge/lib/python3.10/site-packages/fastai/callback/core.py:62, in Callback.__call__(self, event_name)
     60     try: res = getcallable(self, event_name)()
     61     except (CancelBatchException, CancelBackwardException, CancelEpochException, CancelFitException, CancelStepException, CancelTrainException, CancelValidException): raise
---> 62     except Exception as e: raise modify_exception(e, f'Exception occured in `{self.__class__.__name__}` when calling event `{event_name}`:\n\t{e.args[0]}', replace=True)
     63 if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit
     64 return res

File ~/mambaforge/lib/python3.10/site-packages/fastai/callback/core.py:60, in Callback.__call__(self, event_name)
     58 res = None
     59 if self.run and _run: 
---> 60     try: res = getcallable(self, event_name)()
     61     except (CancelBatchException, CancelBackwardException, CancelEpochException, CancelFitException, CancelStepException, CancelTrainException, CancelValidException): raise
     62     except Exception as e: raise modify_exception(e, f'Exception occured in `{self.__class__.__name__}` when calling event `{event_name}`:\n\t{e.args[0]}', replace=True)

File ~/mambaforge/lib/python3.10/site-packages/fastai/callback/progress.py:33, in ProgressCallback.after_batch(self)
     31 def after_batch(self):
     32     self.pbar.update(self.iter+1)
---> 33     if hasattr(self, 'smooth_loss'): self.pbar.comment = f'{self.smooth_loss:.4f}'

File ~/mambaforge/lib/python3.10/site-packages/torch/_tensor.py:850, in Tensor.__format__(self, format_spec)
    848 def __format__(self, format_spec):
    849     if has_torch_function_unary(self):
--> 850         return handle_torch_function(Tensor.__format__, (self,), self, format_spec)
    851     if self.dim() == 0 and not self.is_meta and type(self) is Tensor:
    852         return self.item().__format__(format_spec)

File ~/mambaforge/lib/python3.10/site-packages/torch/overrides.py:1535, in handle_torch_function(public_api, relevant_args, *args, **kwargs)
   1529     warnings.warn("Defining your `__torch_function__ as a plain method is deprecated and "
   1530                   "will be an error in future, please define it as a classmethod.",
   1531                   DeprecationWarning)
   1533 # Use `public_api` instead of `implementation` so __torch_function__
   1534 # implementations can do equality/identity comparisons.
-> 1535 result = torch_func_method(public_api, types, args, kwargs)
   1537 if result is not NotImplemented:
   1538     return result

File ~/mambaforge/lib/python3.10/site-packages/fastai/torch_core.py:376, in TensorBase.__torch_function__(cls, func, types, args, kwargs)
    374 if cls.debug and func.__name__ not in ('__str__','__repr__'): print(func, types, args, kwargs)
    375 if _torch_handled(args, cls._opt, func): types = (torch.Tensor,)
--> 376 res = super().__torch_function__(func, types, args, ifnone(kwargs, {}))
    377 dict_objs = _find_args(args) if args else _find_args(list(kwargs.values()))
    378 if issubclass(type(res),TensorBase) and dict_objs: res.set_meta(dict_objs[0],as_copy=True)

File ~/mambaforge/lib/python3.10/site-packages/torch/_tensor.py:1273, in Tensor.__torch_function__(cls, func, types, args, kwargs)
   1270     return NotImplemented
   1272 with _C.DisableTorchFunction():
-> 1273     ret = func(*args, **kwargs)
   1274     if func in get_default_nowrap_functions():
   1275         return ret

File ~/mambaforge/lib/python3.10/site-packages/torch/_tensor.py:853, in Tensor.__format__(self, format_spec)
    851 if self.dim() == 0 and not self.is_meta and type(self) is Tensor:
    852     return self.item().__format__(format_spec)
--> 853 return object.__format__(self, format_spec)

TypeError: Exception occured in `ProgressCallback` when calling event `after_batch`:
	unsupported format string passed to TensorBase.__format__

Additional context
Add any other context about the problem here.

@prePhilip prePhilip changed the title Typeerror preventing from running even the basic fastai tutorials Exception occured in ProgressCallback when calling event after_batch - preventing from running even the basic fastai tutorials Sep 29, 2022
@prePhilip
Copy link
Author

prePhilip commented Sep 29, 2022

Fixed by adding
learn.remove_cb(ProgressCallback)
before fine tuning.
But can someone look into this bug?
Without the progress bar it is hard to figure out what is happening.

@johnomics
Copy link

I had the same problem with fastai 2.7.9, fastcore 1.5.27, PyTorch 1.13.0 on macOS Ventura 13.0. The problem is caused by this:

File ~/mambaforge/lib/python3.10/site-packages/fastai/callback/progress.py:33, in ProgressCallback.after_batch(self)
     31 def after_batch(self):
     32     self.pbar.update(self.iter+1)
---> 33     if hasattr(self, 'smooth_loss'): self.pbar.comment = f'{self.smooth_loss:.4f}'

The smooth_loss value cannot be converted properly in the format string (because it looks like Tensor doesn't support format):

>>> learn.smooth_loss
TensorBase(0.0595)
>>> f'{learn.smooth_loss:.4f}'
...
TypeError: unsupported format string passed to TensorBase.__format__

It can be fixed by extracting the value of smooth_loss first with item:

>>> f'{learn.smooth_loss.item():.4f}'
'0.0595'

Hacking site-packages/fastai/callback/progress.py with this change fixed the problem.

daavoo added a commit to iterative/dvclive that referenced this issue Oct 31, 2022
daavoo added a commit to iterative/dvclive that referenced this issue Oct 31, 2022
* fastai: Remove ProgressCallback in tests.

Per fastai/fastai#3809

* pin PytTorch version.

Per pytorch/pytorch#85427
@isuru-c-p
Copy link

Looks like this is a regression from the following pytorch commit:
pytorch/pytorch@3c2c2cc

Tensor.__format__ previously handled formatting correctly for all non-meta 0-dimensional tensor instances:

if self.dim() == 0 and not self.is_meta:
            return self.item().__format__(format_spec)

However, an additional check was added, preventing this path from being exercised for instances of subclasses of Tensor (such as fastai.torch_core.TensorBase):

if self.dim() == 0 and not self.is_meta and type(self) is Tensor:
            return self.item().__format__(format_spec)

@ima747
Copy link

ima747 commented Nov 1, 2022

Sounds like this is an underlying pytorch issue, is there an open bug or PR on pytorch anyone is aware of? I just picked up fastai the other day and ran face first into this.

The learn.remove_cb(ProgressCallback) allows learning to at least process thought I don't know what it's doing. But printing formatted outputs from tutorials also needs to be modified to just dump the tensor object instead of formatting it as well, so I'm just playing whack-a-mole to get output.

@isuru-c-p
Copy link

There's an existing pytorch issue for this here:
pytorch/pytorch#82764

@johnomics
Copy link

A workaround for this has been merged: #3828

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants