Skip to content

Conversation

chm90
Copy link
Contributor

@chm90 chm90 commented Jan 29, 2020

Fixes #
returning the model with the lowest priority of the saved models
Description:
The self._saved list is always sorted in an ascending order(ensured by sorting it each time a new element is added). Hence, returning self._saved[0] in last_checkpoint actually yields the oldest checkpoint of the saved ones, or the worst if a score function is provided. Changing the above toself._saved[-1] yields the expected result, i.e, the model with the highest priority.

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • [X ] Documentation is updated (if required)

@chm90 chm90 requested a review from vfdev-5 January 29, 2020 15:12
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 29, 2020

@chm90 thanks for the PR! I need to recheck that. I have a llttle doubt on that.
Edit: Yes, you are correct!
Anyway, I think we should have some explicit tests on that.

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chm90 could you please add an explicit test for checking that. Thanks!

@vfdev-5 vfdev-5 added the bug label Jan 29, 2020
- minor improvement of typing
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 29, 2020

@chm90 as it is a bug, I added tests by myself. So, thanks again for catching it !

@chm90
Copy link
Contributor Author

chm90 commented Jan 29, 2020

Thanks @vfdev-5, I didn't have time to implement tests yet. Thank's for implementing

@chm90
Copy link
Contributor Author

chm90 commented Jan 29, 2020

@vfdev-5, do I need to do anything else for you to be able to merge the pr?

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chm90 thanks again for the catch!

@vfdev-5 vfdev-5 merged commit c295330 into pytorch:master Jan 29, 2020
@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 30, 2020

Hi @vfdev-5 ,

Have you added this bug fix to nightly build?
I faced this same issue before and manually get last_checkpoint in my project.
If you updated, I can update my code immediately.
Thanks.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 30, 2020

Hi @Nic-Ma

This commit will be normally added to the nighlty build today (30/01/2020).
Cron job that built nightly release for 29/01/2020 is up to Minor docstring fix (#743) commit.
https://travis-ci.org/pytorch/ignite/builds/643449008

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 31, 2020

Hi @vfdev-5 ,

Thanks for your information.
I already updated to your nightly build, but faced another issue:
File "/usr/local/lib/python3.5/dist-packages/ignite/__init__.py", line 1, in <module> import ignite.engine File "/usr/local/lib/python3.5/dist-packages/ignite/engine/__init__.py", line 4, in <module> from ignite.engine.engine import Engine File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 15, in <module> from ignite._utils import _to_hours_mins_secs File "/usr/local/lib/python3.5/dist-packages/ignite/_utils.py", line 4, in <module> from ignite.utils import convert_tensor, apply_to_tensor, apply_to_type, to_onehot File "/usr/local/lib/python3.5/dist-packages/ignite/utils.py", line 44, in <module> input_type: Union[Type, Tuple[Type[Any], Any]], File "/usr/lib/python3.5/typing.py", line 552, in __getitem__ dict(self.__dict__), parameters, _root=True) File "/usr/lib/python3.5/typing.py", line 512, in __new__ for t2 in all_params - {t1} if not isinstance(t2, TypeVar)): File "/usr/lib/python3.5/typing.py", line 512, in <genexpr> for t2 in all_params - {t1} if not isinstance(t2, TypeVar)): File "/usr/lib/python3.5/typing.py", line 1077, in __subclasscheck__ if super().__subclasscheck__(cls): File "/usr/lib/python3.5/abc.py", line 225, in __subclasscheck__ for scls in cls.__subclasses__(): TypeError: descriptor '__subclasses__' of 'type' object needs an argument

Is it compatible with Python 3.5?
Thanks.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 31, 2020

@vfdev-5 ,

And BTW, I found some small bugs in current checkpoint handler:

  1. The latest code added logic: suffix = "{}{:.4f}", which forces the "priority" to be float number.
  2. There is some mixed use of " " and ' ', which causes the saved filename includes a ' '.

For example, the saved model filename for epoch 2 will be:
'checkpoint_epoch=2.0000.pth'

Could you please have a look?
Thanks.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 31, 2020

@Nic-Ma thanks for the feedback, seems like it is a regression we introduced with typing. Sorry about that. We will fix it ASAP.

About the suffix, yes that's right, it should be related to another PR where such case was somehow out of scope. I'll take a look on that. Thanks again for reporting these things!

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 31, 2020

@vfdev-5

Cool, thanks for your quick fix.
I also faced other issues even when I updated to python 3.6:
If I extended torch.DataLoader and defined my own dataloader, it will crash when Ignite replaces it with torch.Dataloader for reproduction feature(I already named all other memeber vars start with "_").
The exception is: init error, unexpected keyname "dataset" when re-construct torch.DataLoader object.

Anyway, we will still use ignite 0.3.0 until the 0.4.0 version is stable and officially released.
Thanks.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 31, 2020

If I extended torch.DataLoader and defined my own dataloader, it will crash when Ignite replaces it with torch.Dataloader for reproduction feature(I already named all other memeber vars start with "_").

@Nic-Ma probably this should be related with an issue and its fix here:

return type(dataloader)(**params)

So, I would say that in python 3.7 you most probably should not have the issue. But, it would depend on what kind of extension you made in torch.DataLoader.

Could you please open a separate issue and put a small snippet to reproduce the problem from our side. Or pointout an example if it is in your MONAI project. Thanks!

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 31, 2020

Hi @vfdev-5 ,

Thanks for your tips.
We are using python 3.6 for MONAI project, and I encountered this issue when I tested locally.
We developed many arch design and haven't been in public repo yet.
It's not very easy to reproduce this issue because I ran in our NVIDIA public PyTorch docker env
https://ngc.nvidia.com/catalog/containers/nvidia:pytorch and extended DataLoader.
I think 0.3.0 is enough for our current features, will continue to develop things until 0.4.0 released,
then evaluate new features to use.
Thanks.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 31, 2020

@Nic-Ma I see, thanks. If you could however share some more details on your extended DataLoader (e.g. a snippet or just a description of what kind of extension you made) such that we could create some tests for that and ensure that ignite works for various cases. Thanks

I think 0.3.0 is enough for our current features, will continue to develop things until 0.4.0 released, then evaluate new features to use.

Certainly, we will release several 0.3.X versions before 0.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants