Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] Recommend a different file extension for models (.PTH is a special extension for Python) #14864

Open
vadimkantorov opened this issue Dec 7, 2018 · 17 comments
Assignees
Labels
triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@vadimkantorov
Copy link
Contributor

*.pth files are used by Python to list additional package search paths: https://docs.python.org/3/library/site.html

The pth files will be loaded as text files by Python interpreter. At some point when I had some PyTorch model pth file placed along with the sources, it caused a hang of Python at startup (it was trying to parse the big binary file as a list of paths).

Maybe just *.pt?

@t-vi
Copy link
Collaborator

t-vi commented Dec 7, 2018

As far as I can tell, .pt is used in many bits anyway, e.g. https://pytorch.org/tutorials/advanced/cpp_export.html , even if I have seen .pth (or even .pth.tar, when it wasn't a tar) in the wild.
But yes, I agree that standardizing on something not colliding with basic Python functionality is a good thing.

@vadimkantorov
Copy link
Contributor Author

I think the downloadable torchvision models have pth extension

@vadimkantorov
Copy link
Contributor Author

e.g. Intel's distiller uses the strange .pth.tar as well: https://nervanasystems.github.io/distiller/model_zoo/index.html

@vadimkantorov vadimkantorov changed the title [discussion] Recommend a different file extension for models (.PTH is a special for Python) [discussion] Recommend a different file extension for models (.PTH is a special extension for Python) Dec 8, 2018
@vadimkantorov
Copy link
Contributor Author

I think this is especially pertinent with a new announcement of Torch Hub:

https://pytorch.org/docs/master/hub.html and https://github.com/pytorch/vision/blob/master/hubconf.py both mention *.pth files

@soumith

@soumith
Copy link
Member

soumith commented Dec 9, 2018

sure, we can change our models to .pt, I have no reservations.
Do you might sending PRs, or pointing out where all you noticed the .pth recommendations so that we can change them?

@vadimkantorov
Copy link
Contributor Author

@soumith Sure! I'll find all occurences and paste the pointers here :)

@vadimkantorov
Copy link
Contributor Author

vadimkantorov commented Dec 12, 2018

An incomplete yet list (so far searched on github pytorch, torchvision, examples):

  1. model_urls = {
    'alexnet': 'https://download.pytorch.org/models/alexnet-owt-4df8aa71.pth',
    'dcgan_b': 'https://s3.amazonaws.com/pytorch/test_data/export/netG_bedroom_epoch_1-0649e76b.pth',
    'dcgan_f': 'https://s3.amazonaws.com/pytorch/test_data/export/netG_faces_epoch_49-d86035a6.pth',
    'densenet121': 'https://download.pytorch.org/models/densenet121-d66d3027.pth',
    'inception_v3_google': 'https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth',
    'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
    'srresNet': 'https://s3.amazonaws.com/pytorch/demos/srresnet-e10b2039.pth',
    'super_resolution': 'https://s3.amazonaws.com/pytorch/test_data/export/superres_epoch100-44c6958e.pth',
    'squeezenet1_0': 'https://download.pytorch.org/models/squeezenet1_0-a815701f.pth',
    'squeezenet1_1': 'https://download.pytorch.org/models/squeezenet1_1-f364aa15.pth',
    'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
    'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth',

  2. # matches bfd8deac from resnet18-bfd8deac.pth
    HASH_REGEX = re.compile(r'-([a-f0-9]*)\.')
    def load_url(url, model_dir=None, map_location=None, progress=True):
    r"""Loads the Torch serialized object at the given URL.
    If the object is already present in `model_dir`, it's deserialized and
    returned. The filename part of the URL should follow the naming convention
    ``filename-<sha256>.ext`` where ``<sha256>`` is the first eight or more
    digits of the SHA256 hash of the contents of the file. The hash is used to
    ensure unique names and to verify the contents of the file.
    The default value of `model_dir` is ``$TORCH_HOME/models`` where
    ``$TORCH_HOME`` defaults to ``~/.torch``. The default directory can be
    overridden with the ``$TORCH_MODEL_ZOO`` environment variable.
    Args:
    url (string): URL of the object to download
    model_dir (string, optional): directory in which to save the object
    map_location (optional): a function or a dict specifying how to remap storage locations (see torch.load)
    progress (bool, optional): whether or not to display a progress bar to stderr
    Example:
    >>> state_dict = torch.utils.model_zoo.load_url('https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth')

  3. https://github.com/pytorch/pytorch/blob/5734e9677564743fc4000cfb955fb42046689be9/docs/source/hub.rst

  4. https://github.com/pytorch/vision/blob/8f943d4e0c380cb0a5587b6e0e032932576fabea/torchvision/models/vgg.py#L12-L19

  5. https://github.com/pytorch/vision/blob/71182bc1ea27652f9952f6d60d8b27e408fc940e/torchvision/models/resnet.py#L10-L14

  6. https://github.com/pytorch/vision/blob/c7e9bd3006b0144fd1a94724f08122f673fe3587/hubconf.py#L48-L62

  7. https://github.com/pytorch/vision/blob/d5637696eba298f96a5fda44c6462f97ad1f987c/torchvision/models/densenet.py#L12-L15

  8. https://github.com/pytorch/vision/blob/dc0238b82f0df5c44ec9878cb41011d1852a7afd/torchvision/models/squeezenet.py#L11-L12

  9. https://github.com/pytorch/vision/blob/1fb0ccf71620d113cb72696b2eb8317b3e252cbb/torchvision/models/alexnet.py#L9

  10. https://github.com/pytorch/vision/blob/85369e3a315697be7e167f303d44f6b69d46c8ee/torchvision/models/inception.py#L12

  11. https://github.com/pytorch/examples/blob/29c2ed8ca6dc36fc78a3e74a5908615619987863/dcgan/README.md

  12. https://github.com/pytorch/examples/blob/29c2ed8ca6dc36fc78a3e74a5908615619987863/super_resolution/README.md

  13. https://github.com/pytorch/examples/blob/2fc0211d30b808f049ab7e7f4990858cf2ac471f/fast_neural_style/neural_style/neural_style.py#L107-L217

  14. https://github.com/pytorch/examples/blob/64f829ce495dad43392451c7431ae26eeee39bad/dcgan/main.py#L260-L261

  15. https://github.com/pytorch/examples/blob/29c2ed8ca6dc36fc78a3e74a5908615619987863/super_resolution/main.py#L75

  16. https://github.com/pytorch/examples/blob/29c2ed8ca6dc36fc78a3e74a5908615619987863/fast_neural_style/README.md

  17. https://github.com/pytorch/examples/blob/15e27719d75e35358555a27215665c797999740f/imagenet/main.py#L349-L352

@vadimkantorov
Copy link
Contributor Author

If *.pt is reserved for zipballs from saved JIT'ted models, it may be needed to recommend a different extension for raw saved tensors (preferably not *.pth or fake *.tar)

@nzmora
Copy link
Contributor

nzmora commented Mar 25, 2019

Hi,
Any updates on this? It's a rather trivial issue, but it would be nice to have a "standard" and meaningful file extension for the PyTorch checkpoint files.
Thanks!

@soumith
Copy link
Member

soumith commented Mar 26, 2019

we can go with *.ptc. We haven't had time to actually do the task though.

@vadimkantorov
Copy link
Contributor Author

@soumith *.ptc for both pickle format (from torch.save and state_dict) and zip format from JIT?

@soumith
Copy link
Member

soumith commented Mar 27, 2019

maybe .pt for pickle format and .ptc (pytorch compiled) for JIT

@vadimkantorov
Copy link
Contributor Author

One alternative more verbose option: *.torch.pkl, *.torch.zip, *.torch.h5

@soumith
Copy link
Member

soumith commented Mar 28, 2019

i think that's too long

@vadimkantorov
Copy link
Contributor Author

@soumith Another option: *.pt and *.ptz (hints that it is a collection of multiple things, like npz).

@ain-soph
Copy link
Contributor

ain-soph commented Dec 14, 2020

Hi, any update on this?
My current library still uses .pth to save models and .pt to save tensors. Let me know the standard if it's finally determined, so that I could apply it on my library.

I don't recommend .ptz if we don't have a torch.savez function and the same style as numpy np.savez(file_path, key1=value1,key2=value2). Do we plan to have it?

@KOLANICH
Copy link

👍 for chained extensions not concealing the underlying format. Just .zip and .tar conveys not enough info about what is inside them (it can be pickle, for me pickle === "I cannot accept that").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

7 participants