Skip to content

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Apr 20, 2022

It's quite likely that on future versions of TorchVision, we will have to modify the meta-data stored along with the weights. I believe that those meta-data are documentation and not code and as a result we should be able to modify them (add, delete, edit fields) on the future.

This PR adds a note on the documentation to clarify that there are no BC guarantees on meta-data.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @datumbox

Should we rename the meta field into _meta as well?

@datumbox
Copy link
Contributor Author

@NicolasHug We definitely can. I hesitated doing this because we use a dataclass for Weights and this would likely make it look ugly on the constructor call.

I decided to bring a PR and decide on the spot with you and @jdsgomes. I would rather avoid renaming meta to _meta but I don't really have a strong concern here. Thoughts?

@NicolasHug
Copy link
Member

My concern is that documenting that something is private is usually not enough for users not to use it. On top of that, I'm not sure these docs will actually get rendered anywhere (will they?).
Unless we add a leading underscore we can be 99% sure that users will depend on it; a different but relevant instance is the recent #5836.

I'm personally not too concerned about using _meta, but if we want to keep it clean perhaps we can just make Weights a regular class instead of a DataClass?

class Weights:
    def __init__(self, url, transforms, meta):
        self.url = url
        self.transforms = transforms
        self._meta = meta

Unless there's some killer feature from dataclasses that I'm missing?

@datumbox
Copy link
Contributor Author

datumbox commented Apr 20, 2022

@NicolasHug I think the use of dataclasses is from the time when the enums stored the data directly in them. I don't think there is a specific feature we need. Let me try make the change here and see if all work well.

@datumbox datumbox requested a review from NicolasHug April 20, 2022 16:00
@datumbox datumbox force-pushed the docs/meta_data branch 2 times, most recently from 913b3fd to f3a1f51 Compare April 20, 2022 16:07
@datumbox
Copy link
Contributor Author

It's been a while since I've worked on this part of the code-base so I had to refresh my memory about it.

The meta-data as they are now store two types of info:

  1. Meta fields with information crucial for the construction of the model. These can be guaranteed to be BC and can be accessed programmatically. Examples of these fields are: categories, keypoint_names, backend (for quantization) and size.
  2. Documentation fields that just provide context / info about the weights. These are only there for documentation purposes and have no BC guarantees. Examples of these fields are: recipe, num_params, accuracy metrics such as acc@1/`acc@5 etc

One clean option is to separate the two and provide different guarantees for each. A bit of an overkill but I don't see a better way if we want to be super consistent. Thoughts?

@datumbox
Copy link
Contributor Author

After sleeping over it, my preferred solution is not to generate a bunch of different doc fields but instead clean up the current ones (remove unnecessary fields see #5768) and restructure existing fields such as metrics. This way we can still offer a great level of BC, excluding things like URLs that are purely informational and can change / move on the future. Thoughts?

@NicolasHug
Copy link
Member

One clean option is to separate the two and provide different guarantees for each. A bit of an overkill but I don't see a better way if we want to be super consistent. Thoughts?

Something like a meta BC field and a private _doc field would be fine, I think.

I'm not sure I fully get your last comment, happy to chat more :)

@datumbox
Copy link
Contributor Author

@NicolasHug Sorry for not being very clear.

I'm proposing cleaning up the meta-data fields to keep only the bare minimum for which we can offer BC compatibility. With the exclusion of recipe URLs which can change to reflect the most updated info, all other fields like categories, num_params etc can be considered frozen and BC.

@datumbox
Copy link
Contributor Author

I had an offline chat with @NicolasHug and we agreed to strip the meta field down to the bare minimum fields (PR #5848 already is going towards that direction) and maintain BC.

I'm closing this PR and I'll follow up with a new one that further cleans up the fields and restructures them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants