-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alpaca.py -> _alpaca.py. slimorca.py -> _slimorca.py #407
Conversation
✅ Deploy Preview for torchtune-preview ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@@ -11,7 +11,7 @@ | |||
from tests.test_utils import get_assets_path | |||
|
|||
from torchtune import datasets | |||
from torchtune.datasets.alpaca import CROSS_ENTROPY_IGNORE_IDX | |||
from torchtune.datasets._alpaca import CROSS_ENTROPY_IGNORE_IDX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the underscore: we're accessing private APIs in _alpaca
now. And that's completely OK to do so in tests files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Is it ok to access private fields in tests as well? I've been doing this but leaving apologetic comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean private fields like private attributes of torchtune classes and things like that? Yes of course, you should be testing those as well.
If you mean private fields as in "private stuff from other libraries" then HELL NO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I meant the former - private fields from TorchTune classes. Ok thats awesome!
@@ -61,7 +61,7 @@ To run the recipe without any changes on 4 GPUs, launch a training run using Tun | |||
Dataset | |||
------- | |||
|
|||
In this example, we use the `Alpaca Dataset <https://github.com/pytorch-labs/torchtune/blob/main/torchtune/datasets/alpaca.py>`_ | |||
In this example, we use :class:`~torchtune.datasets.AlpacaDataset` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!!
@NicolasHug This PR is awesome and the explanation is very clear. Thank you! Just making sure I understand correctly:
If this makes sense, then when do we NOT have an |
In general, yes. There might be case-by-case rare exceptions.
Yes although since the file has a
Yes
No, the whole There are cases where it make sense to prefix an entire folder with |
@NicolasHug very clear. Thank you! |
Thanks for putting this together @NicolasHug , I'm largely onboard with this but had one question.
Why is this not BC-breaking but |
There's no BC-break for private APIs. BC only applies to public APIs.
Just to clarify, we're not concerned about what users do with these APIs: they can't remove anything from torchtune anyway, and even if they were to modify |
This PR implements #25 (comment) for the
torchtune.datasets
namespace. This is just an example of what should be done, I'm not signing up to address the rest of the code-base :) (but I can still help if needed)changes
alpaca.py
into_alpaca.py
and update usages accordingly.slimorca.py
test plan
CI +
git grep
:Did it do any good???
CC @kartikayk @RdoubleA @rohan-varma @joecummings @ebsmothers hopefully this can clarify a few things:
Yes. Take a look at the current
alpaca.py
file:https://github.com/pytorch-labs/torchtune/blob/44d829e3a28a7e663a3089524afae3d8d1946c63/torchtune/datasets/alpaca.py#L16
Do you need
CROSS_ENTROPY_IGNORE_IDX
to be public? Probably not. Well, inmain
, users can access it without any underscore viatorchtune.datasets.alpaca.CROSS_ENTROPY_IGNORE_IDX
, and removing it or even changing its value would effectively be BC-breaking. That's really annoying, isn't it?Now that
alpaca.py
is_alpaca.py
, you don't need to care about it. ThatCROSS_ENTROPY_IGNORE_IDX
variable is private because its path is nowtorchtune.datasets._alpaca.CROSS_ENTROPY_IGNORE_IDX
i.e. it can't be accessed without an underscore. So you can remove / change it as you please, that's not a BC-break.To exemplify @RdoubleA 's question in yesterday's meeting: is renaming
alpaca
into_alpaca
equivalent to just putting underscores in front of the private stuff inalpaca
, i.e. could we just put an underscore in front of_CROSS_ENTROPY_IGNORE_IDX
and keepalpaca.py
? Yes, the end result is largely equivalent, but putting the underscore at the file makes it a lot easier for reviewers to not miss anytyhing. Case in point,_CROSS_ENTROPY_IGNORE_IDX
was made explicitly private inslimorca
but it was missed in reviews foralpaca
. Ifalpaca
had been_alpaca
, the fact thatCROSS_ENTROPY_IGNORE_IDX
was "missed" would have been a non-issue (there's no "missing" it when it's already private anyway).