New repo layout based on RFC #70

pbontrager · 2023-12-11T18:01:45Z

New repo layout to match https://github.com/pytorch-labs/torchtune/pull/54/files?short_path=dda34b2#diff-dda34b2e50075ce560d9f896ac2834b74c9e851a94f35fdfec031da03efa22c3

As this is file structure changes and the only code changes are imports, it might be easier to evaluate by checking out the branch and viewing the folders

joecummings · 2023-12-11T21:27:11Z

What are blocks? That's different from the RFC.

pbontrager · 2023-12-11T22:12:53Z

What are blocks? That's different from the RFC.

In the RFC the folders are all under torchtune/. So the folder called tune would result in imports of torchtune.tune.models which I think is fairly clunky. In this PR I changed torchtune to tune and tune to blocks though I'm open to other suggestions. This results in tune.blocks.models. "blocks" is meant to be like building blocks.

rohan-varma

I'm still a fan of keeping the directory and import name as torchtune, and doing imports such as torchtune.llm and etc. If users feel strongly about the shorthand, they can simply import torchtune as tune.

It also looks like not everything is under the tune directory so some imports will be like from torchtune import blah and others will be like from torchtune.tune import blah and users will be quite confused.

This also follows other domain libraries as a convention - i.e. torchaudio and torchrec.

As @joecummings mentioned, the blocks directory is also something I'm not sure about - we should group things into general subdirectories where things make sense of course, but datasets, models, trainer, and utils are only loosely related and can probably all live under torchtune.

also, we should consider whether we really need the trainer subdirectory. It might imply to users that it's a full-service trainer offering which we're not planning on building, and (2) is currently not used.

Thanks!

pbontrager · 2023-12-12T15:22:16Z

I'm still a fan of keeping the directory and import name as torchtune, and doing imports such as torchtune.llm and etc. If users feel strongly about the shorthand, they can simply import torchtune as tune. This also follows other domain libraries as a convention - i.e. torchaudio and torchrec.

I can concede on the tune vs torchtune point though I prefer when packages have shorter import names. Also your right that it should match the other domain library pattern.

It also looks like not everything is under the tune directory so some imports will be like from torchtune import blah and others will be like from torchtune.tune import blah and users will be quite confused.

This goal was that everything from the library would be under tune (or torchtune) and the other code that users would use from the rep goes in the top level directory so you'd never import it. I can move the utils folder into the tune (torchtune) folder if it's expected to be available from the pip package.

As @joecummings mentioned, the blocks directory is also something I'm not sure about - we should group things into general subdirectories where things make sense of course, but datasets, models, trainer, and utils are only loosely related and can probably all live under torchtune.

I'm not sold on the name blocks, so I'm open to suggestions, but the idea is that 'blocks' is the library within the library. It's all the building blocks that the recipes rely on. You can think that User 1 mostly uses the defaults folder, user 2 works with and modifies the recipes, and user three just uses the blocks that they need.

also, we should consider whether we really need the trainer subdirectory. It might imply to users that it's a full-service trainer offering which we're not planning on building, and (2) is currently not used.

That's my fear with 'trainer' too. train_utils didn't seem like a great name for a folder though. But maybe something along that line is better.

ebsmothers · 2023-12-12T18:08:21Z

So one fundamental set of questions that I think will inform a lot of how we set things up:

What are we building/packaging/running CI on?

Basically if we are packaging everything under torchtune (or tune as in this PR), then we should probably have CI set up there. But this includes recipes, which I'm not sure we want to run CI on (I think the previous suggestion was that we will have integration tests; while those are potentially amenable to CI I imagine we would want it decoupled from core components). For that reason I would suggest separate test directories for e.g. tune/blocks and tune/recipes.

To weigh in on the usage of blocks: I like what you're going for here. We want some way for a user to say "here are all the importable components in this offering". The problem is that this is defined differently for every user: some want modeling components only, some want a dataset, some want training components, etc. Imo better to just be explicit and skip the extra directory. Combining this with the previous point, my rec would be to move recipes up a level. This helps with discoverability, since I imagine recipes will be a very natural starting point for most people and will point users towards importable components. I think we should still be able to support the type 1 users described in your RFC by just adding either these recipes or some general script supporting a suite of recipes as an entry_point for our distribution.

A couple other miscellaneous thoughts:

Why the top-level utils directory? I assume these files will not be part of the package, but will they have any tests? Also maybe just personal preference but imo it feels janky to have a top-level directory called utils, maybe we can call it scripts or something instead.

Also, what is up with examples/inference.py? The naming seems to imply that this is a recipe, but really this is just a parity check, right? I think the naming does not indicate that clearly, and idk why this couldn't go in e.g. tests/blocks/llama2/scripts/...

pbontrager · 2023-12-12T18:28:43Z

@ebsmothers thanks for your helpful comments here. I will admit I'm not an expert on python packaging. The intent here is that the library provides a set of building blocks that we then user ourselves and provides to the user as building blocks. I was not aware of package entry points, given that, I think it would make sense for the current tune directory to act as blocks currently does. And then we'd move the recipes and current defaults directory up.

Two additional points to that though:

I think the recipes should be part of the CI in a limited form, for example the finetune_llm.py can have a test config which launches it with a dummy model and dataset for a single step through the loop.
We should build a console_script that allows launching our recipe with config from the command line

For the top level utils, those are meant to be user scripts we offer, I guess they should be available from the entry_point too. I just moved examples to the top because I didn't really know its purpose at the moment.

ebsmothers · 2023-12-12T18:37:43Z

@pbontrager I agree on both (1) and (2). For (1) we just need to make a call on whether to couple recipe tests to component tests, sounds like you're saying we shouldn't (assuming I'm understanding correctly). I'm in agreement with this, in practice we could have e.g. a nightly job to run only recipe tests instead of one that runs on every PR like with the core library components.

pbontrager · 2023-12-12T19:36:02Z

Based on the comments so far I made the following changes:

Renamed defaults -> configs
Moved recipes + configs to top level
Renamed tune -> torchtune
Removed blocks and put all its contents in torchtune
Renamed top level utils -> scripts
Moved examples -> scripts/llama2_inference

rohan-varma

synced up over VC and resolved the remaining questions. Looks great - please rebase on latest & test comprehensively before landing. thanks!

…ne into phil-repo-structure

pbontrager · 2023-12-12T23:02:31Z

All the unit tests pass and finetune_llm.py was able to start and pull all imports correctly.

New repo layout based on RFC

f71c3ab

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 11, 2023

rohan-varma and others added 3 commits December 11, 2023 15:42

Remove erroneously committed file (#72)

eaf8722

Fix test name and also confirm unit tests run (#68)

33e31a1

Setup minimal doc folder and build (#56)

0447d30

rohan-varma reviewed Dec 12, 2023

View reviewed changes

daniellepintz and others added 4 commits December 12, 2023 10:47

Update README.md: torchTBD -> torchtune (#76)

eb89afc

New repo layout based on RFC

7627d5e

Updated ;q merge conflict

6da3c17

Updated structure based on ebsmothers comment

fb0a813

rohan-varma self-requested a review December 12, 2023 20:29

rohan-varma approved these changes Dec 12, 2023

View reviewed changes

Philip Bontrager added 5 commits December 12, 2023 16:12

New repo layout based on RFC

f63372c

New repo layout based on RFC

cc4dbb8

Rebase on main

d92fa7d

Merge branch 'phil-repo-structure' of github.com:pytorch-labs/torchtu…

4183340

…ne into phil-repo-structure

Ran unit tests

621cf58

pbontrager merged commit c6a60cf into main Dec 12, 2023
1 check passed

pbontrager deleted the phil-repo-structure branch December 12, 2023 23:04

ebsmothers mentioned this pull request Dec 20, 2023

[RFC] First pass at making APIs private by default #120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New repo layout based on RFC #70

New repo layout based on RFC #70

pbontrager commented Dec 11, 2023 •

edited

joecummings commented Dec 11, 2023

pbontrager commented Dec 11, 2023 •

edited

rohan-varma left a comment

pbontrager commented Dec 12, 2023

ebsmothers commented Dec 12, 2023

pbontrager commented Dec 12, 2023 •

edited

ebsmothers commented Dec 12, 2023

pbontrager commented Dec 12, 2023

rohan-varma left a comment

pbontrager commented Dec 12, 2023

New repo layout based on RFC #70

New repo layout based on RFC #70

Conversation

pbontrager commented Dec 11, 2023 • edited

joecummings commented Dec 11, 2023

pbontrager commented Dec 11, 2023 • edited

rohan-varma left a comment

Choose a reason for hiding this comment

pbontrager commented Dec 12, 2023

ebsmothers commented Dec 12, 2023

pbontrager commented Dec 12, 2023 • edited

ebsmothers commented Dec 12, 2023

pbontrager commented Dec 12, 2023

rohan-varma left a comment

Choose a reason for hiding this comment

pbontrager commented Dec 12, 2023

pbontrager commented Dec 11, 2023 •

edited

pbontrager commented Dec 11, 2023 •

edited

pbontrager commented Dec 12, 2023 •

edited