Remove GUI and Update Docs #48

hamelsmu · 2024-04-22T05:53:26Z

This PR is in lieu of #43. I also am removing the GUI component per discussion in slack

This PR closes #45

(reopening this from a branch instead of a fork).

hamelsmu · 2024-04-22T05:56:41Z

Ok looks like an error with wandb ... 👀

EDIT: I see that already fixed this in #46

mwaskom · 2024-04-22T17:33:21Z

Can we please not vary parameters across base models where we don't mean to communicate that the parameterizations are model specific?. i.e. if we're going to switch to deepspeed zero1 let's do it for all of the models not just mistral.

mwaskom

Some nits on the new readme text.

README.md

mwaskom · 2024-04-22T17:37:59Z

README.md

+Some important caveats about the `train` command:
+
+- The `--data` flag is used to pass your dataset to axolotl. This dataset is then written to the `datasets.path` as specified in your config file. If you alraedy have a dataset at `datasets.path`, you must be careful to also pass the same path to `--data` to ensure the dataset is correctly loaded.
+- Unlike axolotl, you cannot pass additional flags to the `train` command. However, you can specify all your desired options in the config file instead.


We already say this above, probably only need to say it once (most people coming to this repo will not be very experienced axolotilians).

My observations:

I think this repo is pretty difficult to reason about if you aren't familiar with axolotl IMO. Like what are these configs? How does it work? How are my prompts assembled exactly? What does the dataset format need to be? Are there other dataset formats? How do I check the prompt construction? etc. I was actually assuming that the user is indeed familiar with axolotl.

If you are very familiar with axoltol, this --data flag was really confusing to me, because a key parameter in my config that I am used to using is being completely ignored with an extra layer of indirection. I actually got stuck on this personally as an experienced axolotl user, so I found the need to provide these two caveats.

cc: @charlesfrye @winglian curious what you think

README.md

config/mistral.yml

hamelsmu · 2024-04-22T18:34:03Z

@mwaskom about your question

Can we please not vary parameters across base models where we don't mean to communicate that the parameterizations are model specific?. i.e. if we're going to switch to deepspeed zero1 let's do it for all of the models not just mistral.

I checked and all the other example configs in this repo either do not use any deepspeed config, or are mixtral 8x8b parameter models which are too big to use DS Zero 1. DeepSpeed parameterizations are indeed model specific IMO in that they are specific to the model's size relative to the GPUs being used.

codellama: No DeepSpeed config was present
llama-2: No DS config was present
mistral already had a DS config, but I think it was incorrect IMO that is why I changed it. I don't think we need DS Z3 and all the additional communication overhead that comes with it for a 7B model on a H100 GPU.
mixtral (there are two configs) - they are both ok in that they are using DS Z2
pythia - no DS config was present

I was inferring from this prior structure that we didn't always assume multi-gpu training and its just the example in the README.

Let me know if I'm misunderstanding or missing something! cc: @charlesfrye

mwaskom · 2024-04-22T18:56:32Z

Hm are we looking in different places?

$ git grep zero config
config/codellama.yml:deepspeed: /root/axolotl/deepspeed_configs/zero3_bf16.json
config/mistral.yml:deepspeed: /root/axolotl/deepspeed_configs/zero3_bf16.json
config/mixtral.yml:deepspeed: /root/axolotl/deepspeed_configs/zero2.json
config/mixtral_out_of_box.yml:deepspeed: /root/axolotl/deepspeed_configs/zero2.json

That said, already kind of a mess I guess.

mwaskom · 2024-04-22T18:58:37Z

But as I mentioned above, I think that "what deepspeed to use" is a tricky question, i.e. i think that the original intention here was to demonstrate that zero3 works on modal even if it's not strictly necessary for the model size that we include in the quickstart. But I appreciate that many users will just pick up that config and run with it. So it's hard — a broader question about the point of this repo, IMO.

hamelsmu · 2024-04-22T18:59:36Z

Hm are we looking in different places?

I was looking at some old commit ong GitHub sorry about that, not sure how that happened

hamelsmu · 2024-04-22T19:02:11Z

But I appreciate that many users will just pick up that config and run with it. So it's hard — a broader question about the point of this repo, IMO.

That is an interesting point, perhaps we should huddle and discuss who the think the audience is, what we assume they come with etc. I probably have a much different mental model. I think it would be good to understand that so I can make better choices!

hamelsmu · 2024-04-22T19:03:26Z

re: Deepspeed configs

To resolve the gridlock I can just make everything z3 for now; its not like its going to kill anyone even if it is suboptimal

mwaskom · 2024-04-22T19:04:40Z

its not like its going to kill anyone even if it is suboptimal

Famous last words in LLM finetuning ;)

hamelsmu · 2024-04-22T19:07:05Z

Famous last words in LLM finetuning ;)

I'll actually just revert that one specific change :) I actually don't want to screw up other configs. They can be quite fiddly and I'm not sure about bf16 etc

hamelsmu added 3 commits April 13, 2024 12:39

update docs

46d0a32

remove gui

d1b2801

nit

d6cfbef

hamelsmu requested a review from charlesfrye April 22, 2024 05:54

hamelsmu mentioned this pull request Apr 22, 2024

update docs #43

Closed

fix wandb

38b5176

hamelsmu requested a review from mwaskom April 22, 2024 06:09

charlesfrye merged commit 1280fa1 into main Apr 22, 2024
5 checks passed

charlesfrye mentioned this pull request Apr 22, 2024

Pin wandb version in llm-finetuning repo #46

Closed

mwaskom reviewed Apr 22, 2024

View reviewed changes

mwaskom deleted the rm-gui branch April 22, 2024 19:23

This was referenced Apr 22, 2024

fix nits in docs #49

Merged

Audience For This Repo #51

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove GUI and Update Docs #48

Remove GUI and Update Docs #48

hamelsmu commented Apr 22, 2024

hamelsmu commented Apr 22, 2024 •

edited

mwaskom commented Apr 22, 2024

mwaskom left a comment

mwaskom Apr 22, 2024

hamelsmu Apr 22, 2024

hamelsmu commented Apr 22, 2024 •

edited

mwaskom commented Apr 22, 2024

mwaskom commented Apr 22, 2024

hamelsmu commented Apr 22, 2024

hamelsmu commented Apr 22, 2024

hamelsmu commented Apr 22, 2024

mwaskom commented Apr 22, 2024

hamelsmu commented Apr 22, 2024 •

edited

Remove GUI and Update Docs #48

Remove GUI and Update Docs #48

Conversation

hamelsmu commented Apr 22, 2024

hamelsmu commented Apr 22, 2024 • edited

mwaskom commented Apr 22, 2024

mwaskom left a comment

Choose a reason for hiding this comment

mwaskom Apr 22, 2024

Choose a reason for hiding this comment

hamelsmu Apr 22, 2024

Choose a reason for hiding this comment

hamelsmu commented Apr 22, 2024 • edited

mwaskom commented Apr 22, 2024

mwaskom commented Apr 22, 2024

hamelsmu commented Apr 22, 2024

hamelsmu commented Apr 22, 2024

hamelsmu commented Apr 22, 2024

mwaskom commented Apr 22, 2024

hamelsmu commented Apr 22, 2024 • edited

hamelsmu commented Apr 22, 2024 •

edited

hamelsmu commented Apr 22, 2024 •

edited

hamelsmu commented Apr 22, 2024 •

edited