-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoRA Fine Tuning #82
base: main
Are you sure you want to change the base?
LoRA Fine Tuning #82
Commits on Mar 4, 2024
-
This is just the first draft so we can start building this feature. - Added dataloader.py, which loads data for training - Added train.py, with the current training loop - Added lora.py, for LoRA wrapper of the stage 1 Transformer - Added dummy_dataset folder with 25 data samples to work with when testing (VCTK-->p311) - Commented out the initial inference code when stage 1 model is built. There is no batch processing in the training loop currently (was getting some dimension mismatching in the KVCache.update). The dataloader works fine, but everything else requires some work. This is just an initial draft so we can start working on this thing together! :-)
Configuration menu - View commit details
-
Copy full SHA for 53011d7 - Browse repository at this point
Copy the full SHA 53011d7View commit details
Commits on Mar 6, 2024
-
Cleared everything except stage 1
Switched from Adam to SGD optimizer Modified the DataLoader to return the first two encodec token hierarchies as a flattened interleaved tensor (let me know if that looks ok to you?) Modified LoRA wrapper to only fine tune speaker_cond_pos layer. In nanoGPT-LoRA only the causal attention layer is fine tuned (https://github.com/danielgrittner/nanoGPT-LoRA/blob/master/model.py#L128). Would it be worth trying something similar? Modified training loop to forward pass with entire batches at a time. Loss calculation doesn't work, need to match the GT labels with generated probabilities. Need some direction here.
Configuration menu - View commit details
-
Copy full SHA for da6e475 - Browse repository at this point
Copy the full SHA da6e475View commit details -
Add accelerate, new training loop
Almost trains, I have just made a mistake somewhere causing: RuntimeError: Trying to backward through the graph a second time I'm guessing it's because we need to do all the preprocessing in the dataloader rather than in the training loop. Let me know any thoughts. It's getting close :-)
Configuration menu - View commit details
-
Copy full SHA for 22de9d6 - Browse repository at this point
Copy the full SHA 22de9d6View commit details -
move loss to model + format similar to nanoGPT
Moved loss calculation to LoRA wrapper model. Modified training loop to be similar to that of nanoGPT. This involves using a sliding window for prompts & labels, which should more accurately replicate what the model is actually producing at logits level. If my intuition is wrong about this, please correct me.
Configuration menu - View commit details
-
Copy full SHA for c23d878 - Browse repository at this point
Copy the full SHA c23d878View commit details
Commits on Mar 7, 2024
-
Get model to train by clearing cache between iters
The model is fine tuning now. Not correctly, but it is fine tuning. Data must be prepared correctly now. But first, Attention and KVCache must be modified to be compatible with processing batches where batch size > 1.
Configuration menu - View commit details
-
Copy full SHA for fecd0ac - Browse repository at this point
Copy the full SHA fecd0acView commit details -
Cleaned up training loop, issue with data
Training loop is pretty clean now, and all data preparation is now done in the dataloader.py file. Loss becomes "nan" when entries in a batch have a lot of variance between each other (eg one entry had to be padded a lot during collation due to big difference in lengths on either prompt or encodec tokens tensors or both). Issue could perhaps be solved by grouping longer data points together, keeping them of similar length to avoid a lot of padding. Would love to hear any thoughts here.
Configuration menu - View commit details
-
Copy full SHA for d473493 - Browse repository at this point
Copy the full SHA d473493View commit details -
Implement sliding window data loading
Data loading should be more memory optimized, but this runs. Gonna run some tests to ensure this correctly trains the LoRA. Might wanna test LoRAs on different layers.
Configuration menu - View commit details
-
Copy full SHA for 198fd9d - Browse repository at this point
Copy the full SHA 198fd9dView commit details
Commits on Mar 8, 2024
-
Corrected mistake in data preparation. Will start training some LoRAs now and see if this fine tuning code is correctly set up.
Configuration menu - View commit details
-
Copy full SHA for 7c4b93e - Browse repository at this point
Copy the full SHA 7c4b93eView commit details -
Configuration menu - View commit details
-
Copy full SHA for e7d27e8 - Browse repository at this point
Copy the full SHA e7d27e8View commit details
Commits on Mar 10, 2024
-
Fix dtypes and copy nanoGPT-LoRA training params
Disabled Accelerate for now. Properly aligned all dtypes between the model and dataloader. Previously loaded speaker embeddings were not converted to the correct dtype. Copied most of the nanoGPT-LoRA training parameters. Renamed "epochs" to "iters" like in nanoGPT.
Configuration menu - View commit details
-
Copy full SHA for 42c4b92 - Browse repository at this point
Copy the full SHA 42c4b92View commit details -
load pretrained weights in lora layers
update to further mimic the nanoGPT-LoRA training process
Configuration menu - View commit details
-
Copy full SHA for 5c6c7a8 - Browse repository at this point
Copy the full SHA 5c6c7a8View commit details