Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train wont start for custom dataset #296

Open
ArtBreguez opened this issue Aug 14, 2023 · 2 comments
Open

Train wont start for custom dataset #296

ArtBreguez opened this issue Aug 14, 2023 · 2 comments

Comments

@ArtBreguez
Copy link

ArtBreguez commented Aug 14, 2023

I'm trying to train llama to customer support, i'm using a custom dataset that has this shape:

[
    {
        "query": "My order hasn't arrived yet.",
        "response": "We apologize for the inconvenience. Can you please provide your order number so we can investigate?"
    },
    {
        "query": "I received a damaged product.",
        "response": "We apologize for the inconvenience. Can you please provide a photo of the damaged product so we can assist you further?"
    },
    {
        "query": "I need to return an item.",
        "response": "Certainly. Please provide your order number and reason for return, and we will provide you with instructions on how to proceed."
    },
    {
        "query": "I want to change my shipping address.",
        "response": "No problem. Can you please provide your order number and the new shipping address you'd like to use?"
    },
]

I already tokenized, but when i run train.py the terminal freezes on:

tokens per iteration will be: 131,072
breaks down as: 4 grad accum steps * 1 processes * 128 batch size * 256 max seq len
Initializing a new model from scratch
num decayed parameter tensors: 43, with 15,187,968 parameters
num non-decayed parameter tensors: 13, with 3,744 parameters
using fused AdamW: False
Created a PretokDataset with rng seed 42

What could be the reason? When training with TinyStories it work fine.

@twobob
Copy link

twobob commented Aug 16, 2023

Yeah. Im kinda in this space too. Figured it might be a me problem. still undecided.

Did try to poke around in exactly where the process thought it was by implementing the entire thing in colab and then stepping into the running process, but still did not figure it out beyond my "training has started" message that I injected fired...

have to revist

@madroidmaq
Copy link
Contributor

@ArtBreguez you can try this #311 (comment) ,it‘s worked for me。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants