Train wont start for custom dataset #296

ArtBreguez · 2023-08-14T20:24:31Z

I'm trying to train llama to customer support, i'm using a custom dataset that has this shape:

[
    {
        "query": "My order hasn't arrived yet.",
        "response": "We apologize for the inconvenience. Can you please provide your order number so we can investigate?"
    },
    {
        "query": "I received a damaged product.",
        "response": "We apologize for the inconvenience. Can you please provide a photo of the damaged product so we can assist you further?"
    },
    {
        "query": "I need to return an item.",
        "response": "Certainly. Please provide your order number and reason for return, and we will provide you with instructions on how to proceed."
    },
    {
        "query": "I want to change my shipping address.",
        "response": "No problem. Can you please provide your order number and the new shipping address you'd like to use?"
    },
]

I already tokenized, but when i run train.py the terminal freezes on:

tokens per iteration will be: 131,072
breaks down as: 4 grad accum steps * 1 processes * 128 batch size * 256 max seq len
Initializing a new model from scratch
num decayed parameter tensors: 43, with 15,187,968 parameters
num non-decayed parameter tensors: 13, with 3,744 parameters
using fused AdamW: False
Created a PretokDataset with rng seed 42

What could be the reason? When training with TinyStories it work fine.

The text was updated successfully, but these errors were encountered:

twobob · 2023-08-16T13:30:31Z

Yeah. Im kinda in this space too. Figured it might be a me problem. still undecided.

Did try to poke around in exactly where the process thought it was by implementing the entire thing in colab and then stepping into the running process, but still did not figure it out beyond my "training has started" message that I injected fired...

have to revist

madroidmaq · 2023-08-18T01:31:43Z

@ArtBreguez you can try this #311 (comment) ，it‘s worked for me。

madroidmaq mentioned this issue Aug 17, 2023

Stuck on training: Created a PretokDataset with rng seed 42 #311

Open

RahulSChand mentioned this issue Aug 19, 2023

Give better error message in Tinystories data loader #319

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train wont start for custom dataset #296

Train wont start for custom dataset #296

ArtBreguez commented Aug 14, 2023 •

edited

Loading

twobob commented Aug 16, 2023

madroidmaq commented Aug 18, 2023

Train wont start for custom dataset #296

Train wont start for custom dataset #296

Comments

ArtBreguez commented Aug 14, 2023 • edited Loading

twobob commented Aug 16, 2023

madroidmaq commented Aug 18, 2023

ArtBreguez commented Aug 14, 2023 •

edited

Loading