Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creating a NeMo model #8601

Closed
ShabnamRA opened this issue Mar 6, 2024 · 1 comment
Closed

creating a NeMo model #8601

ShabnamRA opened this issue Mar 6, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@ShabnamRA
Copy link

I am trying to learn NeMo from "tutorials/01_NeMo_Models.ipynb"

at the end of the page after crating NeMoGPTv2 class try to create a model :
model = NeMoGPTv2(cfg=cfg.model)

facing the following error :

   File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-67-1b7caab869c2>", line 1, in <module>
    model = NeMoGPTv2(cfg=cfg.model)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "<ipython-input-31-f04b7157a9ba>", line 3, in __init__
    super().__init__(cfg=cfg, trainer=trainer)
  File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/nemo/core/classes/modelPT.py", line 154, in __init__
    self.setup_multiple_validation_data(val_data_config=cfg.validation_ds)
  File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/nemo/core/classes/modelPT.py", line 539, in setup_multiple_validation_data
    model_utils.resolve_validation_dataloaders(model=self)
  File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/nemo/utils/model_utils.py", line 293, in resolve_validation_dataloaders
    model.setup_validation_data(cfg.validation_ds)
  File "<ipython-input-66-0c8f18429ac6>", line 23, in setup_validation_data
    vocab = f.read().split('')[:-1]  # the -1 here is for the dangling  token in the file
            ^^^^^^^^^^^^^^^^^^
ValueError: empty separator
@ShabnamRA ShabnamRA added the bug Something isn't working label Mar 6, 2024
@ShabnamRA
Copy link
Author

In this modified version provided here, split() is called without specifying any separator, which defaults to splitting based on whitespace characters such as space, tab, or newline. This resolved the ValueError caused by the empty separator.You need to modify this tutorial as follows :

class NeMoGPTv2(NeMoGPT):

    def setup_training_data(self, train_data_config: OmegaConf):
        self.vocab = None
        self._train_dl = self._setup_data_loader(train_data_config)

        # Save the vocab into a text file for now
        with open('vocab.txt', 'w') as f:
            for token in self.vocab:
                f.write(f"{token}")

        # This is going to register the file into .nemo!
        # When you later use .save_to(), it will copy this file into the tar file.
        self.register_artifact('vocab_file', 'vocab.txt')


    def setup_validation_data(self, val_data_config: OmegaConf):
        vocab_file = self.register_artifact('vocab_file', 'vocab.txt')

        with open(vocab_file, 'r') as f:
            vocab = f.read().split()[:-1]  # Split based on whitespace characters
            self.vocab = vocab

        self._validation_dl = self._setup_data_loader(val_data_config)


    def setup_test_data(self, test_data_config: OmegaConf):
        # This is going to try to find the same file, and if it fails,
        # it will use the copy in .nemo
        vocab_file = self.register_artifact('vocab_file', 'vocab.txt')

        with open(vocab_file, 'r') as f:
            vocab = []
            vocab = f.read().split()[:-1]  # the -1 here is for the dangling  token in the file
            self.vocab = vocab

        self._test_dl = self._setup_data_loader(test_data_config)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant