add support to finetune with use_distributed_optimizer #68

dumpmemory · 2023-09-18T10:28:32Z

fix issues for finetune with use_distributed_optimizer option

martinjaggi · 2023-09-26T09:03:12Z

could you comment what is the issue solved by this fix? (compared to the finetuning code and scripts we provide?)

dumpmemory · 2023-09-27T05:35:45Z

could you comment what is the issue solved by this fix? (compared to the finetuning code and scripts we provide?)

Yes. fix the missing function when u add --use-distributed-optimizer args in fintuning scripts.

dumpmemory · 2023-09-27T05:46:35Z

also fix #67 (comment)

dumpmemory · 2023-10-12T09:52:06Z

any update ?

kylematoba · 2023-10-18T08:26:17Z

hi, sorry no update: the whole team is working on a big run right now and obviously changing the function signature for checkpoint loading is not something we're keen to do right now. We should be done in about a month.

mynewstart · 2023-10-26T08:47:40Z

Hi @dumpmemory, If I used --use_checkpoint_args and --use_distributed_optimizer , an assertion error would be encountered in the code in checkpointing.py, as mpu is not initialized.

optim_name = os.path.join(
            common_path + "_%03d" % mpu.get_data_parallel_rank(),
            "optim.pt")

The root cause is _finish_mpu_init() is called after load_args_from_checkpoint(args) in initialize.py, the code is as follows:

def initialize_megatron(extra_args_provider=None,
                        args_defaults={}):
    """Set global variables, initialize distributed, and
    set autoresume and random seeds.
    `allow_no_cuda` should not be set unless using megatron for cpu only 
    data processing. In general this arg should not be set unless you know 
    what you are doing.
    """

    # Make sure cuda is available.
    assert torch.cuda.is_available(), 'Megatron requires CUDA.'

    # Parse arguments
    args = megatron.arguments.parse_args(extra_args_provider)

    if args.use_checkpoint_args or args_defaults.get('use_checkpoint_args', False):
        assert args.load is not None, '--use-checkpoints-args requires --load argument'
        load_args_from_checkpoint(args)

    megatron.arguments.validate_args(args, args_defaults)
        
    # set global args, build tokenizer, and set adlr_autoresume,
    # tensorboard-writer, and timers.
    set_global_variables(args)

    # torch.distributed initialization
    def _finish_mpu_init():
        _initialize_distributed(args)
        
        # Random seeds for reproducibility.
        if args.rank == 0:
            print('> setting random seeds to {} ...'.format(args.seed))
        _set_random_seed(args.seed, args.data_parallel_random_init)

    # Megatron's MPU is the master. Complete initialization right away.
    _finish_mpu_init()
    _init_autoresume()
    # _compile_dependencies(args)

    # No continuation function
    return None

dumpmemory · 2023-10-26T13:54:57Z

Hi @dumpmemory, If I used --use_checkpoint_args and --use_distributed_optimizer , an assertion error would be encountered in the code in checkpointing.py, as mpu is not initialized.

optim_name = os.path.join(
            common_path + "_%03d" % mpu.get_data_parallel_rank(),
            "optim.pt")

The root cause is _finish_mpu_init() is called after load_args_from_checkpoint(args) in initialize.py, the code is as follows:

def initialize_megatron(extra_args_provider=None,
                        args_defaults={}):
    """Set global variables, initialize distributed, and
    set autoresume and random seeds.
    `allow_no_cuda` should not be set unless using megatron for cpu only 
    data processing. In general this arg should not be set unless you know 
    what you are doing.
    """

    # Make sure cuda is available.
    assert torch.cuda.is_available(), 'Megatron requires CUDA.'

    # Parse arguments
    args = megatron.arguments.parse_args(extra_args_provider)

    if args.use_checkpoint_args or args_defaults.get('use_checkpoint_args', False):
        assert args.load is not None, '--use-checkpoints-args requires --load argument'
        load_args_from_checkpoint(args)

    megatron.arguments.validate_args(args, args_defaults)
        
    # set global args, build tokenizer, and set adlr_autoresume,
    # tensorboard-writer, and timers.
    set_global_variables(args)

    # torch.distributed initialization
    def _finish_mpu_init():
        _initialize_distributed(args)
        
        # Random seeds for reproducibility.
        if args.rank == 0:
            print('> setting random seeds to {} ...'.format(args.seed))
        _set_random_seed(args.seed, args.data_parallel_random_init)

    # Megatron's MPU is the master. Complete initialization right away.
    _finish_mpu_init()
    _init_autoresume()
    # _compile_dependencies(args)

    # No continuation function
    return None

I will update the code. i have fixed this

dumpmemory · 2023-10-26T14:02:00Z

Hi @dumpmemory, If I used --use_checkpoint_args and --use_distributed_optimizer , an assertion error would be encountered in the code in checkpointing.py, as mpu is not initialized.

optim_name = os.path.join(
            common_path + "_%03d" % mpu.get_data_parallel_rank(),
            "optim.pt")

The root cause is _finish_mpu_init() is called after load_args_from_checkpoint(args) in initialize.py, the code is as follows:

def initialize_megatron(extra_args_provider=None,
                        args_defaults={}):
    """Set global variables, initialize distributed, and
    set autoresume and random seeds.
    `allow_no_cuda` should not be set unless using megatron for cpu only 
    data processing. In general this arg should not be set unless you know 
    what you are doing.
    """

    # Make sure cuda is available.
    assert torch.cuda.is_available(), 'Megatron requires CUDA.'

    # Parse arguments
    args = megatron.arguments.parse_args(extra_args_provider)

    if args.use_checkpoint_args or args_defaults.get('use_checkpoint_args', False):
        assert args.load is not None, '--use-checkpoints-args requires --load argument'
        load_args_from_checkpoint(args)

    megatron.arguments.validate_args(args, args_defaults)
        
    # set global args, build tokenizer, and set adlr_autoresume,
    # tensorboard-writer, and timers.
    set_global_variables(args)

    # torch.distributed initialization
    def _finish_mpu_init():
        _initialize_distributed(args)
        
        # Random seeds for reproducibility.
        if args.rank == 0:
            print('> setting random seeds to {} ...'.format(args.seed))
        _set_random_seed(args.seed, args.data_parallel_random_init)

    # Megatron's MPU is the master. Complete initialization right away.
    _finish_mpu_init()
    _init_autoresume()
    # _compile_dependencies(args)

    # No continuation function
    return None

pls , try the new one.

kylematoba · 2023-11-05T18:08:22Z

hello @dumpmemory we're working on clearing the open issues and will be getting to this one soon. Thank you for your patience.

kylematoba · 2023-11-06T15:24:28Z

Thank you for your contribution @dumpmemory. We'll not merge this to keep our own complexity down. Sorry if this wasn't clear, but this repo is meant more as replication code for an upcoming paper than a long-lived fork from NVIDIA's megatron and we are not so interested in allocating time to adding features that we're not using. I'll add a note to the docs saying this :).

mynewstart · 2023-11-07T05:54:57Z

@kylematoba So does it means the main branch don't support use_distributed_optimizer?

add support to finetune with use_distributed_optimizer

03b441f

dumpmemory mentioned this pull request Sep 18, 2023

[Megatron Base Version] Would mind share the based version of Megatron ? #67

Closed

try to fix mpu not initial issue

180bcc7

kylematoba closed this Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support to finetune with use_distributed_optimizer #68

add support to finetune with use_distributed_optimizer #68

dumpmemory commented Sep 18, 2023 •

edited

Loading

martinjaggi commented Sep 26, 2023

dumpmemory commented Sep 27, 2023

dumpmemory commented Sep 27, 2023

dumpmemory commented Oct 12, 2023

kylematoba commented Oct 18, 2023

mynewstart commented Oct 26, 2023

dumpmemory commented Oct 26, 2023

dumpmemory commented Oct 26, 2023

kylematoba commented Nov 5, 2023

kylematoba commented Nov 6, 2023

mynewstart commented Nov 7, 2023

add support to finetune with use_distributed_optimizer #68

add support to finetune with use_distributed_optimizer #68

Conversation

dumpmemory commented Sep 18, 2023 • edited Loading

martinjaggi commented Sep 26, 2023

dumpmemory commented Sep 27, 2023

dumpmemory commented Sep 27, 2023

dumpmemory commented Oct 12, 2023

kylematoba commented Oct 18, 2023

mynewstart commented Oct 26, 2023

dumpmemory commented Oct 26, 2023

dumpmemory commented Oct 26, 2023

kylematoba commented Nov 5, 2023

kylematoba commented Nov 6, 2023

mynewstart commented Nov 7, 2023

dumpmemory commented Sep 18, 2023 •

edited

Loading