Data parallelism【multi-gpu train】+pure ViT work + small modify #150

TITC · 2022-05-17T02:17:15Z

pure ViT structure

We discussed pure ViT structure at #131 .

Initially, I used a pure ViT (6ecc3f4). But the encoder was just not performing very well. The model produced latex code but it has nothing to do with the input image.

And I do come up with same result, the model can't converge. In fact, I would hope that larger pure vit can achieve high performance, it really frustrated me. But in recent days, #147 (comment) give me some idea, because the training loss curve is so familiar like pure vit training curve, so I think the reason why pure vit can't fit maybe due to batch size.

I taken and modified models.py from 844bc21.

Here is the good news, it's working.

How to use

# for vit
 python -m pix2tex.train --config model/settings/config-vit.yaml --structure vit
# for hybrid, default is hybrid
 python -m pix2tex.train --config model/settings/config.yaml --structure hybrid
 python -m pix2tex.train --config model/settings/config.yaml

Data parallelism

I think multi-GPU training can save more time and a larger batch size, so refer to some documents and blogs and make such changes.
Also, it's compatible with one GPU.

How to use

#for one GPU
export CUDA_VISIBLE_DEVICES=6
 python -m pix2tex.train --config model/settings/config-vit.yaml --structure vit
#for multi GPU
export CUDA_VISIBLE_DEVICES=6,7
 python -m pix2tex.train --config model/settings/config-vit.yaml --structure vit

References:

small modify

I think both hybrid and pure vit work together, why not put them together. so create a folder named as structures.

the branch is based on 720978d, please feel free to correct any inappropriate code.😁

lukas-blecher · 2022-05-17T08:46:26Z

That's great news!
I've restructured the models code a little bit so that it is easier to call and removed some duplicate code.
Feel free to take a look, and tell me if I messed up somewhere.
I've moved the structure argument into the config and combined everything in the new models module.

TITC · 2022-05-17T14:06:55Z

Thanks a lot, I can benefit greatly from each pull request that you reviewed.

test again for both vit and hybrid, only have below problem.

Traceback (most recent call last):                                                                   
  File "/root/anaconda3/envs/latex_ocr_test/lib/python3.7/runpy.py", line 193, in _run_module_as_main 
    "__main__", mod_spec)                                                                                                                                                                                  
  File "/root/anaconda3/envs/latex_ocr_test/lib/python3.7/runpy.py", line 85, in _run_code                                                                                                                 
    exec(code, run_globals)                                                                          
  File "/yuhang/LaTeX-OCR/pix2tex/train.py", line 102, in <module>                                                                                                                                         
    train(args)                                                                                                                                                                                            
  File "/yuhang/LaTeX-OCR/pix2tex/train.py", line 27, in train                              
    model = get_model(args, training=True)                                                           
  File "/yuhang/LaTeX-OCR/pix2tex/models/utils.py", line 49, in get_model                            
    en_attn_layers = encoder.module.attn_layers if available_gpus > 1 else encoder.attn_layers                                                                                                             
  File "/root/anaconda3/envs/latex_ocr_test/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1186, in __getattr__                                                                             
    type(self).__name__, name))
AttributeError: 'CustomVisionTransformer' object has no attribute 'attn_layers'

If there is a better way to re-write 0aefdbf, please modify it.

lukas-blecher · 2022-05-17T14:58:41Z

Sorry, my fault. I didn't try with wandb: True

Why don't you just watch the model = Model(encoder, decoder, args)?

…into data-parallelism

TITC · 2022-05-17T23:55:02Z

because I don't know what it used to do before, like the random seeds stuff, haha

I read the doc and found it related to the gradient. maybe it can give some further clues around fit model.

Thanks for your guidance.

TITC · 2022-05-18T00:20:24Z

based on this stackoverflow QA, it seems git pull command fetch 6c53105 and automatically merge to local, then commit 4ffbeca, it's kind beyond my expectation. I thought it only commit 5cbbcb9.

lukas-blecher · 2022-05-18T07:15:39Z

Before I merge this: There is something that went wrong along the way.
When I try to load the pretrained model, the decoder architecture has changed

        Missing key(s) in state_dict: "decoder.net.attn_layers.layers.0.1.to_out.weight", "decoder.net.attn_layers.layers.0.1.to_out.bias", "decoder.net.attn_layers.layers.1.1.to_out.weight", "decoder.net.attn_layers.layers.1.1.to_out.bias", "decoder.net.attn_layers.layers.2.1.net.0.0.weight", "decoder.net.attn_layers.layers.2.1.net.0.0.bias", "decoder.net.attn_layers.layers.3.1.to_out.weight", "decoder.net.attn_layers.layers.3.1.to_out.bias", "decoder.net.attn_layers.layers.4.1.to_out.weight", "decoder.net.attn_layers.layers.4.1.to_out.bias", "decoder.net.attn_layers.layers.5.1.net.0.0.weight", "decoder.net.attn_layers.layers.5.1.net.0.0.bias", "decoder.net.attn_layers.layers.6.1.to_out.weight", "decoder.net.attn_layers.layers.6.1.to_out.bias", "decoder.net.attn_layers.layers.7.1.to_out.weight", "decoder.net.attn_layers.layers.7.1.to_out.bias", "decoder.net.attn_layers.layers.8.1.net.0.0.weight", "decoder.net.attn_layers.layers.8.1.net.0.0.bias", "decoder.net.attn_layers.layers.9.1.to_out.weight", "decoder.net.attn_layers.layers.9.1.to_out.bias", "decoder.net.attn_layers.layers.10.1.to_out.weight", "decoder.net.attn_layers.layers.10.1.to_out.bias", "decoder.net.attn_layers.layers.11.1.net.0.0.weight", "decoder.net.attn_layers.layers.11.1.net.0.0.bias".
        Unexpected key(s) in state_dict: "decoder.net.attn_layers.layers.0.1.to_out.0.weight", "decoder.net.attn_layers.layers.0.1.to_out.0.bias", "decoder.net.attn_layers.layers.1.1.to_out.0.weight", "decoder.net.attn_layers.layers.1.1.to_out.0.bias", "decoder.net.attn_layers.layers.2.1.net.0.proj.weight", "decoder.net.attn_layers.layers.2.1.net.0.proj.bias", "decoder.net.attn_layers.layers.3.1.to_out.0.weight", "decoder.net.attn_layers.layers.3.1.to_out.0.bias", "decoder.net.attn_layers.layers.4.1.to_out.0.weight", "decoder.net.attn_layers.layers.4.1.to_out.0.bias", "decoder.net.attn_layers.layers.5.1.net.0.proj.weight", "decoder.net.attn_layers.layers.5.1.net.0.proj.bias", "decoder.net.attn_layers.layers.6.1.to_out.0.weight", "decoder.net.attn_layers.layers.6.1.to_out.0.bias", "decoder.net.attn_layers.layers.7.1.to_out.0.weight", "decoder.net.attn_layers.layers.7.1.to_out.0.bias", "decoder.net.attn_layers.layers.8.1.net.0.proj.weight", "decoder.net.attn_layers.layers.8.1.net.0.proj.bias", "decoder.net.attn_layers.layers.9.1.to_out.0.weight", "decoder.net.attn_layers.layers.9.1.to_out.0.bias", "decoder.net.attn_layers.layers.10.1.to_out.0.weight", "decoder.net.attn_layers.layers.10.1.to_out.0.bias", "decoder.net.attn_layers.layers.11.1.net.0.proj.weight", "decoder.net.attn_layers.layers.11.1.net.0.proj.bias".

I'll look into it

lukas-blecher · 2022-05-18T07:19:42Z

One other thing I just remembered:

What if you train you model on multiple GPUs and then try to finetune or evaluate on a single GPU machine?
Won't there be an error because of the nn.DataParallel wrapper? I think the state dict should be saved without the wrapper, which makes saving and loading a bit more complicated

TITC · 2022-05-18T07:27:35Z

I will test that situation right now.

What if you train you model on multiple GPUs and then try to finetune or evaluate on a single GPU machine?

There is no rush to merge this PR, I think we have plenty of time to test and troubleshoot the situation.

lukas-blecher · 2022-05-18T07:30:25Z

Before I merge this: There is something that went wrong along the way. When I try to load the pretrained model, the decoder architecture has changed

        Missing key(s) in state_dict: "decoder.net.attn_layers.layers.0.1.to_out.weight", "decoder.net.attn_layers.layers.0.1.to_out.bias", "decoder.net.attn_layers.layers.1.1.to_out.weight", "decoder.net.attn_layers.layers.1.1.to_out.bias", "decoder.net.attn_layers.layers.2.1.net.0.0.weight", "decoder.net.attn_layers.layers.2.1.net.0.0.bias", "decoder.net.attn_layers.layers.3.1.to_out.weight", "decoder.net.attn_layers.layers.3.1.to_out.bias", "decoder.net.attn_layers.layers.4.1.to_out.weight", "decoder.net.attn_layers.layers.4.1.to_out.bias", "decoder.net.attn_layers.layers.5.1.net.0.0.weight", "decoder.net.attn_layers.layers.5.1.net.0.0.bias", "decoder.net.attn_layers.layers.6.1.to_out.weight", "decoder.net.attn_layers.layers.6.1.to_out.bias", "decoder.net.attn_layers.layers.7.1.to_out.weight", "decoder.net.attn_layers.layers.7.1.to_out.bias", "decoder.net.attn_layers.layers.8.1.net.0.0.weight", "decoder.net.attn_layers.layers.8.1.net.0.0.bias", "decoder.net.attn_layers.layers.9.1.to_out.weight", "decoder.net.attn_layers.layers.9.1.to_out.bias", "decoder.net.attn_layers.layers.10.1.to_out.weight", "decoder.net.attn_layers.layers.10.1.to_out.bias", "decoder.net.attn_layers.layers.11.1.net.0.0.weight", "decoder.net.attn_layers.layers.11.1.net.0.0.bias".
        Unexpected key(s) in state_dict: "decoder.net.attn_layers.layers.0.1.to_out.0.weight", "decoder.net.attn_layers.layers.0.1.to_out.0.bias", "decoder.net.attn_layers.layers.1.1.to_out.0.weight", "decoder.net.attn_layers.layers.1.1.to_out.0.bias", "decoder.net.attn_layers.layers.2.1.net.0.proj.weight", "decoder.net.attn_layers.layers.2.1.net.0.proj.bias", "decoder.net.attn_layers.layers.3.1.to_out.0.weight", "decoder.net.attn_layers.layers.3.1.to_out.0.bias", "decoder.net.attn_layers.layers.4.1.to_out.0.weight", "decoder.net.attn_layers.layers.4.1.to_out.0.bias", "decoder.net.attn_layers.layers.5.1.net.0.proj.weight", "decoder.net.attn_layers.layers.5.1.net.0.proj.bias", "decoder.net.attn_layers.layers.6.1.to_out.0.weight", "decoder.net.attn_layers.layers.6.1.to_out.0.bias", "decoder.net.attn_layers.layers.7.1.to_out.0.weight", "decoder.net.attn_layers.layers.7.1.to_out.0.bias", "decoder.net.attn_layers.layers.8.1.net.0.proj.weight", "decoder.net.attn_layers.layers.8.1.net.0.proj.bias", "decoder.net.attn_layers.layers.9.1.to_out.0.weight", "decoder.net.attn_layers.layers.9.1.to_out.0.bias", "decoder.net.attn_layers.layers.10.1.to_out.0.weight", "decoder.net.attn_layers.layers.10.1.to_out.0.bias", "decoder.net.attn_layers.layers.11.1.net.0.proj.weight", "decoder.net.attn_layers.layers.11.1.net.0.proj.bias".

I'll look into it

Solved in ff2641c

TITC · 2022-05-18T09:58:33Z

You are right, I am dealing it.

What if you train you model on multiple GPUs and then try to finetune or evaluate on a single GPU machine?

TITC · 2022-05-18T15:32:07Z

I referenced nn.DataParallel.forward and parallelism_tutorial , the solution is gradually emerging. Write a function only use multi-gpu when be called, no need wrapper anymore or change any other part code to compatible with nn.DataParallel

But I encounter some AssertionError after some steps, hope I can solve it as soon as possible in the next few days.

Loss: 0.9900:   4%|█████▍                                                                                                                                               | 289/7891 [01:35<41:53,  3.03it/s]
Traceback (most recent call last):
  File "/root/anaconda3/envs/latex_ocr_test/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/root/anaconda3/envs/latex_ocr_test/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/yuhang/LaTeX-OCR/pix2tex/train.py", line 123, in <module>
    train(args)
  File "/yuhang/LaTeX-OCR/pix2tex/train.py", line 75, in train
    encoded = data_parallel(encoder,inputs=im[j:j+microbatch].to(device), device_ids=[0,1,2])
  File "/yuhang/LaTeX-OCR/pix2tex/train.py", line 29, in data_parallel
    outputs = nn.parallel.parallel_apply(replicas, inputs,kwargs)
  File "/root/anaconda3/envs/latex_ocr_test/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 40, in parallel_apply
    assert len(modules) == len(kwargs_tup)
AssertionError

TITC · 2022-05-19T03:50:49Z

which way do you think is better to switch GPU? @lukas-blecher

use system environment to specify GPU index

#multi-gpu
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m pix2tex.train --config model/settings/config-vit.yaml
#one-gpu
export CUDA_VISIBLE_DEVICES=0
python -m pix2tex.train --config model/settings/config-vit.yaml

write GPU indices to config.yaml

#multi-gpu
gpu_indices: [0,1,2,3,4,5,6,7]
python -m pix2tex.train --config model/settings/config-vit.yaml
#one GPU
gpu_indices: [0]
python -m pix2tex.train --config model/settings/config-vit.yaml
#or default is one GPU and set gpu_indices:null
gpu_indices:null

lukas-blecher · 2022-05-19T08:44:51Z

Option 1 is better.
I think it is straight forward selecting the gpus with that option.
This is how I belive it works (not 100% sure): Say

export CUDA_VISIBLE_DEVICES=2,3

in python cuda:0 and cuda:1 will correspond to the devices 2, 3.

Also you don't need to change the yaml setting when running the script on another machine or multiple times at the same time

TITC · 2022-05-19T10:08:24Z

Agree, one command then problem solved.

After a couple of hours of usage, I found that if the GPU cards are shared with different guys at the server, some of them directly occupied part memory with several GPUs. Then I need to check the Linux history command or search at Google to copy export CUDA_VISIBLE_DEVICES=xxxxx switch to memory available gpus, one time is convenient but frequently makes it kinda annoying.

so I set option 1 as default and compatible with option 2 for this situation. =￣ω￣=

code tested in

model trained on one GPU env loaded at multi-GPU env, vit.
model trained on multi GPU env and loaded at one GPU env, vit.
training in multi GPU, both vit and hybrid.
training in one GPU, both vit and hybrid.

lukas-blecher · 2022-05-19T11:26:10Z

I've never seen the nn.parallel before. Did you notice performance differences between the nn.DataParellel class and the new method?

TITC · 2022-05-19T12:36:06Z

pytorch lib first import from .data_parallel import DataParallel, data_parallel at nn/parallel/init.py, then from .parallel import DataParallel at nn/init.py.

nn.DataParellel is nn.parallel.data_parallel.DataParellel, it's basically a part of nn.parallel.

I think the method is almost the same with DataParellel.forward, but I am not quite sure.

Did you notice performance differences between the nn.DataParellel class and the new method?

I am testing it.

The reason I haven't use nn.DataParellel is that it only can work at self.module. If I use nn.DataParellel to wrapper model, then it will not work for the encoder and decoder. If I wrapper directly at encoder and decoder then I haven't found a solution to load weight.

An alternative solution that comes up to me is re-write model.forward and includes encoder.forward and decoder.forward in it, then nn.DataParellel wrappered model can directly use mode(xxxx) to forward in multi-gpu mode(I guess, haven't tried).

but I notice you already wrote generate at model.forward.

lukas-blecher · 2022-05-19T13:37:00Z

Ok I see, you basically reimplemented that function. Looks fine, should also have the same performance.

Another way would have been to do something like this, when saving:

new_state_dict = OrderedDict()
for k, v in state_dict.items():
    name = k[:7]  # remove `module.`
    new_state_dict[name] = v

It's quite hacky. I like your solution better.

TITC · 2022-05-19T14:31:01Z

batchsize: 512
micro_batchsize: 128
gpu: RTX3090*3
encoder: vit

each trained 2 epoch

this method

BLEU: 0.637, ED: 3.04e-01, ACC: 0.292:   0%|                                                                                                                                              | 0/389 [00:00<?, ?it/s]
Loss: 0.0629:  95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌       | 1016/1067 [26:33<01:19,  1.57s/it]
BLEU: 0.599, ED: 3.38e-01, ACC: 0.321:   3%|███▍                                                                                                                                 | 10/389 [00:16<10:07,  1.60s/it]
Loss: 0.0806:  96%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████       | 1019/1067 [23:02<01:05,  1.36s/it]

nn.DataParallel

I create a branch from ff2641c

git checkout -b nn_parallel ff2641cd4bddaa672e22823bb0a405f8e87bcf15

then

BLEU: 0.069, ED: 2.81e+00, ACC: 0.039:   0%|                                                                                                                                              | 0/389 [00:08<?, ?it/s]
Loss: 0.3394:  95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌       | 1016/1067 [22:50<01:08,  1.35s/it]
BLEU: 0.340, ED: 7.07e-01, ACC: 0.192:   3%|███▍                                                                                                                                 | 10/389 [00:22<14:18,  2.26s/it]
Loss: 0.1870:  96%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████       | 1019/1067 [23:06<01:05,  1.36s/it]

TITC · 2022-05-19T14:47:58Z

Another way would have been to do something like this

Got it, will try this tomorrow.

lukas-blecher · 2022-05-19T18:10:59Z

Another way would have been to do something like this

Got it, will try this tomorrow.

I don't think this will be necessary. You found a better solution already.

An alternative solution that comes up to me is re-write model.forward and includes encoder.forward and decoder.forward in it,

This. That's the way to go. Handeling the parallel data stuff in the forward method of Model is the way to go.
Move the current forward to generate or something similar and replace all model calls in eval and cli with the gererate function.

TITC · 2022-05-19T23:54:52Z

Gotcha, I will implement it ASAP.

https://github.com/wandb/examples/tree/master/examples/pytorch/pytorch-cnn-fashion

TITC · 2022-05-20T07:53:16Z

Test again, seems same with 2. nn.DataParallel

BLEU: 0.069, ED: 2.81e+00, ACC: 0.039:   0%|                                                                                                                                              | 0/389 [00:08<?, ?it/s]
Loss: 0.3394:  95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌       | 1016/1067 [21:28<01:04,  1.27s/it]
BLEU: 0.341, ED: 7.05e-01, ACC: 0.192:   3%|███▍                                                                                                                                 | 10/389 [00:22<14:31,  2.30s/it]
Loss: 0.1870:  96%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████       | 1019/1067 [21:29<01:00,  1.27s/it]

This reverts some parts of commit e2b55fb.

…into pr/150

lukas-blecher · 2022-05-20T08:17:58Z

The generate function doesn't exist for the LatexOCR class. I've corrected this from a previous commit, replaced the correct lines of code with the new generate function of Model.

Also moved the data_parallel into the Model class.

TITC · 2022-05-20T08:33:19Z

Test again or are there any other things we need to do?

lukas-blecher · 2022-05-20T08:36:08Z

This shouldn't have broken anything. But maybe check if it run on multiple gpus at all. If it does there is a good chance it still works as before :)

I also think this PR is almost ready to merge.

How does the ViT performance compare to the hybrid?

TITC · 2022-05-20T09:16:09Z

Oops, the same config as previous but something happened.

Loss: 0.4742:  94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████          | 999/1067 [21:24<01:27,  1.29s/it]
Traceback (most recent call last):
  File "/root/anaconda3/envs/pix2tex/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/root/anaconda3/envs/pix2tex/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/yuhang/LaTeX-OCR/pix2tex/train.py", line 116, in <module>
    train(args)
  File "/yuhang/LaTeX-OCR/pix2tex/train.py", line 81, in train
    bleu_score, edit_distance, token_accuracy = evaluate(model, valdataloader, args, num_batches=int(args.valbatches*e/args.epochs), name='val')
  File "/root/anaconda3/envs/pix2tex/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/yuhang/LaTeX-OCR/pix2tex/eval.py", line 54, in evaluate
    dec = model.generate(im.to(device), temperature=args.get('temperature', .2))
  File "/root/anaconda3/envs/pix2tex/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/yuhang/LaTeX-OCR/pix2tex/models/utils.py", line 37, in generate
    eos_token=self.args.eos_token, context=self.encoder(x), temperature=temperature)
  File "/root/anaconda3/envs/pix2tex/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/yuhang/LaTeX-OCR/pix2tex/models/transformer.py", line 40, in generate
    out = torch.cat((out, sample), dim=-1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 20 for tensor number 1 in the list.

I am looking at it, do you have any idea about it?

How does the ViT performance compare to the hybrid?

Up to now, the best result is bleu 0.8 with the default config, if there is any news I will share it with you.

lukas-blecher · 2022-05-20T09:25:59Z

Should be fine now

TITC · 2022-05-20T10:08:32Z

Yeah, it's work fine.

BLEU: 0.180, ED: 1.12e+00, ACC: 0.092:   0%|                                                                                                                                              | 0/389 [00:01<?, ?it/s]
Loss: 0.3501:  95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌       | 1016/1067 [23:01<01:09,  1.36s/it]
Loss: 0.4497:  30%|██████████████████████████████████████████████▋                                                                                                             | 319/1067 [07:17<11:13,  1.11it/s]

TITC and others added 3 commits May 16, 2022 21:15

multi-gpu train(data-parallelism) + pure ViT

dffa9f9

support switch structure through shell parameter

27b620f

restructure model code

5fca76e

lukas-blecher added the enhancement New feature or request label May 17, 2022

hybrid encoder hasn't attn_layers

0aefdbf

move transformer decoder to its own file

6c53105

TITC added 2 commits May 18, 2022 07:54

watch model directly

5cbbcb9

Merge branch 'data-parallelism' of https://github.com/TITC/LaTeX-OCR …

4ffbeca

…into data-parallelism

lukas-blecher added 2 commits May 18, 2022 09:09

Merge "main" into pr/150

21bc8d0

Merge branch 'main' into pr/150

7154c03

use correct decoder args

ff2641c

TITC added 2 commits May 19, 2022 14:29

remove nn.DataParallel

3ffb30c

main gpu choose and hint

04e7be4

set option 1 as default

b14d91c

Optional gpu_devices setting

426e594

TITC added 3 commits May 20, 2022 14:09

refactor model's forward&generate

e2b55fb

support wandb sweep

307bc50

https://github.com/wandb/examples/tree/master/examples/pytorch/pytorch-cnn-fashion

remove commented code

694d31b

lukas-blecher added 2 commits May 20, 2022 10:13

move data_parallel to Model, use generate in eval

bd66642

This reverts some parts of commit e2b55fb.

Merge branch 'data-parallelism' of https://github.com/TITC/LaTeX-OCR …

de8f7f0

…into pr/150

Merge branch 'main' into pr/150

6c59f48

fix batch generate

67d46d8

lukas-blecher merged commit 06b7a9a into lukas-blecher:main May 20, 2022

TITC deleted the data-parallelism branch May 21, 2022 01:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data parallelism【multi-gpu train】+pure ViT work + small modify #150

Data parallelism【multi-gpu train】+pure ViT work + small modify #150

TITC commented May 17, 2022

lukas-blecher commented May 17, 2022 •

edited

TITC commented May 17, 2022

lukas-blecher commented May 17, 2022

TITC commented May 17, 2022

TITC commented May 18, 2022

lukas-blecher commented May 18, 2022

lukas-blecher commented May 18, 2022

TITC commented May 18, 2022

lukas-blecher commented May 18, 2022

TITC commented May 18, 2022

TITC commented May 18, 2022

TITC commented May 19, 2022 •

edited

lukas-blecher commented May 19, 2022

TITC commented May 19, 2022

lukas-blecher commented May 19, 2022

TITC commented May 19, 2022 •

edited

lukas-blecher commented May 19, 2022

TITC commented May 19, 2022 •

edited

TITC commented May 19, 2022

lukas-blecher commented May 19, 2022

TITC commented May 19, 2022

TITC commented May 20, 2022

lukas-blecher commented May 20, 2022

TITC commented May 20, 2022

lukas-blecher commented May 20, 2022

TITC commented May 20, 2022 •

edited

lukas-blecher commented May 20, 2022

TITC commented May 20, 2022

Data parallelism【multi-gpu train】+pure ViT work + small modify #150

Data parallelism【multi-gpu train】+pure ViT work + small modify #150

Conversation

TITC commented May 17, 2022

pure ViT structure

Data parallelism

small modify

lukas-blecher commented May 17, 2022 • edited

TITC commented May 17, 2022

lukas-blecher commented May 17, 2022

TITC commented May 17, 2022

TITC commented May 18, 2022

lukas-blecher commented May 18, 2022

lukas-blecher commented May 18, 2022

TITC commented May 18, 2022

lukas-blecher commented May 18, 2022

TITC commented May 18, 2022

TITC commented May 18, 2022

TITC commented May 19, 2022 • edited

lukas-blecher commented May 19, 2022

TITC commented May 19, 2022

lukas-blecher commented May 19, 2022

TITC commented May 19, 2022 • edited

lukas-blecher commented May 19, 2022

TITC commented May 19, 2022 • edited

TITC commented May 19, 2022

lukas-blecher commented May 19, 2022

TITC commented May 19, 2022

TITC commented May 20, 2022

lukas-blecher commented May 20, 2022

TITC commented May 20, 2022

lukas-blecher commented May 20, 2022

TITC commented May 20, 2022 • edited

lukas-blecher commented May 20, 2022

TITC commented May 20, 2022

lukas-blecher commented May 17, 2022 •

edited

TITC commented May 19, 2022 •

edited

TITC commented May 19, 2022 •

edited

TITC commented May 19, 2022 •

edited

TITC commented May 20, 2022 •

edited