No need to call xm.mark_step() explicitly (#4)

Since for gradient accumulation we're accumulating on batches from `ParallelLoader` instance which on next() marks the step itself.
pytorch-tpu · Nov 21, 2019 · 3129ad3 · 3129ad3
1 parent 6ef1edd
commit 3129ad3
Showing 1 changed file with 0 additions and 1 deletion.
diff --git a/examples/run_glue_tpu.py b/examples/run_glue_tpu.py
@@ -150,7 +150,6 @@ def train(args, train_dataset, model, tokenizer, disable_logging=False):
             loss = outputs[0]  # model outputs are always tuple in transformers (see doc)
 
             if args.gradient_accumulation_steps > 1:
-                xm.mark_step()  # Mark step to evaluate graph so far or else graph will grow too big and OOM.
                 loss = loss / args.gradient_accumulation_steps
 
             loss.backward()