ray-project · matthewdeng · Mar 8, 2024 · Mar 4, 2024 · Mar 4, 2024 · Mar 6, 2024
@@ -23,7 +23,7 @@ For reference, the final code is as follows:
     from ray.train.torch import TorchTrainer
     from ray.train import ScalingConfig
 
-    def train_func(config):
+    def train_func():
         # Your PyTorch Lightning training code here.
 
     scaling_config = ScalingConfig(num_workers=2, use_gpu=True)
@@ -190,6 +190,13 @@ Begin by wrapping your code in a :ref:`training function <train-overview-trainin
 
 Each distributed training worker executes this function.
 
+You can specify the input argument for `train_func` via the Trainer's `train_loop_config` parameter.
+
+.. note::
+
+    Avoid passing large data objects through `train_loop_config` to reduce the
+    serialization and deserialization overhead. Instead, it's preferred to
+    initialize large objects (e.g. datasets, models) directly in `train_func`.
 
 Ray Train sets up your distributed process group on each worker. You only need to
 make a few changes to your Lightning Trainer definition.

@@ -24,7 +24,7 @@ For reference, the final code will look something like the following:
     from ray.train.torch import TorchTrainer
     from ray.train import ScalingConfig
 
-    def train_func(config):
+    def train_func():
         # Your PyTorch training code here.
         ...
 
@@ -195,6 +195,14 @@ Begin by wrapping your code in a :ref:`training function <train-overview-trainin
 
 Each distributed training worker executes this function.
 
+You can specify the input argument for `train_func` via the Trainer's `train_loop_config` parameter.
+
+.. note::
+
+    Avoid passing large data objects through `train_loop_config` to reduce the
+    serialization and deserialization overhead. Instead, it's preferred to
+    initialize large objects (e.g. datasets, models) directly in `train_func`.
+
 Set up a model
 ^^^^^^^^^^^^^^
 

@@ -22,7 +22,7 @@ For reference, the final code follows:
     from ray.train.torch import TorchTrainer
     from ray.train import ScalingConfig
 
-    def train_func(config):
+    def train_func():
         # Your Transformers training code here.
 
     scaling_config = ScalingConfig(num_workers=2, use_gpu=True)
@@ -212,9 +212,17 @@ You can begin by wrapping your code in a :ref:`training function <train-overview
     def train_func(config):
         # Your Transformers training code here.
 
-This function executes on each distributed training worker. Ray Train sets up the distributed
-process group on each worker before entering this function.
+This function executes on each distributed training worker. 
+
+You can specify the input argument for `train_func` via the Trainer's `train_loop_config` parameter.
+
+.. note::
+
+    Avoid passing large data objects through `train_loop_config` to reduce the
+    serialization and deserialization overhead. Instead, it's preferred to
+    initialize large objects (e.g. datasets, models) directly in `train_func`.
 
+Ray Train sets up the distributed process group on each worker before entering this function. 
 Put all the logic into this function, including dataset construction and preprocessing,
 model initialization, transformers trainer definition and more.
 

@@ -146,7 +146,9 @@ def train_loop_per_worker(config):
             :ref:`Ray Train Loop utilities <train-loop-api>`.
         train_loop_config: A configuration ``Dict`` to pass in as an argument to
             ``train_loop_per_worker``.
-            This is typically used for specifying hyperparameters.
+            This is typically used for specifying hyperparameters. Passing large
+            datasets via `train_loop_config` is not recommended and may introduce
+            large overhead and unknown issues with serialization and deserialization.
         torch_config: The configuration for setting up the PyTorch Distributed backend.
             If set to None, a default configuration will be used in which
             GPU training uses NCCL and CPU training uses Gloo.