diff --git a/docs/source/multi_gpu.rst b/docs/source/multi_gpu.rst index def47810504d6..b3e0b905f27f4 100644 --- a/docs/source/multi_gpu.rst +++ b/docs/source/multi_gpu.rst @@ -663,7 +663,7 @@ It is highly recommended to use Sharded Training in multi-GPU environments where A technical note: as batch size scales, storing activations for the backwards pass becomes the bottleneck in training. As a result, sharding optimizer state and gradients becomes less impactful. Work within the future will bring optional sharding to activations and model parameters to reduce memory further, but come with a speed cost. -To use Sharded Training, you need to first install FairScale using the command below or install all extras using ``pip install pytorch-lightning["extra"]``. +To use Sharded Training, you need to first install FairScale using the command below. .. code-block:: bash