diff --git a/docs/source/accelerators.rst b/docs/source/accelerators.rst index ee801f2dee28b..bc9abebcd90d8 100644 --- a/docs/source/accelerators.rst +++ b/docs/source/accelerators.rst @@ -21,7 +21,7 @@ To link up arbitrary hardware, implement your own Accelerator subclass class MyAccelerator(Accelerator): def __init__(self, trainer, cluster_environment=None): super().__init__(trainer, cluster_environment) - self.nickname = 'my_accelator' + self.nickname = 'my_accelerator' def setup(self): # find local rank, etc, custom things to implement diff --git a/docs/source/asr_nlp_tts.rst b/docs/source/asr_nlp_tts.rst index 06353aee85413..a5f1ac59bf696 100644 --- a/docs/source/asr_nlp_tts.rst +++ b/docs/source/asr_nlp_tts.rst @@ -324,13 +324,13 @@ that are included with NeMo: - `Language Modeling (BERT Pretraining) `_ - `Question Answering `_ - `Text Classification `_ (including Sentiment Analysis) -- `Token Classifcation `_ (including Named Entity Recognition) +- `Token Classification `_ (including Named Entity Recognition) - `Punctuation and Capitalization `_ Named Entity Recognition (NER) ------------------------------ -NER (or more generally token classifcation) is the NLP task of detecting and classifying key information (entities) in text. +NER (or more generally token classification) is the NLP task of detecting and classifying key information (entities) in text. This task is very popular in Healthcare and Finance. In finance, for example, it can be important to identify geographical, geopolitical, organizational, persons, events, and natural phenomenon entities. See this `NER notebook `_ @@ -435,7 +435,7 @@ Hydra makes every aspect of the NeMo model, including the PyTorch Lightning Trai Tokenizers ---------- -Tokenization is the process of converting natural langauge text into integer arrays +Tokenization is the process of converting natural language text into integer arrays which can be used for machine learning. For NLP tasks, tokenization is an essential part of data preprocessing. NeMo supports all BERT-like model tokenizers from @@ -462,7 +462,7 @@ Much of the state-of-the-art in natural language processing is achieved by fine-tuning pretrained language models on the downstream task. With NeMo, you can either `pretrain `_ -a BERT model on your data or use a pretrained lanugage model from `HuggingFace Transformers `_ +a BERT model on your data or use a pretrained language model from `HuggingFace Transformers `_ or `NVIDIA Megatron-LM `_. To see the list of language models available in NeMo: diff --git a/docs/source/bolts.rst b/docs/source/bolts.rst index 2c069e0f47481..9133176cab912 100644 --- a/docs/source/bolts.rst +++ b/docs/source/bolts.rst @@ -46,7 +46,7 @@ Example 1: Pretrained, prebuilt models Example 2: Extend for faster research ------------------------------------- Bolts are contributed with benchmarks and continuous-integration tests. This means -you can trust the implementations and use them to bootstrap your resarch much faster. +you can trust the implementations and use them to bootstrap your research much faster. .. code-block:: python diff --git a/docs/source/loggers.rst b/docs/source/loggers.rst index a3b85450e6233..b74fe292b251b 100644 --- a/docs/source/loggers.rst +++ b/docs/source/loggers.rst @@ -10,7 +10,7 @@ Loggers ******* Lightning supports the most popular logging frameworks (TensorBoard, Comet, etc...). TensorBoard is used by default, -but you can pass to the :class:`~pytorch_lightning.trainer.trainer.Trainer` any combintation of the following loggers. +but you can pass to the :class:`~pytorch_lightning.trainer.trainer.Trainer` any combination of the following loggers. .. note:: diff --git a/docs/source/lr_finder.rst b/docs/source/lr_finder.rst index 2521ec73e3afe..fbeb1f5fd959d 100755 --- a/docs/source/lr_finder.rst +++ b/docs/source/lr_finder.rst @@ -102,7 +102,7 @@ method of the trainer. A typical example of this would look like trainer.fit(model) The figure produced by ``lr_finder.plot()`` should look something like the figure -below. It is recommended to not pick the learning rate that achives the lowest +below. It is recommended to not pick the learning rate that achieves the lowest loss, but instead something in the middle of the sharpest downward slope (red point). This is the point returned py ``lr_finder.suggestion()``. diff --git a/docs/source/metrics.rst b/docs/source/metrics.rst index ee141dc74a679..387cbc3bd7482 100644 --- a/docs/source/metrics.rst +++ b/docs/source/metrics.rst @@ -17,7 +17,7 @@ common metric implementations. The metrics API provides ``update()``, ``compute()``, ``reset()`` functions to the user. The metric base class inherits ``nn.Module`` which allows us to call ``metric(...)`` directly. The ``forward()`` method of the base ``Metric`` class -serves the dual purpose of calling ``update()`` on its input and simultanously returning the value of the metric over the +serves the dual purpose of calling ``update()`` on its input and simultaneously returning the value of the metric over the provided input. These metrics work with DDP in PyTorch and PyTorch Lightning by default. When ``.compute()`` is called in diff --git a/docs/source/trainer.rst b/docs/source/trainer.rst index 99f93bd02f0b4..0748302f30613 100644 --- a/docs/source/trainer.rst +++ b/docs/source/trainer.rst @@ -224,7 +224,7 @@ The accelerator backend to use (previously known as distributed_backend). - (```ddp```) is DistributedDataParallel (each gpu on each node trains, and syncs grads) - (```ddp_cpu```) is DistributedDataParallel on CPU (same as `ddp`, but does not use GPUs. Useful for multi-node CPU training or single-node debugging. Note that this will **not** give - a speedup on a single node, since Torch already makes effient use of multiple CPUs on a single + a speedup on a single node, since Torch already makes efficient use of multiple CPUs on a single machine.) - (```ddp2```) dp on node, ddp across nodes. Useful for things like increasing the number of negative samples @@ -982,7 +982,7 @@ Number of processes to train with. Automatically set to the number of GPUs when using ``accelerator="ddp"``. Set to a number greater than 1 when using ``accelerator="ddp_cpu"`` to mimic distributed training on a machine without GPUs. This is useful for debugging, but **will not** provide -any speedup, since single-process Torch already makes effient use of multiple +any speedup, since single-process Torch already makes efficient use of multiple CPUs. .. testcode:: diff --git a/docs/source/training_tricks.rst b/docs/source/training_tricks.rst index 0de18a1b7f16c..6ff9dfd0a30d3 100644 --- a/docs/source/training_tricks.rst +++ b/docs/source/training_tricks.rst @@ -110,11 +110,11 @@ The algorithm in short works by: 2. Iteratively until convergence or maximum number of tries `max_trials` (default 25) has been reached: - Call `fit()` method of trainer. This evaluates `steps_per_trial` (default 3) number of training steps. Each training step can trigger an OOM error if the tensors - (training batch, weights, gradients ect.) allocated during the steps have a + (training batch, weights, gradients, etc.) allocated during the steps have a too large memory footprint. - If an OOM error is encountered, decrease batch size else increase it. - How much the batch size is increased/decreased is determined by the choosen - stratrgy. + How much the batch size is increased/decreased is determined by the chosen + strategy. 3. The found batch size is saved to either `model.batch_size` or `model.hparams.batch_size` 4. Restore the initial state of model and trainer