Load weights to cpu with PretrainedModelInitializer (allenai#4712)

* load weights to cpu with PretrainedModelInitializer * changelog
matt-gardner · Oct 7, 2020 · 40bb47a · 40bb47a
1 parent 327188b
commit 40bb47a
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 1 deletion.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -40,6 +40,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   and they both now return the results in bytes as integers. Also, the `peak_gpu_memory` function now utilizes PyTorch functions to find the memory
   usage instead of shelling out to the `nvidia-smi` command. This is more efficient and also more accurate because it only takes
   into account the tensor allocations of the current PyTorch process.
+- Make sure weights are first loaded to the cpu when using PretrainedModelInitializer, preventing wasted GPU memory.
 
 ### Removed
 

diff --git a/allennlp/nn/initializers.py b/allennlp/nn/initializers.py
@@ -384,7 +384,7 @@ class PretrainedModelInitializer(Initializer):
     def __init__(
         self, weights_file_path: str, parameter_name_overrides: Dict[str, str] = None
     ) -> None:
-        self.weights: Dict[str, torch.Tensor] = torch.load(weights_file_path)
+        self.weights: Dict[str, torch.Tensor] = torch.load(weights_file_path, map_location="cpu")
         self.parameter_name_overrides = parameter_name_overrides or {}
 
     @overrides