[FSDP] ssd_offload fixing backward path (grad_fn) for SsdFlatParamete…

…r and SsdFlatParameterView (#974) * [FSDP] fixing backward path for SsdFlatParameter and SsdFlatParameterView when overriding .data * Get ssd_offload unit tests passing * [FSDP] get all test_fsdp_offload tests passing w/ ssd_offload on * Update changelog
facebookresearch · May 2, 2022 · 51b53dd · 51b53dd
1 parent 58ccb16
commit 51b53dd
Show file tree

Hide file tree

Showing 6 changed files with 770 additions and 104 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,7 +12,40 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - FSDP: Add pickle/unpickle support for SsdTensorHandle (and derived classes),
   verified that FSDP models w/ ssd_offload enabled can correctly call model.state_dict()
   and model.load_state_dict(...) and thus successfully checkpoint and recover parameters
-  stored as SsdFlatParameters.
+  stored as SsdFlatParameters. [#964]
+- FSDP Ssd Offload: [#974]
+   * add __setattr__ function for SsdTensorHandle/SsdFlatParameter to capture when .data is overridden
+     and perform necessary checks/updates before letting the tensor metadata be updated.
+   * There was a bug in pytorch core that caused re-assigning TensorSubclass .data to
+     disable the __torch_dispatch__ mechanism and effectively revert back to a normal
+     tensor. Since ssd_offload feature uses __torch_dispatch__ extensively and FSDP overrides
+     .data, ssd_offload now can only be used if pytorch version is 1.12.0 or later
+     (currently pytorch-nightly release). pytest tests are disabled and trying to import
+     ssd_offload or use FSDP with ssd_offload enabled will raise ImportError Exceptions.
+   * Enhance storage_state ON_CPU into ON_CPU_CLEAN and ON_CPU_DIRTY. ON_CPU_CLEAN
+     indicates that the .tensor value and the value stored on disk are identical, and no
+     writes are needed when calling .to_file(). ON_CPU_DIRTY indicates .tensor does not
+     match value stored on disk, and a write is necessary. This is disabled in FSDP as
+     it does not currently flush variables to disk as soon as optimizer modifies values.
+   * Fix detection in SsdTensorHandle __torch_dispatch__ when it is used in an in-place
+     operator, or as an output, then calling mark_dirty().
+   * Fix grad_fn on SsdFlatParameterViews so that both ssd_flat_param.view.grad_fn points
+     to a split and view on ssd_flat_parameter so that when X.backwards() is called. ssd_flat_param.grad
+     is properly updated in the backward pass.
+   * Add unit tests to verify grad_fn and backward hooks are called appropriately on
+     SsdFlatParameters/SsdFlatParameterViews
+   * Implement SsdFlatParameter.from_tensor() direct_to_file option. This allows creating
+     and SsdFlatParameter and directly writing it to disk, rather than creating a new tensor
+     first. This prevents another copy of tensors to be instantiated and can save memory when
+     creating very large SsdFlatParameters.
+   * Add SsdFlatParameterViewProperty for overriding a parameter in an nn.Module
+     similar to pytorch core's parameterization code path. This allows SsdFlatParameterViews
+     to be updated as SsdFlatParameter.data is overridden by replacing Layer.weight with a
+     property whose getter returns ssd_flat_param.views[view_id].
+   * Added example SsdFlatParameterViewParameterization on how the existing parameterization
+     method could be used instead of SsdFlatParameterViewProperty. But this method will result
+     in memory inefficiencies due to parameterization always keeping a copy of the original
+     parameter internally.
 
 ### Fixed