Implement simple model checkpointing #37

meffmadd · 2022-11-04T09:23:23Z

No description provided.

Add workaround for model saving with hydra

nmichlo · 2022-11-04T10:40:49Z

Hi @meffmadd, thank you so much for this contribution. It looks great! 😁🎉

Before we merge, please may you update the checkpointing in the configs as some tests are currently failing and for documentation purposes?

Both in the main config (possibly defaulting to false to maintain current behavior)

disent/experiment/config/config.yaml

Lines 44 to 47 in 8dee583

    
           # TODO: https://pytorch-lightning.readthedocs.io/en/stable/common/weights_loading.html 
        
           #  checkpoint: 
        
           #    load_checkpoint: NULL   # NULL or string 
        
           #    save_checkpoint: FALSE  # boolean, save at end of run -- more advanced checkpointing can be done with a callback!

And copying this to the test config in the corresponding location (defaulting to true so we know the checkpoint behavior works).
- https://github.com/nmichlo/disent/blob/8dee583993d582729928984640930a4e92e65de6/experiment/config/config_test.yaml

On that note, we might want to just add the checkpointing to the pytorch lightning train step, using the same hook. Then possibly at the end, load the checkpoint after everything to make sure it works.

disent/tests/test_frameworks.py

Lines 98 to 126 in 8dee583

    
           @pytest.mark.parametrize(['Framework', 'cfg_kwargs', 'Data'], _TEST_FRAMEWORKS) 
        
           def test_frameworks(Framework, cfg_kwargs, Data): 
        
               DataSampler = { 
        
                   1: GroundTruthSingleSampler, 
        
                   2: GroundTruthPairSampler, 
        
                   3: GroundTruthTripleSampler, 
        
               }[Framework.REQUIRED_OBS] 
        
               data = XYObjectData() if (Data is None) else Data() 
        
               dataset = DisentDataset(data, DataSampler(), transform=ToImgTensorF32()) 
        
               dataloader = DataLoader(dataset=dataset, batch_size=4, shuffle=True, num_workers=0) 
        
               framework = Framework( 
        
                   model=AutoEncoder( 
        
                       encoder=EncoderLinear(x_shape=data.x_shape, z_size=6, z_multiplier=2 if issubclass(Framework, Vae) else 1), 
        
                       decoder=DecoderLinear(x_shape=data.x_shape, z_size=6), 
        
                   ), 
        
                   cfg=Framework.cfg(**cfg_kwargs) 
        
               ) 
        
               # test pickling before training 
        
               pickle.dumps(framework) 
        
               # train! 
        
               trainer = pl.Trainer(logger=False, checkpoint_callback=False, max_steps=256, fast_dev_run=True) 
        
               trainer.fit(framework, dataloader) 
        
               # test pickling after training, something may have changed! 
        
               pickle.dumps(framework)

meffmadd · 2022-11-07T01:38:02Z

Hi, I couldn't work on it this weekend but will start now and fix the configs so that the tests work again. I will also add a test case that tests the behavior.

Added save_checkpoint to experiment configs Added tests for checkpointing

codecov · 2022-11-07T03:41:17Z

Codecov Report

Base: 70.01% // Head: 70.04% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (28d0e95) compared to base (8dee583).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #37      +/-   ##
==========================================
+ Coverage   70.01%   70.04%   +0.02%     
==========================================
  Files         135      135              
  Lines        7531     7538       +7     
==========================================
+ Hits         5273     5280       +7     
  Misses       2258     2258

Impacted Files	Coverage Δ
disent/frameworks/_ae_mixin.py	`90.00% <100.00%> (+1.32%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

disent/frameworks/vae/_unsupervised__vae.py

experiment/run.py

nmichlo · 2022-11-07T04:33:27Z

Great work! Thank you so much for making these changes!

meffmadd and others added 3 commits November 3, 2022 03:00

Fix x_shape config value in norm_conv64.yaml

22c1fc5

Merge branch 'nmichlo:main' into main

e36d266

Implement simple model checkpointing

a3c19b1

Add workaround for model saving with hydra

meffmadd mentioned this pull request Nov 4, 2022

[FEATURE]: Model Saving and Checkpointing #28

Closed

Moved on_save_checkpoint to _AeAndVaeMixin

28d0e95

Added save_checkpoint to experiment configs Added tests for checkpointing

nmichlo approved these changes Nov 7, 2022

View reviewed changes

disent/frameworks/vae/_unsupervised__vae.py Outdated Show resolved Hide resolved

experiment/run.py Show resolved Hide resolved

nmichlo merged commit 950ba81 into nmichlo:main Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement simple model checkpointing #37

Implement simple model checkpointing #37

meffmadd commented Nov 4, 2022

nmichlo commented Nov 4, 2022 •

edited

meffmadd commented Nov 7, 2022

codecov bot commented Nov 7, 2022 •

edited

nmichlo commented Nov 7, 2022

Implement simple model checkpointing #37

Implement simple model checkpointing #37

Conversation

meffmadd commented Nov 4, 2022

nmichlo commented Nov 4, 2022 • edited

meffmadd commented Nov 7, 2022

codecov bot commented Nov 7, 2022 • edited

Codecov Report

nmichlo commented Nov 7, 2022

nmichlo commented Nov 4, 2022 •

edited

codecov bot commented Nov 7, 2022 •

edited