Skip to content
This repository has been archived by the owner on Sep 28, 2022. It is now read-only.

Commit

Permalink
Update local pytorch-lightning master (#1)
Browse files Browse the repository at this point in the history
* Add hint in docs for how to use shared memory (#6036)

* Prevent flickering progress bar (#6009)

* add padding

* fix

* fix

* Update pytorch_lightning/callbacks/progress.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* updated based on suggestion

* changelog

* add test

* fix pep8

* resolve test

* fix code format

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>

* Fix Wrapping optimizers upon assignment (#6006)

* Update properties.py

* pep8

* [Bugfix] Apply untoggle_optimizer when result is None (#5983)

* update changelog

* apply untoggle_optimizer when result is None

* update tests

* still return loss sometimes

* Update CHANGELOG.md

Co-authored-by: deng-cy <dcy1996@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* remove outdated info (#6032)

* DeepSpeed Integration (#5954)

* Add initial deepspeed changes

* Address code review

* Move static method outside of function

* Fixes

* Add missing annotation

* Remove seed setting

* Doc changes

* Doc changes, add address reviews

* Fix docs

* Try fixing issue by moving to torch adam

* Clean up check

* Changes, better APIs!

* Add wrapper, swap to git install revision

* Add special test

* Add warning

* Address review

* Add better disclaimer

* Turn off ZeRO for testing due to compilation

* Add description on modifying parameters via the plugin

* Doc strings clear

* Small doc fixes

* Fix hash, reduce test

* Added CI change

* Move to azure pipeline

* Fix test name

* Add missing flag

* Remove sudo...

* Try conda instead

* Swap to conda base

* Try suggested install

* Apply suggestions from code review

* Apply suggestions from code review

* Revert "Apply suggestions from code review"

This reverts commit 41cca05a

* Revert "Apply suggestions from code review"

This reverts commit e06ec29e

* Remove setter

* Address most review

* Move out function, remove DeepSpeed from requirements

* Install deepspeed/mpi4py within container

* Use special tests, move to master commit for deepspeed

* Export path

* Force compile to happen first

* Remove!

* Debugging ninja

* Fix error in optimizer step logic

* Attempt to fix symbolic link

* Reverse to aid debugging

* Export path again

* Clean up mess

* var

* Revert "var"

This reverts commit 3450eaca

* Address review, add todo

* Add note about unsupported functionality

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* Trainer only references accelerator (#6039)

* Trainer only references accelerator where it can

* Move teardown to the trainer, as it is reponsible for the accelerator

* Address code review for deepspeed (#6042)

* [feat] Add Trainer(stochastic_weight_avg=True/False) (#6038)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* [CI] Move DeepSpeed into CUDA image, remove DeepSpeed install from azure (#6043)

* Move to CUDA image

* Remove deepspeed install as deepspeed now in the cuda image

* Remove path setting, as ninja should be in the container now

* drop deprecated result object 1/n (#5005)

* ro1

* ro2

* Add option for weight tying on TPU's (#5441)

* added on_post_move_to_device

* added tests

* docs and refactors

* Update tests/backends/test_tpu_backend.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update docs/source/tpu.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update docs/source/tpu.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update docs/source/tpu.rst

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/hooks.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* moved weight sharing module back to test

updated tpu available

* add count to warning

* fix doctest

* import trainer in doctest

* import trainer in doctest

* do not test code as no TPU device

* param count to layer count

* formatting

* update docs

* update import

* update

* resolve tests

* remove legacy accelerator

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Your Name <you@example.com>

* Delete tests.helpers.TrialMNISTDataModule (#5999)

* Remove TrialMNISTDataModule

* Allow using TrialMNIST in the MNISTDataModule

* Update tests/helpers/datasets.py

* Fix: Allow hashing of metrics with lists in their state (#5939)

* Fix: Allow hashing of metrics with lists in their state

* Add test case and modify semantics of Metric __hash__ in order to be compatible with structural equality checks

* Fix pep8 style issue

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* et al. (#6050)

* et al.

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: chaton <thomas@grid.ai>

* [ModelPruning] Add missing attribute with use_global_unstructured=False and verbose (#6045)

* fix/test quant (#6040)

* fix/test quant

* ...

* ---

* Add descriptions to accelerator broadcast function/clean up all_gather (#6044)

* Add descriptions to accelerator broadcast function/clean up all_gather

* Remove todo

* Add before_batch_transfer and after_batch_transfer hooks (#3671)

* add hooks

* comment

* docs

* add tests

* make it private

* fix tests

* docs

* chlog

* testcode

* codefactor

* fix doctest

* fix doctest

* suggestions

* is always overriden

* pep and BoringModel

* BoringModel

* docs

* docs

* docs

* fix

* rebase

* rebase

* suggestions

* docs

* suggestions

* try fix docs

* docs

* update name

* yapf

* docs

* rebase

* yapf

* Make parallel devices optional across all plugins (#6051)

* Make parallel devices optional across all plugins so that they can be instantiated

* Add any to types to capture vars passed in

* clarify gpu / process (#6049)

* Fix docs typo (#6055)

Put .test() in  code blocks

* Docs for Pruning, Quantization, and SWA (#6041)

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Replace .get_model() with explicit .lightning_module (#6035)

* rename get_model -> lightning_module

* update references to get_model

* pep8

* add proper deprecation

* remove outdated _get_reference_model

* fix cyclic import

* rename accelerator_backend -> accelerator (#6034)

* rename accelerator backend

* rename new additions from master

* add proper deprecation

* pep8

* warning match

* add missing warning type

* fix flake8 for new plugins (#5951)

* flake8

* fix cyclic import

* isort

* fix docs links (#6057)

* Add warnings to on_before/after_batch_transfer hooks (#6059)

* Add warnings to hooks

* Add default idx to prevent signature change in the future

* Nothing to see here

* Add default val to transfer_batch_to_device hook

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Revert "Add default val to transfer_batch_to_device hook"

This reverts commit 5c6a68f2

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* v1.2.0rc2 (#6063)

* v1.2.0rc2

* chlogs

* chlogs

* format

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update auto-opt docs (#6037)

* fix docs

* update on comments

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* rm comment

* Update docs/source/common/lightning_module.rst

Co-authored-by: chaton <thomas@grid.ai>

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>

* Raise AttributeError in lightning_getattr and lightning_setattr when attribute not found (#6024)

* Empty commit

* Raise AttributeError instead of ValueError

* Make functions private

* Update tests

* Add match string

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* lightning to Lightning

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* default sched (#6062)

* v1.2.0 (#6065)

* v1.2.0

* docs

* add Azure tags trigger (#6066)

* add Azure tags trigger

* fix

* mnodes

* pypi azure badges - tags (#6068)

* pypi azure badges - tags

* pep8

* id

* continue towards 1.3 (#6069)

* Fix amp autocast  (#6080)

* precision fixes

* add amp test model

* fix test

* revert

* move assert to training step

* fix test

* fix test

* remove unrelated changes

* add changelog

* remove unused import

* add sanity check on nb available GPUs (#6092)

* consistent behavior for reduce method across all Plugins (#6011)

* reduction docs

* docs for abstract base method

* make mean the default

* add preliminary chlog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* [Hot Fix] Give priority to plugins to set distributed mode, and then accelerator (#6089)

* Give priority to plugins to set distributed mode, and then accelerator

* Add CHANGELOG.md

* Update CHANGELOG.md

* Remove very scary line

* Ensure we set cluster environment after slurm configured if necessary

* Simplify the fix with a reset

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Enable ZeRO tests for CI, fix to/half function calls (#6070)

* Enable ZeRO optimization, and make sure that the lightning module hook is called when we move to half precision

* Added test, update to function

* Expose DeepSpeed FP16 parameters due to loss instability (#6115)

* Expose deepspeed config parameters to init function due to instability in parameters

* See if tests can run on normal CI, without special tests

* Add changelog

* Update pytorch_lightning/plugins/training_type/deepspeed.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Collapse 2 DeepSpeed tests (#6108)

* fix amp/apex misconfiguration error for cpu (#6107)

* fix weird test

* fix apex plugin test

* fix raise

* cpu test

* fix type

* add changelog

* Update Contributing Guide (#6118)

* Update Contributing Guide

* update docs

* Minor fixes/improvements in Metric docs (#6114)

* Fix wrong render

* Improve classification metrics docs

* Improve other domain metrics docs

* Change the structure level in the docs

* Avoid printing ModelCheckpoint log with monitor=None and verbose=True (#6109)

* Feature/5275 clean progress bar print (#5470)

* Trainer.test should return only test metrics (#5214)

* resolve bug

* merge tests

* Fix metric state reset (#5273)

* Fix metric state reset

* Fix test

* Improve formatting

Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai>

* print() method added to ProgressBar

* printing alongside progress bar added to LightningModule.print()

* LightningModule.print() method documentation updated

* ProgressBarBase.print() stub added

* stub

* add progress bar tests

* fix isort

* Progress Callback fixes

* test_metric.py duplicate DummyList removed

* PEP and isort fixes

* CHANGELOG updated

* test_progress_bar_print win linesep fix

* test_progress_bar.py remove whitespaces

* Update CHANGELOG.md

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Tadej Svetina <tadej.svetina@gmail.com>
Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai>
Co-authored-by: Alexander Snorkin <Alexander.Snorkin@acronis.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* mini refactor for _running_stage access (#5724)

* running stage

* circular import

* running stage cleanup

* fix unused import

* fix running stage access

* add return type

* Revert "add return type"

This reverts commit 65b0fe269c6547213e34b6a88b97bee31cdfe8c7.

* try fix typing

* Add specifics around DeepSpeed docs (#6142)

* Be more specific with DeepSpeed compatibility

* Better wording

* Ensure accelerator is valid if running interactively (#5970)

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

* fixing miss-leading tested acc values (#5876)

* fixing tested values

* .

* tests

* yapf

* softmax

* hvd

* rename

* lr

* duplicate

* drop

* classif

* rm EvalModel

* Revert "rm EvalModel"

This reverts commit 6c3fb39ebe0c4bfb52357bccfd050438f2c0f31c.

* update tests

* fix

* azure

* azure

* self

* cpu

* Apply suggestions from code review

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>

* Update CHANGELOG (#6156)

* prune deprecated profiler as bool (#6164)

* prune profiler

* chlog

* prune deprecated Trainer arg `enable_pl_optimizer` (#6163)

* prune enable_pl_optimizer

* prune automatic_optimization

* Prune deprecated metrics for 1.3 (#6161)

* prune deprecated metrics for 1.3

* isort / yapf

* [Bugfix] Fixed epoch level schedulers not being called when val_check_interval < 1.0 (#6075)

* fix bug

* fix tests

* changelog

* fix pep8

* fix tests

* fix and add some tests

* add test for rlop

* chlog

* Update CHANGELOG.md

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>

* Prune deprecated checkpoint arguments (#6162)

* prune prefix

* prune mode=auto

* chlog

* Prune deprecated EarlyStopping(mode='auto') (#6167)

Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Fix typo (#6178)

* Update issue template to use discussions for questions (#6155)

* add issue config

* remove question template

* update URL

* Update README.md

* Update README.md

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update .github/ISSUE_TEMPLATE/config.yml

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update with GitHub Discussions (#6186)

* Update gpu warning (#6181)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik Bokka <kaushikbokka@gmail.com>

* type accelerators (#6148)

* Fix for multiple callbacks (#6197)

* Fix for multiple callbacks

* Add CHANGELOG.md

* Remove old params

* Skip tests on windows using ddp

* Change name of the variable to not clash with should stop, which is separate

* Apply suggestions from code review

* Fix params

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Add checkpoint parameter to on_save_checkpoint (#6072)

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Document exceptions in loggers (#6171)

* Document exceptions in loggers

* minor formatting

* docstring changed in comet.py

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Prune deprecated Trainer(checkpoint_callback=ModelCheckpoint()) (#6166)

* fix parallel devices return type & add copyright (#6215)

* Add mypy typing to precision plugins. (#6149)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* apply_func.py: from torchtext.legacy.data import Batch (#6211)

* Update apply_func.py

The name Batch is no longer located under torchtext.data
--Error message--
File "/home/daniel/py38/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 25, in <module>                                                      
    from torchtext.data import Batch                                                  
ImportError: cannot import name 'Batch' from 'torchtext.data' (/home/daniel/py38/lib/p
ython3.8/site-packages/torchtext/data/__init__.py)
You can fix this by changing line line 28 to:
    from torchtext.legacy.data import Batch

* Update apply_func.py

* Update apply_func.py

* Update apply_func.py

* Update apply_func.py

* Update apply_func.py

* fix(wandb): prevent WandbLogger from dropping values (#5931)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Prune deprecated hparams setter (#6207)

* document exceptions for metrics/regression (#6202)

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Prajakta Phadke <pphadke@iu.edu>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* simplify skip-if tests >> 0/n (#5920)

* skipif + yapf + isort

* tests

* docs

* pp

* update (#6237)

* Document Exceptions in profilers (#6229)

* docstring changes in profilers

* minor changes in profilers.py

* Call `optimizer.zero_grad()` before backward inside closure in AutoOpt (#6147)

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

* Fix for incorrect usage of detach(), cpu(), to() (#6216)

* Fix for incorrect detach/cpu calls (#6214)

* Fix incorrect use of detach(), to(), and cpu(), #6214

* Fix incorrect use of detach() and cpu(), #6214

* update pr

* add typing

* chlog

* more...

* revert on module

* update on comments

* revert changes on model

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* add skipif warpper (#6258)

* cleaning SWA (#6259)

* rename

* if

* test

* chlog

* Remove opt from manual_backward in docs (#6267)

* switch agents pool (#6270)

* docstring changes in tuner (#6264)

* docstring changes in tuner

* added full stop

* Disable CPU Offload as default for DeepSpeed (#6262)

* Change default for CPU offload to false for best throughput/memory efficiency

* Add changelog

* default

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* split profilers (#6261)

* Refactor: skipif for multi - gpus 1/n (#6266)

* ngpus

* gpu

* isort

* pt

* flake8

* Improved EarlyStopping.patience documentation (#6278)

* Improved early stopping documentation

* Changed to 120 column format

* doc

* doc

* doc

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* Refactor: skipif for Windows 2/n (#6268)

* win

* isort

* flake8

* fix duplicate console logging bug v2 (#6275)

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Refactor: skipif for AMPs 3/n (#6293)

* args

* native

* apex

* isort

* [fix] Ensure we check deepspeed/sharded in multinode DDP (#6297)

* Ensure we check deepspeed/sharded in multinode

* Add CHANGELOG.md

* Add CHANGELOG.md

* Drop mock, use actual multi-gpu node

* unfreeze torchtext version (#6302)

* Add possibility for custom naming when using multiple dataloaders (#6274)

* try to fix imports for parsing (#6256)

* try to fix imports

* legacy 1.2.1

* Refactor: Runif for TPU and Horovod 5/n (#6301)

* TPU

* horovod

* extra

* fix

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* doc

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Refactor: runif for spec 6/6 (#6307)

* special

* rpc

* Add fairscale & deepspeed to skipif 4/n (#6281)

* add fairscale & windows to skipif

* add deepspeed to runif

* fairscale

* deepspeed

* flake8

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* [bugfix] TPU test hangs to barrier on 1 process (#6272)

* update

* resolve flake8

* update

* update

* update changelog

* update

* resolve flake8

Co-authored-by: Your Name <you@example.com>

* prune duplicite test in optim (#6312)

* Simplify test for AMP plugins (#6311)

* AMP

* fuse

* yapf

* Fix ModelPruning(make_pruning_permanent=True) buffers getting removed when saved during training (#6073)

Co-authored-by: chaton <thomas@grid.ai>

* [bugfix] TPU + all_gather + SingleTPU shouldn't call xm.all_gather (#6296)

* resolve an issue with TPU

* update

* add changelog

* drop unused variable in API (#6308)

* drop unused pl model in ckpt

* irelevant

* on_evaluation_batch_start

* evaluation_epoch_end

* attach_datamodule

* hotfix for PT1.6 and torchtext (#6323)

* ci: azure reinstall torchtext

* move

* todos

* 0.6.0

* skip examples

* formatter

* skip

* todo

* Apply suggestions from code review

* [fix] Use training type plugin hook when saving (FSDP 1/n) (#6321)

* Rely on training type plugin when saving

* Add better typing to training type plugin

* leaving lezwon (#6347)

* Add `tests/utilities/test_parsing.py` (#4460)

* Create branch tests/4400_parsing

* Rename test file for parsing.py

* Fix lightning_hasattr

* Fix lightning_hasattr

* Fix lightning_setattr

* Add empty lines and remove rubbish spaces

* Raise AttributeError not ValueError

* Use getattr in hasattr

* Remove rubbish spaces

* Fix getattr

* Fix by flake8

* Add tests for str_to_bool_or_str

* Fix by flake8

* Add tests for str_to_bool

* Add tests for is_picklable

* Add tests for clean_namespace

* Fix typo

* Fix lightning_getattr

* Add tests for AttributeDict

* Add tests for flatten_dict

* Fix by flake8

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Apply isort

* Revert "Apply suggestions from code review"

* Define unpicklable_function outside

* Add comment to test_clean_namespace

* Add tests for parse_class_init_keys

* Add tests for get_init_args and collect_init_args

* Share objects across the tests

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Add ignore param to save_hyperparameters (#6056)

* add ignore param to save_hyperparameters

* add docstring for ignore

* add type for frame object

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* fix whitespace

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Parametrize tests

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* seq

* fix docs

* Update lightning.py

* Update lightning.py

* fix docs errors

* add example keyword

* update docstring

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Fix when _stable_1d_sort to work when n >= N (#6177)

* Fix when _stable_1d_sort to work when n >= N

* Apply suggestions

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

* Update docs on arg train_dataloader in fit (#6076)

* add to docs

* update docs

* Apply suggestions from code review

* Update pytorch_lightning/core/hooks.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* nested loaders

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* shorten text length

* Update pytorch_lightning/core/hooks.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* missing tests default_root_dir=tmpdir (#6314)

* default_root_dir=tmpdir

* miss

* Document exception for metrics/classification (#6190)

* document exception for metrics/classification

* minor formatting fixes

* fix trailing whitespaces

* document exception for metrics

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Apply suggestions from code review

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* [Fix] Call clip gradients if clip val greater than 0 (#6330)

* Call clip gradients if clip val greater than 0

* format

* Format

* Move to top of file

* [bugfix] Check LightningOptimizer doesn't delete optimizer hooks (#6305)

* update

* resolve bug

* docstring changes in accelerators (#6327)

* docstring changes in accelerators

* docstrings moved

* whitespaces removed

* PEP8 correction[1]

* [bugfix] Perform reduction for dict in training_step and DP (#6324)

* fix

* update

* update

* add changelog

* Update CHANGELOG.md

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update tests/accelerators/test_dp.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update changelog

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* introduce default cluster environment for lightning-specific ddp (#5915)

* handle distributed_sampler_kwargs

* move emptying cache to accelertor

* fix a few tests

* restoring the result from subprocess

* fix queue.get() order for results

* add missing "block_backward_sync" context manager

* add missing "block_backward_sync" context manager

* fix sync_batchnorm

* fix supported gpu-ids for tuple

* fix clip gradients and inf recursion

* accelerator selection: added cluster_environment plugin

* fix torchelastic test

* fix reduce early stopping decision for DDP

* fix tests: callbacks, conversion to lightning optimizer

* fix lightning optimizer does not pickle

* fix setting benchmark and deterministic option

* fix slurm amp test

* fix prepare_data test and determine node_rank

* fix retrieving last path when testing

* remove obsolete plugin argument

* fix test: test_trainer_config

* fix torchscript tests

* fix trainer.model access

* move properties

* fix test_transfer_batch_hook

* fix auto_select_gpus

* fix omegaconf test

* fix test that needs to simulate slurm ddp

* add horovod plugin

* fix test with named arguments

* clean up whitespace

* fix datamodules test

* remove old accelerators

* fix naming

* move old plugins

* move to plugins

* create precision subpackage

* create training_type subpackage

* fix all new import errors

* fix wrong arguments order passed to test

* fix LR finder

* Added sharded training type and amp plugin

* Move clip grad to precision plugin

* Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically

* Fix import issue, attempting to fix tests

* Fix initial test

* Reflect hook logic from master, should wrap model after move to device

* Optional state consolidation, since master has optimizers not wrapped

* change attribute for instance test

* reset optimizers

optimizers are not used in main process, so state would be wrong.

* legacy

* imports in accel

* legacy2

* trainer imports

* fix import errors after rebase

* move hook to new setup location

* provide unwrapping logic

* fix trainer callback system

* added ddp2 implementation

* fix imports .legacy

* move plugins

* restore legacy

* drop test.py from root

* add tpu accelerator and plugins

* fixes

* fix lightning optimizer merge

* reset bugreportmodel

* unwrapping

* step routing forward

* model access

* unwrap

* opt

* integrate distrib_type

* sync changes

* sync

* fixes

* add forgotten generators

* add missing logic

* update

* import

* missed imports

* import fixes

* isort

* mv f

* changelog

* format

* move helper to parallel plugin

* d

* add world size

* clean up

* duplicate

* activate ddp_sharded and tpu

* set nvidia flags

* remove unused colab var

* use_tpu <-> on_tpu attrs

* make some ddp_cpu and clusterplugin tests pass

* Ref/accelerator connector (#5742)

* final cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* connector cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* trainer cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* accelerator cleanup + missing logic in accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add missing changes to callbacks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect accelerator changes to lightning module

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* clean cluster envs

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* cleanup plugins

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add broadcasting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* yapf

* remove plugin connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* plugins

* manual optimization

* update optimizer routing

* add rank to torchelastic

* fix memory mixed precision

* setstate on trainer for pickling in ddp spawn

* add predict method

* add back commented accelerator code

* adapt test for sync_batch_norm to new plugin

* fix deprecated tests

* fix ddp cpu choice when no num_processes are given

* yapf format

* skip a memory test that cannot pass anymore

* fix pickle error in spawn plugin

* x

* avoid

* x

* fix cyclic import in docs build

* add support for sharded

* update typing

* add sharded and sharded_spawn to distributed types

* make unwrap model default

* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel

* update sharded spawn to reflect changes

* update sharded to reflect changes

* Merge 1.1.5 changes

* fix merge

* fix merge

* yapf isort

* fix merge

* yapf isort

* fix indentation in test

* copy over reinit scheduler implementation from dev1.2

* fix apex tracking calls with dev_debugger

* reduce diff to dev1.2, clean up

* fix trainer config test  when gpus>0 and num_processes >0 and ddp_cpu

* sort plugin tests legacy/new

* fix error handling for amp on cpu

* fix merge


fix merge


fix merge

* [Feat] Resolve manual_backward (#5837)

* resolve manual_backward

* resolve flake8

* update

* resolve for ddp_spawn

* resolve flake8

* resolve flake8

* resolve flake8

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* fix tests/accelerator tests on cpu

* [BugFix] Resolve manual optimization (#5852)

* resolve manual_optimization

* update

* update

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856)

* resovle a bug

* Accelerator refactor sharded rpc (#5854)

* rpc branch

* merge

* update handling of rpc

* make devices etc. Optional in RPC

* set devices etc. later if necessary

* remove devices from sequential

* make devices optional in rpc

* fix import

* uncomment everything

* fix cluster selection

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* resolve bug

* fix assert in rpc test

* resolve a test

* fix docs compilation

* accelerator refactor - fix for sharded parity test (#5866)

* fix memory issue with ddp_spawn

* x


x


x


x


x


x


x


x


x

* x

* Remove DDP2 as this does not apply

* Add missing pre optimizer hook to ensure lambda closure is called

* fix apex docstring

* [accelerator][BugFix] Resolve some test for 1 gpu (#5863)

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* update

* resolve flake8

* update

* update

* update

* update

* update

* all_gather

* update

* make plugins work, add misconfig for RPC

* update

* update

* remove breaking test

* resolve some tests

* resolve flake8

* revert to ddp_spawn

Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>

* yapf isort

* resolve flake8

* fix apex doctests

* fix apex doctests 2

* resolve docs

* update drone

* clean env

* update

* update

* update

* update

* merge

* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881)

* Fix RPC related tests, clean out old API, update for new accelerator API

* Move tests out of legacy folder, update paths and names

* Update test_remove_1-4.py

* Expose properties for tpu cores/gpus/num_gpus

* Add root GPU property

* Move properties to properties.py

* move tests that were previously in drone

* Fix root GPU property (#5908)

* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator

* Add missing tests back

* fix best model path transfer when no checkpoint callback available

* Fix setup hook order [wip] (#5858)

* Call trainer setup hook before accelerator setup

* Add test case

* add new test

* typo

* fix callback order in test

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* rename ddp sequential -> rpc sequential for special test

* revert

* fix stupid merge problem

* abstract the cluster plugins

* default plugin

* integrate default environment

* fix property

* adapt tests

* adjust test

* fix world size access

* base cluster env

* revert rebase errors

* revert rebase errors

* missing import

* revert unrelated change

* remove unused cluster local rank

* remove unrelated changes

* fix unrelated changes

* fix pep8

* remove unused var

* reset permissions

* ypaf

* test default environment

* test torchelastic environment

* world  size as int

* tests for slurm environment

* changelog

* test comments

* remove unintended change

* keep master port fixed after it is generated

* test random master port

* yapf

* add missing default environment

* move helper function

* rename default environment

* rename

* rename

* yapf

* Update pytorch_lightning/plugins/environments/lightning_environment.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update CHANGELOG.md

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* spawn -> create

Co-authored-by: justusschock <justus.schock@posteo.de>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* [bugfix] Resolve memory leak for evaluation (#6326)

* resolve bug

* resolve flake8

* revert name

* Update changelog for v1.2.2 (#6325)

* update changelog for v1.2.2

* ckpr 1.2.2

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* CI: fix examples - patch download MNIST (#6357)

* patch download

* CI

* isort

* extra

* [bug] Fix Pytorch profiler with emit_nvtx (#6260)

* resolve bug

* update changelog

* Update tests/trainer/test_trainer.py

* Update pytorch_lightning/profiler/profilers.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* resolve comments

* resolve flake8

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix importing torchtext batch (#6365)

* copy torchtext batch

* update

* rev

* rev

* give a more complete GAN example (#6294)

* Refactor RunningStage usage in advance of implementing Trainer.validate() (#4945)

* Update code

Co-authored-by: EliaCereda

* More property updates

* Move properties. Introduce trainer._fitting

* Use trainer.fitting

* Fix reset dataloaders

* Unused code

* RunningStage.SANITY_CHECKING

* Use setters

* Fix bugs

* Fix bugs

* TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING}

* Fix bugs

* Fix bugs

* Fix tests

* Update CHANGELOG. Add deprecation warning. Fix tests

* Unused imports

* Optional trainer

* More deprecation. More refactoring

* Correct version

* Use properties

* Address comments

* flake8

* Missed renamings

* Typo

* is -> ==

It is recommended to use  for Enums since they are singletons, however, since the LightningEnum subclasses str, it's not a good idea in case a user sets the state/stage with a str

* Also for tests

* Typo

* Address @tchaton's comments

* PEP8

* Correct property

* Update CHANGELOG

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Remove called sanity check

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* require: adjust versions (#6363)

* adjust versions

* release

* manifest

* pep8

* CI

* fix

* build

* Use f-"""-string in a Trainer comment (#6377)

* Use f-"""-string

* Add r

* Use Trainer.

* r -> noqa: W605

* Remove no return warning from val/test step (#6139)

* remove warning

* auto_opt

* chlog

* auto_opt

* no_warning_call

* rm old code

* add warning for predict

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Fix manual optimization in pl_example (#6373)

* Fix automatic_optimization

* Fix automatic_optimization

* Uncomment fairscale

* Update Sharded test with RunIf (#6384)

* Remove optimizer_idx arg in manual optimization (#6093)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>

* [doc] Improve Multiple Val/Test Dataloaders with simultaneous batches option (#6320)

* improve doc to describe how to combine batches of multiple test and val dataloaders simultaneously

* fix typo

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* use paramref

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* [doc] Fix closure in manual optimization (#6374)

* Fix manual optimization docs

* Fix typo. Thanks @import-antigravity

* Fix ModelCheckpoint(monitor=None, save_last=True) not saving checkpoints (#6136)

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Update TBLogger docs (#6315)

* Update tensorboard.py

* Update logging.rst

* pep8

* Update logging.rst

* Update logging.rst

* Apply suggestions from code review

* add code sample

* Update logging.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Fix trainer not resetting lightning_optimizers (#6372)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update python version (#6399)

* Fix AttributeError: 'NoneType' object has no attribute 'finalize'  on TPU (#6221)

* Fix bug

Fix AttributeError: 'NoneType' object has no attribute 'finalize'

* Update CHANGELOG.md

* deleted a period

* Update CHANGELOG.md

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Update CHANGELOG.md

* Update pytorch_lightning/plugins/training_type/tpu_spawn.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Run CI (#6402)

* Pass {fit,validate,test,predict} to setup() and teardown() (#6386)

* fix dp reduction test (#6404)

* fix

* update

* fix

* move the class outside

* Add check for verbose attribute of ModelCheckpoint (#6419)

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fixed bug where tuner would not tune lr if also tuning batch_size (#4688)

* fixed bug where tuner would not tune lr if also tuning batch_size

* added a '+1' to computing the smoothed loss. This maintains the behavior for the smoothed loss as before the bug fix

* pep8 fix

* add changelog

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update (#6403)

* fix logger creating directory structure too early in DDP (#6380)

* fix

* add simple test

* fix imports

* add changelog

* tighter test with on_fit_start hook closer to the dispatch call

* move class inside test f unction

* add a comment

* Typing for tests 1/n (#6313)

* typing

* yapf

* typing

* [changelog] Update Changelog on release v1.2.3 (#6444)

* update changelog

* legacy 1.2.3

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* Improve DummyLogger (#6398)

* fix dummy logger

* docs

* update docs

* add changelog

* add none return annotation

* return empty string for name, version

* Raise an exception if check_val_every_n_epoch is not an integer (#6411)

* raise an exception if check_val_every_n_epoch is not an integer

* remove unused object

* add type hints

* add return type

* update exception message

* update exception message

* Set find unused parameters to True by default to fix breaking compatibility (#6438)

* Set find unused parameters to True by default to fix breaking models, add suggestion to re-enable

* Add changelog

* [bug] All_gather support tensor on cpu (#6416)

* add test

* update changelog

* update

* rename function

* [Fix] Ensure we set the default device before initializing deepspeed (#6460)

* Ensure we set the default device before initializing deepspeed

* Add CHANGELOG.md

* Update pytorch_lightning/plugins/training_type/deepspeed.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Remove redundant test (#6466)

* Add Trainer.validate(…) method to run one validation epoch (#4948)

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Allow user to disable the automatic formatting of checkpoint file names. (#6277)

* cleaning SWA (#6259)

* rename

* if

* test

* chlog

* Remove opt from manual_backward in docs (#6267)

* switch agents pool (#6270)

* Allow user to disable the automatic formatting of checkpoint file names.

* Added changelog entry.

* Made flake8 happy.

* Applied review suggestion: quotes for special characters in docstring

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Fixed example in docstring.

* Fixed syntax error in docstring.

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Hotfix for torchvision (#6476)

* cover subproc coverage (#6477)

* argparse: Add use_argument_group=True (#6088)

* argparse: Add inplace option

Replicate in GAN model

* datamodule: Deduplicate logic w/ argparser utilities

* Update pl_examples/domain_templates/generative_adversarial_net.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Keep docstrings

* Correct name

* Whitespace

* Consistency

* fix weird type stuff

* try alt - use_argument_group

* fix syntax + lint

* fix ci errs

* fix ci

* change examples... still failing w/ "unrecognized arguments: --batch_size"

* address review

* mnist_datamodule: add some docstrings

* argparse: check cls or cls.__init__ for param

didn't capture issue, but meh

* fix lint

* fix no-doc edge case

* address review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

* Disable batch transfer in DP mode (#6098)

* add exceptions and test

* hook

* fix

* clean up

* clean up

* regex

* regex

* docs

* rev

* comment and docs

* chlog

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Apply suggestions from code review

Co-authored-by: chaton <thomas@grid.ai>

* Monkey-patch device count

* docs

* pep

* api_change

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>

* remove obsolete todo in pl_examples (#6475)

* [feat] Support iteration-based checkpointing in model checkpoint callback (#6146)

* Update model_checkpoint.py

* add tests

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* fix tests

* every_n_batches

* Update test_model_checkpoint.py

* defaults

* rm tests

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* Prune deprecated metrics for 1.3 (#6161)

* prune deprecated metrics for 1.3

* isort / yapf

* Update model_checkpoint.py

* add tests

* defaults

* Update CHANGELOG.md

* pre-commit

* Update model_checkpoint.py

* update defaults

* Update test_remove_1-5.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* fix tests

* Update test_model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* ckpt-callback

* Update test_model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* validation-end

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* Update test_model_checkpoint.py

* Update test_model_checkpoint.py

* Update test_model_checkpoint.py

* clarify-names

- Make names explicit as to which hooks they apply to
- Use step instead of batch for consistency with global step

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* mutual-exclusive

Make every_n_train_steps and every_n_val_epochs mutually exclusive

* fix-default-0

* Update CHANGELOG.md

* formatting

* make-private

make attributes private to the class

* rebase

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update xla version (#6464)

* Remove unused mixin attributes (#6487)

* Remove unused mixing attributes

* Missing import

* [doc] Update the order of zero_grad and backward (#6478)

* Fix zero_grad in docs

* Fix zero_grad in docs

* Fix tuner.scale_batch_size not finding batch size attribute when using datamodule (#5968)

* Update docs for limit_predict_batches (#6507)

* add docs and minor updates

* docs

* fraction

* [bug] Update broadcast + reduce decision ModelCheckpoint] (#6410)

* resolve bug

* update

* update changelog

* update PR

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* add todo

* resolve issues

* resolve flake8

* update

* add coverage for reduce

* wip

* restore back to brodbact

* remove test.py

* resolve flake8

* update

* check world size

* resolve test

* update

* use pytorch version when defined

* update on comments

* update on comments

* flake8

* resolve bugs

* Update CHANGELOG.md

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update

* update

* update

* update

* remove test

* update

* resolve flake8

* update

* update

* update

* proxy

* update

* update

* resolve typo

* prune

* update parallel

* update

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Handle torch.jit scripted modules in layer summary (#6511)

* CI: resume testing with py3.8 (#6516)

* testing on python 3.8

* req

* document exceptions for metrics/functional (#6273)

* document exceptions for metrics/functional

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Mean Average Precision metric for Information Retrieval (1/5) (#5032)

* init information retrieval metrics

* changed retrieval metrics names, expanded arguments and fixed typo

* added 'Retrieval' prefix to metrics and fixed conflict with already-present 'average_precision' file

* improved code formatting

* pep8 code compatibility

* features/implemented new Mean Average Precision metrics for Information Retrieval + doc

* fixed pep8 compatibility

* removed threshold parameter and fixed typo on types in RetrievalMAP and improved doc

* improved doc, put first class-specific args in RetrievalMetric and transformed RetrievalMetric in abstract class

* implemented tests for functional and class metric. fixed typo when input tensors are empty or when all targets are False

* fixed typos in doc and changed torch.true_divide to torch.div

* fixed typos pep8 compatibility

* fixed types in long division in ir_average_precision and example in mean_average_precision

* RetrievalMetric states are not lists and _metric method accepts predictions and targets for easier extension

* updated CHANGELOG file

* added '# noqa: F401' flag to not used imports

* added double space before '# noqa: F401' flag

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* change get_mini_groups in get_group_indexes

* added checks on target inputs

* minor refactoring for code cleanness

* split tests over exception raising in separate function && refactored test code into multiple functions

* fixed pep8 compatibility

* implemented suggestions of @SkafteNicki

* fixed imports for isort and added types annontations to functions in test_map.py

* isort on test_map and fixed typing

* isort on retrieval and on __init__.py and utils.py in metrics package

* fixed typo in pytorch_lightning/metrics/__init__.py regarding code style

* fixed yapf compatibility

* fixed yapf compatibility

* fixed typo in doc

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* CI: Azure publish results (#6514)

* deprecate metrics pkg (#6505)

* deprecate metrics

* examples

* req

* docs

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* pep8

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* [test] lr_find with bs_scale (#6422)

* init test: test_lr_find_with_bs_scale

* Update test_lr_finder.py

* remove gpu req

* try boring model

* custom boring model

* pep8

* fix typo

* Update test_lr_finder.py

* typo

* typo

* Update DeepSpeed docs (#6528)

* Clean up docs and add some explicitness around stages

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* fix attribute access in LightningModule.toggle_optimizer (#6513)

* Update hook lifecycle (#6538)

* Update hook lifecycle

* Update docs/source/common/lightning_module.rst

* Prune metrics base classes 2/n (#6530)

* base class

* extensions

* chlog

* _stable_1d_sort

* _check_same_shape

* _input_format_classification_one_hot

* utils

* to_onehot

* select_topk

* to_categorical

* get_num_classes

* reduce

* class_reduce

* tests

* Custom Plugin is_distributed (#6537)

* return from plugin

* dont return for tpu

* refactor reading env defaults (#6510)

* change tests

* fix

* test

* _defaults_from_env_vars

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Prune metric: helpers and inputs 3/n (#6547)

* _basic_input_validation

* _check_shape_and_type_consistency

* _check_num_classes_binary

* _check_num_classes_mc

* _check_num_classes_ml

* _check_top_k

* _check_classification_inputs

* _input_format_classification

* _reduce_stat_scores

* DataType

* rest

* flake8

* chlog

* prune warning & deprecation wrapper (#6540)

* docs

* wrapper

* test

* count

* flake8

* Add outputs param for `on_val/test_epoch_end` hooks (#6120)

* add outputs param for on_val/test_epoch_end hooks

* update changelog

* fix warning message

* add custom call hook

* cache logged metrics

* add args to docstrings

* use warning cache

* add utility method for param in sig check

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update docstring

* add test for eval epoch end hook

* add types and replace model ref

* add deprecation test

* fix test fx name

* add model hooks warning

* add old signature model to tests

* add clear warning cache

* sopport args param

* update tests

* add tests for model hooks

* code suggestions

* add signature utils

* fix pep8 issues

* fix pep8 issues

* fix outputs issue

* fix tests

* code fixes

* fix validate test

* test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* [doc] Add Zero Grad `set_to_none=True` trick (#6548)

* add trick to doc

* update

* update path

* Update docs/source/benchmarking/performance.rst

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* fix deprecation wrapper & tests (#6553)

* fix deprecation wrapper & tests

* flake8

* prune metric: accuracy 4/n (#6515)

* prune accuracy

* chlog

* flake8

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* wrap

* test

* test

* fix

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Prune metrics: AUC & AUROC (#6572)

* class: AUC AUROC

* func: auc auroc

* format

* tests

* [doc] Update Dict Train Loader doc.  (#6579)

* update doc

* update example

* Prune metrics: precision & recall 6/n (#6573)

* avg precision

* precision
* recall

* curve

* tests

* chlog

* isort

* fix

* Update Changelog for v1.2.4 (#6581)

* Update changelog for v1.2.4

* lagacy v1.2.4

* prune duplicates from changelog

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

* [Fix] Move init dist connection into the setup function (#6506)

* Move connection setup into the setup function. Call setup hook after we set up the accelerator

* Added CHANGELOG.md

* fix setup order in callback test

* fix input arguments in test

* Mock distributed function, remove protection to turn into training type hook

* Remove import

* Add missing mock, ensure custom plugin does not create children process

* Skip test on windows

* Update deepspeed to init connection in setup

* Do not initialize distributed module

* Move DeepSpeed tests to special tests since dist communication is being set up

* Special the test to see if this fixes CI

* Delete accelerator connector test to see if its causing build to fail

* Delete deepspeed test

* Revert "Delete accelerator connector test to see if its causing build to fail"

This reverts commit edde60b8

* Revert "Delete deepspeed test"

This reverts commit 9d317429

* Reverse hook

* Reverse setup hooks to debug again

* Add todo so i know where i left off

* For single device move in pre_dispatch after setup function

* Add additional model to device hook if any additional parameters have been set

* See if we can enable deepspeed tests

* Revert "See if we can enable deepspeed tests"

This reverts commit b5450def

* See if this hook approach works

* Introduce new granular hooks

* Remove import, fix tpu spawn by moving the function to setup

* Added missing special test

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Fix all_gather for tpu_cores=8 (#6587)

* Update Gradient Clipping for TPU Accelerator (#6576)

* NGC container PoC (#6187)

* add NVIDIA flows

* push

* pull

* ...

* extras

* ci prune

* fix

* tag

* .

* list

* Automatically set sync_batchnorm for training_type_plugin (#6536)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: Kaushik Bokka <kaushikbokka@gmail.com>

* Prune metrics: other classification 7/n (#6584)

* confusion_matrix

* iou

* f_beta

* hamming_distance

* stat_scores

* tests

* flake8

* chlog

* fixing examples (#6600)

* try Azure

* -e

* path

* Add AMP for validation, prediction and testing (#6565)

* Add Tests for val and test-steps

* Add native AMP

* pep8 tests

* pep8 plugin

* changelog

* Add trainer.predict config validation (#6543)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Add DDP Spawn being default for Multi GPUs (#6292)

* Move profiler tests (#6619)

* drop mypy from .pre-commit-config.yaml (#6542)

* Clean utilities/argparse and add missing tests (#6607)

* Allow training type plugin to delay optimizer creation (FSDP 2/n) (#6331)

* Allow training_type_plugin to delay optimizer configure

* Add missing references to trainer, add a CPU accelerator based test

* Add teardown method to BaseProfiler. (#6370)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* refactoring setup (#6590)

* refactoring setup

* .

* docs

* flake8

* hotfix: mock examples (#6632)

* mock examples

* drop from GA

* [refactor] Add setup to profilers + _run_stage_setup to trainer 2/5 (#6633)

* add setup

* update

* updates on comment

* Minor changes

* Extra import

* Docs

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>

* fix comparing versions (#6434)

* fix comparing versions

* chlog

* .

* ...

* datasets

* Prune metrics: regression 8/n (#6636)

* explained_variance

* tests

* mean_absolute_error

* mean_squared_error

* mean_relative_error

* mean_squared_log_error

* chlog

* Prune metyrics: regression 9/n (#6637)

* psnr

* r2score

* ssim

* chlog

* Refactor base profilers 3/5 (#6621)

Co-authored-by: tchaton <thomas@grid.ai>

* prune metrics: info retrieval (#6649)

* Flash predict step (#6577)

* add predict_step

* Update predict_loop.py

* Update trainer.py

* Update trainer.py

* resolve bugs

* update

* update

* update

* resolve bug

* resolve some failing tests

* udpate tests

* update

* resolve tests

* add a test

* remove typo

* add a test for attachement

* update

* changed to on_train_dataloader

* remove __flash_special_attr__

* resolve tests

* update

* update

* update

* update on comments

* Update pytorch_lightning/trainer/data_loading.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix back-compatibility for Accel (#6655)

* Refactor PyTorch profiler 4/5 (#6349)

Co-authored-by: thomas chaton <thomas@grid.ai>

* Add PyTorch 1.8 Profiler 5/5 (#6618)

* Refactor profilers

* Update PassThrough

* WIP - This is broken and will change

* Update pytorch_lightning/profiler/pytorch.py

Co-authored-by: thomas chaton <thomas@grid.ai>

* resolve tests

* resolve tests

* find output

* try something

* update

* add support for test and predict

* update

* update

* use getattr

* test

* test

* update

* tests

* update

* update

* update

* update

* update

* remove file

* update

* update

* update

* update

* update

* test

* update#

* update

* update tests

* update

* add suport for 1.8

* rename records

* add support for 1.8

* update

* resolve flake8

* resolve test

* Refactor basic profilers

* Fixes

* Unused import

* Introduce setup

* Profile on all ranks. Print to stdout on 0

* Introduce dirpath + filename

* CHANGELOG

* Add tests. Address comments

* add `on_run_stage_setup`

* add on_run_stage_setup function

* update

* add test for RegisterRecordFunction

* update lightnng flow direction

* move variable to private

* remove trace

* Undo code that should be in 3/4

* Multi-stage multi-rank

* 2/5 changes

* Pass stage in __del__

* Remove TODOs

* Describe on_evaluation_end. Add tests

* Typo

* Address comments

* deepcopy tests

* Advanced teardown

* Fix teardown test

* Fix tests

* Minor change

* Update CHANGELOG.md

* Fix test

* Quick fixes

* Fix 6522

* resolve ddp tests

* resolve tests

* resolve some tests
…
  • Loading branch information
Show file tree
Hide file tree
Showing 652 changed files with 84,171 additions and 20,943 deletions.
260 changes: 101 additions & 159 deletions .circleci/config.yml 100755 → 100644
@@ -1,191 +1,133 @@
# Python CircleCI 2.0 configuration file
#
# Check https://circleci.com/docs/2.0/language-python/ for more details
#
version: 2.0
# Python CircleCI 2.1 configuration file.
version: 2.1
orbs:
gcp-gke: circleci/gcp-gke@1.0.4
go: circleci/go@1.3.0
codecov: codecov/codecov@1.1.0

references:

install_deps: &install_deps
make_docs: &make_docs
run:
name: Install Dependences
name: Make Documentation
command: |
sudo apt-get update && sudo apt-get install -y cmake
pip install "$TORCH_VERSION"
pip install -r requirements.txt -q
sudo pip install pytest pytest-cov pytest-flake8 -q
pip install -r ./tests/requirements-devel.txt -q
tests: &tests
# First run the same pipeline as Read-The-Docs
# apt-get update && apt-get install -y cmake
# using: https://hub.docker.com/r/readthedocs/build
# we need to use py3.7 ot higher becase of an issue with metaclass inheritence
pyenv global 3.7.3
python --version
pip install -r requirements/docs.txt
pip list
cd docs
make clean
make html --jobs 2 SPHINXOPTS="-W"
checkout_ml_testing: &checkout_ml_testing
run:
name: Testing
name: Checkout ml-testing-accelerators
command: |
python --version ; pip --version ; pip list
py.test pytorch_lightning tests -v --doctest-modules --junitxml=test-reports/pytest_junit.xml
no_output_timeout: 30m
git clone https://github.com/GoogleCloudPlatform/ml-testing-accelerators.git
cd ml-testing-accelerators
git fetch origin 5e88ac24f631c27045e62f0e8d5dfcf34e425e25:stable
git checkout stable
cd ..
examples: &examples
run:
name: PL Examples
command: |
pip install -r ./pl_examples/requirements.txt --user
python --version ; pip --version ; pip list
py.test pl_examples -v --doctest-modules --junitxml=test-reports/pytest_junit.xml
no_output_timeout: 20m

install_pkg: &install_pkg
build_push_docker: &build_push_docker
run:
name: Install package
name: Build and push Docker image
command: |
virtualenv vEnv ; source vEnv/bin/activate
pip install --editable . ; cd .. & python -c "import pytorch_lightning ; print(pytorch_lightning.__version__)"
deactivate ; rm -rf vEnv
create_pkg: &create_pkg
run:
name: Create package
command: |
sudo pip install twine==1.13.0
python setup.py sdist
twine check dist/*
python setup.py clean
format: &format
gcloud --quiet auth configure-docker
#cd dockers/tpu-tests
export PYTHON_VER=$(python -c "import random ; print('3.6' if random.random() > 0.5 else '3.7')" 2>&1)
echo $PYTHON_VER
docker build --tag "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" -f ./dockers/tpu-tests/Dockerfile --build-arg "PYTHON_VERSION=$PYTHON_VER" --build-arg "PYTORCH_VERSION=$XLA_VER" .
docker push "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID"
deploy_cluster: &deploy_cluster
run:
name: Formatting
name: Deploy the job on the kubernetes cluster
command: |
python --version ; pip --version
sudo pip install flake8 -q
pip list
flake8 .
make_docs: &make_docs
go get github.com/google/go-jsonnet/cmd/jsonnet
export PATH=$PATH:$HOME/go/bin
python -c "fname = 'dockers/tpu-tests/tpu_test_cases.jsonnet' ; fff = open(fname).read().replace('pytorch-VERSION', 'pytorch-$XLA_VER') ; open(fname, 'w').write(fff)"
job_name=$(jsonnet -J ml-testing-accelerators/ dockers/tpu-tests/tpu_test_cases.jsonnet --ext-str image=$GCR_IMAGE_PATH --ext-str image-tag=$CIRCLE_WORKFLOW_JOB_ID | kubectl create -f -)
job_name=${job_name#job.batch/}
job_name=${job_name% created}
echo "Waiting on kubernetes job: $job_name"
i=0 && \
# N checks spaced 30s apart = 900s total.
status_code=2 && \
# Check on the job periodically. Set the status code depending on what
# happened to the job in Kubernetes. If we try MAX_CHECKS times and
# still the job hasn't finished, give up and return the starting
# non-zero status code.
printf "Waiting for job to finish: " && \
while [ $i -lt $MAX_CHECKS ]; do ((i++)); if kubectl get jobs $job_name -o jsonpath='Failed:{.status.failed}' | grep "Failed:1"; then status_code=1 && break; elif kubectl get jobs $job_name -o jsonpath='Succeeded:{.status.succeeded}' | grep "Succeeded:1" ; then status_code=0 && break; else printf "."; fi; sleep $CHECK_SPEEP; done && \
echo "Done waiting. Job status code: $status_code" && \
pod_name=$(kubectl get po -l controller-uid=`kubectl get job $job_name -o "jsonpath={.metadata.labels.controller-uid}"` | awk 'match($0,!/NAME/) {print $1}') && \
echo "GKE pod name: $pod_name" && \
kubectl logs -f $pod_name --container=train > /tmp/full_output.txt
if grep -q '<?xml version="1.0" ?>' /tmp/full_output.txt ; then csplit /tmp/full_output.txt '/<?xml version="1.0" ?>/'; else mv /tmp/full_output.txt xx00; fi && \
# First portion is the test logs. Print these to Github Action stdout.
cat xx00 && \
echo "Done with log retrieval attempt." && \
gcloud container images delete "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" --force-delete-tags && \
exit $status_code
stats: &stats
run:
name: Make Documentation
name: Statistics
command: |
# sudo apt-get install pandoc
sudo apt-get update && sudo apt-get install -y cmake
pip install -r requirements.txt --user
sudo pip install -r docs/requirements.txt
pip install -r requirements-extra.txt --user # for doctesting loggers etc.
# sphinx-apidoc -o ./docs/source ./pytorch_lightning **/test_* --force --follow-links
cd docs; make clean; make html --debug --jobs 2 SPHINXOPTS="-W"
make doctest; make coverage
mv ./xx01 coverage.xml
# TODO: add human readable report
cat coverage.xml
sudo pip install pycobertura
pycobertura show coverage.xml
jobs:

Build-Docs:
docker:
- image: circleci/python:3.7
steps:
- checkout
- *make_docs
- store_artifacts:
# allows us to preview the generated html pages
path: docs/build/html/
destination: html

Formatting:
TPU-tests:
docker:
- image: circleci/python:3.7
environment:
- TORCH_VERSION: "torch"
- XLA_VER: 1.7
- MAX_CHECKS: 240
- CHECK_SPEEP: 5
steps:
- checkout
- *format
- go/install
- *checkout_ml_testing
- gcp-gke/install
- gcp-gke/update-kubeconfig-with-credentials:
cluster: $GKE_CLUSTER
perform-login: true
- setup_remote_docker
- *build_push_docker
- *deploy_cluster
- *stats
- codecov/upload:
file: coverage.xml
flags: tpu,pytest
upload_name: TPU-coverage

PyTorch:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch"
steps: &steps
- checkout
#- restore_cache:
# keys:
# # when lock file changes, use increasingly general patterns to restore cache
# - pip-packages--{{ .Environment.CIRCLE_JOB }}
# - pip-packages--
- *install_deps
#- save_cache:
# key: pip-packages--{{ .Environment.CIRCLE_JOB }}
# paths:
# # this path depends on where pipenv creates a virtualenv
# - "~/.cache/pip"
# - "/usr/local/lib/python3.6/site-packages"
# - "/usr/local/lib/site-python"
- *tests
- store_test_results:
path: test-reports
- store_artifacts:
path: test-reports

PyTorch-v1_1:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch>=1.1, <1.2"
steps: *steps

PyTorch-v1_2:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch>=1.2, <1.3"
steps: *steps

PyTorch-v1_3:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch>=1.3, <1.4"
steps: *steps

PyTorch-v1_4:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch>=1.4, <1.5"
steps: *steps

PyTorch-v1_5:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch>=1.5, <1.6"
steps: *steps
path: coverage.xml

Examples:
build-Docs:
docker:
- image: circleci/python:3.7
environment:
- TORCH_VERSION: "torch"
steps:
- checkout
- *install_deps
- *examples

Install-pkg:
docker:
- image: circleci/python:3.7
- image: readthedocs/build:latest
steps:
- checkout
- *create_pkg
- *install_pkg

#orbs:
# python: circleci/python@0.2.1
- *make_docs
- store_artifacts:
# allows us to preview the generated html pages
path: docs/build/html/
destination: html

workflows:
version: 2
build:
tpu-tests:
jobs:
- Formatting
- Build-Docs
- PyTorch-v1_1
- PyTorch-v1_2
- PyTorch-v1_3
- PyTorch-v1_4
- PyTorch-v1_5
- Install-pkg
- Examples
- build-Docs
- TPU-tests
21 changes: 18 additions & 3 deletions .codecov.yml
@@ -1,3 +1,17 @@
# Copyright The PyTorch Lightning team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see https://docs.codecov.io/docs/codecov-yaml
# Validation check:
# $ curl --data-binary @.codecov.yml https://codecov.io/validate
Expand All @@ -9,8 +23,10 @@ codecov:
strict_yaml_branch: "yaml-config"
require_ci_to_pass: yes
notify:
# after_n_builds: 2
after_n_builds: 23
wait_for_ci: yes
# https://docs.codecov.io/docs/codecov-yaml#section-expired-reports
max_report_age: off

coverage:
precision: 0 # 2 = xx.xx%, 0 = xx%
Expand Down Expand Up @@ -48,5 +64,4 @@ comment:
layout: header, diff
require_changes: false
behavior: default # update if exists else create new
# branches: *

after_n_builds: 23
58 changes: 0 additions & 58 deletions .drone.yml

This file was deleted.

0 comments on commit b500619

Please sign in to comment.