Better iterators #122

bengioe · 2024-03-01T01:00:21Z

This PR refactors the SamplingIterator class into a DataSource class, where much of the existing functionality is decomposed into methods that can be combine with more clarity. Instead of interacting with SamplingIterator through a long list of sometimes unclear arguments, DataSource works by creating and combining iterators that perform more specific jobs.

Changes:

SamplingIterator is now DataSource
refactor around devices, device-needing code should now call get_worker_device(). This avoids passing around a device object (which, because of workers, can get confusing)
adds an is_eval member to algorithms, and shifts the responsibility of determining random action probabilities to GFNAlgorithm
adds SQLiteLogHook to generalize logging as a hook that's added to DataSource instances
adds AvgRewardHook, a simple sampling hook to report average reward
fixes minor MOO incongruities
fixes Nested dataclasses do not reinitialize #123 , whereby nested dataclasses would be only initialized once
fixes a timeout condition typo in MultiObjectiveStatsHook

TODO:

address TODO items in this draft PR
complete transition to get_worker_device and remove device objects being passed around

julienroyd

Taking a first look. Great refacto, I think it is much clearer now! Left a few comments (probably more to come).

src/gflownet/data/data_source.py

src/gflownet/trainer.py

julienroyd

Looks good to me! And much needed refacto, very nice result !

Small suggestion: could add just a few words about DataSource in implementation_notes.md, e.g.:

## Data sources

The data used for training GFlowNets can come from a variety of sources. `DataSource` implements these different use-cases as individual iterators that collectively assemble the training batches before passing it to the trainer. Some of these use-cases include:
- Generating new trajectories on-policy
- Sampling trajectories from passed policies from a replay buffer
- Sampling trajectories from a fixed, offline dataset 

`DataSource` also covers validation sets, including cases such as:
- Generating new trajectories (w.r.t a fixed dataset of conditioning goals)
- Evaluating the model's likelihood on trajectories from a fixed, offline dataset

src/gflownet/__init__.py

src/gflownet/algo/config.py

src/gflownet/trainer.py

src/gflownet/algo/config.py

src/gflownet/data/data_source.py

bengioe added 5 commits February 28, 2024 15:11

first throw at refactoring SamplingIterator

7dbca12

Merge branch 'trunk' into bengioe-better-iterators

939cb56

changed all iterators to DataSource

dfba1ca

lots of little fixes, tested all tasks, better device management

e5239fb

style

43dfc2b

julienroyd reviewed Mar 1, 2024

View reviewed changes

src/gflownet/data/data_source.py Show resolved Hide resolved

src/gflownet/trainer.py Outdated Show resolved Hide resolved

src/gflownet/trainer.py Outdated Show resolved Hide resolved

bengioe added 7 commits March 7, 2024 08:25

change batch size hyperparameters + fix nested dataclasses

279ecfc

Merge branch 'trunk' into bengioe-better-iterators

2ba251a

move things around & prevent circular import

282bbfb

tox

c3bc6d0

fix imports

b1c5630

replace device references with get_worker_device

a64a639

little fixes

28bcc59

bengioe marked this pull request as ready for review March 7, 2024 16:42

a few more stragglers

4811e7c

julienroyd approved these changes Mar 8, 2024

View reviewed changes

typo + impl notes

c6d5613

bengioe merged commit 9bf35cd into trunk Mar 11, 2024
4 checks passed

bengioe deleted the bengioe-better-iterators branch March 11, 2024 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better iterators #122

Better iterators #122

bengioe commented Mar 1, 2024 •

edited

Loading

julienroyd left a comment

julienroyd left a comment

Better iterators #122

Better iterators #122

Conversation

bengioe commented Mar 1, 2024 • edited Loading

julienroyd left a comment

Choose a reason for hiding this comment

julienroyd left a comment

Choose a reason for hiding this comment

bengioe commented Mar 1, 2024 •

edited

Loading