`Scheduler` causes cycle in execution DAG? #122

darsnack · 2022-06-02T16:54:58Z

I have the following script:

lossfn = Flux.Losses.logitcrossentropy

# define schedule and optimizer
initial_lr = 0.1
schedule = Step(initial_lr, 0.5, 20)
optim = Flux.Optimiser(Momentum(initial_lr), WeightDecay(1e-3))

# callbacks
logger = TensorBoardBackend("tblogs")
schcb = Scheduler(LearningRate => schedule)
hlogcb = LogHyperParams(logger)
mlogcb = LogMetrics(logger)
valcb = Metrics(Metric(accuracy; phase = TrainingPhase, name = "train_acc"),
                Metric(accuracy; phase = ValidationPhase, name = "val_acc"))

# setup learner object
learner = Learner(m, lossfn;
                  data = (trainloader, valloader),
                  optimizer = optim,
                  callbacks = [ToGPU(), mlogcb, valcb])

Any time I add schcb to the list of callbacks passed to the Learner, I get an error from FluxTraining that there is a cycle in the DAG. This did not happen in previous versions of FluxTraining (though I haven't been able to bisect the change yet).

The text was updated successfully, but these errors were encountered:

lorenzoh · 2022-06-02T17:59:53Z

Since what version?

Can you give a stacktrace and dump learner.callbacks.cbs?

darsnack · 2022-06-02T19:20:09Z

Here is the trace:

ERROR: The input graph contains at least one loop.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] topological_sort_by_dfs(::Type{Graphs.IsDirected{Graphs.SimpleGraphs.SimpleDiGraph{Int64}}}, g::Graphs.SimpleGraphs.SimpleDiGraph{Int64})
    @ Graphs ~/.julia/packages/Graphs/zrMoC/src/traversals/dfs.jl:65
  [3] topological_sort_by_dfs(g::Graphs.SimpleGraphs.SimpleDiGraph{Int64})
    @ Graphs ~/.julia/packages/SimpleTraits/l1ZsK/src/SimpleTraits.jl:331
  [4] (::FluxTraining.var"#16#17"{Learner})()
    @ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/callbacks/execution.jl:9
  [5] ignore
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/utils.jl:25 [inlined]
  [6] handle(runner::FluxTraining.LinearRunner, event::FluxTraining.Events.EpochBegin, phase::TrainingPhase, learner::Learner)
    @ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/callbacks/execution.jl:8
  [7] (::FluxTraining.var"#handlefn#77"{Learner, TrainingPhase})(e::FluxTraining.Events.EpochBegin)
    @ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/training.jl:102
  [8] runepoch(epochfn::FluxTraining.var"#67#68"{Learner, TrainingPhase, DataLoaders.BufferGetObsParallel{NamedTuple{(:image, :label), Tuple{Array{Float32, 4}, Matrix{Bool}}}, BatchView{ObsView{MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{Pad{4}, Crop{2, DataAugmentation.FromRandom}, Rotate{Distributions.Uniform{Float64}}, Crop{2, DataAugmentation.FromCenter}, DataAugmentation.OneOfProjective{DataAugmentation.ProjectiveTransform, Distributions.Categorical{Float64, Vector{Float64}}}, ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}, UnitRange{Int64}}, MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{Pad{4}, Crop{2, DataAugmentation.FromRandom}, Rotate{Distributions.Uniform{Float64}}, Crop{2, DataAugmentation.FromCenter}, DataAugmentation.OneOfProjective{DataAugmentation.ProjectiveTransform, Distributions.Categorical{Float64, Vector{Float64}}}, ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}}}}, learner::Learner, phase::TrainingPhase)
    @ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/training.jl:104
  [9] epoch!
    @ ~/.julia/packages/FluxTraining/bday3/src/training.jl:22 [inlined]
 [10] fit!(learner::Learner, nepochs::Int64, ::Tuple{DataLoaders.BufferGetObsParallel{NamedTuple{(:image, :label), Tuple{Array{Float32, 4}, Matrix{Bool}}}, BatchView{ObsView{MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{Pad{4}, Crop{2, DataAugmentation.FromRandom}, Rotate{Distributions.Uniform{Float64}}, Crop{2, DataAugmentation.FromCenter}, DataAugmentation.OneOfProjective{DataAugmentation.ProjectiveTransform, Distributions.Categorical{Float64, Vector{Float64}}}, ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}, UnitRange{Int64}}, MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{Pad{4}, Crop{2, DataAugmentation.FromRandom}, Rotate{Distributions.Uniform{Float64}}, Crop{2, DataAugmentation.FromCenter}, DataAugmentation.OneOfProjective{DataAugmentation.ProjectiveTransform, Distributions.Categorical{Float64, Vector{Float64}}}, ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}}}, DataLoaders.BufferGetObsParallel{NamedTuple{(:image, :label), Tuple{Array{Float32, 4}, Matrix{Bool}}}, BatchView{ObsView{MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}, UnitRange{Int64}}, MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}}}})
    @ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/training.jl:168
 [11] fit!(learner::Learner, nepochs::Int64)
    @ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/training.jl:174
 [12] top-level scope
    @ ~/test-cifar/test-kyle.jl:120

And the output of learner.callbacks.cbs:

8-element Vector{FluxTraining.SafeCallback}:
 ToDevice(Flux.gpu, Flux.gpu)
 Scheduler(LearningRate)
 LogMetrics((TensorBoardBackend(/home/daruwalla/test-cifar/tblogs),))
 Metrics(Loss(), Metric(train_acc), Metric(val_acc))
 ProgressPrinter()
 MetricsPrinter()
 StopOnNaNLoss()
 Recorder()

I'll try and bisect which version later.

lorenzoh · 2022-06-03T05:51:58Z

You can also visualize the dependency graph using

using GraphPlot
gplot(learner.callbacks.graph, nodelabel = learner.callbacks.cbs, layout = stressmajorize_layout)

That together with FluxTraining.stateaccess.(learner.callbacks.cbs) should give a better picture of where the conflict occurs.

lorenzoh · 2022-06-10T10:29:43Z

I found the problem: since #115 Scheduler now writes to learner.optimizer (because Optimisers.jl are immutable), the following cyclical dependency is created:

Recorder (reads step) depends on ToGPU (which modifies step)
ToGPU (reads optimizer) depends on Scheduler (which modifies optimizer)
Scheduler (reads history) depends on Recorder (which modifies history)

darsnack mentioned this issue Jun 8, 2022

Keypoint regression example: The input graph contains at least one loop FluxML/FastAI.jl#231

Closed

lorenzoh added a commit that referenced this issue Jun 10, 2022

Fix Scheduler causes cycle in execution DAG? #122

9a8cf6f

lorenzoh mentioned this issue Jun 10, 2022

Fix Scheduler causes cycle in execution DAG? #122 #123

Merged

lorenzoh closed this as completed in #123 Jun 10, 2022

lorenzoh added a commit that referenced this issue Jun 10, 2022

Fix Scheduler causes cycle in execution DAG? #122 (#123)

44e7097

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Scheduler` causes cycle in execution DAG? #122

`Scheduler` causes cycle in execution DAG? #122

darsnack commented Jun 2, 2022

lorenzoh commented Jun 2, 2022

darsnack commented Jun 2, 2022

lorenzoh commented Jun 3, 2022

lorenzoh commented Jun 10, 2022 •

edited

Scheduler causes cycle in execution DAG? #122

Scheduler causes cycle in execution DAG? #122

Comments

darsnack commented Jun 2, 2022

lorenzoh commented Jun 2, 2022

darsnack commented Jun 2, 2022

lorenzoh commented Jun 3, 2022

lorenzoh commented Jun 10, 2022 • edited

`Scheduler` causes cycle in execution DAG? #122

`Scheduler` causes cycle in execution DAG? #122

lorenzoh commented Jun 10, 2022 •

edited