New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler
causes cycle in execution DAG?
#122
Comments
Since what version? Can you give a stacktrace and dump |
Here is the trace: ERROR: The input graph contains at least one loop.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] topological_sort_by_dfs(::Type{Graphs.IsDirected{Graphs.SimpleGraphs.SimpleDiGraph{Int64}}}, g::Graphs.SimpleGraphs.SimpleDiGraph{Int64})
@ Graphs ~/.julia/packages/Graphs/zrMoC/src/traversals/dfs.jl:65
[3] topological_sort_by_dfs(g::Graphs.SimpleGraphs.SimpleDiGraph{Int64})
@ Graphs ~/.julia/packages/SimpleTraits/l1ZsK/src/SimpleTraits.jl:331
[4] (::FluxTraining.var"#16#17"{Learner})()
@ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/callbacks/execution.jl:9
[5] ignore
@ ~/.julia/packages/Zygote/DkIUK/src/lib/utils.jl:25 [inlined]
[6] handle(runner::FluxTraining.LinearRunner, event::FluxTraining.Events.EpochBegin, phase::TrainingPhase, learner::Learner)
@ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/callbacks/execution.jl:8
[7] (::FluxTraining.var"#handlefn#77"{Learner, TrainingPhase})(e::FluxTraining.Events.EpochBegin)
@ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/training.jl:102
[8] runepoch(epochfn::FluxTraining.var"#67#68"{Learner, TrainingPhase, DataLoaders.BufferGetObsParallel{NamedTuple{(:image, :label), Tuple{Array{Float32, 4}, Matrix{Bool}}}, BatchView{ObsView{MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{Pad{4}, Crop{2, DataAugmentation.FromRandom}, Rotate{Distributions.Uniform{Float64}}, Crop{2, DataAugmentation.FromCenter}, DataAugmentation.OneOfProjective{DataAugmentation.ProjectiveTransform, Distributions.Categorical{Float64, Vector{Float64}}}, ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}, UnitRange{Int64}}, MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{Pad{4}, Crop{2, DataAugmentation.FromRandom}, Rotate{Distributions.Uniform{Float64}}, Crop{2, DataAugmentation.FromCenter}, DataAugmentation.OneOfProjective{DataAugmentation.ProjectiveTransform, Distributions.Categorical{Float64, Vector{Float64}}}, ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}}}}, learner::Learner, phase::TrainingPhase)
@ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/training.jl:104
[9] epoch!
@ ~/.julia/packages/FluxTraining/bday3/src/training.jl:22 [inlined]
[10] fit!(learner::Learner, nepochs::Int64, ::Tuple{DataLoaders.BufferGetObsParallel{NamedTuple{(:image, :label), Tuple{Array{Float32, 4}, Matrix{Bool}}}, BatchView{ObsView{MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{Pad{4}, Crop{2, DataAugmentation.FromRandom}, Rotate{Distributions.Uniform{Float64}}, Crop{2, DataAugmentation.FromCenter}, DataAugmentation.OneOfProjective{DataAugmentation.ProjectiveTransform, Distributions.Categorical{Float64, Vector{Float64}}}, ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}, UnitRange{Int64}}, MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{Pad{4}, Crop{2, DataAugmentation.FromRandom}, Rotate{Distributions.Uniform{Float64}}, Crop{2, DataAugmentation.FromCenter}, DataAugmentation.OneOfProjective{DataAugmentation.ProjectiveTransform, Distributions.Categorical{Float64, Vector{Float64}}}, ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}}}, DataLoaders.BufferGetObsParallel{NamedTuple{(:image, :label), Tuple{Array{Float32, 4}, Matrix{Bool}}}, BatchView{ObsView{MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}, UnitRange{Int64}}, MLUtils.MappedData{Base.Fix1{typeof(apply_augmenation), DataAugmentation.Sequence{Tuple{ImageToTensor{Float32}, Normalize{3}}}}, NamedTuple{(:image, :label), Tuple{ObsView{MLUtils.MappedData{typeof(DataAugmentation.tensortoimage), Array{Float32, 4}}, Vector{Int64}}, SubArray{Bool, 2, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{Int64}}, false}}}}}}})
@ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/training.jl:168
[11] fit!(learner::Learner, nepochs::Int64)
@ FluxTraining ~/.julia/packages/FluxTraining/bday3/src/training.jl:174
[12] top-level scope
@ ~/test-cifar/test-kyle.jl:120 And the output of 8-element Vector{FluxTraining.SafeCallback}:
ToDevice(Flux.gpu, Flux.gpu)
Scheduler(LearningRate)
LogMetrics((TensorBoardBackend(/home/daruwalla/test-cifar/tblogs),))
Metrics(Loss(), Metric(train_acc), Metric(val_acc))
ProgressPrinter()
MetricsPrinter()
StopOnNaNLoss()
Recorder() I'll try and bisect which version later. |
You can also visualize the dependency graph using using GraphPlot
gplot(learner.callbacks.graph, nodelabel = learner.callbacks.cbs, layout = stressmajorize_layout) That together with |
I found the problem: since #115
|
I have the following script:
Any time I add
schcb
to the list of callbacks passed to theLearner
, I get an error from FluxTraining that there is a cycle in the DAG. This did not happen in previous versions of FluxTraining (though I haven't been able to bisect the change yet).The text was updated successfully, but these errors were encountered: