-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with crew #1044
Integration with crew #1044
Conversation
With the enhancement in this PR,
Additionally, if you supply a controller group to |
@shikokuchuo, I am trying to integrate I tried to isolate the problem using https://github.com/ropensci/targets/tree/753-debug. In a simple pipeline of 100 independent targets, execution quickly reaches a point where the Is there something about GitHub Actions + R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(targets)
> pipeline <- targets:::pipeline_init(
+ lapply(
+ seq_len(100),
+ function(index) {
+ tar_target_raw(
+ name = paste0("task_", index),
+ command = quote(stats::rnorm(1000))
+ )
+ }
+ )
+ )
> targets:::tar_runtime$set_fun("tar_make")
> controller <- crew::crew_controller_local()
> out <- targets:::crew_init(
+ pipeline = pipeline,
+ controller = controller,
+ reporter = "timestamp"
+ )
> out$run()
• +0000 UTC 2023-04-10 02:09 38.19 start target task_60
• +0000 UTC 2023-04-10 02:09 38.37 start target task_61
• +0000 UTC 2023-04-10 02:09 38.57 start target task_62
• +0000 UTC 2023-04-10 02:09 38.66 start target task_63
• +0000 UTC 2023-04-10 02:09 38.75 start target task_64
• +0000 UTC 2023-04-10 02:09 38.85 start target task_65
• +0000 UTC 2023-04-10 02:09 38.95 start target task_66
• +0000 UTC 2023-04-10 02:09 39.80 start target task_67
• +0000 UTC 2023-04-10 02:09 39.90 start target task_68
• +0000 UTC 2023-04-10 02:09 39.99 start target task_69
• +0000 UTC 2023-04-10 02:09 40.08 start target task_1
• +0000 UTC 2023-04-10 02:09 40.17 start target task_2
name
1 task_60
command
1 targets::target_run_worker(target = target, envir = envir, path_store = path_store, \n fun = fun, options = options, envvars = envvars)
result seconds seed error traceback warnings
1 done 0.024 857953997 <NA> <NA> <NA>
launcher worker
1 9aaadb8ba81c51e5122bb6ff503bee81f6909ed1 1
instance
1 53db515f68c47be2cb0291d02bc0ca70716397b4
• +0000 UTC 2023-04-10 02:09 40.60 built target task_60 [0.009 seconds]
• +0000 UTC 2023-04-10 02:09 40.60 start target task_3
• +0000 UTC 2023-04-10 02:09 40.70 start target task_4
• +0000 UTC 2023-04-10 02:09 40.80 start target task_5
• +0000 UTC 2023-04-10 02:09 41.01 start target task_6
name
1 task_61
command
1 targets::target_run_worker(target = target, envir = envir, path_store = path_store, \n fun = fun, options = options, envvars = envvars)
result seconds seed error traceback warnings
1 done 0.002 244508592 <NA> <NA> <NA>
launcher worker
1 9aaadb8ba81c51e5122bb6ff503bee81f6909ed1 1
instance
1 53db515f68c47be2cb0291d02bc0ca70716397b4
• +0000 UTC 2023-04-10 02:09 41.15 built target task_61 [0 seconds]
• +0000 UTC 2023-04-10 02:09 41.15 start target task_7
• +0000 UTC 2023-04-10 02:09 41.24 start target task_50
• +0000 UTC 2023-04-10 02:09 41.33 start target task_40
• +0000 UTC 2023-04-10 02:09 41.42 start target task_8
• +0000 UTC 2023-04-10 02:09 41.51 start target task_51
• +0000 UTC 2023-04-10 02:09 41.61 start target task_41
• +0000 UTC 2023-04-10 02:09 41.69 start target task_9
• +0000 UTC 2023-04-10 02:09 41.79 start target task_52
• +0000 UTC 2023-04-10 02:09 41.88 start target task_42
• +0000 UTC 2023-04-10 02:09 41.97 start target task_53
• +0000 UTC 2023-04-10 02:09 42.06 start target task_43
• +0000 UTC 2023-04-10 02:09 42.15 start target task_54
• +0000 UTC 2023-04-10 02:09 42.24 start target task_44
• +0000 UTC 2023-04-10 02:09 42.33 start target task_55
• +0000 UTC 2023-04-10 02:09 42.42 start target task_45
• +0000 UTC 2023-04-10 02:09 42.51 start target task_56
• +0000 UTC 2023-04-10 02:09 42.60 start target task_46
• +0000 UTC 2023-04-10 02:09 42.70 start target task_57
• +0000 UTC 2023-04-10 02:09 42.79 start target task_47
• +0000 UTC 2023-04-10 02:09 42.88 start target task_58
• +0000 UTC 2023-04-10 02:09 42.97 start target task_48
• +0000 UTC 2023-04-10 02:09 43.06 start target task_59
• +0000 UTC 2023-04-10 02:09 43.15 start target task_49
• +0000 UTC 2023-04-10 02:09 43.25 start target task_30
• +0000 UTC 2023-04-10 02:09 43.34 start target task_31
• +0000 UTC 2023-04-10 02:09 43.43 start target task_32
• +0000 UTC 2023-04-10 02:09 43.53 start target task_33
• +0000 UTC 2023-04-10 02:09 43.62 start target task_34
• +0000 UTC 2023-04-10 02:09 43.72 start target task_35
• +0000 UTC 2023-04-10 02:09 43.81 start target task_36
• +0000 UTC 2023-04-10 02:09 43.90 start target task_37
• +0000 UTC 2023-04-10 02:09 43.99 start target task_38
• +0000 UTC 2023-04-10 02:09 44.09 start target task_39
• +0000 UTC 2023-04-10 02:09 44.18 start target task_100
• +0000 UTC 2023-04-10 02:09 44.27 start target task_20
• +0000 UTC 2023-04-10 02:09 44.39 start target task_21
• +0000 UTC 2023-04-10 02:09 44.49 start target task_22
• +0000 UTC 2023-04-10 02:09 44.58 start target task_23
• +0000 UTC 2023-04-10 02:09 44.67 start target task_24
• +0000 UTC 2023-04-10 02:09 44.77 start target task_25
• +0000 UTC 2023-04-10 02:09 44.86 start target task_26
• +0000 UTC 2023-04-10 02:09 44.96 start target task_27
• +0000 UTC 2023-04-10 02:09 45.05 start target task_28
• +0000 UTC 2023-04-10 02:09 45.15 start target task_29
• +0000 UTC 2023-04-10 02:09 45.24 start target task_10
• +0000 UTC 2023-04-10 02:09 45.33 start target task_11
• +0000 UTC 2023-04-10 02:09 45.43 start target task_12
• +0000 UTC 2023-04-10 02:09 45.52 start target task_13
• +0000 UTC 2023-04-10 02:09 45.61 start target task_14
• +0000 UTC 2023-04-10 02:09 45.70 start target task_15
• +0000 UTC 2023-04-10 02:09 45.80 start target task_16
• +0000 UTC 2023-04-10 02:09 45.90 start target task_17
• +0000 UTC 2023-04-10 02:09 45.99 start target task_90
• +0000 UTC 2023-04-10 02:09 46.09 start target task_18
• +0000 UTC 2023-04-10 02:09 46.19 start target task_91
• +0000 UTC 2023-04-10 02:09 46.28 start target task_19
• +0000 UTC 2023-04-10 02:09 46.37 start target task_92
• +0000 UTC 2023-04-10 02:09 46.47 start target task_93
• +0000 UTC 2023-04-10 02:09 46.57 start target task_94
• +0000 UTC 2023-04-10 02:09 46.66 start target task_95
• +0000 UTC 2023-04-10 02:09 46.75 start target task_96
• +0000 UTC 2023-04-10 02:09 46.85 start target task_97
• +0000 UTC 2023-04-10 02:09 46.94 start target task_98
• +0000 UTC 2023-04-10 02:09 47.03 start target task_99
• +0000 UTC 2023-04-10 02:09 47.13 start target task_80
• +0000 UTC 2023-04-10 02:09 47.22 start target task_81
• +0000 UTC 2023-04-10 02:09 47.32 start target task_82
• +0000 UTC 2023-04-10 02:09 47.41 start target task_83
• +0000 UTC 2023-04-10 02:09 47.51 start target task_84
• +0000 UTC 2023-04-10 02:09 47.60 start target task_85
• +0000 UTC 2023-04-10 02:09 47.70 start target task_86
• +0000 UTC 2023-04-10 02:09 47.79 start target task_87
• +0000 UTC 2023-04-10 02:09 47.89 start target task_88
• +0000 UTC 2023-04-10 02:09 47.98 start target task_89
• +0000 UTC 2023-04-10 02:09 48.08 start target task_70
• +0000 UTC 2023-04-10 02:09 48.18 start target task_71
• +0000 UTC 2023-04-10 02:09 48.27 start target task_72
• +0000 UTC 2023-04-10 02:09 48.37 start target task_73
• +0000 UTC 2023-04-10 02:09 48.47 start target task_74
• +0000 UTC 2023-04-10 02:09 48.57 start target task_75
• +0000 UTC 2023-04-10 02:09 48.66 start target task_76
• +0000 UTC 2023-04-10 02:09 48.76 start target task_77
• +0000 UTC 2023-04-10 02:09 48.86 start target task_78
• +0000 UTC 2023-04-10 02:09 48.96 start target task_79
Error in Exception(...) :
reached elapsed time limit [cpu=300s, elapsed=300s]
$connections
[1] 1
$daemons
online instance
ws://10.1.0.128:44051/53db515f68c47be2cb0291d02bc0ca70716397b4 1 1
assigned
ws://10.1.0.128:44051/53db515f68c47be2cb0291d02bc0ca70716397b4 2
complete
ws://10.1.0.128:44051/53db515f68c47be2cb0291d02bc0ca70716397b4 2
[1] 9923
[1] "dispatcher is running"
[1] "queue"
[1] 98
[1] "results"
[1] 0
[1] "pending tasks"
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
'unresolved' logi NA
[1] "finished tasks"
• +0000 UTC 2023-04-10 02:14 43.42 end pipeline [5.13 minutes]
> unlink("_targets", recursive = TRUE)
> targets:::tar_runtime$unset_fun()
> controller$terminate()
>
> proc.time()
user system elapsed
255.944 19.628 308.907 |
This is likely new - I have not seen Github Actions hanging on Ubuntu before for So many |
It must be because GHA is using rcmdcheck rather than R CMD check. If you can replicate locally using the |
Thanks for the ideas. At https://github.com/ropensci/targets/actions/runs/4657411118, I reproduced the same problem with raw |
This is useful. There are 3 assigned, 2 completed. This means the computation is stuck at the server (or it has crashed or ended after 2 tasks - are you able to poll it independently using As there is only one server, it is expected for |
Yes, I polled the processx handle. When the pipeline reaches my manual timeout of 5 minutes, the server process is alive and sleeping. |
Interesting, 'sleeping' means sleeping on a wait or poll rather than computing anything. Are you able to print the results of the 2 successful cases to check they are what we expect? The most likely hypothesis at the moment is somehow it gets stuck sending back the evaluation result. |
The other easy thing that can be done is to manually set the URL to say "abstract://mirai" - if this works then it could be some network issue specific to GH. But then you say it works outside of R CMD check, so I don't think it is this. |
I will try. The last run completed no tasks within 5 minutes.
Maybe this will be possible, although I am currently trying a spoofed version of the pipeline where |
The spoofed workflow at https://github.com/ropensci/targets/actions/runs/4658550150/jobs/8244363706 completed without the original problem. So it must be something to do with the work that happens inside a task rather than the way Meanwhile, I have been debugging on Windows, and I figured out that Are there sensitive environment variables in |
Neither The only thing |
Just in case, I thought I should mention that the |
Yes, thanks, I did see that. It will definitely be helpful to work directly with language objects. Sorry I have not responded to your original comment, I have been stuck on this bug for the last few days. |
No need to explain - the integration with |
Some progress: I was finally able to reproduce this using |
After isolating the problem in shikokuchuo/mirai#53, I think it may be reasonable in the meantime to skip most of the |
Prework
Related GitHub issues and pull requests
Summary
This PR integrates
targets
withcrew
.crew
already has machinery for auto-scaling and heterogeneous workers, and through its launcher plugin interface, its ecosystem will eventually support distributed computing from SLURM to AWS Batch.crew
will become the main high-performance computing backend intargets
when it is more mature.