Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pre-pull request] Adding jobs to an existing registry #209

Open
cfhammill opened this issue Nov 9, 2018 · 6 comments
Open

[Pre-pull request] Adding jobs to an existing registry #209

cfhammill opened this issue Nov 9, 2018 · 6 comments

Comments

@cfhammill
Copy link
Contributor

Frequently I run into the situation that I run set of jobs in parallel with batchMap only to realize that I forgot to include an interesting case in the input list at a later date. Historically I've either made a new registry (ugh) or deleted and re-run everything (more-ugh). Today I really didn't want to do either so I figured out how to add jobs to an existing registry.

Is this something you'd consider adding if I put together a PR? I suspect the answer is probably "you should be using the experiment abstraction", but I suspect enough people run in to this problem that it would be beneficial to add. I've included code at the bottom for doing it very roughly, in the case it's going to be a part of the package I'd write something like batchUpdateMap which assembles the new param list for the user.

Example code for doing it manually below for if anyone needs it in the mean-time:

reg <- loadRegistry(reg, writeable = TRUE)

previous_max_id <- max(reg$status$job.id)
new_id <- previous_max_id + 1
new_params <- list(some = pars) #get skeleton from reg$defs$job.pars[[1]]

#Add row to job definitions
reg$defs <- 
  rbind(reg$defs
      , data.table(def.id = new_id
                 , job.pars = list(list(new_params)))

setkey(reg$defs, "def.id") #reset data.table key

#Add row to status table
reg$status <-
  rbind(reg$status
      , data.table(job.id = new_id, def.id = new_id, submitted = NA_real_, 
                   started = NA_real_, done = NA_real_, error = NA_character_, 
                   mem.used = NA_real_, resource.id = NA_integer_, batch.id = NA_character_, 
                   log.file = NA_character_, job.hash = NA_character_, job.name = NA_character_, 
                   key = "job.id"))

setkey(reg$status, "job.id") #reset data.table key

saveRegistry(reg) #Save our updates

Obviously this can be generalized for adding more than one job.

@tdhock
Copy link

tdhock commented Nov 9, 2018

that's funny I was doing pretty much the same hack yesterday

+1 for adding jobs to an existing registry

@mllg
Copy link
Owner

mllg commented Nov 12, 2018

I can include something like this. How do you want the interface to look like? Re-running batchMap() or something like addJobs(params = list())?

@mllg mllg mentioned this issue Nov 12, 2018
@cfhammill
Copy link
Contributor Author

I'd be interested in something along the lines of re-running batchMap, but with a different name e.g. batchMapAddition.

Originally I was thinking that the function should be required to use the same function as the original batchMap, but maybe that constraint isn't particularly useful.

@cfhammill
Copy link
Contributor Author

Also, as mentioned in the title I'm happy to write it, but if you'd like more control over the implementation and want to write it yourself just let me know.

@tdhock
Copy link

tdhock commented Nov 15, 2018

my use case for adding jobs to an existing registry involves dependencies #204 between jobs that each have different functions, so it would be useful if each job could have its own function

@mllg
Copy link
Owner

mllg commented Nov 16, 2018

my use case for adding jobs to an existing registry involves dependencies #204 between jobs that each have different functions, so it would be useful if each job could have its own function

Lifting all restrictions is probably better than only allowing to add more jobs for the same function.
However, this requires extensive refactoring and is not easy to implement in a backward compatible fashion. I can give it a shot, but I'm currently quite busy with other projects, so this will probably not get done before January. 😞

If one of you guys want to start a PR, here are the most important steps to consider:

  • In batchMap, the tuple of user function and more args must be stored using a unique file name (using their hash), and the hash must be stored in reg$defs.
  • batchMap just needs to append jobs. If you provide a different function or different more.args, it will automatically only be used for the new jobs. Adding the possibility to "patch" a function for already defined jobs can be added later.
  • JobCollections must store the hash to identify the function to load on the slave.
  • Job$fun() and Job$pars() must read from the new locations
  • The update routine has to adjust old registries to the new file system structure on first load

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants