Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

looper documentation; complete(?) todo list #254

Closed
17 tasks done
stolarczyk opened this issue Apr 29, 2020 · 25 comments
Closed
17 tasks done

looper documentation; complete(?) todo list #254

stolarczyk opened this issue Apr 29, 2020 · 25 comments

Comments

@stolarczyk
Copy link
Member

stolarczyk commented Apr 29, 2020

looper has changed a lot. This is a list of new features that we will need to describe in the documentation before the release:

  • --settings arg can be used to point to YAML file with template variables.
  • compute settings priority order:
  1. CLI (--settings YAML file specification < --compute itemized specification)
  2. config
  3. pipeline interface
  4. divcfg
  • .looper.yaml in the project directory can be used to pre-set CLI options
  • arguments priotity order:
  1. CLI
  2. looper dotfile
  3. argparser defaults
  • looper init pep.yaml can be used to produce a ready to use .looper.yaml file
  • looper runp, project_pipeline concept, how a set of pipelines to run on the project level is determined: pipeline interface specification #244 (comment)
  • --compute can be used to specify arguments other than resources (as previously)
  • --command-extra is the only way to append strings to the pipeline command
  • command_template section in the pipeline interface is a Jinja2 template; variable namespaces availability
  • pipeline input and output schemas
  • size_dependent_variables moved from pipleline interface and is a TSV file that pipeline interface points to
  • dynamic_variables_command_template is a Jinja2 template to construct a system command run in a subprocess. It has to return a JSON object, which is then used to populate submission templates, but see "compute settings priority order"
  • sample YAML representation destination can be set in the pipeline interface
  • use of eido for PEP validation
  • required and all input files are set in the pipeline input schema with required_inputs_attr and all_inputs_attr, respectively
  • pipeline interfaces can be provided on CLI.
  • update changelog

already opened issues:

@stolarczyk stolarczyk added this to the 1.1 milestone Apr 29, 2020
@stolarczyk stolarczyk mentioned this issue Apr 29, 2020
14 tasks
@nsheff
Copy link
Contributor

nsheff commented May 4, 2020

can you clarify compute settings priority order? Shouldn't it be:

  1. Looper CLI (--compute)
  2. PEP config, project.looper.compute section
  3. Pipeline interface, pipeline.compute section
  4. Activated divvy compute package (--settings or --package)

@stolarczyk
Copy link
Member Author

your order reflects current implementation, except for --settings.

I thought that --settings is a YAML version of --compute. If that statement is correct, it does not make sense to me to make it not overwrite settings specified in the config files (2. and 3.).

@nsheff
Copy link
Contributor

nsheff commented May 4, 2020

Ok, you could think of it that way. the other way to think of it is that --settings and --package are both divvy modifiers; so we let divvy handle them however it does. We then do what we're going to do after that.

@nsheff
Copy link
Contributor

nsheff commented May 4, 2020

but wait, I guess --compute is also a divvy modifier, isn't it? So, yeah.

@nsheff
Copy link
Contributor

nsheff commented May 4, 2020

the question is: is settings more like package or more like compute. I guess I thought of it as similar to package.

@stolarczyk
Copy link
Member Author

I implemented this based on this sentence from #245 :

-s SETTINGS, --settings SETTINGS -- YAML file with job settings to populate the template.

so it was just my interpretation. But I think I still don't understand your intent; why would you to need to specify --package in a YAML file? It does not spare any typing since you'd need to provide a path to the file anyway.

Is this the YAML file you'd use?

package: local

I thought we want:

mem: 10000
cores: 2
jobname: test

@nsheff
Copy link
Contributor

nsheff commented May 4, 2020

the second example is correct.

the settings is just between package and compute. It's legitimate to put it in either place, I think. I just don't think I would use settings for on-the-fly stuff... I'd use it more like an ad-hoc package. that's why I put it lower priority. you think of it as on-the-fly, so lumped it with compute for priority. it can be used in either way.

I think the way you did it is fine.

nsheff added a commit that referenced this issue May 4, 2020
@nsheff
Copy link
Contributor

nsheff commented May 5, 2020

For the input schema section --

the input schema format is based on the extended PEP JSON-schema validation framework, but adds some additional looper-specific capabilities. The available extended sections are:

which is true of input schemas:

  1. Looper uses exactly eido's functionality for validating input schema
  2. the input schema extends eido's functionality

Then, I have the same question for output schemas -- but this, I'm pretty sure all the extra stuff extends outside the scope of eido, right? Because it's not used for validation, but for summarizer stuff.

@stolarczyk
Copy link
Member Author

2 is true

eido is not aware of required_intput_attrs and input_attrs

Then, I have the same question for output schemas -- but this, I'm pretty sure all the extra stuff extends outside the scope of eido, right? Because it's not used for validation, but for summarizer stuff.

yes, eido is not used at all with respect to output schema

@nsheff
Copy link
Contributor

nsheff commented May 5, 2020

eido is not aware of required_intput_attrs and input_attrs

Wait -- that's not what we said here:

@nsheff
Copy link
Contributor

nsheff commented May 5, 2020

Ok, I guess the question I have is this: Does it makes sense to move required_intput_attrs and input_attrs into eido? That's how I thought it was implemented; it makes sense to me that way.

If the other way makes more sense, why?

@stolarczyk
Copy link
Member Author

that's not what we said here:

I confused eido functionality with general PEP schema functionality, sorry..

inputs validation is currently a Sample class method, so it is implemented in peppy. This location is relic of the previous input handling with was a looper.Sample method. Even though the looper.Sample no longer exists and the method itself has been completely rewritten it stayed a Sample method. Maybe because it sticks the determined input sizes as a Sample attribute..

I agree, it makes much more sense to move it to eido.

@nsheff
Copy link
Contributor

nsheff commented May 8, 2020

I agree, it makes much more sense to move it to eido.

Ok I've updated the docs to reflect this; looper now just refers to the eido docs, and the required_intput_attrs and input_attrs stuff is all described in the eido docs. I agree it makes more sense this way.

@stolarczyk
Copy link
Member Author

great, I'll move the input validation to eido today

@nsheff
Copy link
Contributor

nsheff commented May 11, 2020

if you have a moment, can you look through the latest dev version of the docs and see if we're missing anything?

@stolarczyk
Copy link
Member Author

found an inconsistency in the docs in https://github.com/pepkit/looper/blob/dev/docs/defining-a-project.md#2-add-a-looper-section-to-your-pep related to #253

in the current implementationcommand_extra key in Project.looper will not be picked up by looper unless a dotfile is used. According to your suggestion the dotfile is a "project config" importing the main project config. Subsequently, the values from the dofile (enriched with the Project.looper section data) are used to pre set the CLI options.

The rest of the keys mentioned in the docs are valid, because I made the system backwards compatible. So for example the following scenario works: no dotfile, but Project.output_dir set in the config

@stolarczyk
Copy link
Member Author

I guess we could make it possible, but this would require some retooling

@nsheff
Copy link
Contributor

nsheff commented May 20, 2020

@stolarczyk do you consider the docs complete now? can close this issue?

@stolarczyk
Copy link
Member Author

stolarczyk commented May 20, 2020

after recent round of changes related to looper section in the PEP the pipeline interaface selection strategy has changed. I don't know what is the current status of the docs regarding that.

The current implementation reflects what I wrote in the last comment in #244 minus the "behind the scenes" sample-based project pipeline interface selection.

Additionally, the command_template namespaces might be out of date

@stolarczyk
Copy link
Member Author

I can read through the docs tonight and see if they reflect the reality ;)

@nsheff
Copy link
Contributor

nsheff commented May 20, 2020

I think I finished updating the pipeline interface docs...

@stolarczyk
Copy link
Member Author

stolarczyk commented May 21, 2020

I've read through the docs and made some changes, I think they are ready. But after mkdocs serve I get:

WARNING -  A relative path to 'hello-world.md' is included in the 'nav' configuration, which is not found in the documentation files 
WARNING -  Documentation file 'README.md' contains a link to 'hello-world.md' which is not found in the documentation files.

The hello world jupyter notebook is missing so the markdown file is no longer created, but linked in 2 places. We lost the entire docs_jupyter directory from the repo. Was that intentional?

@nsheff
Copy link
Contributor

nsheff commented May 21, 2020

Was that intentional?

No, let me look into that.

@nsheff
Copy link
Contributor

nsheff commented May 21, 2020

I deleted them in this commit somehow: 27ca124

I got it back with:

git checkout b99edb46a056c7021b01edeae3b1aa0c20f2bcf9 docs_jupyter/hello-world.ipynb
git checkout b99edb46a056c7021b01edeae3b1aa0c20f2bcf9 docs_jupyter/build/.gitignore

Now the docs build again.

@nsheff
Copy link
Contributor

nsheff commented May 21, 2020

I'll declare these docs complete enough, then, and close this issue.

@nsheff nsheff closed this as completed May 21, 2020
nsheff pushed a commit to pepkit/eido that referenced this issue Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants