Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to customise names of grid tasks. #647

Closed
vgn1 opened this issue Mar 10, 2022 · 8 comments
Closed

Add option to customise names of grid tasks. #647

vgn1 opened this issue Mar 10, 2022 · 8 comments
Labels
good first issue Good for newcomers

Comments

@vgn1
Copy link

vgn1 commented Mar 10, 2022

When running tasks in a grid, a number is appended at the end of the tasks product names. For example, the following pipeline.yaml file:

- source: example_task.py
   name: example-task
   product:
       data: output/output_dataframe.csv
   grid:
       input_dataframe: ['birds.csv', 'fish.csv', 'flowers.csv']

would result in 3 products: output_dataframe-1.csv, output_dataframe-2.csv and output_dataframe-3.csv.

I would like to have an option to replace -1, -2 and -3 in the filenames with for example -birds, -fish and -flowers.

@idomic idomic added the good first issue Good for newcomers label Mar 10, 2022
@edublancas
Copy link
Contributor

edublancas commented Mar 10, 2022

Thanks for your feedback!

Any suggestions on how this could look like? I'm thinking of having a product_prefix like this:

- source: example_task.py
   name: example-task
   product:
       data: output/output_dataframe.csv
   grid:
       input_dataframe: ['birds.csv', 'fish.csv', 'flowers.csv']
   product_suffix: '_{{input_dataframe}}'

Then, the outputs would be 'output/output_dataframe_birds.csv', etc.

@vgn1
Copy link
Author

vgn1 commented Mar 10, 2022

Yes! Something like that would be great.

I forgot this in the original issue description, but I think the suffix also should be (optionally) added to the task name. So that the name can be the same as the products. So maybe task_suffix or task_name_suffix would be better. And I think the suffix field should be able to take a separate list like:
task_name_suffix: ['birds', 'fish', 'flowers']

Edit: Perhaps a task_name_prefix field could be added as well.

@hornste
Copy link

hornste commented Mar 11, 2022

I like the product_suffix as well. Would be nice to apply to task_suffix as well.

That way if you had something like:

grid:
   param1: [1,2,3,4]
   param2: [a,b,c,d]

you could do, product_suffix: '-{{param1}}-{{param2}}'

@edublancas
Copy link
Contributor

Hi all,

I started working on this. Bouncing off an idea.

To have a more flexible naming logic, I was thinking something like this:

- source: example_task.py
   name: example-task
   product:
       data: output/[[input_dataframe]]
   grid:
       input_dataframe: ['birds.csv', 'fish.csv', 'flowers.csv']

This would output: output/birds.csv, etc. Having the placeholders anywhere in the product path will allow you to organize them by folder and is a lot more customizable (the initial Idea I had with the product_suffix is kind of limited)

Note that this would use a special [[placeholder]] notation to prevent clashing with the regular {{placeholder}} notation from the env.yaml.

Thoughts?

@hornste
Copy link

hornste commented Mar 30, 2022

Yes, definitely would be helpful that you could have it anywhere in the product path, whether it was a directory name or part of the file name.

product:
data: output/[[input_dataframe]]/one.csv
data2: output/[[input_dataframe]].csv
data3: output/testing.[[extension]]

so could be a prefix, suffix, directory name, etc. and of course, would need to be able to use more than one of them at once.

product:
data: output/[[dir]]/[[filename]].[[extension]]

@edublancas
Copy link
Contributor

awesome! implementation is almost ready, this will be part of our next release!

@hornste
Copy link

hornste commented Mar 31, 2022

On further thought, it would be nice to be able to use the [[param]] anywhere in the task definition (source, name, product, etc.)

that way, you can also have more meaningful task names rather than the current number suffixes.

@edublancas
Copy link
Contributor

Thanks for the feedback. I created #698 to track this.

I see the use case for the task name. But what would it be a use case for source? The purpose of using grid is to execute the same source script with many parameters, so parametrizing the source file might be confusing - in such case, it might be better to add multiple grid entries, which is already supported. Thoughts?

neelasha23 pushed a commit to neelasha23/ploomber that referenced this issue May 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants