Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduced User-Defined outputs_path for Pipeline Execution #402

Conversation

MinuraPunchihewa
Copy link
Collaborator

This Pull Request aims to allow users to define a custom path to store the results of their Pipeline executions. This is done by using the outputs_path parameter in the run() method of the Pipeline class.

Given below is an example of how this works,

from naas.pipeline.pipeline import (Pipeline, DummyStep,End)

pipeline = Pipeline()

step1 = DummyStep("Notebook 1")
step2 = DummyStep("Notebook 2")
step3 = DummyStep("Notebook 3")

pipeline >> step1 >> step2 >> step3 >> End()
pipeline.run(outputs_path='outputs')

As shown here, the outputs directory is created as specified above and the results of the pipeline executions are stored in separate sub-directories within it,
image

The contents of the outputs directory,
image

If a path is not specified, the results will be stored within the pipeline_executions directory, by default.

Note: This directory is also visible in the above screenshot, because I have the run the Pipeline twice; once without specifying the outputs_path and a second time by passing the argument.

MinuraPunchihewa and others added 3 commits June 16, 2023 23:20
@jravenel
Copy link

Hey @MinuraPunchihewa thanks!👌 from what I understand from you screenshots, if I specify outputs_paths then the pipeline exécutions folder is not created anymore and go straight to the location specified. Is that correct ?

@MinuraPunchihewa
Copy link
Collaborator Author

Hey @jravenel,
Yes, that is correct.

@jravenel
Copy link

Ok. I think @FlorentLvr would argue it will be best to keep the name of the folder pipeline_executions every time. Otherwise things can get messy, or give us the job to create a new folder, name it, etc when we don't really need that effort.

@MinuraPunchihewa
Copy link
Collaborator Author

Oh, I don't quite understand. I worked on this to resolve this issue,
#373

@@ -39,7 +39,7 @@ class ExecutionContext:
output_dir: str = None
output_path: str = None

def __init__(self, output_dir: str = "pipeline_executions"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MinuraPunchihewa, I think that what @jravenel refers to is to keep this default value when output_dir is not provided, that way we will keep the local pipeline_executions directory.

And then we can chose to completely overwrite it 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dr0p42 This default value is available in the run() method of the the Pipeline class.

@@ -428,7 +428,7 @@ def run(
self,
style: Literal["diagram", "progess"] = "diagram",
monitor: bool = True,
outputs_path="",
outputs_path="pipeline_executions",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Dr0p42,
This is what I meant. If the parameter is not passed, the results will be stored in the pipeline_executions directory.

@FlorentLvr
Copy link
Contributor

Ok. I think @FlorentLvr would argue it will be best to keep the name of the folder pipeline_executions every time. Otherwise things can get messy, or give us the job to create a new folder, name it, etc when we don't really need that effort.

I don't understand what you mean @jravenel! This PR seems to resolve the issue i had 🙈

@FlorentLvr
Copy link
Contributor

@MinuraPunchihewa,
Let's summarize to ensure everything functions as intended:

  • If no value is specified for "output_dir," the pipeline executions will be stored in a folder named "pipeline_executions," located at the position where the pipeline notebook is invoked. This maintains the same behavior as before.
  • If a specific path is provided for "output_dir," such as "/home/ftp/data-product/outputs/pipeline_executions," all executions will be stored in that specified location when the pipeline is executed.
    Is my understanding accurate?

@jravenel
Copy link

jravenel commented Jun 19, 2023

The question I was raising is:
Do we want to have:
A/ all the pipeline executions in any folder (it will create a lot of folders in the directory)
B/ specify a folder (outputs) and keep a pipeline_executions folder in the target folder generated automatically

@MinuraPunchihewa @FlorentLvr

@MinuraPunchihewa
Copy link
Collaborator Author

@MinuraPunchihewa, Let's summarize to ensure everything functions as intended:

  • If no value is specified for "output_dir," the pipeline executions will be stored in a folder named "pipeline_executions," located at the position where the pipeline notebook is invoked. This maintains the same behavior as before.
  • If a specific path is provided for "output_dir," such as "/home/ftp/data-product/outputs/pipeline_executions," all executions will be stored in that specified location when the pipeline is executed.
    Is my understanding accurate?

@FlorentLvr Yes, you are right.

@jravenel
Copy link

And my point is why do I need to specify /pipeline_executions in the output dir? Can't we have pipeline_executions created automatically ?

@FlorentLvr
Copy link
Contributor

And my point is why do I need to specify /pipeline_executions in the output dir? Can't we have pipeline_executions created automatically ?

@jravenel, you don't. It is default parameter in the function. @MinuraPunchihewa

@FlorentLvr
Copy link
Contributor

@MinuraPunchihewa, Let's summarize to ensure everything functions as intended:

  • If no value is specified for "output_dir," the pipeline executions will be stored in a folder named "pipeline_executions," located at the position where the pipeline notebook is invoked. This maintains the same behavior as before.
  • If a specific path is provided for "output_dir," such as "/home/ftp/data-product/outputs/pipeline_executions," all executions will be stored in that specified location when the pipeline is executed.
    Is my understanding accurate?

@FlorentLvr Yes, you are right.

@MinuraPunchihewa, sounds good to me! You can merge the PR once all checks are valid :)

@MinuraPunchihewa
Copy link
Collaborator Author

Hey @Dr0p42, @FlorentLvr,
Do you know why the checks are failing?

@jravenel
Copy link

Hey @Dr0p42, @FlorentLvr, Do you know why the checks are failing?

@MinuraPunchihewa it's coming from the linter, go to GitHub Action you'll be able to have infos.

Screenshot 2023-06-20 at 16 38 16

@MinuraPunchihewa
Copy link
Collaborator Author

Hey @jravenel,
Yes, I saw that, but it's not clear to me what the issue is.

@jravenel
Copy link

Calling Mr @Dr0p42 for help here 🙏

@sonarcloud
Copy link

sonarcloud bot commented Jun 29, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@Dr0p42 Dr0p42 changed the title Introduced User-Defined outputs_path for Pipeline Execution feat: Introduced User-Defined outputs_path for Pipeline Execution Jun 29, 2023
@Dr0p42 Dr0p42 merged commit bf941f9 into main Jun 29, 2023
@Dr0p42 Dr0p42 deleted the 373-naaspipeline-setup-the-output_dir-as-parameters-to-get-the-pipeline_executions branch June 29, 2023 07:44
@Dr0p42
Copy link
Collaborator

Dr0p42 commented Jun 29, 2023

Ok @MinuraPunchihewa @jravenel this is finally fixed :)

@jravenel
Copy link

jravenel commented Jun 29, 2023

;) thanks @Dr0p42. We will be implementing it today or tomorrow.

Dr0p42 added a commit that referenced this pull request Jul 20, 2023
* fix: Remove templates tab

* fix: Remove usage and credits

* fix: Remove links at the top and update the account button

* feat: Introduced User-Defined outputs_path for Pipeline Execution (#402)

* used the output_path paremeter in creating the ExecutionContext within the run() method of Pipeline

* fix: Local dev

* updated the default value of the outputs_path parameter in the run() method of Pipeline and removed the default value from ExecutionContext

* fix: Applied Black

* fix: apply black

* ci(tests): Trying to fix tzlocal

---------

Co-authored-by: Maxime Jublou <jubloum@gmail.com>
Co-authored-by: FlorentLvr <48032461+FlorentLvr@users.noreply.github.com>

* bump: version 2.10.3 → 2.11.0

* fix(dependency): update naas_drivers 0.110.2

* bump: version 2.11.0 → 2.11.1

* fix: Remove codecov upload for now

* bump: version 2.11.1 → 2.11.2

* fix: Rework based on workspace arrival

* fix: Applied linters

---------

Co-authored-by: Minura Punchihewa <49385643+MinuraPunchihewa@users.noreply.github.com>
Co-authored-by: FlorentLvr <48032461+FlorentLvr@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: actions-user <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

naas.pipeline() - Setup the output_dir as parameters to get the pipeline_executions
4 participants