Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--params option added to CLI commands similar to one that Kedro has #884

Merged

Conversation

IanVlasov
Copy link
Contributor

@IanVlasov IanVlasov commented Jun 3, 2022

Signed-off-by: Yan Vlasov vlasov.yan.ed@gmail.com

Description

Resolve #883

Development notes

  • Added additional --params click.option to cli.py and passed it to run_server_kwargs
  • Added extra_params param to server.py and passed it to kedro_data_loader.load_data
  • Added 'extra_params' param to data_loader.py and passed it to KedroSession.create
  • Updated cli tests

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

@datajoely
Copy link
Contributor

This is really cool!

@IanVlasov IanVlasov force-pushed the feature/add-params-to-cli-commands branch from 0de0dc3 to 9f72a55 Compare June 3, 2022 09:44
@tynandebold
Copy link
Member

Thank you for this, @IanVlasov. We'll have a look at it and get back to you!

cc @AntonyMilneQB @limdauto

@antonymilne
Copy link
Contributor

Hi @IanVlasov, thanks very much for your contribution and congratulations on your first open source PR! My initial thought on this was "why would we want to add params to kedro viz?", but after reading #883 and remembering the discussion on kedro-org/kedro#1436 I think it makes more sense!

Just to check I'm getting this right, please could you confirm the following things:

  1. Is the after_context_created + global variable approach working well for you outside kedro viz?
  2. The current problem you have is that the pipeline run by kedro run --params=path:path/to/parameter/file is completely different from the one shown by kedro viz (which is presumably empty or just doesn't work at all) because the pipeline definition in kedro viz won't have the right pipeline.yml file loaded?
  3. Does the solution you've given here actually work right for you in kedro viz? At a glance it looks like it should, but would be good to try out on your own project if you haven't already. Let me know if you're not sure how to test it.

What you say about the current --params approach/yml pipeline being a bit of a hacky workaround that we don't necessarily encourage is definitely correct. But given that extra_params is the only argument we're not already passing to KedroSession.create, if this PR is all it takes to get this working then it seems like a good addition to me since it will make kedro viz usable in such cases where it currently isn't at all 👍

@IanVlasov
Copy link
Contributor Author

Hello @AntonyMilneQB, thanks for congratulations! And, yes, you are totally right that this issue is related to the one I asked in kedro project about.

  1. I can confirm that the approach with after_context_created + global variable works good. At least, it covers our case. I can give a bit more context, probably you will find a bit more elegant solution that could be implemented in kedro: as I have already described in #883 we have a group of data scientists which life we try to make a bit easier. They do many experiments during every single day, staying more or less inside the same pipeline. So, we defined something like a pipeline template where they have some freedom to play with the parameters or to decide which nodes they need or doesn't need and in what order. Still, we do not give them the full control to define pipelines through yaml files (like it was represented here for example). We assume that if they really need to develop something new, then they should really know what exactly they require, so it can be done using python. To speed up their experiments, we would like to give them an opportunity to run several experiments simultaneously from different folders: they define a parameter file, put it in any folder they like, copy it as many times as they want and can run everything from different terminal session modifying their parameters manually or by using --params option. Related data for any particular run is written to the folder with the related parameter file, and all experiments are tracked by mlflow (many thanks to Galileo-Galilei and his awesome extension). As I understand, currently, kedro suppots only single run at one time because otherwise we would have collisions inside data folder. Our next step is to write an extension that could do something like grid-search through some set of possible parameters, so we could see it in MLFlow as a nested run. All local data produced by pipeline we leave as something auxiliary if, for example, they see that something goes wrong, and they need to investigate it via jupyter notebooks. Seems it is pretty full description of our particular case. I didn't mention only that for defining proper paths and inputs/outputs to the nodes we use slightly rewritten version of kedro-wings, but I am not sure if it is important. Hopefully, this description will be useful for you in furthe development.

  2. Yes, that is true. As I use templated config loader and globals.yml I can get around it by simply defining path: path/to/parameter/file inside globals.yml for the used environment. But it is slightly inconvenient when you run a pipeline itself by using one command (kedro run --params=path:path/to/parameter/file) and should manually modify project files to see the visualization in kedro viz instead of write something like kedro viz --params=path:path/to/parameter/file where we just change run option to viz.

  3. I can confirm that with these modifications we can see the visualization of the correct parameter file. I tried to modify kedro viz files inside my project (which is definitely not good, but still) and tested it a bit.

@merelcht
Copy link
Member

merelcht commented Jun 7, 2022

Another Kedro user reached out to me about this functionality as well, specifically for Experiment Tracking:

"I am running a node ABC by passing params --params user:poorni,env:dev and DS in my team runs a node ABC by passing params --params user:ds,env:dev. We are storing the the resultant dataset of node ABC in two different locations.

(our catalog entries are like below)

ABC:
   filepath: blabla/{user}/{env}/abc.csv

Assume this abc dataset is having metrics, I want to see the results which DS has stored in abc dataset. So when i am running kedro viz it shows the results of dataset which got stored under --params user:poorni,env:dev if i wanted to see DS results i need to change in globals.yml file

if i have an option through kedro viz only we can just pass the params kedro viz --params user:poorni,env:dev"

In summary, currently she has to change her globals.yml in order to see the results for a run where specific parameters were used.

@antonymilne
Copy link
Contributor

antonymilne commented Jun 7, 2022

@IanVlasov thanks very much for all the extra information, it's very helpful! I'm definitely happy to accept this change in principle then. I'll review now in terms of the actual implementation.

Unrelated to this PR, but re: "only single run at one time because otherwise we would have collisions inside data folder". This is an interesting topic that has definitely been considered by other people before so worth searching around in case you find other ways of doing it that might help you. e.g. kedro-org/kedro#1303. One very simple approach if you're worried about collisions between multiple users running the pipeline is to make users save to different locations, e.g. set user: ian in your conf/local/globals.yml file and then use ${user} in your dataset filepaths to make sure that your output doesn't overwrite anyone else's. Versioning datasets might also help. But definitely sounds like your particular situation needs some special treatment!

Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you for your work on it! I've left a few small comments and will be very happy to approve and get this nice new feature in after those are dealt with 🎉

RELEASE.md Outdated Show resolved Hide resolved
package/kedro_viz/launchers/cli.py Outdated Show resolved Hide resolved
package/kedro_viz/server.py Outdated Show resolved Hide resolved
package/kedro_viz/server.py Outdated Show resolved Hide resolved
dependabot bot and others added 7 commits June 8, 2022 12:02
…dro-org#881)

Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/EventSource/eventsource/releases)
- [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md)
- [Commits](EventSource/eventsource@v1.1.0...v1.1.1)

---
updated-dependencies:
- dependency-name: eventsource
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tynan DeBold <thdebold@gmail.com>
Signed-off-by: yan.vlasov <vlasov.yan.ed@gmail.com>
Signed-off-by: Yan Vlasov <vlasov.yan.ed@gmail.com>

Signed-off-by: yan.vlasov <vlasov.yan.ed@gmail.com>
Signed-off-by: yan.vlasov <vlasov.yan.ed@gmail.com>
Signed-off-by: yan.vlasov <vlasov.yan.ed@gmail.com>
Signed-off-by: yan.vlasov <vlasov.yan.ed@gmail.com>
Signed-off-by: yan.vlasov <vlasov.yan.ed@gmail.com>
Bumps [semver-regex](https://github.com/sindresorhus/semver-regex) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/sindresorhus/semver-regex/releases)
- [Commits](https://github.com/sindresorhus/semver-regex/commits/v3.1.4)

---
updated-dependencies:
- dependency-name: semver-regex
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: yan.vlasov <vlasov.yan.ed@gmail.com>
@IanVlasov IanVlasov force-pushed the feature/add-params-to-cli-commands branch from 78e1bdb to e1ecdcb Compare June 8, 2022 08:03
@IanVlasov
Copy link
Contributor Author

@AntonyMilneQB many thanks for the reviewing! I added suggested changes.

Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @IanVlasov! I think there's one more occurrence of KedroSession.create that needs updating in data_loader.py but after that's done please feel free to merge 👍

@IanVlasov
Copy link
Contributor Author

I double-checked it. I don't know why, but in my last commit there are really only 2 new occurences. It seems GitHub does not count previous change. In the 'Files changed' tab everything should be ok and 3 occurences of KedroSession.create were changed. Or am I missing something?

@antonymilne
Copy link
Contributor

Ahh yes I see it now. Sorry, I was just misreading things on github! Feel free to merge when ready 🙂 And thank you again for this!

Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @IanVlasov! I will merge this on your behalf!

@rashidakanchwala rashidakanchwala merged commit fd84153 into kedro-org:main Jun 8, 2022
cvetankanechevska pushed a commit that referenced this pull request Jun 9, 2022
…884)

Signed-off-by: Cvetanka <cventanka_nechevska@external.mckinsey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

--params option in CLI commands
6 participants