Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change CLI to Click #77

Merged
merged 16 commits into from Apr 28, 2020
Merged

Change CLI to Click #77

merged 16 commits into from Apr 28, 2020

Conversation

schuderer
Copy link
Owner

@schuderer schuderer commented Feb 23, 2020

Reference issues/PRs

When implementing #75 (addressing code style issues), I was reminded again that mllaunchpad's command-line interface module cli has a high cyclomatic complexity (command line arguments are checked using an enormous if-elif-elif-elif...-else monster, which is bad style, as it's easy to lose track on what happens where and why exactly).

What this implements/fixes

I took this as an opportunity to do a complete cleanup/rewrite of the cli module, using the Click framework.

@Bart92, @gosiarorat, @bobplatte, @planeetjupyter What would be your opinion on these changes? Would this stand in the way of something you are doing or planning to do with mllaunchpad?

Upsides:

  • Better code structure, and the command line arguments are now (in my opinion) more cleanly separated into commands (train/retest/(run)api/generate-raml, without initial --) vs. options like --config (-c for short) or --log-config (-l). This is the same way most standard linux commands work, cf. git and pip.
  • Easier to unit-test
  • We also get it to generate a help text essentially for free:
Usage: mllaunchpad [OPTIONS] COMMAND [ARGS]...

  Train, test or run a config file's model.

Options:
  --version              Show the version and exit.
  -v, --verbose          Print debug messages.
  -c, --config PATH      Use this configuration file.
  -l, --log-config PATH  Use this log configuration file.
  -h, --help             Show this message and exit.

Commands:
  api            Run API server in unsafe debug mode.
  generate-raml  Generate and print RAML template from data source name.
  predict        Run prediction on features from JSON file ( - for stdin).
  retest         Retest model, update metrics.
  train          Run training, store model and metrics.

Downside: This changes how mllaunchpad is used from the command line. I actually think it's an improvement, but a breaking change is a breaking change, and maybe someone already uses the "old" way.

Here are some examples of the differences (using unabbreviated --options and arguments for clarity (both can be abbreviated in actual use)):

Training:

# Before:
$ mllaunchpad --config my_conf.yml --train
# After:
$ mllaunchpad --config my_conf.yml train

Predicting:

# Before:
$ mllaunchpad --config my_conf.yml --predict  # only empty features possible
# After:
$ mllaunchpad --config my_conf.yml predict args.json

Creating an initial RAML file from datasource petals:

# Before:
$ mllaunchpad --config my_conf.yml --generateraml petals >my.raml
# After:
$ mllaunchpad --config my_conf.yml generate-raml petals >my.raml

Like before, --config and --log-config are optional arguments (the file names can also be optained from environment variables and default files).

Other comments

If we decide to go through with this pull request, I still need to

@schuderer schuderer changed the title Change CLI to Click (no tests, no doc) Change CLI to Click Feb 23, 2020
-l <file> / --logconfig=<file> : Log config file to use
-g <datsrc> / --generateraml=<datsrc> : Generate and print RAML template
from data source name
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anymore the aliases like '-t' for '--train'. Does it mean that we always have to use full 'train'? It doesn't matter for the review but just for my clarity ;)

Copy link
Owner Author

@schuderer schuderer Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I‘ll push a commit where the commands (train, predict, api, generate-raml) are matched from the beginning (as long as they are unique), so you can use e.g. t or tr which match train uniquely, g or gen or so, which matches generate-raml uniquely, a for api and so on.

\b
Example JSON: { "petal.width": 1.4, "petal.length": 2.0,
"sepal.width": 1.8, "sepal.length": 4.0 }
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always have to provide json with the arguments? Is there a possibility to add some arguments in for a single prediction api call after '?'. (as previous comment it is just a question for the general understanding not a remark)

Copy link
Owner Author

@schuderer schuderer Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that this PR only deals with the command line interface, so it won’t change how API calls work. The ‘mllaunchpad predict’ command calls prediction functionality directly without going through the Web API. This is also why there is no ‘?’ In the command line interface.

Before this change, it was not possible at all to provide any input to the prediction from the command line (it always used the empty set). Allowing for providing JSON from a file (or stdin) was the easiest way to make providing input possible (as prediction functionality expects the features as a dictionary). I’d wait and see if and how people use the new possibilities that ‘mllaunchpad predict’ provides, and use what we learned to decide whether we need to support more direct command line string format of prediction input (which might or might not look like query parameters).

Again, just to be unambiguous: we’re not taking away any ways that one was able to try out their models up until now (running the web api, then calling its url via browser/curl/postman), but we’re adding an additional way to do prediction without going through the Web API

setup.py Outdated
@@ -15,6 +15,7 @@
"dill",
"pandas",
"pyyaml",
"click",
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General remark about the requirements file - shouldn't be there a specific version of the package? Or should always the latest version be installed and in case something doesn't work with the latest version then we would use the specific version?

Copy link
Owner Author

@schuderer schuderer Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question!

TL;DR: You want to play it loose with requirements for libraries (which mllaunchpad is), and strict for applications (which mllaunchpad is not).

Edit: This person explains it better than I do: https://stackoverflow.com/a/44938662

In this case, for a library (which mllaunchpad is), not specifying specific versions is okay. Because our users will have to manage a bunch of their own dependencies (=requirements), too, we want to give them as much flexibility as possible. Otherwise, if we specified for example pandas==1.0.1 and the library's users had to use a pre-1.0 version of pandas to be compatible with some other library they want to use, this would cause them problems.

Not specifying a version is not the same as saying "use the newest version available". While in practice, of course, pip tries to get the newest versions that are available, we leave this open, so that there's wiggle room to find compatible versions. Some tools (not pip (at least not yet), but e.g. pip-tools, pipenv, poetry and dephell) do proper dependency resolution, searching for versions that satisfy all dependencies (also inter-dependency).

That said, as soon as we learn from actual users that mllaunchpad does not work with specific versions of its dependencies, such as very old versions of pandas, etc., there's no harm in adding these >=[version] contstraints to setup.py then. I'd rather react if something doesn't work, than make a choice now that is maybe ill-informed and unnecessarily excludes some users. It would be a lot of work to make a well-informed choice on this (to try out old dependency versions and see if everything still works). Maybe when we have good unit test coverage, we might want to run a matrix check on the dependency versions to fill in this information.

As a side note, one practice I see a lot in libraries recently and explicitly don't want to adopt is that some libraries just generally exclude the next major version of all their dependencies. I get that they want to make sure that nothing breaks when some dependency has a breaking change, because this would be out of their control. But this is only acceptable if they have some daily check running that automatically tests new versions of all dependencies for compatibility and then auto-deploys a new version of their package to pypi with the updated dependency specifications. Otherwise, if mllaunchpad is slow to adopt new versions, while the users want to use a newer major version of one of mllaunchpads dependency, they are out of luck. In my opinion, particularly seeing how some projects tend to appear semi-abandoned from time to time, projects that do this just play it safe on the backs of their users (particularly those who only use pip), forbidding installs of their library in settings where they would work perfectly, creating even more of a dependency hell for their users.

For applications on the other hand (which mllaunchpad is not), you definitely want to freeze/pin your dependencies to specific versions, for reproducibility of your deployments. For example, data scientists who use mllaunchpad to deploy an API should absolutely pin their project's dependencies.

Copy link
Contributor

@gosiarorat gosiarorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Andreas, I made a poging to review your code, although it was pretty difficult since I don't understand it fully :(.

@schuderer
Copy link
Owner Author

schuderer commented Feb 25, 2020

Hi Gosia, thanks for looking at the PR and for your questions. I hope my answers above make sense. I just pushed a change right now so that you can abbreviate commands like "train" with "t", and so on. Good point that you made there; it improves the CLI's ease-of-use, in my opinion. 😄

If there are no further comments or questions, I'll earmark this PR for release 1.0 and work on the unit tests and documentation as soon as I find the time.

@gosiarorat
Copy link
Contributor

Hi Andreas,
Your answers make sense, I have no further comments :)

@schuderer schuderer added the enhancement New feature or request label Mar 1, 2020
@schuderer schuderer added the hygiene The code could be better, or better managed label Mar 14, 2020
# Manually resolved conflicts:
#	pytest.ini
#	requirements/prod.txt
#	setup.py
# Manually resolved conflicts:
#	pytest.ini
#	requirements/prod.txt
#	setup.py
@schuderer schuderer moved this from To do to In progress in Release 1.0.0 tracker Apr 3, 2020
@schuderer schuderer mentioned this pull request Apr 3, 2020
6 tasks
@schuderer schuderer marked this pull request as ready for review April 8, 2020 16:48
@schuderer
Copy link
Owner Author

All done, waiting for the 1.0.0 release to merge.

schuderer and others added 3 commits April 22, 2020 23:29
* Add some tests for model_actions (#99)

* 86 type guessing fails (#100)

* fix_issue86_type_guessing_fails_oracle

* Update CHANGELOG.rst

add fix #86

* Add some tests for api module (#101)

Co-authored-by: bobplatte <33421563+bobplatte@users.noreply.github.com>
@schuderer
Copy link
Owner Author

schuderer commented Apr 24, 2020

  • update any comments or scripts in the examples that might still refer to the old cli style

@schuderer schuderer merged commit ab41628 into master Apr 28, 2020
Release 1.0.0 tracker automation moved this from In progress to Done Apr 28, 2020
@schuderer schuderer deleted the move_cli_to_click branch April 28, 2020 16:43
@schuderer schuderer mentioned this pull request Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hygiene The code could be better, or better managed
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants