Change CLI to Click #77

schuderer · 2020-02-23T17:48:59Z

Reference issues/PRs

When implementing #75 (addressing code style issues), I was reminded again that mllaunchpad's command-line interface module cli has a high cyclomatic complexity (command line arguments are checked using an enormous if-elif-elif-elif...-else monster, which is bad style, as it's easy to lose track on what happens where and why exactly).

What this implements/fixes

I took this as an opportunity to do a complete cleanup/rewrite of the cli module, using the Click framework.

@Bart92, @gosiarorat, @bobplatte, @planeetjupyter What would be your opinion on these changes? Would this stand in the way of something you are doing or planning to do with mllaunchpad?

Upsides:

Better code structure, and the command line arguments are now (in my opinion) more cleanly separated into commands (train/retest/(run)api/generate-raml, without initial --) vs. options like --config (-c for short) or --log-config (-l). This is the same way most standard linux commands work, cf. git and pip.
Easier to unit-test
We also get it to generate a help text essentially for free:

Usage: mllaunchpad [OPTIONS] COMMAND [ARGS]...

  Train, test or run a config file's model.

Options:
  --version              Show the version and exit.
  -v, --verbose          Print debug messages.
  -c, --config PATH      Use this configuration file.
  -l, --log-config PATH  Use this log configuration file.
  -h, --help             Show this message and exit.

Commands:
  api            Run API server in unsafe debug mode.
  generate-raml  Generate and print RAML template from data source name.
  predict        Run prediction on features from JSON file ( - for stdin).
  retest         Retest model, update metrics.
  train          Run training, store model and metrics.

Downside: This changes how mllaunchpad is used from the command line. I actually think it's an improvement, but a breaking change is a breaking change, and maybe someone already uses the "old" way.

Here are some examples of the differences (using unabbreviated --options and arguments for clarity (both can be abbreviated in actual use)):

Training:

# Before:
$ mllaunchpad --config my_conf.yml --train
# After:
$ mllaunchpad --config my_conf.yml train

Predicting:

# Before:
$ mllaunchpad --config my_conf.yml --predict  # only empty features possible
# After:
$ mllaunchpad --config my_conf.yml predict args.json

Creating an initial RAML file from datasource petals:

# Before:
$ mllaunchpad --config my_conf.yml --generateraml petals >my.raml
# After:
$ mllaunchpad --config my_conf.yml generate-raml petals >my.raml

Like before, --config and --log-config are optional arguments (the file names can also be optained from environment variables and default files).

Other comments

If we decide to go through with this pull request, I still need to

get mllaunchpad <cmd> --help to work,
get help to print the defaults description for --config and --log-config,
add unit tests, and
update the documentation (see https://stackoverflow.com/questions/39392753/how-can-i-document-click-commands-using-sphinx)
update any comments or scripts in the examples that might still refer to the old CLI style

* Manually added to help text instead of using Click's default/show_default parameters

gosiarorat · 2020-02-25T10:35:38Z

mllaunchpad/cli.py

-l <file>   / --logconfig=<file>      : Log config file to use
-g <datsrc> / --generateraml=<datsrc> : Generate and print RAML template
-                                        from data source name
-"""


I don't see anymore the aliases like '-t' for '--train'. Does it mean that we always have to use full 'train'? It doesn't matter for the review but just for my clarity ;)

Good point! I‘ll push a commit where the commands (train, predict, api, generate-raml) are matched from the beginning (as long as they are unique), so you can use e.g. t or tr which match train uniquely, g or gen or so, which matches generate-raml uniquely, a for api and so on.

gosiarorat · 2020-02-25T10:39:51Z

mllaunchpad/cli.py

+    \b
+    Example JSON: { "petal.width": 1.4, "petal.length": 2.0,
+                    "sepal.width": 1.8, "sepal.length": 4.0 }
+    """


Do we always have to provide json with the arguments? Is there a possibility to add some arguments in for a single prediction api call after '?'. (as previous comment it is just a question for the general understanding not a remark)

Keep in mind that this PR only deals with the command line interface, so it won’t change how API calls work. The ‘mllaunchpad predict’ command calls prediction functionality directly without going through the Web API. This is also why there is no ‘?’ In the command line interface.

Before this change, it was not possible at all to provide any input to the prediction from the command line (it always used the empty set). Allowing for providing JSON from a file (or stdin) was the easiest way to make providing input possible (as prediction functionality expects the features as a dictionary). I’d wait and see if and how people use the new possibilities that ‘mllaunchpad predict’ provides, and use what we learned to decide whether we need to support more direct command line string format of prediction input (which might or might not look like query parameters).

Again, just to be unambiguous: we’re not taking away any ways that one was able to try out their models up until now (running the web api, then calling its url via browser/curl/postman), but we’re adding an additional way to do prediction without going through the Web API

gosiarorat · 2020-02-25T10:44:26Z

setup.py

@@ -15,6 +15,7 @@
    "dill",
    "pandas",
    "pyyaml",
+    "click",
 ]


General remark about the requirements file - shouldn't be there a specific version of the package? Or should always the latest version be installed and in case something doesn't work with the latest version then we would use the specific version?

Great question!

TL;DR: You want to play it loose with requirements for libraries (which mllaunchpad is), and strict for applications (which mllaunchpad is not).

Edit: This person explains it better than I do: https://stackoverflow.com/a/44938662

In this case, for a library (which mllaunchpad is), not specifying specific versions is okay. Because our users will have to manage a bunch of their own dependencies (=requirements), too, we want to give them as much flexibility as possible. Otherwise, if we specified for example pandas==1.0.1 and the library's users had to use a pre-1.0 version of pandas to be compatible with some other library they want to use, this would cause them problems.

Not specifying a version is not the same as saying "use the newest version available". While in practice, of course, pip tries to get the newest versions that are available, we leave this open, so that there's wiggle room to find compatible versions. Some tools (not pip (at least not yet), but e.g. pip-tools, pipenv, poetry and dephell) do proper dependency resolution, searching for versions that satisfy all dependencies (also inter-dependency).

That said, as soon as we learn from actual users that mllaunchpad does not work with specific versions of its dependencies, such as very old versions of pandas, etc., there's no harm in adding these >=[version] contstraints to setup.py then. I'd rather react if something doesn't work, than make a choice now that is maybe ill-informed and unnecessarily excludes some users. It would be a lot of work to make a well-informed choice on this (to try out old dependency versions and see if everything still works). Maybe when we have good unit test coverage, we might want to run a matrix check on the dependency versions to fill in this information.

As a side note, one practice I see a lot in libraries recently and explicitly don't want to adopt is that some libraries just generally exclude the next major version of all their dependencies. I get that they want to make sure that nothing breaks when some dependency has a breaking change, because this would be out of their control. But this is only acceptable if they have some daily check running that automatically tests new versions of all dependencies for compatibility and then auto-deploys a new version of their package to pypi with the updated dependency specifications. Otherwise, if mllaunchpad is slow to adopt new versions, while the users want to use a newer major version of one of mllaunchpads dependency, they are out of luck. In my opinion, particularly seeing how some projects tend to appear semi-abandoned from time to time, projects that do this just play it safe on the backs of their users (particularly those who only use pip), forbidding installs of their library in settings where they would work perfectly, creating even more of a dependency hell for their users.

For applications on the other hand (which mllaunchpad is not), you definitely want to freeze/pin your dependencies to specific versions, for reproducibility of your deployments. For example, data scientists who use mllaunchpad to deploy an API should absolutely pin their project's dependencies.

gosiarorat

Hi Andreas, I made a poging to review your code, although it was pretty difficult since I don't understand it fully :(.

schuderer · 2020-02-25T15:12:37Z

Hi Gosia, thanks for looking at the PR and for your questions. I hope my answers above make sense. I just pushed a change right now so that you can abbreviate commands like "train" with "t", and so on. Good point that you made there; it improves the CLI's ease-of-use, in my opinion. 😄

If there are no further comments or questions, I'll earmark this PR for release 1.0 and work on the unit tests and documentation as soon as I find the time.

gosiarorat · 2020-02-25T15:32:02Z

Hi Andreas,
Your answers make sense, I have no further comments :)

# Manually resolved conflicts: # pytest.ini # requirements/prod.txt # setup.py

…_to_click

# Conflicts: # docs/_static/examples.zip # setup.cfg

schuderer · 2020-04-08T16:57:55Z

All done, waiting for the 1.0.0 release to merge.

* Add some tests for model_actions (#99) * 86 type guessing fails (#100) * fix_issue86_type_guessing_fails_oracle * Update CHANGELOG.rst add fix #86 * Add some tests for api module (#101) Co-authored-by: bobplatte <33421563+bobplatte@users.noreply.github.com>

schuderer · 2020-04-24T21:09:04Z

update any comments or scripts in the examples that might still refer to the old cli style

schuderer added 3 commits February 23, 2020 16:35

Change CLI to Click (no tests, no doc)

b4d69b7

Fix subcommand --help, improve help texts

ddd02a7

Add default description for --config and --log-config

c579f51

* Manually added to help text instead of using Click's default/show_default parameters

schuderer changed the title ~~Change CLI to Click (no tests, no doc)~~ Change CLI to Click Feb 23, 2020

gosiarorat reviewed Feb 25, 2020

View reviewed changes

schuderer added 2 commits February 25, 2020 16:05

Merge branch 'master' into move_cli_to_click

8a832bd

Add cli command abbreviation and abbreviated cli script name 'mllp'

a8b8fc5

schuderer added this to To do in Release 1.0.0 tracker Feb 25, 2020

schuderer mentioned this pull request Feb 29, 2020

(Automated) dependency compatibility tests #79

Open

schuderer added the enhancement New feature or request label Mar 1, 2020

schuderer added the hygiene The code could be better, or better managed label Mar 14, 2020

schuderer added 3 commits March 20, 2020 19:51

Merge remote-tracking branch 'upstream/master' into move_cli_to_click

9f8fb52

# Manually resolved conflicts: # pytest.ini # requirements/prod.txt # setup.py

Merge remote-tracking branch 'upstream/master' into move_cli_to_click

8a784d9

# Manually resolved conflicts: # pytest.ini # requirements/prod.txt # setup.py

Merge remote-tracking branch 'origin/move_cli_to_click' into move_cli…

5bd2b56

…_to_click

schuderer moved this from To do to In progress in Release 1.0.0 tracker Apr 3, 2020

schuderer mentioned this pull request Apr 3, 2020

Coverage >80% #46

Closed

6 tasks

schuderer added 3 commits April 6, 2020 19:32

Add unit tests for CLI

bed3b67

Merge branch 'master' into move_cli_to_click

7485e0d

# Conflicts: # docs/_static/examples.zip # setup.cfg

Update CLI docs using Click

d400d5b

schuderer marked this pull request as ready for review April 8, 2020 16:48

schuderer and others added 3 commits April 22, 2020 23:29

Merge from master

ce464b0

* Add some tests for model_actions (#99) * 86 type guessing fails (#100) * fix_issue86_type_guessing_fails_oracle * Update CHANGELOG.rst add fix #86 * Add some tests for api module (#101) Co-authored-by: bobplatte <33421563+bobplatte@users.noreply.github.com>

Merge branch 'master' into move_cli_to_click

9aa3490

Update CHANGELOG.rst

3ba95b8

schuderer added 2 commits April 28, 2020 18:29

Update remaining docs

27a17ba

Update CLI usage comments in examples

e9f27e7

schuderer merged commit ab41628 into master Apr 28, 2020

Release 1.0.0 tracker automation moved this from In progress to Done Apr 28, 2020

schuderer deleted the move_cli_to_click branch April 28, 2020 16:43

schuderer mentioned this pull request Jun 8, 2020

Release 1.0.0 #122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change CLI to Click #77

Change CLI to Click #77

schuderer commented Feb 23, 2020 •

edited

gosiarorat Feb 25, 2020

schuderer Feb 25, 2020 •

edited

gosiarorat Feb 25, 2020

schuderer Feb 25, 2020 •

edited

gosiarorat Feb 25, 2020

schuderer Feb 25, 2020 •

edited

gosiarorat left a comment

schuderer commented Feb 25, 2020 •

edited

gosiarorat commented Feb 25, 2020

schuderer commented Apr 8, 2020

schuderer commented Apr 24, 2020 •

edited

Change CLI to Click #77

Change CLI to Click #77

Conversation

schuderer commented Feb 23, 2020 • edited

Reference issues/PRs

What this implements/fixes

Other comments

gosiarorat Feb 25, 2020

Choose a reason for hiding this comment

schuderer Feb 25, 2020 • edited

Choose a reason for hiding this comment

gosiarorat Feb 25, 2020

Choose a reason for hiding this comment

schuderer Feb 25, 2020 • edited

Choose a reason for hiding this comment

gosiarorat Feb 25, 2020

Choose a reason for hiding this comment

schuderer Feb 25, 2020 • edited

Choose a reason for hiding this comment

gosiarorat left a comment

Choose a reason for hiding this comment

schuderer commented Feb 25, 2020 • edited

gosiarorat commented Feb 25, 2020

schuderer commented Apr 8, 2020

schuderer commented Apr 24, 2020 • edited

schuderer commented Feb 23, 2020 •

edited

schuderer Feb 25, 2020 •

edited

schuderer Feb 25, 2020 •

edited

schuderer Feb 25, 2020 •

edited

schuderer commented Feb 25, 2020 •

edited

schuderer commented Apr 24, 2020 •

edited