New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change CLI to Click #77
Conversation
* Manually added to help text instead of using Click's default/show_default parameters
-l <file> / --logconfig=<file> : Log config file to use | ||
-g <datsrc> / --generateraml=<datsrc> : Generate and print RAML template | ||
from data source name | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anymore the aliases like '-t' for '--train'. Does it mean that we always have to use full 'train'? It doesn't matter for the review but just for my clarity ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I‘ll push a commit where the commands (train, predict, api, generate-raml) are matched from the beginning (as long as they are unique), so you can use e.g. t or tr which match train uniquely, g or gen or so, which matches generate-raml uniquely, a for api and so on.
\b | ||
Example JSON: { "petal.width": 1.4, "petal.length": 2.0, | ||
"sepal.width": 1.8, "sepal.length": 4.0 } | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always have to provide json with the arguments? Is there a possibility to add some arguments in for a single prediction api call after '?'. (as previous comment it is just a question for the general understanding not a remark)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that this PR only deals with the command line interface, so it won’t change how API calls work. The ‘mllaunchpad predict’ command calls prediction functionality directly without going through the Web API. This is also why there is no ‘?’ In the command line interface.
Before this change, it was not possible at all to provide any input to the prediction from the command line (it always used the empty set). Allowing for providing JSON from a file (or stdin) was the easiest way to make providing input possible (as prediction functionality expects the features as a dictionary). I’d wait and see if and how people use the new possibilities that ‘mllaunchpad predict’ provides, and use what we learned to decide whether we need to support more direct command line string format of prediction input (which might or might not look like query parameters).
Again, just to be unambiguous: we’re not taking away any ways that one was able to try out their models up until now (running the web api, then calling its url via browser/curl/postman), but we’re adding an additional way to do prediction without going through the Web API
setup.py
Outdated
@@ -15,6 +15,7 @@ | |||
"dill", | |||
"pandas", | |||
"pyyaml", | |||
"click", | |||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General remark about the requirements file - shouldn't be there a specific version of the package? Or should always the latest version be installed and in case something doesn't work with the latest version then we would use the specific version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question!
TL;DR: You want to play it loose with requirements for libraries (which mllaunchpad
is), and strict for applications (which mllaunchpad
is not).
Edit: This person explains it better than I do: https://stackoverflow.com/a/44938662
In this case, for a library (which mllaunchpad
is), not specifying specific versions is okay. Because our users will have to manage a bunch of their own dependencies (=requirements), too, we want to give them as much flexibility as possible. Otherwise, if we specified for example pandas==1.0.1
and the library's users had to use a pre-1.0 version of pandas
to be compatible with some other library they want to use, this would cause them problems.
Not specifying a version is not the same as saying "use the newest version available". While in practice, of course, pip
tries to get the newest versions that are available, we leave this open, so that there's wiggle room to find compatible versions. Some tools (not pip
(at least not yet), but e.g. pip-tools
, pipenv
, poetry
and dephell
) do proper dependency resolution, searching for versions that satisfy all dependencies (also inter-dependency).
That said, as soon as we learn from actual users that mllaunchpad
does not work with specific versions of its dependencies, such as very old versions of pandas
, etc., there's no harm in adding these >=[version]
contstraints to setup.py
then. I'd rather react if something doesn't work, than make a choice now that is maybe ill-informed and unnecessarily excludes some users. It would be a lot of work to make a well-informed choice on this (to try out old dependency versions and see if everything still works). Maybe when we have good unit test coverage, we might want to run a matrix check on the dependency versions to fill in this information.
As a side note, one practice I see a lot in libraries recently and explicitly don't want to adopt is that some libraries just generally exclude the next major version of all their dependencies. I get that they want to make sure that nothing breaks when some dependency has a breaking change, because this would be out of their control. But this is only acceptable if they have some daily check running that automatically tests new versions of all dependencies for compatibility and then auto-deploys a new version of their package to pypi with the updated dependency specifications. Otherwise, if mllaunchpad
is slow to adopt new versions, while the users want to use a newer major version of one of mllaunchpad
s dependency, they are out of luck. In my opinion, particularly seeing how some projects tend to appear semi-abandoned from time to time, projects that do this just play it safe on the backs of their users (particularly those who only use pip
), forbidding installs of their library in settings where they would work perfectly, creating even more of a dependency hell for their users.
For applications on the other hand (which mllaunchpad
is not), you definitely want to freeze/pin your dependencies to specific versions, for reproducibility of your deployments. For example, data scientists who use mllaunchpad
to deploy an API should absolutely pin their project's dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Andreas, I made a poging to review your code, although it was pretty difficult since I don't understand it fully :(.
Hi Gosia, thanks for looking at the PR and for your questions. I hope my answers above make sense. I just pushed a change right now so that you can abbreviate commands like "train" with "t", and so on. Good point that you made there; it improves the CLI's ease-of-use, in my opinion. 😄 If there are no further comments or questions, I'll earmark this PR for release 1.0 and work on the unit tests and documentation as soon as I find the time. |
Hi Andreas, |
# Manually resolved conflicts: # pytest.ini # requirements/prod.txt # setup.py
# Manually resolved conflicts: # pytest.ini # requirements/prod.txt # setup.py
# Conflicts: # docs/_static/examples.zip # setup.cfg
All done, waiting for the 1.0.0 release to merge. |
|
Reference issues/PRs
When implementing #75 (addressing code style issues), I was reminded again that
mllaunchpad
's command-line interface modulecli
has a high cyclomatic complexity (command line arguments are checked using an enormousif-elif-elif-elif...-else
monster, which is bad style, as it's easy to lose track on what happens where and why exactly).What this implements/fixes
I took this as an opportunity to do a complete cleanup/rewrite of the
cli
module, using theClick
framework.@Bart92, @gosiarorat, @bobplatte, @planeetjupyter What would be your opinion on these changes? Would this stand in the way of something you are doing or planning to do with
mllaunchpad
?Upsides:
--
) vs. options like--config
(-c
for short) or--log-config
(-l
). This is the same way most standard linux commands work, cf.git
andpip
.Downside: This changes how
mllaunchpad
is used from the command line. I actually think it's an improvement, but a breaking change is a breaking change, and maybe someone already uses the "old" way.Here are some examples of the differences (using unabbreviated
--options
andarguments
for clarity (both can be abbreviated in actual use)):Training:
Predicting:
Creating an initial RAML file from datasource
petals
:Like before,
--config
and--log-config
are optional arguments (the file names can also be optained from environment variables and default files).Other comments
If we decide to go through with this pull request, I still need to
mllaunchpad <cmd> --help
to work,--config
and--log-config
,