New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: hyperopt facelift #350
Comments
hey @aldanor, thanks for all your suggestions.
@aldanor how confident are you to pull of replacing mongo? I'd be happy if we could mostly focus on fixing failing tests and setting up CI, before making a new release. Afterwards we could begin to tackle other things. |
Just to reply quickly to a few points:
It's actually not a whole lot of work, at least the initial cleaning - at least compared to other points, like fixing up the tests or especially point (4.). In fact I've already hacked it together in a local py3-only branch to see if it's feasible before posting this, took me less than an hour :) Of course, there's more detailed py3-related work as outlined in (3.), that will take more time but again, it's fairly mechanical.
With a simple/naive in-memory db + joblib? Again, it's a bit of work but not that hard, I would perhaps volunteer to try and hack it together later (once we have a stable release with tests passing and all that). Basically, a stable method of pickling (e.g. cloudpickle), a multiprocessing-safe queue to transfer hand-pickled stuff in and out, and a parallelization backend. Most of work really would be trying to fit this to the existing Trials interface and all the existing conventions. |
I broke out the license question to new issue ^^ to discuss there. |
Re: dropping python2 support, I'd be fine with that. I sent an email to hyperopt-discuss to see if anyone objects. If no one objects in a week, then let's say python2 support is dropped. |
As a suggestion, if some of the other cleanup can be done without Python 2 support getting in the way, why not have a final Python 2 release for those who are stuck there for various reasons before moving to Python 3? |
* Remove references to bandit in plotting utils. * Remove some large block of deactivated, undocumented and unreadable code (ain't nobody gonna need that). * Cleanup some imports. * Minor style cleanup, removal of unused code, smarter column management in main_plot_history * Add plotting function to plot 1D trial attachments (e.g. learning progress)
This post has been 😴 for 18+ months! I did a small summary:
Other requests:
maybe @aldanor could include this (or something similar) as a TLDR at the beginning of the issue to keep track of the different branches this issue has created? |
Life would be a lot easier if we can work with dask and hyperopt together. |
This is continuation of a discussion started in #348, where I promised to summarise my thoughts on what could be cleaned up in the current codebase to hopefully make it easier for everyone.
Preface: while trying to build a conda package for hyperopt, I had to run the tests on circleci / appveyor / travis on all possible platforms, and many of them were failing randomly on different boxes. I then spent a good few days digging through the codebase and trying to make sense of it, also noting the existing todos in the comments.
Note 1: while some of the points below may seem too intrusive, please hear me out first.
Note 2: apologies for the wall of text.
I've noticed that some of the files claim they are MIT-licensed, e.g.
hyperopt/ipy.py
, while the whole project is 3-clause BSD.Would the authors consider switching the project to a somewhat friendlier MIT license? (unless there's reasons we're not aware about that prohibit doing so)
In all seriousness, I suggest dropping Python 2 support for good. Before jumping to pros/cons, let's look at the current state of things in Python world:
Pros:
Dependency on
six
package can be removedDependency on
future
package can be removedDecluttering the imports; all of the below can be removed (this is taken from
hyperopt/fmin.py
, as an example):which becomes just
Type annotations can (and should) be used throughout the code; this helps general readability and allows IDEs and static checkers like mypy to verify type correctness. E.g.,
can be annotated as
You could also use more complex annotations including generics, such as
Dict[K, V]
orIterator[T]
. Note that many docstrings that say 'this expects an argument of such and such type and returns such type' can essentially be replaced with a type annotation; instead, docstrings can be used for describing the effects and/or side effects, i.e. what the function actually does.Enables the use of
asyncio
if neededEnums should really be treated as such; e.g.,
should probably just be
(The same goes for string enums). For the sake of backwards-compatibility, functions expecting those enums could take either enums or the contained values (at least during the transition period).
Cons:
distribute_setup.py
can be removed as it's not neededsetup.py
except thesetup()
call itself can be removed, it's not needed (I've just checked).hyperopt-mongo-worker
) is through entry points hooks and not throughscripts
. E.g., you could have ahyperopt/mongo/worker.py
with amain()
function, and then insetup.py
registerRELEASE.txt
(update the version insetup.py
-> update the version inhyperopt/__init__.py
-> sdist -> git tag, I think the versioning process could be simplified by using tag-based scheme from the start. A good tool for that issetuptools_scm
, made by the same guys who maintain pip and setuptools. Basically, insetup.py
instead of hard-coding a version number you do this:A list of things I've noticed, in no particular order, just so I don't forget:
class A(object):
becomes justclass A:
super().__init__()
in Python 3 (instead ofBase.__init__()
orsuper(Foo, self).__init__()
.from hyperopt.base import foo
instead offrom .base import foo
.TODO
andXXX
comments and either remove or resolve them (or convert to GH issues), many of them have been there for years.SONify
or arguments likeN
.fmin
importsbase
which importsfmin
. There's even a note saying "Stop-gap implementation! fmin should have been a Trials method in the first place but for now it's still sitting in another file." Should it be refactored to become a Trials method then?It would be nice to have a multiprocessing backend that 'just works'. As for parallelization itself, it could be done via e.g.
joblib
. However, this will involve a non-trivial amount of work, e.g. implementing an (in-memory?) shared database which would collect results across processes. Another option is using distributed (dask), as suggested in #282.Note: looking at
hyperopt/ipy.py
, it says 'WARNING: IPythonTrials is not as complete, stable'. Would it become redundant if a joblib backend was implemented?pymongo
optional?)I understand that Mongo backend is an essential part of hyperopt, however quite often it won't be needed (people just import
fmin
andhp
and go on optimizing). It would be nice if, at the very least, the core code didn't try to importpymongo
unless you tried to create aMongoTrials
.It might be nice to provide a clean separation, so that mongo-related code is not intertwined with the core codebase; also,
pymongo
could be made a separate dependency, enabled via[mongo]
setuptools feature (which would also install the mongo backend), like so:Random hiccups I've stumbled upon while using hyperopt myself:
xgboost
orlightgbm
). However, there's parameters that are integers (e.g., maximum depth of a tree, of number of estimators). While you can specifyhp.quniform(min, max, 1)
, you will still have to convert it to int manually in the objective function. It would be nice if hyperopt supported this natively as this is a very common use case. E.g.,hp.quniformint(min, max, 1)
andhp.qloguniformint(min, max)
(with step defaulting to 1).Trials
from the objective function. Maybe I'm missing something, but the objective function currently receives space and doesn't have direct access to the trials object? E.g., if you wanted to log something like:Objective
subclass, with richer API, and providing access to trials object. Or maybe it could be done differently.Trials.best_trial
will throw an exception (argmin of an empty array) if there have been no successful trials recorded; the code does something like this:Trials.successful_trials
(optionally, alsoTrials.has_best
) properties./tests/
folder which would not have an__init__.py
in it. In-tree tests could still be easily run viaPYTHONPATH=. <run-tests>
if need be.nose
. There's no reason to install it if you're just using the package.tox
as well (on Travis) to simplify testing across intepreter versions, both locally and on CI. On Windows though, you'll have to do it manually since Python is normally installed via conda which is not compatible with venv.To the tests themselves:
nose
with a better test runner and an ecosystem of plugins. I could help with migration if need be, once the test failures on master are fixed. Main pros: shared fixtures, fixture parametrization, less boilerplate, expression expansion on failures, native support for exact/approximate matching for numpy arrays (in the most recent versions).test_domains
being imported in other test_files). The fixtures could be provided in the form of proper (possibly parametrized) fixtures; other shared stuff could also be exposed via conftest.DISPLAY
is not set; runnning plotting tests opens a new plotting window which is not nice. Plotting tests should configure matplotib backend to avoid either of the above problems.test_basic*
,test_mu_is_used*
,test_cdf*
,test_pdf_logpdf*
,test_random*
,TestExperimentWithThreads*
,test_plot*
,test_q1lognormal*
(exact failures can be reproduced if needed by re-enabling these tests in Add hyperopt recipe conda-forge/staged-recipes#4710 and re-running the builds on all platforms).The text was updated successfully, but these errors were encountered: