Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request : Make FLAML installable with Conda #194

Closed
fleuryc opened this issue Sep 8, 2021 · 35 comments
Closed

Feature Request : Make FLAML installable with Conda #194

fleuryc opened this issue Sep 8, 2021 · 35 comments
Labels
help wanted Extra attention is needed

Comments

@fleuryc
Copy link

fleuryc commented Sep 8, 2021

Basically the title.

Gherkin style :

  • AS A ML developer
  • WHEN I run conda install flaml (or conda -c conda-forge install flaml)
  • THEN it installs FLAML in my current Conda environment
    • AND I can execute the following code :
from flaml import AutoML
from sklearn.datasets import load_iris
# Initialize an AutoML instance
automl = AutoML()
# Specify automl goal and constraint
automl_settings = {
    "time_budget": 10,  # in seconds
    "metric": 'accuracy',
    "task": 'classification',
    "log_file_name": "test/iris.log",
}
X_train, y_train = load_iris(return_X_y=True)
# Train with labeled input data
automl.fit(X_train=X_train, y_train=y_train,
           **automl_settings)
# Predict
print(automl.predict_proba(X_train))
# Export the best model
print(automl.model)
@sonichi sonichi added the help wanted Extra attention is needed label Sep 9, 2021
@sonichi sonichi pinned this issue Sep 11, 2021
@sonichi
Copy link
Collaborator

sonichi commented Sep 11, 2021

Can someone help with this issue? Right now catboost is the only required package unavailable from conda. I can make flaml work when catboost is not installed. Can someone help with uploading the package to conda after that?

@MichalChromcak
Copy link
Collaborator

MichalChromcak commented Oct 6, 2021

@sonichi I can have a look into that. If you want, you can check the progress here

Right now, the issue is with catboost (might need a new release to make it work first)

@sonichi
Copy link
Collaborator

sonichi commented Oct 6, 2021

@MichalChromcak Thanks for taking care of it. Do I need to remove catboost from the "install_requires" field in setup.py? That will affect the pypi package. Is there a way you could skip catboost only in the conda package?

@MichalChromcak
Copy link
Collaborator

@sonichi

Catboost

catboost in install_requires is really the cause.

I tried to keep it along the way, yet it requires plotly, matplotlib and graphviz to be installed and even with that, the graphviz being installed over conda (see here) still fails to build on the pip check command with catboost requires graphviz to be installed (see here)

Unfortunately, install_requires is the minimum, that is installed along the way (the script part of the build contains command pip install --no-deps https://files.pythonhosted.org/..../FLAML-0.6.5-py3-none-any.whl)

Tarball over wheel

Recommended way is also to use tarball distribution (not wheels) - would you be able to update that as well?

If yes, feel free to re-use e.g. hcrystalball's CD github actions

What are your thoughts on this?

@sonichi
Copy link
Collaborator

sonichi commented Oct 7, 2021

@MichalChromcak Thanks for the suggestions. @qingyun-wu I tend to remove catboost from install_requires and make it optional. What do you think? We'll need to make a PR in the automlbenchmark to install the catboost option. Also, we may need to document this in the notebook example.

@qingyun-wu
Copy link
Collaborator

Removing catboost from `install_requires' sounds good to me. I can make a PR to automlbenchmark to reflect this change.

@sonichi sonichi mentioned this issue Oct 9, 2021
@sonichi
Copy link
Collaborator

sonichi commented Oct 9, 2021

@MichalChromcak and @qingyun-wu v0.6.6 is published to PyPI with the CD github actions. Thanks @MichalChromcak again for recommending it.
Request for @MichalChromcak could you try the conda package build again?
Request for @qingyun-wu could you make the PR in the automlbenchmark?

@sonichi
Copy link
Collaborator

sonichi commented Oct 11, 2021

@MichalChromcak please wait for v0.6.7 as I am adding pandas into install_requires in #244

@sonichi
Copy link
Collaborator

sonichi commented Oct 11, 2021

@MichalChromcak v0.6.7 is released.

@MichalChromcak
Copy link
Collaborator

MichalChromcak commented Oct 13, 2021

@sonichi The conda package with version v0.6.7 is available here.
The code snippet metined by @fleuryc works if a test folder exists (otherwise it fails for me), but I suppose this is unrelated.
I am also adding a badge to the README.md (feel free to position it differently).

There are still few things to consider...

  1. There is a following automation in place

    1. When a new release is created on PyPI, the conda-forge's regro-cf-autotick-bot will create a PR with new build to be created
    2. conda-forge automerge bot (github actions) will merge the PR automatically (notifications are being sent do feedstock maintainers), but does not update the dependent package versions in the conda recipe (meta.yaml)
  2. Dependent packages version updates

    1. If there is an update in setup.py of some dependency (say pandas to a higher version) it should be reflected also in the conda recipe. We should align how to make this process easy and reliable. Depending on how often the release without dependent package version changes is made, we could decide whether to
      1. Switch off automerge bot and while reviewing the PR, directly update the dependencies
      2. Keep automerge - when notified, check for version matches and if an update is needed create new PR(s) with build_number>0
  3. Maintainers

    1. I think it is a good practise to have more than 1 maintainer. Is there someone, who would like to be added? Once it is setup, it should rather be about maintenance I guess. Adding a maintainer is easily done with a PR with comment in the conda feedstock (I would just need a github account name to extend the current list of conda maintainers)

@sonichi
Copy link
Collaborator

sonichi commented Oct 13, 2021

@sonichi The conda package with version v0.6.7 is available here. The code snippet metined by @fleuryc works if a test folder exists (otherwise it fails for me), but I suppose this is unrelated. I am also adding a badge to the README.md (feel free to position it differently).

That's fantastic. I really appreciate it.

There are still few things to consider...

  1. There is a following automation in place

    1. When a new release is created on PyPI, the conda-forge's regro-cf-autotick-bot will create a PR with new build to be created
    2. conda-forge automerge bot (github actions) will merge the PR automatically (notifications are being sent do feedstock maintainers), but does not update the dependent package versions in the conda recipe (meta.yaml)
  2. Dependent packages version updates

    1. If there is an update in setup.py of some dependency (say pandas to a higher version) it should be reflected also in the conda recipe. We should align how to make this process easy and reliable. Depending on how often the release without dependent package version changes is made, we could decide whether to

      1. Switch off automerge bot and while reviewing the PR, directly update the dependencies
      2. Keep automerge - when notified, check for version matches and if an update is needed create new PR(s) with build_number>0

Dependency in install_requires rarely changes. So we can keep automerge.

  1. Maintainers

    1. I think it is a good practise to have more than 1 maintainer. Is there someone, who would like to be added? Once it is setup, it should rather be about maintenance I guess. Adding a maintainer is easily done with a PR with comment in the conda feedstock (I would just need a github account name to extend the current list of conda maintainers)

@fleuryc @qingyun-wu @ekzhu @slhuang @cdeil @yue-msr @anshumandutt @aminsaied @angela97lin @Yard1 @gianpdomiziani @bnriiitb @int-chaos any volunteer to be added to the conda feedstock maintainer? See the paragraph above.

@MichalChromcak
Copy link
Collaborator

@sonichi glad to be helpful :)

Good, we should just make sure, that if package versions in install_requires updates, we update it also in the conda recipe.

@sonichi
Copy link
Collaborator

sonichi commented Oct 14, 2021

@MichalChromcak To confirm, if the package version in install_requires changes, we need to update in the conda recipe before we publish the pypi package, right? Otherwise the automerge will publish a conda package with wrong dependencies and I suppose there is no way to revert it.

@MichalChromcak
Copy link
Collaborator

@sonichi One can make a manual second build with updated recipe. It might take some time (to 1 hour?) until conda install would prefer the new build for the same version…(see the two builds on the anaconda.org as an example)

@sonichi
Copy link
Collaborator

sonichi commented Oct 15, 2021

It's good to know that the same version can be overwritten, which is different from pypi packages. Thanks!

@sonichi
Copy link
Collaborator

sonichi commented Oct 22, 2021

@MichalChromcak FYI #223 (comment). There seems to be a conda package of catboost, though I've no idea how to leverage it.

@MichalChromcak
Copy link
Collaborator

@sonichi I can look into that again if you want to extend the dependency to catboost.

I dropped the idea of running optional pip check command in the test section and only check for importability, so it might be easier to accomplish it.

Do you want me to try adding it?

One should also make a mind in general what should and what should not be a part of conda bundle. Whether to

  • keep it rather small (that's what we went for in HCrystalBall) and let others install other packages along side (allowing e.g. different versions) or
  • keep it rich (big bundle to download, harder to maintain, easier to install*).

Note: The ease of installation might though be solved with adding a full blown environment.yml somewhere to the repository as an example how to typically install everything (see e.g. this section)

@MichalChromcak
Copy link
Collaborator

@sonichi let me know, whether to make a change still or whether the issue can be closed already please :)

@sonichi
Copy link
Collaborator

sonichi commented Oct 22, 2021

@MichalChromcak I'm with you in keeping the installation small. We don't need to include catboost in the bundle. I'm just curious what failed our earlier conda bundle if catboost is already in conda. I thought it was because catboost was not available in conda.

@sonichi sonichi mentioned this issue Oct 25, 2021
@sonichi
Copy link
Collaborator

sonichi commented Oct 27, 2021

@MichalChromcak could you change the required sklearn version in the conda package to be >=0.24? The reason is in #260

@MichalChromcak
Copy link
Collaborator

@sonichi sorry, wasn't around for a while.

The catboost - if I remember it well the problem was with other libraries, that would need to be added - like graphviz. For some reason, it seems, that the conda install <package_name> work as expected, but following pip check command is sometimes missing already existing installation. The pip check is no longer there, so maybe it would work now.

I will update the sklearn version and try again if the latest release is as expected.

@MichalChromcak
Copy link
Collaborator

MichalChromcak commented Nov 4, 2021

@sonichi there already is sklearn >=0.24 in the recipe. The following now is the minimal configuration of the conda environment if created as is making the test above pass. Any update needed?

conda create -n flaml
conda install -c conda-forge flaml
conda list
# packages in environment at /home/michal/miniconda3/envs/flaml:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
_py-xgboost-mutex         2.0                       cpu_0    conda-forge
ca-certificates           2021.10.8            ha878542_0    conda-forge
flaml                     0.7.0              pyhd8ed1ab_0    conda-forge
joblib                    1.1.0              pyhd8ed1ab_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
libblas                   3.9.0           12_linux64_openblas    conda-forge
libcblas                  3.9.0           12_linux64_openblas    conda-forge
libffi                    3.4.2                h9c3ff4c_4    conda-forge
libgcc-ng                 11.2.0              h1d223b6_11    conda-forge
libgfortran-ng            11.2.0              h69a702a_11    conda-forge
libgfortran5              11.2.0              h5c6108e_11    conda-forge
libgomp                   11.2.0              h1d223b6_11    conda-forge
liblapack                 3.9.0           12_linux64_openblas    conda-forge
libopenblas               0.3.18          pthreads_h8fe5266_0    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_11    conda-forge
libxgboost                1.3.3                h2531618_0  
libzlib                   1.2.11            h36c2ea0_1013    conda-forge
lightgbm                  3.2.1            py39he80948d_0    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
numpy                     1.21.3           py39hdbf815f_1    conda-forge
openssl                   3.0.0                h7f98852_2    conda-forge
pandas                    1.3.4            py39hde0f152_0    conda-forge
pip                       21.3.1             pyhd8ed1ab_0    conda-forge
py-xgboost                1.3.3            py39h06a4308_0  
python                    3.9.7           hf930737_3_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
scikit-learn              1.0.1            py39h7c5d8c9_1    conda-forge
scipy                     1.7.1            py39hee8e79c_0    conda-forge
setuptools                58.5.2           py39hf3d152e_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlite                    3.36.0               h9cd32fc_2    conda-forge
threadpoolctl             3.0.0              pyh8a188c0_0    conda-forge
tk                        8.6.11               h27826a3_1    conda-forge
tzdata                    2021e                he74cb21_0    conda-forge
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
xgboost                   1.3.3            py39h06a4308_0  
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h36c2ea0_1013    conda-forge

@sonichi
Copy link
Collaborator

sonichi commented Nov 4, 2021

@MichalChromcak that looks good to me. Thanks for double checking.
@qingyun-wu @fleuryc @cdeil @angela97lin Could you test the new conda package, especially on macos? We can close this issue if it works.
@cdeil you had a good suggestion about adding conda package test into the CI. Would you like to create a PR or an issue for it?

@cdeil
Copy link
Collaborator

cdeil commented Nov 4, 2021

To be realistic, no, I don't have any spare time, sorry.

@skzhang1
Copy link
Collaborator

@MichalChromcak @sonichi Can I help maintain the conda package? I am very happy to do this.

@MichalChromcak
Copy link
Collaborator

@sonichi such test might be added to the test section of the conda package creation if you want - replacing just import check with something more detailed. Is there already some code, which should be executed to check for correctness?

@Shao-kun-Zhang from my end - sure, happy to have more people on board.

@skzhang1
Copy link
Collaborator

@MichalChromcak ok, If there is anything I can help with, please tell me!

@sonichi
Copy link
Collaborator

sonichi commented Nov 12, 2021

@MichalChromcak what about adding the test for the example code in the main post?
Change "test/iris.log" into "iris.log".
@Shao-kun-Zhang Thanks for helping!

@MichalChromcak
Copy link
Collaborator

@sonichi Yes I can do that :)

@MichalChromcak
Copy link
Collaborator

@sonichi, @Shao-kun-Zhang
I added the test in this PR.

There were few architectural decisions made, let me comment on them...

I decided to keep the test within the flaml-feedstock repository, as it provides more control over

  • the environment (no need to install all the dependencies)
  • test duration
  • test definition itself

On the other hand, this setup might cause out of sync behavior with the package development.

So if wanted, another options are to:

  1. use existing test within flaml repository (less prefered)
    • this might need dependencies, that are not currently installed
    • we might get in more complicated and time consuming test setup here
  2. create a new test in flaml repository, which is dedicated to the conda forge build
    • in this case, the test file itself might be the same as the one my PR introduces, just live in different repository
    • such setup might benefit from pytest markers (we could mark the file to not run by default in your CI, and explicitly run it in conda forge build test section)

As this seems nicer for the testing code locality, both above mentioned options require contributions to the flaml repository, making redeployments if the test changes and so on.

Let me know, if you prefer the current way, where the conda forge build is self contained also with the test or any of the proposed changes.

@MichalChromcak
Copy link
Collaborator

MichalChromcak commented Nov 16, 2021

@Shao-kun-Zhang I added you to the list of maintainers (via admin web services over issue#6, that created PR#7 and I already approved the review (just mentioning the process for further extension of maintainers if you would do that later with someone else and also to mention how cool conda forge has this setup).

Can you double check, that expected changes work and add comment to the PR, that you accept this?

@sonichi
Copy link
Collaborator

sonichi commented Nov 16, 2021

@sonichi, @Shao-kun-Zhang I added the test in this PR.

Thank you! It looks good.

There were few architectural decisions made, let me comment on them...

I decided to keep the test within the flaml-feedstock repository, as it provides more control over

  • the environment (no need to install all the dependencies)
  • test duration
  • test definition itself

On the other hand, this setup might cause out of sync behavior with the package development.

So if wanted, another options are to:

  1. use existing test within flaml repository (less prefered)

    • this might need dependencies, that are not currently installed
    • we might get in more complicated and time consuming test setup here
  2. create a new test in flaml repository, which is dedicated to the conda forge build

    • in this case, the test file itself might be the same as the one my PR introduces, just live in different repository
    • such setup might benefit from pytest markers (we could mark the file to not run by default in your CI, and explicitly run it in conda forge build test section)

I like option 2. We can put the test file you need in test/conda/.

As this seems nicer for the testing code locality, both above mentioned options require contributions to the flaml repository, making redeployments if the test changes and so on.

Let me know, if you prefer the current way, where the conda forge build is self contained also with the test or any of the proposed changes.

@MichalChromcak
Copy link
Collaborator

Ok, the last change was made and conda distribution should now contain test taken from flaml repo. pytest.ini was also not copied over during the distribution creation, but as pytest markers work also without it I just skipped copying it. Changing pytest.ini option into setup.cfg should not need any change in the flaml-feedstock repo now.

I would propose to close this now, what do you think @sonichi?

@sonichi
Copy link
Collaborator

sonichi commented Nov 30, 2021

@MichalChromcak Thanks for the great effort in helping with this issue. I really appreciate it. I'm closing it now.

@sonichi sonichi closed this as completed Nov 30, 2021
@sonichi sonichi unpinned this issue Nov 30, 2021
@fleuryc
Copy link
Author

fleuryc commented Dec 1, 2021

Thanks everyone ! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants