Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneHotEncoder Failure: Simple example failure #147

Closed
chclam opened this issue Mar 21, 2022 · 5 comments
Closed

OneHotEncoder Failure: Simple example failure #147

chclam opened this issue Mar 21, 2022 · 5 comments

Comments

@chclam
Copy link
Contributor

chclam commented Mar 21, 2022

An error occurs when I'm trying to run the following simple example from the main page:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss, accuracy_score
from gama import GamaClassifier

if __name__ == '__main__':
    X, y = load_breast_cancer(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)

    automl = GamaClassifier(max_total_time=180, store="nothing")
    print("Starting `fit` which will take roughly 3 minutes.")
    automl.fit(X_train, y_train)

    label_predictions = automl.predict(X_test)
    probability_predictions = automl.predict_proba(X_test)

    print('accuracy:', accuracy_score(y_test, label_predictions))
    print('log loss:', log_loss(y_test, probability_predictions))
    # the `score` function outputs the score on the metric optimized towards (by default, `log_loss`)
    print('log_loss', automl.score(X_test, y_test))

The error that I get:

Traceback (most recent call last):
  File "/Users/chris/Development/gradproject/issues/gama/gama/./test.py", line 13, in <module>
    automl.fit(X_train, y_train)
  File "/Users/chris/Development/gradproject/issues/gama/gama/gama/GamaClassifier.py", line 134, in fit
    super().fit(x, y, *args, **kwargs)
  File "/Users/chris/Development/gradproject/issues/gama/gama/gama/gama.py", line 502, in fit
    self._x, self._basic_encoding_pipeline = basic_encoding(
  File "/Users/chris/Development/gradproject/issues/gama/gama/gama/utilities/preprocessing.py", line 63, in basic_encoding
    x_enc = encoding_pipeline.fit_transform(x, y=None)  # Is this dangerous?
  File "/usr/local/lib/python3.9/site-packages/sklearn/pipeline.py", line 434, in fit_transform
    return last_step.fit_transform(Xt, y, **fit_params_last_step)
  File "/usr/local/lib/python3.9/site-packages/sklearn/base.py", line 847, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/usr/local/lib/python3.9/site-packages/category_encoders/one_hot.py", line 152, in fit
    oe_missing_strat = {
KeyError: 'ignore'

It seems to be caused by assigning an invalid keyword to the handle_missing function parameter to OneHotEncoder in the dependency category_encoders.
According to the docs, the valid keywords are as follows: error, return_nan, value, and indicator, where value is the default.

@PGijsbers
Copy link
Member

Hi, thanks for opening the issue and providing a solution 👍
It looks like this is specific to the latest release (and was undocumented and without deprecation warnings) :)

For those that run into this issue until a new gama PyPI release is available: please downgrade category encoders to 2.3:
pip install category-encoders==2.3

@chclam
Copy link
Contributor Author

chclam commented Mar 23, 2022

Hey, I'm glad to be of any help 👍

@alanwilter
Copy link

I'm hitting this same problem right now and I need it in docker.

If you don't have a linux, try with gitpod.io with https://github.com/openml/automlbenchmark

yes | python runbenchmark.py gama:latest example test -m docker -s force
...
Collecting liac-arff>=2.2.2
  Downloading liac-arff-2.5.0.tar.gz (13 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Requirement already satisfied: psutil in ./frameworks/GAMA/venv/lib/python3.7/site-packages (from gama==22.0.1.dev0) (5.8.0)
ERROR: Ignored the following versions that require a different python version: 1.1.0 Requires-Python >=3.8; 1.1.0rc1 Requires-Python >=3.8; 1.1.1 Requires-Python >=3.8; 1.1.2 Requires-Python >=3.8; 1.4.0 Requires-Python >=3.8; 1.4.0rc0 Requires-Python >=3.8; 1.4.1 Requires-Python >=3.8; 1.4.2 Requires-Python >=3.8; 1.4.3 Requires-Python >=3.8; 1.4.4 Requires-Python >=3.8; 1.5.0rc0 Requires-Python >=3.8; 1.8.0 Requires-Python >=3.8,<3.11; 1.8.0rc1 Requires-Python >=3.8,<3.11; 1.8.0rc2 Requires-Python >=3.8,<3.11; 1.8.0rc3 Requires-Python >=3.8,<3.11; 1.8.0rc4 Requires-Python >=3.8,<3.11; 1.8.1 Requires-Python >=3.8,<3.11; 1.9.0 Requires-Python >=3.8,<3.12; 1.9.0rc1 Requires-Python >=3.8,<3.12; 1.9.0rc2 Requires-Python >=3.8,<3.12; 1.9.0rc3 Requires-Python >=3.8,<3.12; 1.9.1 Requires-Python >=3.8,<3.12
ERROR: Could not find a version that satisfies the requirement scikit-learn>=1.1.0 (from gama) (from versions: 0.9, 0.10, 0.11, 0.12, 0.12.1, 0.13, 0.13.1, 0.14, 0.14.1, 0.15.0b1, 0.15.0b2, 0.15.0, 0.15.1, 0.15.2, 0.16b1, 0.16.0, 0.16.1, 0.17b1, 0.17, 0.17.1, 0.18, 0.18.1, 0.18.2, 0.19b2, 0.19.0, 0.19.1, 0.19.2, 0.20rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.21rc2, 0.21.0, 0.21.1, 0.21.2, 0.21.3, 0.22rc2.post1, 0.22rc3, 0.22, 0.22.1, 0.22.2, 0.22.2.post1, 0.23.0rc1, 0.23.0, 0.23.1, 0.23.2, 0.24.dev0, 0.24.0rc1, 0.24.0, 0.24.1, 0.24.2, 1.0rc1, 1.0rc2, 1.0, 1.0.1, 1.0.2)
ERROR: No matching distribution found for scikit-learn>=1.1.0
Traceback (most recent call last):

  File "<string>", line 1, in <module>

ModuleNotFoundError: No module named 'gama'



Cloning into '/bench/frameworks/GAMA/lib/gama'...
ERROR: Ignored the following versions that require a different python version: 1.1.0 Requires-Python >=3.8; 1.1.0rc1 Requires-Python >=3.8; 1.1.1 Requires-Python >=3.8; 1.1.2 Requires-Python >=3.8; 1.4.0 Requires-Python >=3.8; 1.4.0rc0 Requires-Python >=3.8; 1.4.1 Requires-Python >=3.8; 1.4.2 Requires-Python >=3.8; 1.4.3 Requires-Python >=3.8; 1.4.4 Requires-Python >=3.8; 1.5.0rc0 Requires-Python >=3.8; 1.8.0 Requires-Python >=3.8,<3.11; 1.8.0rc1 Requires-Python >=3.8,<3.11; 1.8.0rc2 Requires-Python >=3.8,<3.11; 1.8.0rc3 Requires-Python >=3.8,<3.11; 1.8.0rc4 Requires-Python >=3.8,<3.11; 1.8.1 Requires-Python >=3.8,<3.11; 1.9.0 Requires-Python >=3.8,<3.12; 1.9.0rc1 Requires-Python >=3.8,<3.12; 1.9.0rc2 Requires-Python >=3.8,<3.12; 1.9.0rc3 Requires-Python >=3.8,<3.12; 1.9.1 Requires-Python >=3.8,<3.12
ERROR: Could not find a version that satisfies the requirement scikit-learn>=1.1.0 (from gama) (from versions: 0.9, 0.10, 0.11, 0.12, 0.12.1, 0.13, 0.13.1, 0.14, 0.14.1, 0.15.0b1, 0.15.0b2, 0.15.0, 0.15.1, 0.15.2, 0.16b1, 0.16.0, 0.16.1, 0.17b1, 0.17, 0.17.1, 0.18, 0.18.1, 0.18.2, 0.19b2, 0.19.0, 0.19.1, 0.19.2, 0.20rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.21rc2, 0.21.0, 0.21.1, 0.21.2, 0.21.3, 0.22rc2.post1, 0.22rc3, 0.22, 0.22.1, 0.22.2, 0.22.2.post1, 0.23.0rc1, 0.23.0, 0.23.1, 0.23.2, 0.24.dev0, 0.24.0rc1, 0.24.0, 0.24.1, 0.24.2, 1.0rc1, 1.0rc2, 1.0, 1.0.1, 1.0.2)
ERROR: No matching distribution found for scikit-learn>=1.1.0
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'gama'

Command '['/bench/frameworks/GAMA/setup.sh', 'latest']' returned non-zero exit status 1.
The command '/bin/sh -c $PY runbenchmark.py gama:latest -s only' returned a non-zero code: 2
Traceback (most recent call last):

  File "runbenchmark.py", line 182, in <module>

    bench.setup(amlb.SetupMode[args.setup])

  File "/bench/amlb/benchmark.py", line 126, in setup

    _activity_timeout_=rconfig().setup.activity_timeout)

  File "/bench/frameworks/GAMA/__init__.py", line 7, in setup

    call_script_in_same_dir(__file__, "setup.sh", *args, **kwargs)

  File "/bench/amlb/utils/process.py", line 259, in call_script_in_same_dir

    return run_script(script_path, *args, **kwargs)

  File "/bench/amlb/utils/process.py", line 253, in run_script

    return run_cmd(script_path, *args, **kwargs)

  File "/bench/amlb/utils/process.py", line 247, in run_cmd

    raise e

  File "/bench/amlb/utils/process.py", line 234, in run_cmd

    preexec_fn=params.preexec_fn)

  File "/bench/amlb/utils/process.py", line 77, in run_subprocess

    raise subprocess.CalledProcessError(retcode, process.args, output=stdout, stderr=stderr)

subprocess.CalledProcessError: Command '['/bench/frameworks/GAMA/setup.sh', 'latest']' returned non-zero exit status 1.






The command '/bin/sh -c $PY runbenchmark.py gama:latest -s only' returned a non-zero code: 2

Command 'docker build --no-cache -t automlbenchmark/gama:latest-dev -f /workspace/automlbenchmark/frameworks/GAMA/.setup/Dockerfile .' returned non-zero exit status 2.
Traceback (most recent call last):
  File "runbenchmark.py", line 182, in <module>
    bench.setup(amlb.SetupMode[args.setup])
  File "/workspace/automlbenchmark/amlb/runners/container.py", line 80, in setup
    self.image = self._build_image(cache=(mode != SetupMode.force))
  File "/workspace/automlbenchmark/amlb/runners/container.py", line 194, in _build_image
    self._run_container_build_command(image, cache)
  File "/workspace/automlbenchmark/amlb/runners/docker.py", line 98, in _run_container_build_command
    run_cmd("docker build {options} -t {container} -f {script} .".format(
  File "/workspace/automlbenchmark/amlb/utils/process.py", line 247, in run_cmd
    raise e
  File "/workspace/automlbenchmark/amlb/utils/process.py", line 221, in run_cmd
    completed = run_subprocess(str_cmd if params.shell else full_cmd,
  File "/workspace/automlbenchmark/amlb/utils/process.py", line 77, in run_subprocess
    raise subprocess.CalledProcessError(retcode, process.args, output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'docker build --no-cache -t automlbenchmark/gama:latest-dev -f /workspace/automlbenchmark/frameworks/GAMA/.setup/Dockerfile .' returned non-zero exit status 2.

If I run local, it works, but only because my local python is 3.8.

@alanwilter
Copy link

Never mind, changed in resources/config.yaml to python: 3.8 and it worked.

@PGijsbers
Copy link
Member

As expected: the latest GAMA release (22.0.0) is only available for Py 3.8+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants