Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input contains NaN, infinity or a value too large for ('float64') #105

Closed
paulomann opened this issue Jul 16, 2023 · 2 comments
Closed

Input contains NaN, infinity or a value too large for ('float64') #105

paulomann opened this issue Jul 16, 2023 · 2 comments

Comments

@paulomann
Copy link

paulomann commented Jul 16, 2023

  • OCTIS version: 1.13.0
  • Python version: 3.10.12
  • Operating System: Google Colab

Description

I am trying to run the Google Colab example provided in the repo README. I only changed the dataset, to load a custom dataset using the load_custom_dataset_from_folder() in the .tsv format. I executed the algorithm with a small vocab (39 words) without problems, but with a "big" vocabulary (7894 words), I got an error from sklearn.utils.validation.py as follows:

Also, note that my dataset is split into train (70%), val (10%) and test (20%).

What I Did

Current call:  0
Current call:  1
Current call:  2
Current call:  3
Current call:  4
Current call:  5
Current call:  6
Current call:  7
Current call:  8
Current call:  9
Current call:  10
Current call:  11
Current call:  12
Current call:  13
Current call:  14
/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-31-2dff32b1aedb>] in <cell line: 2>()
      1 optimizer=Optimizer()
----> 2 optimization_result = optimizer.optimize(
      3     model, dataset, npmi, search_space, number_of_call=optimization_runs,
      4     model_runs=model_runs, save_models=True,
      5     extra_metrics=None, # to keep track of other metrics

10 frames
[/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py] in _assert_all_finite(X, allow_nan, msg_dtype)
    101                 not allow_nan and not np.isfinite(X).all()):
    102             type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103             raise ValueError(
    104                     msg_err.format
    105                     (type_err,

ValueError: Input contains NaN, infinity or a value too large for **dtype('float64').**
@paulomann
Copy link
Author

I also tried running locally, although with a different version and environment

Octis: 1.10.2
Python: 3.7.3
OS: Linux

And I got this full traceback, and by inspection I got a value of -inf for f_val

Traceback (most recent call last):
  File "/home/paulomann/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/paulomann/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/home/paulomann/workspace/reddit-topic-modelling/octis_training/training_and_optimization.py", line 102, in <module>
    model_runs=5, plot_best_seen=True) # number of runs of the topic model
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/octis/optimization/optimizer.py", line 160, in optimize
    results = self._optimization_loop(opt)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/octis/optimization/optimizer.py", line 288, in _optimization_loop
    res = opt.tell(next_x, f_val)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/skopt/optimizer/optimizer.py", line 493, in tell
    return self._tell(x, y, fit=fit)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/skopt/optimizer/optimizer.py", line 536, in _tell
    est.fit(self.space.transform(self.Xi), self.yi)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/_forest.py", line 304, in fit
    accept_sparse="csc", dtype=DTYPE)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/base.py", line 432, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 805, in check_X_y
    ensure_2d=False, dtype=None)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 645, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 99, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

@paulomann
Copy link
Author

It was related to a topic that was absent in the dataset --- due to some bug, I had a vocabulary with words that were not in the primary dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant