Input contains NaN, infinity or a value too large for ('float64') #105

paulomann · 2023-07-16T17:51:19Z

OCTIS version: 1.13.0
Python version: 3.10.12
Operating System: Google Colab

Description

I am trying to run the Google Colab example provided in the repo README. I only changed the dataset, to load a custom dataset using the load_custom_dataset_from_folder() in the .tsv format. I executed the algorithm with a small vocab (39 words) without problems, but with a "big" vocabulary (7894 words), I got an error from sklearn.utils.validation.py as follows:

Also, note that my dataset is split into train (70%), val (10%) and test (20%).

What I Did

Current call:  0
Current call:  1
Current call:  2
Current call:  3
Current call:  4
Current call:  5
Current call:  6
Current call:  7
Current call:  8
Current call:  9
Current call:  10
Current call:  11
Current call:  12
Current call:  13
Current call:  14
/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-31-2dff32b1aedb>] in <cell line: 2>()
      1 optimizer=Optimizer()
----> 2 optimization_result = optimizer.optimize(
      3     model, dataset, npmi, search_space, number_of_call=optimization_runs,
      4     model_runs=model_runs, save_models=True,
      5     extra_metrics=None, # to keep track of other metrics

10 frames
[/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py] in _assert_all_finite(X, allow_nan, msg_dtype)
    101                 not allow_nan and not np.isfinite(X).all()):
    102             type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103             raise ValueError(
    104                     msg_err.format
    105                     (type_err,

ValueError: Input contains NaN, infinity or a value too large for **dtype('float64').**

The text was updated successfully, but these errors were encountered:

paulomann · 2023-07-16T17:56:34Z

I also tried running locally, although with a different version and environment

Octis: 1.10.2
Python: 3.7.3
OS: Linux

And I got this full traceback, and by inspection I got a value of -inf for f_val

Traceback (most recent call last):
  File "/home/paulomann/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/paulomann/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/home/paulomann/.vscode-server/extensions/ms-python.python-2023.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/home/paulomann/workspace/reddit-topic-modelling/octis_training/training_and_optimization.py", line 102, in <module>
    model_runs=5, plot_best_seen=True) # number of runs of the topic model
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/octis/optimization/optimizer.py", line 160, in optimize
    results = self._optimization_loop(opt)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/octis/optimization/optimizer.py", line 288, in _optimization_loop
    res = opt.tell(next_x, f_val)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/skopt/optimizer/optimizer.py", line 493, in tell
    return self._tell(x, y, fit=fit)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/skopt/optimizer/optimizer.py", line 536, in _tell
    est.fit(self.space.transform(self.Xi), self.yi)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/_forest.py", line 304, in fit
    accept_sparse="csc", dtype=DTYPE)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/base.py", line 432, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 805, in check_X_y
    ensure_2d=False, dtype=None)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 645, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/home/paulomann/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 99, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

paulomann · 2023-07-17T15:52:49Z

It was related to a topic that was absent in the dataset --- due to some bug, I had a vocabulary with words that were not in the primary dataset.

paulomann closed this as completed Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input contains NaN, infinity or a value too large for ('float64') #105

Input contains NaN, infinity or a value too large for ('float64') #105

paulomann commented Jul 16, 2023 •

edited

Loading

paulomann commented Jul 16, 2023

paulomann commented Jul 17, 2023

Input contains NaN, infinity or a value too large for ('float64') #105

Input contains NaN, infinity or a value too large for ('float64') #105

Comments

paulomann commented Jul 16, 2023 • edited Loading

Description

What I Did

paulomann commented Jul 16, 2023

paulomann commented Jul 17, 2023

paulomann commented Jul 16, 2023 •

edited

Loading