Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while fitting #1299

Open
mattia-lecci opened this issue May 11, 2023 · 0 comments
Open

Error while fitting #1299

mattia-lecci opened this issue May 11, 2023 · 0 comments

Comments

@mattia-lecci
Copy link

I used TPOTRegressor on my dataset, adding and removing features from the input data for different tests.
When using all 18 features of my 28 datapoints and sample_weight, TPOT fails to fit with a ValueError.
This doesn't happen when removing the sample_weight.

The error also doesn't happen in the same dataset using, for example, only 10 features of those 18, or in a different dataset with 8 features and 55 data points.

Process to reproduce the issue

I'm afraid i cannot share the data. This is a mockup of the code used:

import pandas as pd
import tpot

# load data
train_x: pd.DataFrame (28, 18)
train_y: pd.Series (28,)
train_weight: pd.Series (28,)

model= tpot.TPOTRegressor(generations=50, population_size=20, cv=5, random_state=42, verbosity=2)
model.fit(features=train_x, target=train_y, sample_weight=train_weight)

The same result is obtained when using .values on the pandas variables.

Yields:

                                                                           
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File .\.venv\lib\site-packages\tpot\base.py:816, in TPOTBase.fit(self, features, target, sample_weight, groups)
    815         warnings.simplefilter("ignore")
--> 816         self._pop, _ = eaMuPlusLambda(
    817             population=self._pop,
    818             toolbox=self._toolbox,
    819             mu=self.population_size,
    820             lambda_=self._lambda,
    821             cxpb=self.crossover_rate,
    822             mutpb=self.mutation_rate,
    823             ngen=self.generations,
    824             pbar=self._pbar,
    825             halloffame=self._pareto_front,
    826             verbose=self.verbosity,
    827             per_generation_function=self._check_periodic_pipeline,
    828             log_file=self.log_file_,
    829         )
    831 # Allow for certain exceptions to signal a premature fit() cancellation

File .\.venv\lib\site-packages\tpot\gp_deap.py:228, in eaMuPlusLambda(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, pbar, stats, halloffame, verbose, per_generation_function, log_file)
    226     initialize_stats_dict(ind)
--> 228 population[:] = toolbox.evaluate(population)
    230 record = stats.compile(population) if stats is not None else {}

File .\.venv\lib\site-packages\tpot\base.py:1531, in TPOTBase._evaluate_individuals(self, population, features, target, sample_weight, groups)
   1530 self._stop_by_max_time_mins()
-> 1531 val = partial_wrapped_cross_val_score(
   1532     sklearn_pipeline=sklearn_pipeline
   1533 )
   1534 result_score_list = self._update_val(val, result_score_list)

File .\.venv\lib\site-packages\stopit\utils.py:145, in base_timeoutable.__call__..wrapper(*args, **kwargs)
    144     # ``result`` may not be assigned below in case of timeout
--> 145     result = func(*args, **kwargs)
    146 return result

File .\.venv\lib\site-packages\tpot\gp_deap.py:416, in _wrapped_cross_val_score(sklearn_pipeline, features, target, cv, scoring_function, sample_weight, groups, use_dask)
    393 """Fit estimator and compute scores for a given dataset split.
    394 
    395 Parameters
   (...)
    414     Whether to use dask
    415 """
--> 416 sample_weight_dict = set_sample_weight(sklearn_pipeline.steps, sample_weight)
    418 features, target, groups = indexable(features, target, groups)

File .\.venv\lib\site-packages\tpot\operator_utils.py:111, in set_sample_weight(pipeline_steps, sample_weight)
    110 for (pname, obj) in pipeline_steps:
--> 111     if inspect.getargspec(obj.fit).args.count("sample_weight"):
    112         step_sw = pname + "__sample_weight"

File ~\AppData\Local\Programs\Python\Python310\lib\inspect.py:1245, in getargspec(func)
   1244 if kwonlyargs or ann:
-> 1245     raise ValueError("Function has keyword-only parameters or annotations"
   1246                      ", use inspect.signature() API which can support them")
   1247 return ArgSpec(args, varargs, varkw, defaults)

ValueError: Function has keyword-only parameters or annotations, use inspect.signature() API which can support them

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
Cell In[58], line 4
      1 import tpot
      3 tmp = tpot.TPOTRegressor(generations=50, population_size=20, cv=5, random_state=seed, verbosity=2)
----> 4 tmp.fit(features=train_x.values, target=train_y.values, sample_weight=train_weight.values)
      5 # tpot_train_y = tmp.predict(train_x)
      6 # tpot_test_y = tmp.predict(test_x)

File .\.venv\lib\site-packages\tpot\base.py:863, in TPOTBase.fit(self, features, target, sample_weight, groups)
    860     except (KeyboardInterrupt, SystemExit, Exception) as e:
    861         # raise the exception if it's our last attempt
    862         if attempt == (attempts - 1):
--> 863             raise e
    864 return self

File .\.venv\lib\site-packages\tpot\base.py:854, in TPOTBase.fit(self, features, target, sample_weight, groups)
    851 if not isinstance(self._pbar, type(None)):
    852     self._pbar.close()
--> 854 self._update_top_pipeline()
    855 self._summary_of_best_pipeline(features, target)
    856 # Delete the temporary cache before exiting

File .\.venv\lib\site-packages\tpot\base.py:961, in TPOTBase._update_top_pipeline(self)
    957             self._last_optimized_pareto_front_n_gens = 0
    958 else:
    959     # If user passes CTRL+C in initial generation, self._pareto_front (halloffame) shoule be not updated yet.
    960     # need raise RuntimeError because no pipeline has been optimized
--> 961     raise RuntimeError(
    962         "A pipeline has not yet been optimized. Please call fit() first."
    963     )

RuntimeError: A pipeline has not yet been optimized. Please call fit() first.

Expected result

Without using sample_weight:

Generation 1 - Current best internal CV score: -0.10226660695789169
                                                                              
Generation 2 - Current best internal CV score: -0.10226660695789169
                                                                              
Generation 3 - Current best internal CV score: -0.08510081133846376
                                                                               
...
                                                                                
Generation 50 - Current best internal CV score: -0.07952325321214902
                                                                                
Best pipeline: AdaBoostRegressor(Nystroem(ExtraTreesRegressor(PolynomialFeatures(input_matrix, degree=2, include_bias=False, interaction_only=False), bootstrap=False, max_features=0.05, min_samples_leaf=5, min_samples_split=12, n_estimators=100), gamma=0.75, kernel=polynomial, n_components=10), learning_rate=0.01, loss=linear, n_estimators=100)

Environment

OS: Windows 10
Python 3.10.5
TPOT==0.11.7
pandas==1.5.3
numpy==1.24.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant