Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "standard scaler pipeline" #28

Merged
merged 1 commit into from
Jul 7, 2017
Merged

Conversation

lacava
Copy link
Owner

@lacava lacava commented Jul 5, 2017

Reverts #26

this passes the tests but is throwing an error on the dataset i'm currently applying to. the error comes in the fit method, line 314 in few.py:

$python -m few.few ../../data/maize/d_maize-dent-tass.csv -p 100 -max_depth 3 -ms 25 --weight_parents 

warning: ValueError in ml fit. X.shape: (100, 100) y_t shape: (146,)
First ten entries X: [[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]]
First ten entries y_t: [ 826.47  884.33  904.55  848.71  879.46  885.12  905.36  886.69  821.05
  912.51]
equations: ['x_55', 'x_55', 'x_55', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_16313', 'x_55', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55']
FEW parameters: {'min_depth': 1, 'ml': None, 'elitism': True, 'clean': False, 'erc': False, 'classification': False, 'crossover_rate': 0.5, 'c': True, 'op_weight': 1, 'scoring_function': None, 'population_size': '100', 'mdr': False, 'weight_parents': True, 'max_depth': 3, 'tourn_size': 2, 'mutation_rate': 0.5, 'max_depth_init': 2, 'random_state': None, 'max_stall': 25, 'otype': 'f', 'generations': 100, 'sel': 'epsilon_lexicase', 'verbosity': 1, 'track_diversity': False, 'seed_with_ml': True, 'disable_update_check': False, 'fit_choice': None, 'boolean': False}
Traceback (most recent call last):
  File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 314, in fit
    self.ml.fit(self.X[self.valid_loc(),:].transpose(),y_t)
  File "/home/bill/anaconda3/lib/python3.5/site-packages/sklearn/pipeline.py", line 270, in fit
    self._final_estimator.fit(Xt, y, **fit_params)
  File "/home/bill/anaconda3/lib/python3.5/site-packages/sklearn/linear_model/least_angle.py", line 1141, in fit
    axis=0)(all_alphas)
  File "/home/bill/anaconda3/lib/python3.5/site-packages/scipy/interpolate/interpolate.py", line 483, in __init__
    "least %d entries" % minval)
ValueError: x and y arrays must have at least 2 entries

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bill/anaconda3/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/bill/anaconda3/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 906, in <module>
    main()
  File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 893, in main
    learner.fit(training_features, training_labels)
  File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 330, in fit
    raise(ValueError)
ValueError

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 73.387% when pulling 3b60381 on revert-26-standard_scaler into aa214bb on master.

@lacava lacava merged commit 81b2221 into master Jul 7, 2017
@lacava lacava deleted the revert-26-standard_scaler branch July 7, 2017 19:10
@lacava
Copy link
Owner Author

lacava commented Jul 7, 2017

it seems like the problem is related to an open issue in sklearn where some estimators (lasso in this case) err when features have std=0. in this dataset, some of the features are constant.

for other datasets, the pipeline works, so i'm going to remerge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants