Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by check_pairwise_arrays. #16

Closed
cryptocoinserver opened this issue Nov 9, 2021 · 14 comments

Comments

@cryptocoinserver
Copy link

cryptocoinserver commented Nov 9, 2021

multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.8/dist-packages/tuneta/optimize.py", line 240, in fit
    ke.fit(correlations)
  File "/usr/local/lib/python3.8/dist-packages/yellowbrick/cluster/elbow.py", line 316, in fit
    self.k_scores_.append(self.scoring_metric(X, self.estimator.labels_))
  File "/usr/local/lib/python3.8/dist-packages/yellowbrick/cluster/elbow.py", line 104, in distortion_score
    distances = pairwise_distances(instances, center, metric=metric)
  File "/usr/local/lib/python3.8/dist-packages/sklearn/metrics/pairwise.py", line 1884, in pairwise_distances
    return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/sklearn/metrics/pairwise.py", line 1425, in _parallel_pairwise
    return func(X, Y, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/sklearn/metrics/pairwise.py", line 299, in euclidean_distances
    X, Y = check_pairwise_arrays(X, Y)
  File "/usr/local/lib/python3.8/dist-packages/sklearn/metrics/pairwise.py", line 156, in check_pairwise_arrays
    X = check_array(
  File "/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py", line 797, in check_array
    raise ValueError(
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by check_pairwise_arrays.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "tuneta_opti.py", line 29, in <module>
    tt.fit(X_train, y_train,
  File "/usr/local/lib/python3.8/dist-packages/tuneta/tune_ta.py", line 137, in fit
    self.fitted = [fit.get() for fit in self.fitted]
  File "/usr/local/lib/python3.8/dist-packages/tuneta/tune_ta.py", line 137, in <listcomp>
    self.fitted = [fit.get() for fit in self.fitted]
  File "/usr/local/lib/python3.8/dist-packages/multiprocess/pool.py", line 771, in get
    raise self._value
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by check_pairwise_arrays.

Went smoothly before that until:

[I 2021-11-09 11:17:59,067] Trial 71 finished with value: 0.15180702108510608 and parameters: {'length': 40, 'atr_length': 8}. Best is trial 22 with value: 0.15180702108510608.
[I 2021-11-09 11:17:59,091] A new study created in memory with name: pta.kc(X.high, X.low, X.close, length=trial.suggest_int('length', 2, 48), scalar=trial.suggest_int('scalar', 2, 48), )
[I 2021-11-09 11:17:59,115] Trial 72 finished with value: 0.1516975180327597 and parameters: {'length': 38, 'atr_length': 13}. Best is trial 22 with value: 0.15180702108510608.
[I 2021-11-09 11:17:59,162] Trial 0 finished with value: 0.20201019266930986 and parameters: {'length': 34, 'scalar': 15}. Best is trial 0 with value: 0.20201019266930986.

So might have been caused by pta.kc
Any ideas how to solve this?

Thank you for this great package!

@jmrichardson
Copy link
Owner

Thanks for reporting the error. This is a new one that I haven't seen before and can't replicate. Could you provide me the code you are using as well as the output of the following from your environment:

from importlib.metadata import version
print(f"Pandas-TA: {version('pandas-ta')}")
print(f"FinTa: {version('finta')}")
print(f"Ta-Lib: {version('ta-lib')}")
print(f"Pathos: {version('pathos')}")
print(f"Tabulate: {version('tabulate')}")
print(f"Dcor: {version('dcor')}")
print(f"yFinance: {version('yfinance')}")
print(f"Optuna: {version('optuna')}")
print(f"Yellowbrick: {version('yellowbrick')}")
print(f"Sckit-Learn: {version('scikit-learn')}")
print(f"Pandas: {version('pandas')}")
print(f"Numpy: {version('numpy')}")
print(f"Numba: {version('numba')}")

@cryptocoinserver
Copy link
Author

Sure!

Pandas-TA: 0.3.14b0
FinTa: 1.3
Ta-Lib: 0.4.21
Pathos: 0.2.8
Tabulate: 0.8.9
Dcor: 0.5.3
yFinance: 0.1.63
Optuna: 2.10.0
Yellowbrick: 1.3.post1
Sckit-Learn: 0.24.2
Pandas: 1.3.3
Numpy: 1.20.3
Numba: 0.54.1

@cryptocoinserver
Copy link
Author

I used 'all'. And it happens later of the optimization. Maybe RAM?

@cryptocoinserver
Copy link
Author

I think it's the same:
#11 (comment)

@jmrichardson
Copy link
Owner

jmrichardson commented Nov 9, 2021

Hmmmm... I've replicated your environment and was unable to get the error you received. Could you run this code and test on your end:

from tuneta.tune_ta import TuneTA
import pandas as pd
from pandas_ta import percent_return
from sklearn.model_selection import train_test_split
import yfinance as yf


if __name__ == "__main__":
    # Download data set from yahoo, calculate next day return and split into train and test
    X = yf.download("AAPL", period="10y", interval="1d", auto_adjust=True)
    y = percent_return(X.Close, offset=-1)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, shuffle=False)

    # Initialize with x cores and show trial results
    tt = TuneTA(n_jobs=7, verbose=True)

    # Optimize indicators
    tt.fit(X_train, y_train,
        indicators=['all'],
        ranges=[(2, 100)],
        trials=30,
        early_stop=10,
    )

I am wondering if it has something to do with the data? If the above works, then perhaps you could send me your X_train, y_train?

@cryptocoinserver
Copy link
Author

cryptocoinserver commented Nov 9, 2021

This might be it. I use a custom data source.
I pickled it (Download). You should be able to load it with this, if you are using python 3.9, too. :

    with open('./candles.pkl', 'rb') as f:
        X = pickle.load(f)

Thank you for this incredible support!

Just fired up a test with yfinance. Will report back if that works.

@cryptocoinserver
Copy link
Author

Strange yfinance doesn't work too:

[I 2021-11-09 17:15:34,498] A new study created in memory with name: tta.DX(X.high, X.low, X.close, timeperiod=trial.suggest_int('timeperiod', 2, 100), )
[I 2021-11-09 17:15:34,500] Trial 14 finished with value: 0.11147144074882838 and parameters: {'timeperiod': 25}. Best is trial 13 with value: 0.11913746577305458.
.RemoteTraceback[I 2021-11-09 17:15:34,510] Trial 11 finished with value: 0.11583889796679482 and parameters: {'timeperiod': 12}. Best is trial 1 with value: 0.13536161141313785.
:
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/src/tuneta/tuneta/optimize.py", line 240, in fit
    ke.fit(correlations)
  File "/usr/local/lib/python3.9/site-packages/yellowbrick/cluster/elbow.py", line 316, in fit
    self.k_scores_.append(self.scoring_metric(X, self.estimator.labels_))
  File "/usr/local/lib/python3.9/site-packages/yellowbrick/cluster/elbow.py", line 104, in distortion_score
    distances = pairwise_distances(instances, center, metric=metric)
  File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 1790, in pairwise_distances
    return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
  File "/usr/local/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 1359, in _parallel_pairwise
    return func(X, Y, **kwds)
  File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 272, in euclidean_distances
    X, Y = check_pairwise_arrays(X, Y)
  File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 146, in check_pairwise_arrays
    X = check_array(X, accept_sparse=accept_sparse, dtype=dtype,
  File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 726, in check_array
    raise ValueError("Found array with %d sample(s) (shape=%s) while a"
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by check_pairwise_arrays.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/live/tuneta_opti.py", line 38, in <module>
[I 2021-11-09 17:15:34,514] Trial 16 finished with value: 0.09896255266983982 and parameters: {'timeperiod': 19}. Best is trial 16 with value: 0.09896255266983982.
[I 2021-11-09 17:15:34,521] Trial 34 finished with value: 0.06372177855344523 and parameters: {'timeperiod': 48}. Best is trial 4 with value: 0.09774473093630688.
[I 2021-11-09 17:15:34,521] Trial 0 finished with value: 0.04227110044971143 and parameters: {'timeperiod': 70}. Best is trial 0 with value: 0.04227110044971143.
[I 2021-11-09 17:15:34,522] Trial 10 finished with value: 0.13836197640888684 and parameters: {'timeperiod': 10}. Best is trial 10 with value: 0.13836197640888684.
    tt.fit(X_train, y_train,
  File "/home/src/tuneta/tuneta/tune_ta.py", line 137, in fit
[I 2021-11-09 17:15:34,528] Trial 18 finished with value: 0.08295114460901958 and parameters: {'fastperiod': 30, 'slowperiod': 26}. Best is trial 10 with value: 0.11425592453571463.
[I 2021-11-09 17:15:34,529] Trial 15 finished with value: 0.1155967683245075 and parameters: {'timeperiod': 24}. Best is trial 13 with value: 0.11913746577305458.
[I 2021-11-09 17:15:34,529] Trial 70 finished with value: 0.05935943353020685 and parameters: {'timeperiod': 8}. Best is trial 13 with value: 0.062016877300818736.
[I 2021-11-09 17:15:34,530] Trial 36 finished with value: 0.05644099122647902 and parameters: {'timeperiod': 24}. Best is trial 11 with value: 0.0974994756356342.
[I 2021-11-09 17:15:34,531] Trial 40 finished with value: 0.055003654253323424 and parameters: {'timeperiod': 38}. Best is trial 11 with value: 0.06219734835957756.
    self.fitted = [fit.get() for fit in self.fitted]
  File "/home/src/tuneta/tuneta/tune_ta.py", line 137, in <listcomp>
[I 2021-11-09 17:15:34,537] Trial 12 finished with value: 0.1364460842741541 and parameters: {'timeperiod': 27}. Best is trial 12 with value: 0.1364460842741541.
[I 2021-11-09 17:15:34,540] A new study created in memory with name: tta.MACD(X.close, fastperiod=trial.suggest_int('fastperiod', 2, 100), slowperiod=trial.suggest_int('slowperiod', 2, 100), signalperiod=trial.suggest_int('signalperiod', 2, 100), )
[I 2021-11-09 17:15:34,548] Trial 17 finished with value: 0.09248373225478898 and parameters: {'timeperiod': 15}. Best is trial 16 with value: 0.09896255266983982.
[I 2021-11-09 17:15:34,557] Trial 11 finished with value: 0.1164897633704645 and parameters: {'timeperiod': 3}. Best is trial 10 with value: 0.13836197640888684.
[I 2021-11-09 17:15:34,562] Trial 35 finished with value: 0.0675826808773287 and parameters: {'timeperiod': 33}. Best is trial 4 with value: 0.09774473093630688.
[I 2021-11-09 17:15:34,563] Trial 16 finished with value: 0.11807535107417427 and parameters: {'timeperiod': 19}. Best is trial 13 with value: 0.11913746577305458.
[I 2021-11-09 17:15:34,564] Trial 1 finished with value: 0.05137525234234987 and parameters: {'timeperiod': 30}. Best is trial 1 with value: 0.05137525234234987.
[I 2021-11-09 17:15:34,565] Trial 37 finished with value: 0.0801308186933188 and parameters: {'timeperiod': 81}. Best is trial 11 with value: 0.0974994756356342.
    self.fitted = [fit.get() for fit in self.fitted]
  File "/usr/local/lib/python3.9/site-packages/multiprocess/pool.py", line 771, in get
    raise self._value
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by check_pairwise_arrays.

@cryptocoinserver
Copy link
Author

cryptocoinserver commented Nov 9, 2021

Had trials=500, early_stop=100, - with your 30 and 10 it works now. For both yfinance and my source.

@jmrichardson
Copy link
Owner

Ha that is interesting, your data works fine for me (ran twice without issue):


from tuneta.tune_ta import TuneTA
import pandas as pd
from pandas_ta import percent_return
from sklearn.model_selection import train_test_split
import yfinance as yf
import pickle


if __name__ == "__main__":
    with open('candles.pkl', 'rb') as f:
        X = pickle.load(f)
    y = percent_return(X.Close, offset=-1)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, shuffle=False)

    # Initialize with x cores and show trial results
    tt = TuneTA(n_jobs=7, verbose=True)

    # Optimize indicators
    tt.fit(X_train, y_train,
        indicators=['all'],
        ranges=[(2, 100)],
        trials=50,
        early_stop=20,
    )

Hmmm... what version of tuneta are you using:

    from importlib.metadata import version
    print(f"TuneTA: {version('tuneta')}")

@jmrichardson
Copy link
Owner

Ok, let me run with 500, 100

@jmrichardson
Copy link
Owner

What ranges are you using?

@cryptocoinserver
Copy link
Author

TuneTA: 0.1.37

ranges=[(2, 100)],

@jmrichardson
Copy link
Owner

I think this may the problem:

DistrictDataLabs/yellowbrick#1185

This was merged into yellowbrick but a release hasn't been generated. Can you please do this:

pip install git+https://github.com/DistrictDataLabs/yellowbrick.git -U

It may be why I couldn't replicate because I likely installed from github repo too (and forgot) but still shows as Yellowbrick: 1.3.post1

@cryptocoinserver
Copy link
Author

Indeed! This worked. Thank you a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants