TuneTa idles after tuning #19

atomcracker · 2022-03-06T20:20:12Z

I am interested in fitting indicator settings on a training set.

The experienced issue occurs when tuning with a high amount of trials. It runs as expected when launched, but after a few hours of tuning when it reaches the very last indicator and finishes it due to early stopping, it stays idle in that state afterwards.

I'm running in jupyter notebook. The kernel goes in idle after some time, although the cell is still running (cmd window is also still active).

I tried to do the same test with a very low amount of trials, which was completed successfully. So the problem purely arrises when a high amount of trials is searched. Already tried to decrease the search amount by only searching on 'pta' but it did not help.

When I ctrl C out of the loop, it immediatly prints the amount of ProcessPools and afterwards aborts, so this probably indicates the problem rests in completing the tuning part?

Help would be greatly appreciated!

jmrichardson · 2022-03-06T21:28:36Z

@atomcracker ,Happy to reproduce if you can send me the code you are using and the dataset if possible.

atomcracker · 2022-03-06T22:03:25Z

Hi thanks for the quick response!
This is the required code.

# Initialize with x cores and show trial results
amount_cpus = os.cpu_count()
tt = TuneTA(n_jobs=amount_cpus-1, verbose=True)

[train_valid.csv](https://github.com/jmrichardson/tuneta/files/8193596/train_valid.csv)
[y_train_valid.csv](https://github.com/jmrichardson/tuneta/files/8193597/y_train_valid.csv)
# y data was made with this y = df_data['close'].pct_change().shift(-1) # Next day percentage return

# Optimize indicators
tt.fit(X_train_valid, y_train_valid,
    indicators=['pta'], # preferably 'all'
    ranges=[(2, 200)],
    trials=3000, # 3000
    early_stop=500, # 500
    min_target_correlation=0.001
)

[test.csv](https://github.com/jmrichardson/tuneta/files/8193598/test.csv)

X = pd.concat([X_train_valid, X_test], axis=0)
features = tt.transform(X)
X = pd.concat([X, features], axis=1)
X.to_csv(...)

Come to think of it I concatenate the train_valid and test set before the transform, so the moving average indicators etc will use the previous data in the test set (it will be this way in live data too so it seemed like a good thing). But maybe it doesn't react well to a different length of the transform(X) and the initial optimized dataset?

Since I'm already here, do you maybe have a recommendation for the trial and early_stop amount? The objective is accuracy.

Thanks a lot!

jmrichardson · 2022-03-07T00:54:47Z

Thanks for sharing the data and code. Yes, that is a significant amount of trials :) So, especially for indicators with one parameter, this is overkill. At most there would not need to be any more than maximum range (in this case 200). However, even that is too high as Optuna quickly clusters to parameter(s) that optimizes the correlation. On the other hand, indicators with multiple parameters, Optuna will try combinations of those parameters requiring more trials. Perhaps, you can break out the fit for single parameter indicators and use another fit for multiple parameters so you don't waste resources.

In regards to the hanging, I just tried your code on just a few indicators from PandasTA (pta) and not the entire set of indicators:

    tt.fit(X_train_valid, y_train_valid,
           # indicators=['pta'],  # preferably 'all'
           indicators=['pta.roc', 'pta.mom'],  # preferably 'all'
           ranges=[(2, 200)],
           trials=3000,  # 3000
           early_stop=500,  # 500
           min_target_correlation=0.001
           )

I also tested with all "tta" indicators with no issue:

tt.fit(X_train_valid, y_train_valid,
         # indicators=['pta'],  # preferably 'all'
         indicators=['tta'],  # preferably 'all'
         ranges=[(2, 200)],
         trials=3000,  # 3000
         early_stop=500,  # 500
         min_target_correlation=0.001
         )

I wasn't able to reproduce the hanging that you experienced. However, I do know that some indicators from PTA take a significant amount of time to complete. Ta-lib indicators are C based and much more efficient. Since the number of trials is great (at least 500), I would suggest reducing the indicator subset from pandas ta (list of available in config.py) until you are happy with response times (my thought is that it is just taking a long time to finish all the trials but I could be wrong).

Also, it would be helpful if you could provide the output of what is on the screen when it is hanging too.

You can concatenate your train test set so that your indicators have enough history. However, going live, I would only include enough data for your parameter lookback periods to avoid unnecessary calculation of indicator values.

atomcracker · 2022-03-08T23:29:36Z

Hi,

Sorry for the late reply. The absurdly high amount of trials came from the idea that some indicators have a lot of tunable parameters, but I understand this is not optimally efficient.

Regarding the tests, I ran 'tta' as well without any issues. But 'pta' still gave some issues. Surprisingly it did not stay strangly idle as before, but instead it gave an error message saying the set range of 2 was to low, which should have been => 3 in that case. I ran it again with a range of 3 to 200, but it displayed the same message but now that the expected range should have been => 4. I sadly did not get to make screenshots and due to it taking a long time to reproduce I hope this information is enough to get a grasp, otherwise I will gladly run again.

I did put a print statement before the transform operation, which did not get displayed when receiving the described error message. Also the error message occurs exactly when all the indicators should be done with the very last trial (just like before).

Thanks again for your help!

jmrichardson · 2022-03-09T00:15:25Z

Hi, I would consider using a higher value for the low end of the range. I do recall some indicators need a higher value to not generate an error. From what you described, the error message for a particular PTA indicator needed a value greater than 3. Then on next iteration another indicator needed a value greater than 4. To be safe, run the code with low value say 10 to avoid that issue. Unfortunately, I don't have free resources to dedicate to running your code with your data set but hopefully you can give it another shot and let me know how it goes. If you run across the hang issue, please send me the screen shot and how long it has been hanging. Don't be surprised for large data sets some indicators can take a while to complete.

atomcracker · 2022-03-09T09:37:28Z

Thanks again for the fast reply! I let the hang go on for multiple hours, something like 8 hours, but when I started to look at my resources all the cores weren't working. I will now start a run with a lower range bound of 10 as you indicated, will come back to you!

atomcracker · 2022-03-10T10:51:00Z

Great news, running 'all' with a range of (10, 200) worked! I don't understand why I couldn't see any error messages the first few runs, but that doesn't matter now. Thanks a lot for your help and time, have a great one!!

atomcracker closed this as completed Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TuneTa idles after tuning #19

TuneTa idles after tuning #19

atomcracker commented Mar 6, 2022 •

edited

Loading

jmrichardson commented Mar 6, 2022

atomcracker commented Mar 6, 2022

jmrichardson commented Mar 7, 2022

atomcracker commented Mar 8, 2022

jmrichardson commented Mar 9, 2022

atomcracker commented Mar 9, 2022

atomcracker commented Mar 10, 2022

TuneTa idles after tuning #19

TuneTa idles after tuning #19

Comments

atomcracker commented Mar 6, 2022 • edited Loading

jmrichardson commented Mar 6, 2022

atomcracker commented Mar 6, 2022

jmrichardson commented Mar 7, 2022

atomcracker commented Mar 8, 2022

jmrichardson commented Mar 9, 2022

atomcracker commented Mar 9, 2022

atomcracker commented Mar 10, 2022

atomcracker commented Mar 6, 2022 •

edited

Loading