Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TuneTa idles after tuning #19

Closed
atomcracker opened this issue Mar 6, 2022 · 7 comments
Closed

TuneTa idles after tuning #19

atomcracker opened this issue Mar 6, 2022 · 7 comments

Comments

@atomcracker
Copy link

atomcracker commented Mar 6, 2022

I am interested in fitting indicator settings on a training set.

The experienced issue occurs when tuning with a high amount of trials. It runs as expected when launched, but after a few hours of tuning when it reaches the very last indicator and finishes it due to early stopping, it stays idle in that state afterwards.

I'm running in jupyter notebook. The kernel goes in idle after some time, although the cell is still running (cmd window is also still active).

I tried to do the same test with a very low amount of trials, which was completed successfully. So the problem purely arrises when a high amount of trials is searched. Already tried to decrease the search amount by only searching on 'pta' but it did not help.

When I ctrl C out of the loop, it immediatly prints the amount of ProcessPools and afterwards aborts, so this probably indicates the problem rests in completing the tuning part?

Help would be greatly appreciated!

@jmrichardson
Copy link
Owner

@atomcracker ,Happy to reproduce if you can send me the code you are using and the dataset if possible.

@atomcracker
Copy link
Author

Hi thanks for the quick response!
This is the required code.

# Initialize with x cores and show trial results
amount_cpus = os.cpu_count()
tt = TuneTA(n_jobs=amount_cpus-1, verbose=True)

[train_valid.csv](https://github.com/jmrichardson/tuneta/files/8193596/train_valid.csv)
[y_train_valid.csv](https://github.com/jmrichardson/tuneta/files/8193597/y_train_valid.csv)
# y data was made with this y = df_data['close'].pct_change().shift(-1) # Next day percentage return

# Optimize indicators
tt.fit(X_train_valid, y_train_valid,
    indicators=['pta'], # preferably 'all'
    ranges=[(2, 200)],
    trials=3000, # 3000
    early_stop=500, # 500
    min_target_correlation=0.001
)

[test.csv](https://github.com/jmrichardson/tuneta/files/8193598/test.csv)

X = pd.concat([X_train_valid, X_test], axis=0)
features = tt.transform(X)
X = pd.concat([X, features], axis=1)
X.to_csv(...)

Come to think of it I concatenate the train_valid and test set before the transform, so the moving average indicators etc will use the previous data in the test set (it will be this way in live data too so it seemed like a good thing). But maybe it doesn't react well to a different length of the transform(X) and the initial optimized dataset?

Since I'm already here, do you maybe have a recommendation for the trial and early_stop amount? The objective is accuracy.

Thanks a lot!

@jmrichardson
Copy link
Owner

Thanks for sharing the data and code. Yes, that is a significant amount of trials :) So, especially for indicators with one parameter, this is overkill. At most there would not need to be any more than maximum range (in this case 200). However, even that is too high as Optuna quickly clusters to parameter(s) that optimizes the correlation. On the other hand, indicators with multiple parameters, Optuna will try combinations of those parameters requiring more trials. Perhaps, you can break out the fit for single parameter indicators and use another fit for multiple parameters so you don't waste resources.

In regards to the hanging, I just tried your code on just a few indicators from PandasTA (pta) and not the entire set of indicators:

    tt.fit(X_train_valid, y_train_valid,
           # indicators=['pta'],  # preferably 'all'
           indicators=['pta.roc', 'pta.mom'],  # preferably 'all'
           ranges=[(2, 200)],
           trials=3000,  # 3000
           early_stop=500,  # 500
           min_target_correlation=0.001
           )

I also tested with all "tta" indicators with no issue:

tt.fit(X_train_valid, y_train_valid,
         # indicators=['pta'],  # preferably 'all'
         indicators=['tta'],  # preferably 'all'
         ranges=[(2, 200)],
         trials=3000,  # 3000
         early_stop=500,  # 500
         min_target_correlation=0.001
         )

I wasn't able to reproduce the hanging that you experienced. However, I do know that some indicators from PTA take a significant amount of time to complete. Ta-lib indicators are C based and much more efficient. Since the number of trials is great (at least 500), I would suggest reducing the indicator subset from pandas ta (list of available in config.py) until you are happy with response times (my thought is that it is just taking a long time to finish all the trials but I could be wrong).

Also, it would be helpful if you could provide the output of what is on the screen when it is hanging too.

You can concatenate your train test set so that your indicators have enough history. However, going live, I would only include enough data for your parameter lookback periods to avoid unnecessary calculation of indicator values.

@atomcracker
Copy link
Author

Hi,

Sorry for the late reply. The absurdly high amount of trials came from the idea that some indicators have a lot of tunable parameters, but I understand this is not optimally efficient.

Regarding the tests, I ran 'tta' as well without any issues. But 'pta' still gave some issues. Surprisingly it did not stay strangly idle as before, but instead it gave an error message saying the set range of 2 was to low, which should have been => 3 in that case. I ran it again with a range of 3 to 200, but it displayed the same message but now that the expected range should have been => 4. I sadly did not get to make screenshots and due to it taking a long time to reproduce I hope this information is enough to get a grasp, otherwise I will gladly run again.

I did put a print statement before the transform operation, which did not get displayed when receiving the described error message. Also the error message occurs exactly when all the indicators should be done with the very last trial (just like before).

Thanks again for your help!

@jmrichardson
Copy link
Owner

Hi, I would consider using a higher value for the low end of the range. I do recall some indicators need a higher value to not generate an error. From what you described, the error message for a particular PTA indicator needed a value greater than 3. Then on next iteration another indicator needed a value greater than 4. To be safe, run the code with low value say 10 to avoid that issue. Unfortunately, I don't have free resources to dedicate to running your code with your data set but hopefully you can give it another shot and let me know how it goes. If you run across the hang issue, please send me the screen shot and how long it has been hanging. Don't be surprised for large data sets some indicators can take a while to complete.

@atomcracker
Copy link
Author

Thanks again for the fast reply! I let the hang go on for multiple hours, something like 8 hours, but when I started to look at my resources all the cores weren't working. I will now start a run with a lower range bound of 10 as you indicated, will come back to you!

@atomcracker
Copy link
Author

Great news, running 'all' with a range of (10, 200) worked! I don't understand why I couldn't see any error messages the first few runs, but that doesn't matter now. Thanks a lot for your help and time, have a great one!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants