Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can calculate EMA Incrementally? #114

Open
dahai001 opened this issue Sep 19, 2016 · 21 comments
Open

Can calculate EMA Incrementally? #114

dahai001 opened this issue Sep 19, 2016 · 21 comments

Comments

@dahai001
Copy link

I want to calculate EMA every minute using the latest minute k-bar, is there a way to do this? For now, it need calculate from the begin time k-bars.

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Sep 19, 2016

I pushed an experimental "Streaming API" (in the talib.stream module) that you could use that calculates the result for only the most recent time period rather than the full array.

Any function you can import from the Function API (from talib import MOM or from talib.func import MOM), you can just import from the Streaming API (from talib.stream import MOM) and call them the same way. The result is just a single value instead of an array.

@mrjbq7 mrjbq7 closed this as completed Sep 19, 2016
@dahai001
Copy link
Author

Thanks very much, that's wonderful.

Dose the original c/c++ API support this kind of "Streaming API"? Cause my online software is using the c/c++ version while python version is using for back-testing.

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Sep 20, 2016

Yes, my talib.stream module (like talib.func) is just calling the C API using cython.

You can see, for example, the code for calling MOM specifies the beginidx and endidx to cause only a single element to be produced:

https://github.com/mrjbq7/ta-lib/blob/master/talib/stream.pyx#L6486

@dahai001
Copy link
Author

dahai001 commented Sep 21, 2016

I am sorry for that I must info this issue, the steaming API stream.EMA looks doesn't work well while stream.MA is good. Following is the test code.

The key reason for EMA issue may be it need EMA[i-1] when calculate EMA[i], but MA not.

a = np.array([1,1,2,3,5,8,13,5,7,8,9,10,12,4], dtype=float)
for x in func.EMA(a,3):print x
...
nan
nan
1.33333333333
2.16666666667
3.58333333333
5.79166666667
9.39583333333
7.19791666667
7.09895833333
7.54947916667
8.27473958333
9.13736979167
10.5686848958
7.28434244792
for x in range(len(a)-2): print stream.EMA(a[x:3+x],3)
...
1.33333333333
2.0
3.33333333333
5.33333333333
8.66666666667
8.66666666667
8.33333333333
6.66666666667
8.0
9.0
10.3333333333
8.66666666667

for x in func.MA(a,3):print x
...
nan
nan
1.33333333333
2.0
3.33333333333
5.33333333333
8.66666666667
8.66666666667
8.33333333333
6.66666666667
8.0
9.0
10.3333333333
8.66666666667
for x in range(len(a)-2): print stream.MA(a[x:3+x],3)
...
1.33333333333
2.0
3.33333333333
5.33333333333
8.66666666667
8.66666666667
8.33333333333
6.66666666667
8.0
9.0
10.3333333333
8.66666666667

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Sep 21, 2016

Hmm, I'll need to look into that -- maybe there is a way for me to call it differently to have the numerical differences become smaller.

@mrjbq7 mrjbq7 reopened this Sep 21, 2016
@gzcf
Copy link

gzcf commented Sep 26, 2016

As I know, ta-lib doesn't support incrementally calculation. Streaming API is just using part of data to calculate.

refer http://www.kbasm.com/blog/ta-lib-not-incremental-and-wrong.html

@briancappello
Copy link
Contributor

@dahai001 The value differences coming out of the EMA function are actually an intrinsic property of exponential moving averages having a "memory" of past values effecting the most recent values. In other words, even though you're using a period of 3, the 4th, 5th, ... Nth values of the input array (if available) all factor into the calculation (by exponentially decreasing factors as you go further back in time). func.EMA knows about these prior values, whereas stream.EMA does not, hence the apparent difference in results.

Functions which behave this way are said to have an "unstable period" in upstream TA-Lib nomenclature, and can be identified through the abstract API:

from talib.abstract import EMA
if 'Function has an unstable period' in EMA.function_flags:
    print('unstable')

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Sep 26, 2016

Hmm, @briancappello makes sense! If we keep the "Streaming API" around, we probably should document that somewhere. Or maybe remove / warning the functions that are like that?

@briancappello
Copy link
Contributor

@mrjbq7 Yea, probably would be a good idea to document the unstable functions somewhere.

What were your primary design goals with the streaming API? It maybe could be worth the performance trade-off to provide as input to unstable functions more values than the are absolutely necessary (as defined by the lookback), just not the entire array. For the EMA function at least, and depending upon how much accuracy one desires, this value seems to be about 20 times the timeperiod:

import numpy as np
import talib as ta

def test_ema(timeperiod):
    arr = np.random.rand(5000) * 50 # fake if highly volatile price data
    trim = timeperiod
    while True:
        func_ema_val = ta.func.EMA(arr, timeperiod)[-1]
        stream_ema_val = ta.func.EMA(arr[-trim:], timeperiod)[-1]
        if func_ema_val == stream_ema_val:  # or convert to str and chop at desired accuracy
            break
        trim += 1
    return trim*1. / timeperiod

multipliers = [test_ema(timeperiod) for timeperiod in xrange(2, 500)]
max(multipliers) < 20  # True

Exact values vary depending on the random input array, but, they're consistently between 16x and 19x the timeperiod. Also, whether or not multiplying by the timeperiod is always the right thing to do, I am not sure. It seems to work well enough for the EMA function, but I haven't tested any others to see if the relationship holds up.

PS. Can you push the script you used to generate the streaming API?

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Sep 26, 2016

I've had several people ask about "just calculate the most recent bar" functionality (mostly for the performance improvement that might give). I wasn't aware of the unstable functions, although that makes total sense.

I pushed tools/generate_stream.py which you can look at if you want to play around with it!

@dahai001
Copy link
Author

Yeah, incrementally calculation is a very useful feature when we build an online trade system using ta-lib.

Can we change the c/c++ ta-lib to support this feature?

@trufanov-nok
Copy link
Contributor

trufanov-nok commented Apr 7, 2017

@mrjbq7 , I've done such incremental indicators in my fork 2 years ago.
https://github.com/trufanov-nok/ta-lib/tree/exp
Note it's in the exp branch, not master and wasn't well documented.

The main idea is that TA-Lib process arrays of data element by element thus it can be paused and continued after any processing step while you're able to store algorithm internal state between steps. So there are few new functions were added for each indicator:

TA_XXX_StateInit() - allocates an indicator specific state object and returns a pointer to it. The internals of object aren't communicated to end user, so its just a chunk of memory for him. And indicators that doesn't hneedave memory (like ACOS or ADD) return null of course.
TA_XXX_State() - the indicator function itself. In comparison with TA_XXX() it has same arguments but instead of pointers to arrays of data it expects just single values. And it accepts pointer to state object to store internal state between steps. And it can return a new error code to indicate that everything is fine but you need to feed it more data to get a first meaningful indicator value.
TA_XXX_StateFree() - destroys previously allocated state object.

This means that among with TA_MACD(int startIdx, int endIdx, const double inReal[], int optInFastPeriod, int optInSlowPeriod, int optInSignalPeriod, int *outBegIdx, int *outNBElement, double outMACD[], double outMACDSignal[], double outMACDHist[] )

User have:

int TA_MACD_StateInit( struct TA_MACD_State** _state, int optInFastPeriod, int optInSlowPeriod, int optInSignalPeriod )
int TA_MACD_State( struct TA_MACD_State* _state, const double inReal, double *outMACD, double *outMACDSignal, double *outMACDHist )
int TA_MACD_StateFree( struct TA_MACD_State** _state )

And instead of calling

TA_MACD(idexes, optArgs, myDataArray, result)

you can use something like

void* state;
TA_MACD_StateInit(&state, optArgs);
foreach (value in myDataArray) {
 TA_MACD_State(state, value, result);
 print(result);
}
TA_MACD_StateFree(state);

Thus indicator can be easily applied to data received online at runtime. But of course you pay some memory usage overhead for that. Still state objects are small enough to operate hundreds of indicators at once.

I think were was also some state object serialization/deserialization functions to even save and restore this process from drive or rollback state. And some other fancy helper funcs.

New functions are properly declared with modified templates and tools\gen_code generated headers, so they have Java/C# headers generated for them. Thus this is not something on top of TA-Lib codebase, I've sticked it inside ta-lib guts and make work within its architecture. Which is quite exotic by the way. I've even adapt tools\ta_regtest for state functions and their correctness now is compared with old indicator results automatically.

The problems are:

  1. I'm using it with Qt and not sure these C#/Java wrappers are really work. But if Python binds to C declarations it should be fine.
  2. I've generated State functions declarations for all indicator but didn't port a code for last 25 candle detectors. Just because I didn't need them much in my own project and at some point i've got tired from this project. Anyway candles was easiest indicators to port and this can be finished quickly.
  3. I didn't document it much. And tools\ta_regtest isn't cover all functions. And there are no ready made binaries and building library from scratch might require some skills.

Anyway, sooner or later I want to finalize it to releasable state. So if someone want to try the code in its own project or just need some internal C modifications in it - let me know and it may happen faster.

@hhashim1
Copy link

Has the Streaming API been updated in a while or is it stable to be used now?

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Feb 14, 2022

The streaming API works fine, but has some difference of behavior due to not keeping memory in the same way as operating on all the previous data points. The suggested state object idea is a good one, but so far we are trying to match features only in the released TA-Lib.

@hhashim1
Copy link

So the other way you are able to use it for live/streaming data?

@liho00
Copy link

liho00 commented Nov 21, 2022

i need this feature on websocket data, hope it can be released near soon...

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Nov 28, 2022

We are working with the author of TA-Lib to participate in maintaining the C library, which would allow us to consider new use-cases like this.

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Nov 28, 2022

I would encourage trying out the streaming API, which does also work, albeit uses only "N previous observations" in the case of EMA, rather than all previous.

@zgpnuaa
Copy link

zgpnuaa commented Aug 2, 2023

Any progress ?
This also leads to a significant difference between the MACD values calculated by Talib and those on brokerage software.
Standard MACD vs Talib MACD - Different Values

@mrjbq7
Copy link
Collaborator

mrjbq7 commented Aug 2, 2023 via email

@zgpnuaa
Copy link

zgpnuaa commented Aug 3, 2023

What does resampledata do that breaks the ta-lib calculation ?On Aug 2, 2023, at 5:34 AM, zgpnuaa @.> wrote: Any progress ? This has also led to differences in MACD and brokerage software. Standard MACD vs Talib MACD - Different Values —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.>

Maybe there was an error when using dataframe.resample(). This is for resampling, which means combining minute-level OHLC data into hourly OHLC data. However, the trading hours are from 9:30 to 11:30 and 13:00 to 15:00. I noticed that resample is not working correctly for trading hours starting at half past nine (9:30).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants