Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resample vs Informative merge #180

Closed
Masmar25 opened this issue Apr 17, 2021 · 7 comments · Fixed by #183
Closed

Resample vs Informative merge #180

Masmar25 opened this issue Apr 17, 2021 · 7 comments · Fixed by #183

Comments

@Masmar25
Copy link
Contributor

Hi,
first time using github and quite new to python, sorry for any inconvenience.
Resample and informative pairs have different behaviour while merging back to the main dataframe (1m/5m example attached).
I read the explanation of why Informative pair is merged this way (freqtrade/freqtrade#4073) and I think it is correct, but logic seems different for resample. Should not be the same?
I don't get if I am missing something in resample logic.

Resample_vs_Informative_Merge.xlsx

@xmatthias
Copy link
Member

assuming you use 15m and 1h pairs - then the short candle at 10:45 is available at the same time than the 10:00 candle (it's open dates for the candles).
Therefore, merging the "informative" pair must account for this. (this is also explained in the issue you linked - but i guess it's rather lengthy to read through...).

I'm not certain at the moment if this logic should apply to the resample_merge as well or not ... based on your sample it should - but are the "resample" columns created from a resampled dataframe - or from the informative pair?

@Masmar25
Copy link
Contributor Author

Yes, they are from the resampled dataframe, naming is automatic by function util/resampled_merge (I don't know how link the code, sorry). For clarity, I highlighted in green those from Informative and in yellow those from resampled dataframe.

@xmatthias
Copy link
Member

I'm not certain the problem is in the resampled_merge function alone...

if i look at resample_to_interval() - it's resampling to the "right border", which at first glance seems correct (so a 15m candle from 14:00 will end up at 15:00 for 1h candles).

As we're however dealing with open dates (which is easy to forget and complicates things), it's not entirely correct (the correct open-date would be 14:00 - as this candle goes from 14:00-15:00).
The problem is - if you merge that back to the dataframe directly (without additional time math), you'll look into the future on the 14:00 15m candle (the 14:00 1h candle should only be available at 14:45).

the same happens if you directly merge informative pairs - which is why we have the helper function in freqtrade.

I would think that a correct behaviour would be to not use the "right" border for the resample - but the left border (so the 1h candle ends up at 14:00) - and to use the "merge_informative" math to merge it back to the dataframe.

from pandas import DatetimeIndex
def resample_to_interval2(dataframe, interval):
    if isinstance(interval, str):
        interval = TICKER_INTERVAL_MINUTES[interval]

    """
        resamples the given dataframe to the desired interval.
        Please be aware you need to upscale this to join the results
        with the other dataframe

    :param dataframe: dataframe containing close/high/low/open/volume
    :param interval: to which ticker value in minutes would you like to resample it
    :return:
    """

    df = dataframe.copy()
    df = df.set_index(DatetimeIndex(df["date"]))
    ohlc_dict = {"open": "first", "high": "max", "low": "min", "close": "last", "volume": "sum"}
    df = df.resample(str(interval) + "min", label="left").agg(ohlc_dict).dropna()
    df["date"] = df.index

    return df

# This new function merges "left" instead of right.
dataframe_long2 = resample_to_interval2(dataframe, 60)
# merge with informative function for now ....
merge_informative_pair(dataframe, dataframe_long2, '15m', '1h')

@Masmar25
Copy link
Contributor Author

I tried to change label="left" and to use merge_informative_pair, as you suggested.
Now values obtained by "informative_pairs" and "resample" are the same using these two different methods.
I tested it resampling 1m candle to 5m and comparing with the output of a informative pair 5m.
It works like I would expect, it also include "volume" field that was dropped by resampled_merge.
Does this mean resampled_merge it's a sort of duplication of merge_informative_pair?
With little differences, but they work quite similar if I understand correctly.

Thanks for your help!

@xmatthias
Copy link
Member

yes they do - but "resample_merge" is a lot older than "merge_informative_pair" ... and their (initial) purpose was slightly different...

@Masmar25
Copy link
Contributor Author

If you agree, I can open a PR with the label="left" thing.
Even if it is a minimal change, for me it is a good first experience with Github.
On the other hand, I do not know what to do about resampled_merge. Do you plan to move the whole Resampling feature to the main repo and use just merge_informative_pair or is it better to change resampled_merge so that it is a duplicate of merge_informative_pair on this repository?

@xmatthias
Copy link
Member

there's nothing wrong with opening a PR.

I do think that we'll need to fix the 2 points at the same time though - otherwise it'll become quite ... disconnected and dangerous to use - especially by people who didn't read this issue (which will be 99% of the users)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants