-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raw_to_Xy doesn't handle gaps in data #71
Comments
Hey there! Could you please share some minimal reproducible In general, I encourage you to check the implementation of Line 203 in ea894c5
Additionally, check any of the end-to-end examples where the |
Thank you for your response; I realize my question may not have been very clear. I took a look at the implementation and noticed I believe I've been able to address this particular issue by using:
...and then call raw_to_Xy with that custom frequency...
I'm still having issues and will provide sample data and additional information. |
You can use the below code and the attached csv file (sample_raw_df.txt, github does not allow .csv attachments). You'll notice the data has 19 rows (timesteps) and if we use a 5 day lookback, 0 gap, and 1 horizon it should be 14 windowed samples. When running it through raw_to_Xy we end up with 13.
|
Thank you for the example! I would guess that the thing that confused you (I blame the documentation, see #72 for a fix) is that the true value of The index = pd.date_range(start=raw_data.index[0], end=raw_data.index[-1], freq=freq) So it does not really matter what happens in between the start and the end timestamp - the new index is just generated from scratch based on the frequency and the end points. In your example, you changed the frequency to a custom one index_custom = pd.date_range(start=raw_df.index[0], end=raw_df.index[-1], freq=bday_us)
index_default = pd.date_range(start=raw_df.index[0], end=raw_df.index[-1], freq='B')
print(len(index_custom), len(index_default), set(index_default) - set(index_custom))
That means that just by providing your custom index you will lose 1 sample with respect to the default one.
I think you forgot to factor in the fact that the |
Thanks again for the feedback. As background just wanted to quickly test out deepdow with a limited dataset so was following the getting_started.ipynb notebook and simply replacing the generated data with a sampling of my own closer to the format noted in Data Loading. I'm used to creating windowed training datasets as is typical for LSTM. E.g. 3D numpy arrays with samples, lookback, features, and the matching target array (y). Using a toy dataset fed to I think it may be easier to take your earlier advice and create X and y from scratch. Looking at the generated data in the end-to-end examples is a start tough it's only a single feature (channel). At this point I've still not been able to get a toy dataset successfully trained (currently seeing a Thanks for your patience. |
After more experimentation the relationship between the dataset shape and the network is now more clear. I had assumed the dataset and network were generic but now I see the different networks expect different dataset shapes (e.g. number of channels). I has assumed the errors I was seeing when attempting to train was due to something in my dataset construction. In actuality it was a mis-match between what the network was expecting (e.g. 1 channel or multiple channels) and what I was feeding it. |
Well, I hope you managed to do what you wanted! Feel free to ask any other questions at any point! Cheers! |
raw_to_Xy appears to handle regular gaps in data (e.g. weekend days) but cannot handle irregular gaps such as holidays.
When fed trading data similar to the example at https://deepdow.readthedocs.io/en/latest/source/data_loading.html but covering an entire trading year it get out of sync on every holiday. E.g. a Monday that would typically trade but does not on a holiday such as Jan 20, 2020.
The result is that the assertion
assert timestamps[0] == raw_df.index[lookback]
fails.This, and likely other data formatting issues, causes an error when executing
history = run.launch(30)
which isRuntimeError: mat1 and mat2 shapes cannot be multiplied
The text was updated successfully, but these errors were encountered: