I have some question about the input and output #9

chendiva · 2020-07-29T01:33:31Z

Hi there,
So I am now using a time series data which only have two columns- Date and Price. So I am wondering if I can use this algorithm in this situation, and let the algorithm train the model only on price, and predict the price in the future. In other words,
I am wondering if this model can separate my data automatically, so that I will not need to separate by "lag" myself. Thank you for your help!

kdgutier · 2020-07-29T03:26:15Z

Hi Chendiva, Try to parse your data as mentioned in the README file ( https://github.com/kdgutier/esrnn_torch) with the price data in the y_df, try to add a simple constant in the X_df to use if you don't have any exogenous variable. The algorithm needs you to plug with the same column names as specified by the README dataframes, the lag variables are calculated within the algorithm. Good luck.

…

On Tue, Jul 28, 2020 at 9:33 PM chendiva ***@***.***> wrote: Hi there, So I am now using a time series data which only have two columns- Date and Price. So I am wondering if I can use this algorithm in this situation, and let the algorithm train the model only on price, and predict the price in the future. In other words, I am wondering if this model can separate my data automatically, so that I will not need to separate by "lag" myself. Thank you for your help! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#9>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYDACOZT46DNYONQ4FAZ2LR5537RANCNFSM4PLC4U2A> .

chendiva · 2020-07-29T03:54:56Z

Hi,
Will it affect the forecasting result when I add the exogenous variable?

kdgutier · 2020-07-29T04:00:59Z

I recommend you to always have benchmarks to test complex models as the ESRNN, in our case we included the OWA metric in the validation set to compare the relative performance vs the Naive2 model as done in the M4 competition. Take extra care of the learning rate hyperparameters when tuning your model.

…

On Tue, Jul 28, 2020 at 11:55 PM chendiva ***@***.***> wrote: Hi, Will it affect the forecasting result when I add the exogenous variable? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYDACOYO7A676K5ELLF2RDR56MR3ANCNFSM4PLC4U2A> .

chendiva · 2020-07-29T04:07:15Z

Sorry , I am still confused, if I add the exogenous variable as you recommend, will it affect the result? Is the x variable in your example added by you? Or this x is originally included in the dataset and use for forecasting?

kdgutier · 2020-07-29T04:09:57Z

I would recommend to answer the question empirically (with the OWA metric), try the model and see if the performance remains acceptable.

…

On Wed, Jul 29, 2020 at 12:07 AM chendiva ***@***.***> wrote: Sorry , I am still confused, if I add the exogenous variable as you recommend, will it affect the result? Is the x variable in your example added by you? Or this x is originally included in the dataset and use for forecasting? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYDACOJEARSPRY267WNAHTR56OABANCNFSM4PLC4U2A> .

chendiva · 2020-07-30T01:07:48Z

This also give me NaN for my y_hat

kdgutier · 2020-07-30T03:10:15Z

Hi chendiva, I checked the bug, the ESRNN produces correct outputs that fail to merge in the predict method to the X_test_df if the frequency of the dataset is not correctly specified. For instance in the M3 dataset the frequency seems to be 'MS' for dates of the beginning of the month. In the bug reported before they were using 'M' frequency for dates at the end of the month. Let me know if this solves the problem.

…

On Wed, Jul 29, 2020 at 9:08 PM chendiva ***@***.***> wrote: This also give me NaN for my y_hat — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYDACJMG2TRGSEMTY7QJW3R6DBXHANCNFSM4PLC4U2A> .

chendiva · 2020-07-30T03:13:12Z

Hi, how can I decide the frequency then?

kdgutier · 2020-07-30T03:15:18Z

When you instantiate the ESRNN model model = ESRNN(params,...,frequency=‘MS’)

…

On Wed, 29 Jul 2020 at 11:13 pm, chendiva ***@***.***> wrote: Hi, how can I decide the frequency then? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYDACO76T7OWOQABBMBZH3R6DQNNANCNFSM4PLC4U2A> .

chendiva · 2020-07-30T03:17:42Z

I actually use my dataset, not the M3 now. My dataset is daily base. so I set the frequency = 'D', but I then got the error like this:

kdgutier · 2020-07-30T03:20:20Z

That is a protection for the network that aims to protect the model from nan values. Clean nans from the data before using the ESRNN.

…

On Wed, 29 Jul 2020 at 11:17 pm, chendiva ***@***.***> wrote: I actually use my dataset, not the M3 now. My dataset is daily base. so I set the frequency = 'D', but I then got the error like this: [image: image] <https://user-images.githubusercontent.com/22489898/88876633-b3145880-d1f1-11ea-9e72-9cb9161432ee.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYDACJL7NY4Z7VQXW2RRIDR6DQ6HANCNFSM4PLC4U2A> .

chendiva · 2020-07-30T03:21:56Z

Yes, I actually check the dataframe with this command: df.isnull().values.any(), which returns me False. But I still get the above result

kdgutier · 2020-07-30T03:23:31Z

Have you tried printing those unique_ids? Also isnan function?

…

On Wed, 29 Jul 2020 at 11:22 pm, chendiva ***@***.***> wrote: Yes, I actually check the dataframe with this command: df.isnull().values.any(), which returns me False. But I still get the above result — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYDACPHZNUSO3FHTCM6TUDR6DROFANCNFSM4PLC4U2A> .

kdgutier · 2020-07-30T03:33:08Z

You are treating the unique_ids as a numeric variable. I suggest to check the README markdown of the github in which the input dataframes for the model are explained with detail.

…

On Wed, 29 Jul 2020 at 11:30 pm, chendiva ***@***.***> wrote: I actually got this after using the function you mentioned: [image: image] <https://user-images.githubusercontent.com/22489898/88877408-63369100-d1f3-11ea-9fff-69b4875d9096.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYDACOUL5GFLZFK3W5DBT3R6DSLRANCNFSM4PLC4U2A> .

Yu-1245 · 2020-07-31T17:06:06Z

Hi, there, I got the same problem with yours, have you solved it? I tried to slice the m4 data provided from the prepare_m4_data function, and found out that even I make sure the identifier in the training set and testing set are the same, it still generated NaN for the evaluation methods and the predictions, which was weird.

AzulGarza · 2020-07-31T17:26:17Z

Hi!

I think this answer could be useful.

Yu-1245 · 2020-07-31T17:31:16Z

Hi,
I saw the answer you. I have checked my dataset and make the changed you mentioned, but it still generate NaN for me.
@FedericoGarza

chendiva · 2020-07-31T22:48:30Z

Hi, there, I got the same problem with yours, have you solved it? I tried to slice the m4 data provided from the prepare_m4_data function, and found out that even I make sure the identifier in the training set and testing set are the same, it still generated NaN for the evaluation methods and the predictions, which was weird.

No, I haven't solved the problem yet, even I tried his method.

Worben · 2020-08-23T12:55:01Z

Hi,
I have the same problem with my dataset. When I tried to find out the reason, I figured out that the NaN values appears for the first time in the long_to_wide function, more precisely: in the for loop. Any idea how to solve this? my data is structured exactly according to the specifications

def long_to_wide(self, X_df, y_df):
data = X_df.copy()
data['y'] = y_df['y'].copy()
sorted_ds = np.sort(data['ds'].unique())
ds_map = {}
for dmap, t in enumerate(sorted_ds):
	ds_map[t] = dmap
data['ds_map'] = data['ds'].map(ds_map)
data = data.sort_values(by=['ds_map','unique_id'])
df_wide = data.pivot(index='unique_id', columns='ds_map')['y']

x_unique = data[['unique_id', 'x']].groupby('unique_id').first()
last_ds =  data[['unique_id', 'ds']].groupby('unique_id').last()
assert len(x_unique)==len(data.unique_id.unique())
df_wide['x'] = x_unique
df_wide['last_ds'] = last_ds
df_wide = df_wide.reset_index().rename_axis(None, axis=1)

ds_cols = data.ds_map.unique().tolist()
X = df_wide.filter(items=['unique_id', 'x', 'last_ds']).values
y = df_wide.filter(items=ds_cols).values

return X, y

kdgutier · 2020-09-04T21:43:11Z

Have you solved the issue Worben?

Repository owner deleted a comment from chendiva Sep 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I have some question about the input and output #9

I have some question about the input and output #9

chendiva commented Jul 29, 2020

kdgutier commented Jul 29, 2020 via email

chendiva commented Jul 29, 2020

kdgutier commented Jul 29, 2020 via email

chendiva commented Jul 29, 2020

kdgutier commented Jul 29, 2020 via email

chendiva commented Jul 30, 2020

kdgutier commented Jul 30, 2020 via email

chendiva commented Jul 30, 2020

kdgutier commented Jul 30, 2020 via email

chendiva commented Jul 30, 2020

kdgutier commented Jul 30, 2020 via email

chendiva commented Jul 30, 2020

kdgutier commented Jul 30, 2020 via email

kdgutier commented Jul 30, 2020 via email

Yu-1245 commented Jul 31, 2020

AzulGarza commented Jul 31, 2020

Yu-1245 commented Jul 31, 2020

chendiva commented Jul 31, 2020

Worben commented Aug 23, 2020

kdgutier commented Sep 4, 2020

I have some question about the input and output #9

I have some question about the input and output #9

Comments

chendiva commented Jul 29, 2020

kdgutier commented Jul 29, 2020 via email

chendiva commented Jul 29, 2020

kdgutier commented Jul 29, 2020 via email

chendiva commented Jul 29, 2020

kdgutier commented Jul 29, 2020 via email

chendiva commented Jul 30, 2020

kdgutier commented Jul 30, 2020 via email

chendiva commented Jul 30, 2020

kdgutier commented Jul 30, 2020 via email

chendiva commented Jul 30, 2020

kdgutier commented Jul 30, 2020 via email

chendiva commented Jul 30, 2020

kdgutier commented Jul 30, 2020 via email

kdgutier commented Jul 30, 2020 via email

Yu-1245 commented Jul 31, 2020

AzulGarza commented Jul 31, 2020

Yu-1245 commented Jul 31, 2020

chendiva commented Jul 31, 2020

Worben commented Aug 23, 2020

kdgutier commented Sep 4, 2020