Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello, thanks for your great works, I'm confused with the dataset. #54

Open
StarDxxx opened this issue Apr 16, 2022 · 10 comments
Open

Hello, thanks for your great works, I'm confused with the dataset. #54

StarDxxx opened this issue Apr 16, 2022 · 10 comments

Comments

@StarDxxx
Copy link

Hello sir, i'm confused with the dataset, can share the dataset_57M.npz or other demo dataset.
I just don't know the dataset's structure.

@maxjcohen
Copy link
Owner

Hello, for the dataset used in these examples, please see #2 . The expected structure of the input data is described in the Transformer's documentation; you can implement your own dataset as long as it matches this input shape.

@chuzheng88
Copy link

Hello, for the dataset used in these examples, please see #2 . The expected structure of the input data is described in the Transformer's documentation; you can implement your own dataset as long as it matches this input shape.

Hi, I have read the doc. For the inputs and outpurs of the model, I understand those as follows:
d_input and d_output are input features and output features. For example, we use PM2.0, PM5 to predict pollution level, so the d_input and d_output are 2 and 1, respectively. However, I don't understand the parameter K in Input and Output tensor with shape (batch_size, K, d_output).

@chuzheng88
Copy link

In other word, I want to deal with a regression task, it can be described as follows:
there are two features in X, and X = [[x01, x02, .., x0j], [x11, x12, ..., x0j]]
there is one features in Y (labels) and Y = [y1, y2, ... , yj]. For simple, We use two sequences predict one sequence, like sin and cos funciton predictiing tan function.
In this case, how should we construct dataset?

@maxjcohen
Copy link
Owner

K is the length of the time series. In your example K=j, each batch of data should consist of inputs with shape (batch_size, j, 2) and outputs with shape (batch_size, j, 1).

@chuzheng88
Copy link

chuzheng88 commented Apr 25, 2022

K is the length of the time series. In your example K=j, each batch of data should consist of inputs with shape (batch_size, j, 2) and outputs with shape (batch_size, j, 1).

Thanks for you reply. In this case, the parameter attention_size can be set <= K ?

@maxjcohen
Copy link
Owner

Yes exactly !

@chuzheng88
Copy link

Yes exactly !

Hi, I used dataset X, producted by sin function , to predict Y (producted by cons function), the K was set to 12. When validating, the loss=nan. I don't konw why?
Note that whole codes described as follows:
image
image
image

@maxjcohen
Copy link
Owner

Hi, I don't see directly where a NaN could come from, I encourage you to debug during the validation loss computation in order to see what tensor or function is malfunctioning.

@chuzheng88
Copy link

Hi, I don't see directly where a NaN could come from, I encourage you to debug during the validation loss computation in order to see what tensor or function is malfunctioning.

In fact, when network training, it's loss = nan, e.g.,
image

In my opinion, when loss_function = OZELoss(alpha=0.3), the training loss shouldn't is nan. But I don't understand why ?

Further more, I used compute_loss function to calculate loss when validating, as follows:
image

@chuzheng88
Copy link

Is my dataset wrong?
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants