Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model not converging the sparse categorical accuracy stays same for the whole epochs. #1813

Open
subalbeura opened this issue Mar 27, 2024 · 6 comments
Assignees
Labels

Comments

@subalbeura
Copy link

subalbeura commented Mar 27, 2024

while comparing with the results provided in https://keras.io/examples/timeseries/timeseries_classification_transformer/, the dense layer is different.

dense_2 (Dense)             (None, 128)                  256       ['global_average_pooling1d_2[0
                                                                    ][0]']                        
                                                                                                  
 dropout_25 (Dropout)        (None, 128)                  0         ['dense_2[0][0]']  

how to fix it?

@jentstemmerman
Copy link

jentstemmerman commented Apr 1, 2024

Same experience, the model training stops because of the EarlyStopping callback.
Setting callbacks = None results in 150 EPOCHS of zero progress in accuracy.

Also reproducible in Colab

@jentstemmerman
Copy link

jentstemmerman commented Apr 1, 2024

I think I might have found the solution. The GlobalAveragePooling1D just before the dense layer returns the wrong shape.
It returns (None, 1) with input (None, 500, 1), but the result in the example page is (None, 500)
Changing the parameter data_format into channels_first like below fixes that:
x = layers.GlobalAveragePooling1D(data_format="channels_first", name="TooSmallOutput")(x)

With that change, the model does converge.

DISCLAIMER: I am new to transformers, only played with LSTM's so far.
From the tensorflow spec:
"channels_last" corresponds to inputs with shape (batch, steps, features) while "channels_first" corresponds to inputs with shape (batch, features, steps)

If 500 is the number of steps ( the sequence length?) and there is only one feature in the dataset, channels_last makes sense, so I don't understand why channel_first works and channel_last does not.

@subalbeura
Copy link
Author

Without " name="TooSmallOutput" it is also working.
Regarding TNN, I am also new to the machine learning and am currently trying some examples.

@jentstemmerman
Copy link

Ah I forgot to remove that.
I added the "name="TooSmallOutput" to debug it.
It just prints out the name in the model summary.

@subalbeura
Copy link
Author

Thanks for the reply

@subalbeura
Copy link
Author

For multiclass classification using time series data, the accuracy comes nearly 0.009. What are the factors that can improve accuracy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants