-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer keeps predicting the same token #82
Comments
I guess it is related to issue #80, I will update my repo and retrain on a small dataset. |
Hi, awesome you have some resource to be able to train the model!
About the commit loss weight, the LFQ uses a 'diversity_gamma' variable (default 1.0) which can be passed during it's creation. The higher the commit loss weight the more importance is given to how good the encoder can quantize the mesh. During my training for 13k meshes with a 2k codebook I used a very high commit loss at like 0.45 which then I reduced gradually (to 0.2) so the training focused more on the reconstruction loss. If you experience a higher reconstruction loss with the auto-encoder you might want to take a look at the model below which has been the most successful during testing.
I have a idea about the inference issue, since the model is probabilistic, at the start it probably generate slightly wrong tokens and then becomes a snowball effect since the sequence becomes out of distribution (never seen before) and then the probabilities 'goes crazy'. About your transformer setup, use the text conditioner model CLIP since it has a high level of distances between the embeddings and also set the "text_condition_cond_drop_prob" so it's at 0.0 which will help with the text conditioning.
About the training setup, it's recommended that you have a effective batch size of 64 (as per paper) so it's more generalized and will have less of a 'knee-jerk' reaction when it's hit with a high loss.
|
The latests updates have actually helped with that issue, I've not tested the latest but the previous once managed to follow the text conditioning. Btw to avoid fully retraining your model during updates and what not, you can just load previous model using strict = False and ignore the warnings or if there is a error you can just delete the affected keys.
|
Both training and validation loss is reduced to around 0.34.
This refers to the formula used in paper I dived into the code. entropy_aux_loss is a regularization term which encourages less uncertainty on the distribution of the same embedding and evenly usage of the codebook. Therefore per_sample_entropy is minimized to close to zero and codebook_entropy is maximized to logk(entropy case of even distribution), which is log(codebook_size) which is log(2**14) == 14 in my case. Therefore, I guess entropy_aux_loss will be minimized to around -10 ~ -14 because codebook_entropy is maximized to around 10 ~ 14, i guess.
This is my shallow understanding about why the commit_loss ends up around -1. It may be wrong, please correct me if that is the case!
From my above discussion, my understanding is that commit loss below 0 indicates that the codebook has a good utilization?
I agree with this. I tried increasing the number of layers in the encoder before but the performance deteriorates. I did not use the atten layer both in decoder and encoder. I remember I tried but the convergence becomes slower? I am not sure.
For the dataset, I create the dataset on the fly. I mean I did not beforehand read all the vertices and faces and create face-edges and store these information on the disk. I think it is inflexible when I want to change the dataset and occupies a lot of storage space. I am not sure about this. The disadvantage of my approach is that the high VRAM consumption, i guess. For the data augmentation as well, it is done during the training. Worth mentioning that the augmented mesh is not fixed. And each data instance has a probability of 0.3 to be augmented. I have not tested this hyperparameter. I do not know whether this approach is feasible or not. Let me know your opinion! The getitem function in my dataset class def __getitem__(self, index):
file_path = self.data_paths[index]
vertices, faces = self.get_mesh(file_path)
if self.augmentation_enabled and np.random.rand() <= 0.3:
vertices = vertices.numpy()
vertices = self.center_vertices(vertices)
vertices = self.normalize_to_unit_scale(vertices)
vertices = self.random_rotation(vertices)
# vertices = self.random_shift(vertices)
vertices = self.normalize_to_unit_scale(vertices)
vertices = torch.from_numpy(vertices)
return vertices, faces |
I just checked that the default value for attention heads(attn_heads) is 16. Still grateful to learn that it is best to keep dim size to 64 |
I'm not that great at algebra so I haven't really dived into this, more like glanced over the code and done some light debugging.
That is my understanding aswell.
I found that when I moved from the standard resnet setup and into a longer chain of resnet blocks (6,12,24,6) it really helped with the training to capture more detail with a small increase of parameters. The encoder should be as simple as possible, the performance have always been worse every time I've increased the Graph encoder dim's or layers.
The space required for 218k models (max 250 faces) is around 213MB (incl face edges) when saving with np.savez_compressed, so I wouldn't worry about disk requirements. The only thing I things I do live before training is the codes and the text embedding since they might consume alot of disk space. I've not check how much but generating those is pretty quick. One thing to mention is that you shouldn't rescale the vertices so they are -1 or 1,I keep the scale so they are within -0.95 to 0.95. The performance will be worse since you'll need to create both the face edges as well as tokenize the mesh. I like the idea with the dynamic dataset since it would produce a very robust model.
Oh alright, I've must have confused it with something else :) |
Hey, the issue has been resolved so the model can output as per text guidance, here is the model I've published https://huggingface.co/MarcusLoren/MeshGPT-preview |
@MarcusLoppe |
Hi @lucidrains and @MarcusLoppe,
I have succefully trained a meshAutoencoder, where validation loss is as low as training loss. I used around 28000 meshes from shapenet for training and validation. And the maximum face is 800. The loss graphes are below.
![image](https://private-user-images.githubusercontent.com/68132494/330269094-7be66e0a-987f-40fe-abf6-7a21508932ae.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwNjI1ODcsIm5iZiI6MTcxOTA2MjI4NywicGF0aCI6Ii82ODEzMjQ5NC8zMzAyNjkwOTQtN2JlNjZlMGEtOTg3Zi00MGZlLWFiZjYtN2EyMTUwODkzMmFlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDEzMTgwN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWUzZGEwNWJmNzE3M2E2YjljMDUyNGRhZTcxYzkyNDA3ZjRhOWI1ZGUyNDk3YzM5ZDZmMDhhOWU3NmYzODQ5MTAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.jhIsR1BDIuusdG_RktgpIx3JXlZTGOykNjLuCm4qus0)
![image](https://private-user-images.githubusercontent.com/68132494/330269215-ea4146c6-0b7a-4df0-ad12-b3431fc90e51.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwNjI1ODcsIm5iZiI6MTcxOTA2MjI4NywicGF0aCI6Ii82ODEzMjQ5NC8zMzAyNjkyMTUtZWE0MTQ2YzYtMGI3YS00ZGYwLWFkMTItYjM0MzFmYzkwZTUxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDEzMTgwN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY0Y2NlODExMmFlN2Q2ZmFiMDdiNmIwNjhmNGY1ZmJhYTU5N2UxNjYwNjk0MDJkMzViNzJmMWU0YmQxNDY0MTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.pfJ0Q20OjEPf5mL7QLybuO4rxuKjxma7FJNdMs1QfH8)
I was using the look-up free quantizer. I wonder if it is normal to see the fluctuation in the commit_loss and in the end the commit_loss stays near -1. The commit_loss_weight is set as 0.5.
Afterwards, when I am training the meshtransformer without text condition, I encounter a problem that I could not solve after trying many times.
The transformer keeps generating the same token repeadly during inference. And I have no idea why.
I spent nearly 4 days training on a 8-GPU server. And the training loss would not go down below even 1.
Below is a config for my training for both autoencoder and transformer:
The text was updated successfully, but these errors were encountered: