You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is a list of potential improvements for gpt-tfjs in Disco:
Create a compile method to initialize the optimizer (rather than initializing it when fitDataset is called). This ensures the optimizer state is persisted across multiple calls to fitDataset
Rework GPT-tfjs config (learning rate, number of iteration) as Disco parameters rather than being hard-coded
Implement save and load methods to save and re-use a trained model
Rename classes for better clarity and consistency, e.g. multiple classes and functions are called GPT
Assess whenever we can use TFJS' native fitDataset method rather than overriding it with a custom training loop
Assess whether we can use tf.CustomCallbackArgs rather than redefining an interface for TrainingCallbacks
Reading a text file with TF.js only supports reading line by line which is not ideal for LLM inputs, try implementing a file reader chunk by chunk rather than by lines
To use a trained model in Disco to generate text, we have to get the model instance through the aggregator. Implement a better interface to access the language generation API.
Make sure pad tokens are ignored in the loss computation (similarly to pytorch ignoring -100 as padding token)
There is memory leak in the model disposal, one tensor per attention layer is still not disposed after calling model.dispose. Edit: the federated/decentralized mechanism also allocates new tensors every round Garbage Collecting past node contributions #683
Here is a list of potential improvements for gpt-tfjs in Disco:
GPT
#656 and #657 should be addressed first
The text was updated successfully, but these errors were encountered: