Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve and rework GPT-tfjs #654

Open
4 of 10 tasks
JulienVig opened this issue Mar 27, 2024 · 0 comments
Open
4 of 10 tasks

Improve and rework GPT-tfjs #654

JulienVig opened this issue Mar 27, 2024 · 0 comments
Assignees
Labels
discojs Related to Disco.js rework Code that needs to be improved

Comments

@JulienVig
Copy link
Collaborator

JulienVig commented Mar 27, 2024

Here is a list of potential improvements for gpt-tfjs in Disco:

  • Create a compile method to initialize the optimizer (rather than initializing it when fitDataset is called). This ensures the optimizer state is persisted across multiple calls to fitDataset
  • Rework GPT-tfjs config (learning rate, number of iteration) as Disco parameters rather than being hard-coded
  • Implement save and load methods to save and re-use a trained model
  • Rename classes for better clarity and consistency, e.g. multiple classes and functions are called GPT
  • Assess whenever we can use TFJS' native fitDataset method rather than overriding it with a custom training loop
  • Assess whether we can use tf.CustomCallbackArgs rather than redefining an interface for TrainingCallbacks
  • Reading a text file with TF.js only supports reading line by line which is not ideal for LLM inputs, try implementing a file reader chunk by chunk rather than by lines
  • To use a trained model in Disco to generate text, we have to get the model instance through the aggregator. Implement a better interface to access the language generation API.
  • Make sure pad tokens are ignored in the loss computation (similarly to pytorch ignoring -100 as padding token)
  • There is memory leak in the model disposal, one tensor per attention layer is still not disposed after calling model.dispose. Edit: the federated/decentralized mechanism also allocates new tensors every round Garbage Collecting past node contributions #683

#656 and #657 should be addressed first

@JulienVig JulienVig added rework Code that needs to be improved discojs Related to Disco.js labels Mar 27, 2024
@JulienVig JulienVig self-assigned this Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discojs Related to Disco.js rework Code that needs to be improved
Projects
None yet
Development

No branches or pull requests

1 participant