Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 639 Bytes

README.md

File metadata and controls

18 lines (12 loc) · 639 Bytes

Multi-Modality

GPT4o

Community Open Source Implementation of GPT4o in PyTorch

Install

Architecture

  • TikToken Tokenzier: We know fursure the tokenizer. Which is here
  • Model understands Images and Audio Natively. There are 2 approaches, process them natively or use encoders for each. I think here they're using encoders like whisper and vit for simplicity and brevity.
  • Using DALLE3 as the output head to generate images
  • Tokens to denote when to generate an image or audio
  • Whisper output head for the audio outputs

License

MIT