KosmosG

My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"

Installation

pip install kosmosg

Usage

import torch
from kosmosg.main import KosmosG

# usage
img = torch.randn(1, 3, 256, 256)
text = torch.randint(0, 20000, (1, 1024))

model = KosmosG()
output = model(img, text)
print(output)

Architecture

text, image => KosmosG => text tokens with multi modality understanding

License

MIT

Todo

Create Aligner in pytorch
Create Diffusion module
Integrate these pieces
Create a training script

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
kosmosg		kosmosg
tests		tests
tokenizers		tokenizers
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
agorabanner.png		agorabanner.png
example.py		example.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tokenizer.py		tokenizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KosmosG

Installation

Usage

Architecture

License

Todo

About

Releases

Sponsor this project

Packages

Languages

License

kyegomez/KosmosG

Folders and files

Latest commit

History

Repository files navigation

KosmosG

Installation

Usage

Architecture

License

Todo

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages