Skip to content

An plug in and play suite of embodied multi-modal robotic transformers for Robotics

License

Notifications You must be signed in to change notification settings

kyegomez/CyberTron

Repository files navigation

Multi-Modality

CyberTron

GitHub license GitHub stars GitHub issues GitHub pull requests

Share with Friends

Help us spread the word about CyberTron by sharing it with your friends and colleagues on various social media platforms:

Share on Twitter Share on LinkedIn Share on Facebook Share on RedditShare on WhatsApp Share on Pinterest Share on Tumblr Share on Hacker NewsShare on VK

We appreciate your support in sharing CyberTron and making it accessible to the wider community!

CyberTron is an open-source suite of robotic transformers models designed to simplify the training, finetuning, and inference processes. With its plug-and-play functionality, CyberTron provides an easy-to-use interface to effortlessly utilize a variety of robotic transformers models. Whether you're working on robotics research, autonomous systems, or AI-driven robotic applications, CyberTron offers a comprehensive toolkit to enhance your projects.

Key Features

  • Easy integration and plug-and-play functionality.
  • Diverse range of pre-trained robotic transformers models.
  • Efficient training and finetuning pipelines.
  • Seamless inference capabilities.
  • Versatile and customizable for various robotic applications.
  • Active community and ongoing development.

Architecture

CyberTron is built on a modular architecture, enabling flexibility and extensibility for different use cases. The suite consists of the following components:

  1. Model Library: CyberTron provides a comprehensive model library that includes various pre-trained robotic transformers models. These models are designed to tackle a wide range of robotics tasks, such as perception, motion planning, control, and more. Some of the available models in CyberTron include VC-1, RT-1, ROBOTCAT, KOSMOS-X, and many others.

  2. Training and Finetuning: CyberTron offers a streamlined training and finetuning pipeline. You can easily train models from scratch or finetune existing models using your own datasets. The suite provides efficient data preprocessing, augmentation, and optimization techniques to enhance the training process.

  3. Inference: CyberTron allows you to conduct seamless inference using the trained models. You can deploy the models in real-world scenarios, robotics applications, or integrate them into existing systems for robotic perception, decision-making, and control.

Getting Started

To get started with CyberTron, follow the instructions below:

  1. Clone the CyberTron repository:
git clone https://github.com/kyegomez/CyberTron.git
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Choose the desired model from the model library.

  2. Utilize the provided examples and code snippets to train, finetune, or conduct inference using the selected model.

  3. Customize the models and pipelines according to your specific requirements.

Roadmap

The future development of CyberTron includes the following milestones:

  • Expansion of the model library with additional pre-trained robotic transformers models.
  • Integration of advanced optimization techniques and model architectures.
  • Support for more diverse robotic applications and tasks.
  • Enhanced documentation, tutorials, and code examples.
  • Community-driven contributions and collaborations.

Stay tuned for exciting updates and improvements in CyberTron!

Model Directory

Sure! Here's an example of a table-like format in the README.md file, showcasing the models, their tasks, and other metadata:

Model Directory

Model Description Tasks Key Features Code and Resources
RT-1 Robotics Transformer for real-world control at scale Picking and placing items, opening and closing drawers, getting items in and out of drawers, placing elongated items upright, knocking objects over, pulling napkins, opening jars, and more - Transformer architecture with image and action tokenization
- EfficientNet-B3 model for image tokenization
- Token compression for faster inference
- Supports a wide range of tasks and environments
Project Website
RT-1 Code Repository
Gato Generalist Agent for multi-modal, multi-task robotics Playing Atari games, image captioning, chatbot interactions, real-world robot arm manipulation, and more - Multi-modal support for text, images, proprioception, continuous actions, and discrete actions
- Serialized tokenization of data for processing with a transformer neural network
- Flexibility to output different modalities based on context
Published Paper
Gato Code Repository

Datasets Directory

This section provides an overview of the datasets used in the project. The datasets are divided into two categories: control datasets used to train Gato and vision & language datasets used for vision and language tasks. Click here to go to the datasets library

Contributing

Contributions are welcome! If you have any ideas, suggestions, or bug reports, please feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Roadmap

Integrate CHATGPT FOR ROBOTICS

  • Integrate Kosmos-X, Kosmos-2, PALM-E, ROBOCAT, any other robotic models we should integrate? Let me know!

  • Integrate embedding provider for RT-1

  • Integrate flash attention for RT-1

  • Integrate flasha Attention for GATO

  • Integrate VC-1

About

An plug in and play suite of embodied multi-modal robotic transformers for Robotics

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages