工具介绍:
论文阅读:
- GeminiFusion | GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
- Qwen-VL | Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
- ODGEN | ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models
- ControlNets | Adding Conditional Control to Text-to-Image Diffusion Models
- YOLO-World | YOLO-World: Real-Time Open-Vocabulary Object Detection