thomas-yanxin

Regular bencher

thomas-yanxin thomas-yanxin

Regular bencher

不是逢人苦誉君，亦狂亦侠亦温文。

297 followers · 202 following

Achievements

x2 x3

Achievements

x2 x3

Highlights

Developer Program Member

Organizations

Stars

VLM

28 repositories

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

Python 1,051 75 Updated Nov 18, 2024

X-PLUG / MobileAgent

Mobile-Agent: The Powerful GUI Agent Family

Python 7,009 722 Updated Dec 2, 2025

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

17,228 1,107 Updated Dec 26, 2025

Open3DA / LL3DA

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

Python 310 14 Updated Jul 17, 2024

PKU-YuanGroup / Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Python 942 48 Updated Oct 16, 2024

EvolvingLMMs-Lab / lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,572 491 Updated Jan 20, 2026

X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python 2,264 132 Updated May 30, 2025

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,724 614 Updated Jan 15, 2026

Alpha-VLLM / Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,248 94 Updated Feb 16, 2025

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 9,725 756 Updated Sep 22, 2025

LLaVA-VL / LLaVA-NeXT

Python 4,518 441 Updated Sep 14, 2025

thomas-yanxin / Sunsimiao-V

1 Updated May 14, 2024

pipecat-ai / pipecat

Open Source framework for voice and multimodal conversational AI

Python 9,907 1,641 Updated Jan 20, 2026

RLHF-V / RLAIF-V

[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Python 441 21 Updated May 14, 2025

ge25nab / Awesome-VLM-AD-ITS

[T-IV] This repository collects research papers of large Vision Language Models in Autonomous driving and Intelligent Transportation System. The repository will be continuously updated to track the…

441 32 Updated Apr 1, 2025

AILab-CVC / SEED-X

Multimodal Models in Real World

Jupyter Notebook 552 23 Updated Feb 24, 2025

0nutation / SpeechGPT

SpeechGPT Series: Speech Large Language Models

Python 1,400 94 Updated Jul 22, 2024

OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 2,174 137 Updated Dec 15, 2025

modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 2,740 242 Updated Dec 8, 2025

opendatalab / labelU

Data annotation toolbox supports image, audio and video data.

Python 1,473 160 Updated Oct 1, 2025

HCPLab-SYSU / Book-of-MLM

《多模态大模型：新一代人工智能技术范式》作者：刘阳，林倞

HTML 258 24 Updated Dec 5, 2024

SunzeY / Bootstrap3D

[ICCV-2025] Official implementation of Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

Python 95 3 Updated Jul 26, 2025

openvla / openvla

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 5,053 612 Updated Mar 23, 2025

thomas-yanxin / Awesome-MMLM-Datasets

3 Updated Jun 4, 2024

showlab / Show-o

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,856 83 Updated Jan 8, 2026

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,111 219 Updated May 19, 2025

opendilab / CleanS2S

High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体！

Python 491 52 Updated Dec 15, 2025

valeoai / VideoActionModel

VaViM and VaVAM: Autonomous Driving through Video Generative Modeling (official repository).

Jupyter Notebook 138 8 Updated Jul 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thomas-yanxin thomas-yanxin

Achievements

Achievements

Highlights

Organizations

Block or report thomas-yanxin

VLM

BAAI-DCAI / Bunny

X-PLUG / MobileAgent

BradyFU / Awesome-Multimodal-Large-Language-Models

Open3DA / LL3DA

PKU-YuanGroup / Chat-UniVi

EvolvingLMMs-Lab / lmms-eval

X-PLUG / mPLUG-DocOwl

open-compass / VLMEvalKit

Alpha-VLLM / Lumina-T2X

OpenGVLab / InternVL

LLaVA-VL / LLaVA-NeXT

thomas-yanxin / Sunsimiao-V

pipecat-ai / pipecat

RLHF-V / RLAIF-V

ge25nab / Awesome-VLM-AD-ITS

AILab-CVC / SEED-X

0nutation / SpeechGPT

OpenGVLab / InternVideo

modelscope / 3D-Speaker

opendatalab / labelU

HCPLab-SYSU / Book-of-MLM

SunzeY / Bootstrap3D

openvla / openvla

thomas-yanxin / Awesome-MMLM-Datasets

showlab / Show-o

ictnlp / LLaMA-Omni

opendilab / CleanS2S

valeoai / VideoActionModel