Skip to content
View sitodowubb's full-sized avatar
🤒
Out sick
🤒
Out sick
  • Lanzhou University
  • Lanzhou, China

Block or report sitodowubb

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sitodowubb/README.md

Hi, I'm Boyang 👋

MS student @ Lanzhou University, perpetually waiting for a GPU slot


🔭 what I work on

I'm a CS master's student at Lanzhou University, focused on two corners of multimodal ML that don't get along with each other as well as they should:

  • 3D-aware vision-language models — making MLLMs reason about layout, occlusion, rotation, not just object identity.
  • Speaker-side speech models — small, practical tools for screening cloned / synthetic voices, and for understanding what makes a TTS clip sound off.

Both interests stem from the same place: most "multimodal" systems treat each modality as a flat bag of tokens, and they fall down on the parts that need a little structure.

🛠 stack

Python PyTorch HuggingFace SpeechBrain CUDA Slurm Linux LaTeX

📌 things I've shipped

📊 stats

⚡ random

  • I write papers in tmux + vim, but I'm not proud of it
  • The slowest part of my workflow is convincing the cluster scheduler I deserve a GPU
  • My first ML project was a digit classifier; my second was the same one, debugged

📍 Lanzhou, China · he/him · happy to chat about benchmarks and voice anti-spoofing

Pinned Loading

  1. spatial-vqa-bench spatial-vqa-bench Public

    Spatial-VQA-Bench: a focused benchmark of spatial visual reasoning for multimodal LLMs.

    Python 222

  2. deeplethe/forkd deeplethe/forkd Public

    Fork() for AI agent microVMs. Spawn 100 children in ~100ms from a warm parent; BRANCH a live VM in ~150ms. KVM-isolated, snapshot CoW.

    Rust 1.2k 85

  3. openmemind/memind openmemind/memind Public

    Self-evolving cognitive memory and context engine for AI agents in Java. Empowering 24/7 proactive agents like OpenClaw with understanding and SOTA performance.

    Java 895 84

  4. voice-clone-detect voice-clone-detect Public

    Speaker-embedding-based detector for cloned / synthetic voices.

    Python 1