Boyang Ma sitodowubb

Hi, I'm Boyang 👋

MS student @ Lanzhou University, perpetually waiting for a GPU slot

🔭 what I work on

I'm a CS master's student at Lanzhou University, focused on two corners of multimodal ML that don't get along with each other as well as they should:

3D-aware vision-language models — making MLLMs reason about layout, occlusion, rotation, not just object identity.
Speaker-side speech models — small, practical tools for screening cloned / synthetic voices, and for understanding what makes a TTS clip sound off.

Both interests stem from the same place: most "multimodal" systems treat each modality as a flat bag of tokens, and they fall down on the parts that need a little structure.

🛠 stack

📌 things I've shipped

📊 stats

⚡ random

I write papers in tmux + vim, but I'm not proud of it
The slowest part of my workflow is convincing the cluster scheduler I deserve a GPU
My first ML project was a digit classifier; my second was the same one, debugged

_{📍 Lanzhou, China · he/him · happy to chat about benchmarks and voice anti-spoofing}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boyang Ma sitodowubb

Achievements