方先生 falenai

方先生

M.Eng. student at Zhejiang University, somewhere between finishing my thesis and convincing myself I understand attention mechanisms. I work on things at the intersection of vision, language, and speech — specifically: why are these models so slow, and can we make them not slow.

My day-to-day is a mix of reading papers I won't finish, writing PyTorch code that almost works, and wondering if 576 visual tokens is really necessary when most of the image is background.

Currently a member of the CCNT Lab, where we build and break multimodal systems.

Now:

Finishing up LightVLM — dynamic visual token pruning for faster VLM inference
Evaluating speech LLMs systematically with SpeechLLM-Bench
Cleaning messy web-scraped data with vl-data-engine
Reading every paper on KV-cache compression that shows up on arxiv

🔬 Research

Multimodal LLMs Visual Token Compression Speech Understanding Audio-Visual Learning Efficient Inference Vision-Language Alignment

Lately I've been obsessed with making large VLMs practical — not just technically impressive. A model that takes 3 seconds to process one image is useless on a laptop. There's a lot of room between "full attention over all tokens" and "something smarter."

On the speech side: as speech-integrated LLMs become mainstream, I think we need better evaluation protocols. Ad-hoc demos are not benchmarks.

🛠️ Stack

📌 Projects

Repo	Description
LightVLM	Efficient VLM inference via dynamic visual token pruning — 2-5× faster prefill with minimal accuracy drop
speechllm-bench	Unified evaluation benchmark for speech LLMs: ASR, emotion recognition, speech translation, TTS quality
vl-data-engine	Scalable pipeline for cleaning VL pretraining data — filtering, deduplication, bilingual augmentation

⚡ misc

I have very strong opinions about which papers should have released their code and did not
Terminal > IDE (sorry)
If your batch size is 1 your experiments don't count
Favorite quote: "The purpose of computing is insight, not numbers." — R. Hamming

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

方先生 falenai

Block or report falenai

方先生

🔬 Research

🛠️ Stack

📌 Projects

Pinned Loading

Uh oh!