Graduate student focused on Multimodal LLMs.
- Vision-language alignment and grounding
- Multimodal RAG for document intelligence
- Lightweight and reproducible VLM evaluation
- Multimodal web-agent trajectory analysis
- 2024: Built small prototypes for PDF-based multimodal retrieval and chunking.
- 2025: Standardized evaluation workflow and modality-aware metrics.
- 2026: Expanded to agent trajectory analytics and benchmark-oriented reporting.
mm-rag-playbook: multimodal RAG patterns for PDF-like documents.vlm-eval-mini: compact evaluation harness for vision-language models.webagent-trajectory-lab: trajectory analytics toolkit for visual web agents.
- Better multimodal retrieval reranking
- Robust science/engineering benchmark protocols
- Data-efficient adaptation for medium-size VLMs
- Long-context multimodal reasoning
- Agentic planning with visual grounding
- Efficient evaluation and error taxonomy