I'm a PhD candidate in Computational Linguistics at Indiana University with a minor in Computer Science. My work bridges speech processing, data analysis, natural language understanding, and real-world applications — especially for low-resource languages and clinical contexts like Parkinson’s disease detection.
My research focuses on:
- 🧪 Voice-based biomarker detection (e.g., Parkinson's severity from speech)
 - 🗣️ Automatic Speech Recognition (ASR) for under-represented and tonal languages
 - 🌍 Multilingual NLP and syntax-aware models (e.g., Korean, Arabic, Yoruba)
 - 💬 Grammatical Error Correction (GEC) with T5 and dependency-aware attention
 - 🔍 NLP tools for radicalization discourse analysis (with NICC, Brussels)
 
Languages: Python · R (basic) · Java (basic) · HTML (basic)
Frameworks: PyTorch · TensorFlow · Hugging Face · spaCy · scikit-learn
Speech/NLP Tools: OpenAI Whisper · Wav2Vec2 · librosa · Praat · openSMILE
Other: Git · JupyterLab · WSL · LaTeX · ELAN
Here are some of the projects I'm most proud of:
Fine-tuned summarization of Whisper-transcribed TED Talks using GPT and T5, comparing zero-shot, role-based, and chain-of-thought prompting strategies.
Signal processing and machine learning pipeline to classify Parkinson's severity using extracted speech features (e.g., jitter, shimmer, MFCCs).
Grammatical error correction system incorporating dependency relations into the attention mechanism for multilingual grammar correction.
Building an ASR system for the under-resourced Dimasa language using multilingual models and tone-aware evaluation strategies on a the Computational Resource on South Asian Languages (CoRSAL) corpus.
- A Survey of Multilingual Models for ASR (reading group, CoRSAL 2025)
 - Fine-tuning Whisper for Tonal Languages (upcoming, LREC 2025 submission)
 
- 📫 Email: erin.steitz [at] iu [dot] edu
 
“To speak a language is to take on a world, a culture.” — Frantz Fanon
