Skip to content
View ensteitz's full-sized avatar

Block or report ensteitz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ensteitz/README.md

👋 Hi, I'm Rin Steitz!

I'm a PhD candidate in Computational Linguistics at Indiana University with a minor in Computer Science. My work bridges speech processing, data analysis, natural language understanding, and real-world applications — especially for low-resource languages and clinical contexts like Parkinson’s disease detection.

My research focuses on:

  • 🧪 Voice-based biomarker detection (e.g., Parkinson's severity from speech)
  • 🗣️ Automatic Speech Recognition (ASR) for under-represented and tonal languages
  • 🌍 Multilingual NLP and syntax-aware models (e.g., Korean, Arabic, Yoruba)
  • 💬 Grammatical Error Correction (GEC) with T5 and dependency-aware attention
  • 🔍 NLP tools for radicalization discourse analysis (with NICC, Brussels)

🧰 Tools & Languages I Use

Languages: Python · R (basic) · Java (basic) · HTML (basic)
Frameworks: PyTorch · TensorFlow · Hugging Face · spaCy · scikit-learn
Speech/NLP Tools: OpenAI Whisper · Wav2Vec2 · librosa · Praat · openSMILE
Other: Git · JupyterLab · WSL · LaTeX · ELAN


📌 Pinned Projects

Here are some of the projects I'm most proud of:

Fine-tuned summarization of Whisper-transcribed TED Talks using GPT and T5, comparing zero-shot, role-based, and chain-of-thought prompting strategies.

Signal processing and machine learning pipeline to classify Parkinson's severity using extracted speech features (e.g., jitter, shimmer, MFCCs).

Grammatical error correction system incorporating dependency relations into the attention mechanism for multilingual grammar correction.

Building an ASR system for the under-resourced Dimasa language using multilingual models and tone-aware evaluation strategies on a the Computational Resource on South Asian Languages (CoRSAL) corpus.


Publications & Presentations

  • A Survey of Multilingual Models for ASR (reading group, CoRSAL 2025)
  • Fine-tuning Whisper for Tonal Languages (upcoming, LREC 2025 submission)

Let's Connect

  • 🔗 LinkedIn
  • 📫 Email: erin.steitz [at] iu [dot] edu

“To speak a language is to take on a world, a culture.” — Frantz Fanon

Popular repositories Loading

  1. LING-L555 LING-L555 Public

    Forked from ftyers/LING-L555

    Python

  2. LING-L545 LING-L545 Public

    Forked from ftyers/LING-L545

    Rin's fork of the LING-L545 class.

    Python

  3. ILS-Z639 ILS-Z639 Public

    Social Media Data Mining at IU

    Python

  4. goodreads-scraper goodreads-scraper Public

    Forked from maria-antoniak/goodreads-scraper

    A Python scraper for Goodreads books and reviews. To be used for ILS-Z 639

    Jupyter Notebook

  5. GithubIntro GithubIntro Public

    A basic introduction to teaching Git and GitHub

  6. ForkedGithubIntro ForkedGithubIntro Public

    Forked from TheCurryMan/GithubIntro

    The forked version from the YouTube tutorial giving basic introduction to teaching Git and GitHub