Skip to content

fgmoradi/DeepLearning_HW3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepLearning HW3 — Spoken-SQuAD Extractive QA

A minimal, reproducible BERT baseline for extractive question answering over ASR-transcribed passages (Spoken-SQuAD). Trains a span-prediction head on top of bert-base-uncased, logs EM/F1/WER, and saves figures/tables for the report.

Project Structure

  • Bert.py — main training + evaluation script

  • run_outputs/ — generated artifacts after running

    • figures/ — plots (training loss, EM/F1, WER, length histograms)
    • tables/history.json, preds_epoch_*.json (sample preds vs. gold)
    • base_model_wer.txt — WER per epoch (plain text)
  • data/ — dataset folder

Dataset

Spoken-SQuAD JSON files:

  • spoken_train-v1.1.json
  • spoken_test-v1.1.json

Environment

Python 3.10+, CUDA (optional), and:

pip install torch transformers tqdm matplotlib
# if using evaluate/jiwer variants:
pip install evaluate jiwer

## Train and Evaluation
python3 Bert.py \
  --train_json /data/spoken_train-v1.1.json \
  --valid_json /data/spoken_test-v1.1.json

About

Deep learning homework 3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages