# Getting timestamps of Swedish Librivox

> "With wav2vec2 and HuggingFace transformers"

- tok: false
- branch: master
- comments: true
- categories: [long audio, wav2vec2, huggingface, timestamps, librivox, swedish]


First, an audio sample. Using [this video](https://www.youtube.com/watch?v=Kw5jkyLGFGc) from youtube. Youtube says it's 11 minutes, 51 seconds, so that should be enough to check that striding works.

In [2]:
DATA = """
#avskedssang	https://www.archive.org/download/multilingual_poetry_018_1112_lv/swedish_avskedssang_andersson_e.mp3	http://runeberg.org/daefter/12.html	Elin	https://librivox.org/reader/6686	f
efter_torgdagen	https://www.archive.org/download/mshortworks_001_1202_librivox/msw001_19_eftertorsdagen_benedictsson_at.mp3	https://litteraturbanken.se/f%C3%B6rfattare/BenedictssonV/titlar/Ber%C3%A4ttelserOchUtkast/sida/126/etext	owly	https://librivox.org/reader/2857	f
en_saga_om_vreden	https://ia800801.us.archive.org/0/items/mshortworks_001_1202_librivox/msw001_12_en_saga_om_vreden_runeberg_jb.mp3	http://web.archive.org/web/20190814032041/http://freetexthost.com:80/bcp31m60i4	Johan Borg	https://librivox.org/reader/5958
elin_i_hagen	https://www.archive.org/download/multilingual_poetry_014_1002/swedish_elinihagen_froding_ear.mp3	http://runeberg.org/dragharm/elinhage.html	Elina Riuttanen	https://librivox.org/reader/4498	f
"""

In [5]:
items = {}
for line in DATA.split("\n"):
    if line.strip() == "":
        continue
    if line.startswith("#"):
        continue
    parts = line.split("\t")
    items[parts[0]] = parts[1:]

In [7]:
for item in items.keys():
    !wget {items[item][0]}
    mp3name = items[item][0].split("/")[-1]
    !ffmpeg -i {mp3name} -acodec pcm_s16le -ac 1 -ar 16000 {item}.wav

--2022-06-30 14:37:14--  https://www.archive.org/download/mshortworks_001_1202_librivox/msw001_19_eftertorsdagen_benedictsson_at.mp3
Resolving www.archive.org (www.archive.org)... 207.241.224.2
Connecting to www.archive.org (www.archive.org)|207.241.224.2|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://archive.org/download/mshortworks_001_1202_librivox/msw001_19_eftertorsdagen_benedictsson_at.mp3 [following]
--2022-06-30 14:37:14--  https://archive.org/download/mshortworks_001_1202_librivox/msw001_19_eftertorsdagen_benedictsson_at.mp3
Resolving archive.org (archive.org)... 207.241.224.2
Connecting to archive.org (archive.org)|207.241.224.2|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ia800801.us.archive.org/0/items/mshortworks_001_1202_librivox/msw001_19_eftertorsdagen_benedictsson_at.mp3 [following]
--2022-06-30 14:37:14--  https://ia800801.us.archive.org/0/items/mshortworks_001_1202_librivox/

In [None]:
!ffmpeg -i Kw5jkyLGFGc.m4a -acodec pcm_s16le -ac 1 -ar 16000 Kw5jkyLGFGc.wav

Here starts the actual ASR stuff.

In [None]:
!pip install transformers

In [None]:
_SWE_MODEL = "KBLab/wav2vec2-large-voxrex-swedish"

In [None]:
from transformers import pipeline

In [None]:
pipe = pipeline(model=_SWE_MODEL)

For working with strides, there's information in a [blog post](https://huggingface.co/blog/asr-chunking).

There isn't much information on getting timestamps from a pipeline, but the detail is in the [pull request](https://github.com/huggingface/transformers/pull/15792).

In [None]:
output = pipe("/content/Kw5jkyLGFGc.wav", chunk_length_s=10)

In [None]:
output = pipe("/content/Kw5jkyLGFGc.wav", chunk_length_s=10, return_timestamps="word")

In [None]:
import json
with open("/content/Kw5jkyLGFGc.json", "w") as f:
    json.dump(output, f)