# Sametinget WebTV exploration

> "For a master's student"

- categories: [scraper, sametinget, webtv]
- hidden: true
- branch: master

[This video page](https://sametinget.kommunetv.no/archive/244) has the ID 244, which is passed to the API:

In [4]:
URL = "https://sametinget.kommunetv.no/api/streams?streamType=1&id=244"

In [5]:
import requests

request = requests.get(URL)

In [6]:
assert request.status_code == 200

I already know that this is JSON

In [8]:
import json
streams = json.loads(request.text)

In [12]:
streams.keys()

dict_keys(['stream', 'bookmarks', 'speakerEntries', 'messages', 'playlist', 'proposals', 'references'])

`'stream'` has some useful metadata that I'm not looking at now:

In [11]:
streams['stream'].keys()

dict_keys(['assetId', 'assetCopyrightText', 'attachments', 'availableFrom', 'availableTo', 'categoryId', 'description', 'id', 'imageUrl', 'publishDate', 'streamType', 'title', 'tupleId', 'views', 'voting', 'externalCode', 'movieConfiguration', 'hidden', 'allowQuestionForm', 'likes', 'dislikes'])

In [37]:
streams['stream']['title']

'Direktesending - Sametingets plenum 05.06.2025'

In [38]:
streams['stream']['description']

'<p>Plenumsm&oslash;tet starter tirsdag 3. juni kl. 09.00 og avsluttes fredag 6. juni innen kl. 10.00. Se saksliste og program:&nbsp;https://sametinget.no/Kalender/CalendarEvent.aspx?Id=1494&amp;MId1=7</p>'

...but `'playlist'` is where the video files are ultimately retrieved from:

In [15]:
streams['playlist'][0].keys()

dict_keys(['tupleId', 'id', 'streamType', 'started', 'duration', 'cameraId', 'cameraName', 'sortOrder', 'playlist', 'lf', 'll', 'mainCamera'])

In [16]:
streams['playlist'][0]['started']

'2025-06-04T22:00:00Z'

In [19]:
streams['playlist'][0]['playlist'][0].keys()

dict_keys(['bookmarkId', 'description', 'file', 'image', 'startTime', 'title'])

In [20]:
streams['playlist'][0]['playlist'][0]['file']

'https://apollowms.aventia.no/arkivh/_definst_/sametinget/arkiv/244_97.smil/playlist.m3u8?wowzaplaystart=1155000&wowzaplayduration=5437000'

`playlist.m3u8` is basically a default filename, something like `index.html` on a webserver.

In [21]:
smilreq = requests.get(streams['playlist'][0]['playlist'][0]['file'])

In [23]:
print(smilreq.text)

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=1572536,RESOLUTION=1280x720
chunklist_w873789732_b1572536_ps1155000_pd5437000.m3u8



It's possible to have multiple playlists within a playlist, so this function aims at that:

In [25]:
def get_m3u(text):
    res = []
    for line in text.split("\n"):
        if line.startswith("#"):
            continue
        if "m3u8" in line:
            res.append(line)
    return res

In [26]:
get_m3u(smilreq.text)

['chunklist_w873789732_b1572536_ps1155000_pd5437000.m3u8']

In [30]:
m3u_list = get_m3u(smilreq.text)

That it contains just a filename (instead of a full URL) implies that it's relative to the URL of the playlist:

In [27]:
def get_base_url(url):
    last_slash = url.rfind("/")
    return url[:last_slash + 1]

In [28]:
get_base_url(streams['playlist'][0]['playlist'][0]['file'])

'https://apollowms.aventia.no/arkivh/_definst_/sametinget/arkiv/244_97.smil/'

In [29]:
base = get_base_url(streams['playlist'][0]['playlist'][0]['file'])

In [31]:
urls = [base + x for x in m3u_list]

In [33]:
m3u_req = requests.get(urls[0])

In [34]:
m3u_req.status_code

200

In [None]:
print(m3u_req.text)

I won't include this output: essentially, this is a sequence of `.ts` files, which contain an individual chunk of the stream. They need to be concatenated to create a full video (`ffmpeg` can do this).