<a href="https://colab.research.google.com/github/kirajcg/pyscbwrapper/blob/master/pyscbwrapper_en.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

We want to construct a graph over the population density in each county ('län') in Sweden over time. To help with this we have Statistics Sweden's data and Python, and our goal is to not have to leave the Python environment to not even fetch the data. Luckily there is a custom made package for this purpose, called pyscbwrapper.

# Installation

pyscbwrapper is a kind of interface to the Statistical Database, written in Python 3. To run this interface we first need to install and import it. (If you do not have access to Python you can run this via Colab by clicking the button above.)

In [1]:
!pip install -q pyscbwrapper
from pyssbwrapper import SSB

# Initialisation

SCB is now a class we have imported, containing functions for navigating in the Statistical Database and fetch metadata as well as data from it. To use the SCB class we first need to initialise an object from it. Then we need one mandatory argument: language for the data and metadata. English and Swedish are supported. We choose English: 

In [2]:
ssb = SSB('en')

# Navigation and metadata

Now we can look at the top node in the tree that is the Statistical Database's metadata: 

In [3]:
ssb.info()

[{'id': 'al', 'type': 'l', 'text': 'Labour market and earnings'},
 {'id': 'bf', 'type': 'l', 'text': 'Banking and financial markets'},
 {'id': 'vf', 'type': 'l', 'text': 'Establishments, enterprises and accounts'},
 {'id': 'be', 'type': 'l', 'text': 'Population'},
 {'id': 'bb', 'type': 'l', 'text': 'Construction, housing and property'},
 {'id': 'ei', 'type': 'l', 'text': 'Energy and manufacturing'},
 {'id': 'he', 'type': 'l', 'text': 'Health'},
 {'id': 'if', 'type': 'l', 'text': 'Income and consumption'},
 {'id': 'in', 'type': 'l', 'text': 'Immigration and immigrants'},
 {'id': 'js',
  'type': 'l',
  'text': 'Agriculture, forestry, hunting and fishing'},
 {'id': 'kf', 'type': 'l', 'text': 'Culture and recreation'},
 {'id': 'nk', 'type': 'l', 'text': 'National accounts and business cycles'},
 {'id': 'nm', 'type': 'l', 'text': 'Nature and the environment'},
 {'id': 'os', 'type': 'l', 'text': 'Public sector'},
 {'id': 'pp', 'type': 'l', 'text': 'Prices and price indices'},
 {'id': 'sk', '

We can now go down in the tree by using the id tag in the metadata we fetched. Let's say we are interested in the population statistics. 

In [4]:
ssb.go_down('be')

To fetch the metadata about the population statistics, we once again run the function info():

In [5]:
ssb.info()

[{'id': 'be01', 'type': 'l', 'text': 'Children, families and households'},
 {'id': 'be02', 'type': 'l', 'text': 'Population projections'},
 {'id': 'be03', 'type': 'l', 'text': 'Migration'},
 {'id': 'be05', 'type': 'l', 'text': 'Population count'},
 {'id': 'be06', 'type': 'l', 'text': 'Births and deaths'},
 {'id': 'be07', 'type': 'l', 'text': 'Immigrants'},
 {'id': 'be08', 'type': 'l', 'text': 'Gender equality'},
 {'id': 'be09', 'type': 'l', 'text': 'Names'},
 {'id': 'be11', 'type': 'l', 'text': 'Regions'}]

We can keep going down in the tree: 

In [6]:
ssb.go_down('be09')
ssb.info()

[{'id': 'navn', 'type': 'l', 'text': 'Names'}]

Whoops! We did not want the name statistics, but the population statistics. We go up one step and back down to the correct node: 

In [7]:
ssb.go_up()
ssb.go_down('be05')
ssb.info()

[{'id': 'befsvalbard', 'type': 'l', 'text': 'Population of Svalbard'},
 {'id': 'beftett',
  'type': 'l',
  'text': 'Population and land area in urban settlements'},
 {'id': 'flyktninger', 'type': 'l', 'text': 'Refugees'},
 {'id': 'fobbolig',
  'type': 'l',
  'text': 'Population and Housing Census, dwellings (discontinued)'},
 {'id': 'fobhoved',
  'type': 'l',
  'text': 'Population and Housing Census, main figures (discontinued)'},
 {'id': 'fobhushold',
  'type': 'l',
  'text': 'Population and Housing Census, households (discontinued)'},
 {'id': 'fobhusinnt',
  'type': 'l',
  'text': 'Population and Housing Census, household income (discontinued)'},
 {'id': 'fobinnvbolig',
  'type': 'l',
  'text': 'Population and Housing Census, housing conditions of immigrants (discontinued)'},
 {'id': 'fobsysut',
  'type': 'l',
  'text': 'Population and Housing Census, employment and education (discontinued)'},
 {'id': 'folkemengde', 'type': 'l', 'text': 'Population'},
 {'id': 'folkfram', 'type': 'l',

# Direct route to specific nodes

 If we know where in the tree we want to go, we can initialise an object using the id tags as extra variables:

In [8]:
ssb = SSB('en', 'be', 'be05', 'folkemengde')
ssb.info()

[{'id': 'SBMENU6924',
  'type': 'h',
  'text': 'Municipalities, counties and the whole country, population per 1 January'},
 {'id': 'NY3026',
  'type': 't',
  'text': '07459: Population, by sex and one-year age groups  (M) 1986 - 2025',
  'updated': '2025-02-25T08:00:00'},
 {'id': 'Folkemengd1951',
  'type': 't',
  'text': '06913: Population 1 January and population changes during the calendar year (M) 1951 - 2025',
  'updated': '2025-03-13T08:00:00'},
 {'id': 'Rd0002AaX2',
  'type': 't',
  'text': '03027: Population, by marital status (M) 1986 - 2025',
  'updated': '2025-02-25T08:00:00'},
 {'id': 'Rd0002AaX5',
  'type': 't',
  'text': '03031: Population, by sex, age and marital status (C) 1986 - 2025',
  'updated': '2025-02-25T08:00:00'},
 {'id': 'FolkemengdAreal',
  'type': 't',
  'text': '11342: Population and area (M) 2007 - 2025',
  'updated': '2025-02-26T08:00:00'},
 {'id': 'BefolkKomStor',
  'type': 't',
  'text': '12871: Population, by size of municipality, sex and age 2017 - 2

As you can see, we end up directly in the population statistics. 

---

The specific initialisation of the object does not stop us from navigating in the tree: 

In [9]:
ssb.go_up()
ssb.info()

[{'id': 'befsvalbard', 'type': 'l', 'text': 'Population of Svalbard'},
 {'id': 'beftett',
  'type': 'l',
  'text': 'Population and land area in urban settlements'},
 {'id': 'flyktninger', 'type': 'l', 'text': 'Refugees'},
 {'id': 'fobbolig',
  'type': 'l',
  'text': 'Population and Housing Census, dwellings (discontinued)'},
 {'id': 'fobhoved',
  'type': 'l',
  'text': 'Population and Housing Census, main figures (discontinued)'},
 {'id': 'fobhushold',
  'type': 'l',
  'text': 'Population and Housing Census, households (discontinued)'},
 {'id': 'fobhusinnt',
  'type': 'l',
  'text': 'Population and Housing Census, household income (discontinued)'},
 {'id': 'fobinnvbolig',
  'type': 'l',
  'text': 'Population and Housing Census, housing conditions of immigrants (discontinued)'},
 {'id': 'fobsysut',
  'type': 'l',
  'text': 'Population and Housing Census, employment and education (discontinued)'},
 {'id': 'folkemengde', 'type': 'l', 'text': 'Population'},
 {'id': 'folkfram', 'type': 'l',

Anyway, we go directly back to population density: 

In [10]:
ssb.go_down('beftett')
ssb.info()

[{'id': 'FolkTettSpredt',
  'type': 't',
  'text': '05212: Population in densely and sparsely populated areas, by sex (M) 1990 - 2024',
  'updated': '2024-10-01T08:00:00'},
 {'id': 'FolkTettsted',
  'type': 't',
  'text': '05277: Population, by age and sex (US) 1999 - 2024',
  'updated': '2024-10-01T08:00:00'},
 {'id': 'ArealBefTett',
  'type': 't',
  'text': '04859: Area and population of urban settlements (US) 2000 - 2024',
  'updated': '2024-10-01T08:00:00'},
 {'id': 'ArealBefKomm',
  'type': 't',
  'text': '04861: Area and population of urban settlements (M) 2000 - 2024',
  'updated': '2024-10-01T08:00:00'},
 {'id': 'ArealBefFylk',
  'type': 't',
  'text': '04860: Area and population of urban settlements (C) 2000 - 2024',
  'updated': '2024-10-01T08:00:00'},
 {'id': 'ArealBefLand',
  'type': 't',
  'text': '04862: Area and population of urban settlements 2000 - 2024',
  'updated': '2024-10-01T08:00:00'},
 {'id': 'Beftett01',
  'type': 't',
  'text': '14216: Area and population of u

Now there is only one nod to go to, so we do that:

In [11]:
ssb.go_down('FolkTettSpredt')
ssb.info()

{'title': '05212: Population, by region, densely/sparsely populated areas, sex, contents and year',
 'variables': [{'code': 'Region',
   'text': 'region',
   'values': ['0',
    '31',
    '3101',
    '3103',
    '3105',
    '3107',
    '3110',
    '3112',
    '3114',
    '3116',
    '3118',
    '3120',
    '3122',
    '3124',
    '32',
    '3201',
    '3203',
    '3205',
    '3207',
    '3209',
    '3212',
    '3214',
    '3216',
    '3218',
    '3220',
    '3222',
    '3224',
    '3226',
    '3228',
    '3230',
    '3232',
    '3234',
    '3236',
    '3238',
    '3240',
    '3242',
    '30',
    '01',
    '3001',
    '3002',
    '3003',
    '3004',
    '3005',
    '3006',
    '3007',
    '3011',
    '3012',
    '3013',
    '3014',
    '3015',
    '3016',
    '3017',
    '3018',
    '3019',
    '3020',
    '3021',
    '3022',
    '3023',
    '3024',
    '3025',
    '3026',
    '3027',
    '3028',
    '3029',
    '3030',
    '3031',
    '3032',
    '3033',
    '3034',
    '3035',
    '3

Note how the metadata differs from previous nodes: The keyword variables is there, which indicates that we are in a leaf node. From here we can therefore fetch the actual data. 

---

It is not necessary to call info() after each go_down() but it is a good idea to do anyway, if you are not very sure of what the database looks like.

# Fetch data

Now that we are in a leaf node we can look at which variables there are, and their respective ranges: 

In [13]:
ssb.get_variables()

{'region': ['The whole country',
  'Østfold',
  'Halden',
  'Moss',
  'Sarpsborg',
  'Fredrikstad',
  'Hvaler',
  'Råde',
  'Våler (Østfold)',
  'Skiptvet',
  'Indre Østfold',
  'Rakkestad',
  'Marker',
  'Aremark',
  'Akershus',
  'Bærum',
  'Asker',
  'Lillestrøm',
  'Nordre Follo',
  'Ullensaker',
  'Nesodden',
  'Frogn',
  'Vestby',
  'Ås',
  'Enebakk',
  'Lørenskog',
  'Rælingen',
  'Aurskog-Høland',
  'Nes',
  'Gjerdrum',
  'Nittedal',
  'Lunner',
  'Jevnaker',
  'Nannestad',
  'Eidsvoll',
  'Hurdal',
  'Viken (2020-2023)',
  'Østfold (-2019)',
  'Halden (2020-2023)',
  'Moss (2020-2023)',
  'Sarpsborg (2020-2023)',
  'Fredrikstad (2020-2023)',
  'Drammen (2020-2023)',
  'Kongsberg (2020-2023)',
  'Ringerike (2020-2023)',
  'Hvaler (2020-2023)',
  'Aremark (2020-2023)',
  'Marker (2020-2023)',
  'Indre Østfold (2020-2023)',
  'Skiptvet (2020-2023)',
  'Rakkestad (2020-2023)',
  'Råde (2020-2023)',
  'Våler (Viken) (2020-2023)',
  'Vestby (2020-2023)',
  'Nordre Follo (2020-2023)'

Now that we have these, we can choose what we are interested in and create a json query. Let's say we are interested in the number of inhabitants per square kilometer in Örebro county the latest five years. 

In [15]:
ssb.set_query(region=["Oslo"], 
              observations=["densely/sparsely populated areas"], 
              year=["2014", "2015", "2016", "2017", "2018"])

Now we can check how the query looks: 

In [17]:
ssb.get_query()

{'query': [{'code': 'Region',
   'selection': {'filter': 'item', 'values': ['03']}},
  {'code': 'Tid',
   'selection': {'filter': 'item',
    'values': ['2014', '2015', '2016', '2017', '2018']}}],
 'response': {'format': 'json'}}

The query is automatically formatted in the right way to fetch the data from the API. We fetch the data: 

In [18]:
ssb.get_data()

{'columns': [{'code': 'Region', 'text': 'region', 'type': 'd'},
  {'code': 'Tid', 'text': 'year', 'type': 't'},
  {'code': 'Folkemengde', 'text': 'Population', 'type': 'c'}],
 'comments': [],
 'data': [{'key': ['03', '2014'], 'values': ['634463']},
  {'key': ['03', '2015'], 'values': ['647676']},
  {'key': ['03', '2016'], 'values': ['658390']},
  {'key': ['03', '2017'], 'values': ['666759']},
  {'key': ['03', '2018'], 'values': ['673469']}],
 'metadata': [{'infofile': 'None',
   'updated': '2024-10-01T06:00:00Z',
   'label': '05212: Population, by region, year and contents',
   'source': 'Statistics Norway'}]}

This is the same data that one can fetch from the Statistical Database on the website. Via a function in pyscbwrapper we can get the URL to the page with the data: 

In [19]:
ssb.get_url()

'https://data.ssb.no/api/v0/en/table/START__be__be05__beftett/FolkTettSpredt'

The data can of course be fetched without pyscbwrapper, by navigating to the URL above. There we select "Population density per sq. km", choose region County and select Örebro county, and select the last five years. On the next page we can click "API for this table" and get the query and a URL to post it to via e.g. the package requests, but he have to change "format": "px" to "format": "json". 

In [2]:
import requests
import json

session = requests.Session()

query = {
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "vs:RegionLän99EjAggr",
        "values": [
          "18"
        ]
      }
    },
    {
      "code": "ContentsCode",
      "selection": {
        "filter": "item",
        "values": [
          "BE0101U1"
        ]
      }
    },
    {
      "code": "Tid",
      "selection": {
        "filter": "item",
        "values": [
          "2014",
          "2015",
          "2016",
          "2017",
          "2018"
        ]
      }
    }
  ],
  "response": {
    "format": "json"
  }
}

url = "https://api.scb.se/OV0104/v1/doris/en/ssd/START/BE/BE0101/BE0101C/BefArealTathetKon"

response = session.post(url, json=query)
response_json = json.loads(response.content.decode('utf-8-sig'))

response_json

{'columns': [{'code': 'Region', 'text': 'region', 'type': 'd'},
  {'code': 'Tid', 'text': 'year', 'type': 't'},
  {'code': 'BE0101U1', 'text': 'Population density per sq. km', 'type': 'c'}],
 'comments': [],
 'data': [{'key': ['18', '2014'], 'values': ['33.9']},
  {'key': ['18', '2015'], 'values': ['34.2']},
  {'key': ['18', '2016'], 'values': ['34.7']},
  {'key': ['18', '2017'], 'values': ['35.1']},
  {'key': ['18', '2018'], 'values': ['35.5']}]}

As you can see, we get the exact same data. 

# More advanced calls

Now that we have seen what the data looks like we can fetch more of it to make interesting graphs. Since we already are on the correct place in the API structure we only have to construct a new query. Let's say we want data from every available year for each county. First we can pick out a list over all regions, filter out the counties through regular expressions, and after that use the list in the json query: 

In [21]:
import re

regions = ssb.get_variables()['region']
r = re.compile(r'.* county')
county = list(filter(r.match, regions))

ssb.set_query(region=county,
              observations=["Population density per sq. km"])

In [None]:
scb.get_query()

{'query': [{'code': 'Region',
   'selection': {'filter': 'item',
    'values': ['01',
     '03',
     '04',
     '05',
     '06',
     '07',
     '08',
     '09',
     '10',
     '12',
     '13',
     '14',
     '17',
     '18',
     '19',
     '20',
     '21',
     '22',
     '23',
     '24',
     '25']}},
  {'code': 'ContentsCode',
   'selection': {'filter': 'item', 'values': ['BE0101U1']}}],
 'response': {'format': 'json'}}

This is the exact query we need. We fetch the data and place it in a variable so we can use it later: 

In [23]:
ssb_data = ssb.get_data()

As is good practice we look at the data before we do anything else: 

In [24]:
ssb_data

{'columns': [{'code': 'Tid', 'text': 'year', 'type': 't'},
  {'code': 'Folkemengde', 'text': 'Population', 'type': 'c'}],
 'comments': [],
 'data': [{'key': ['1990'], 'values': ['4233116']},
  {'key': ['1991'], 'values': ['4249830']},
  {'key': ['1992'], 'values': ['4273634']},
  {'key': ['1993'], 'values': ['4299167']},
  {'key': ['1994'], 'values': ['4324815']},
  {'key': ['1995'], 'values': ['4348410']},
  {'key': ['1996'], 'values': ['4369957']},
  {'key': ['1997'], 'values': ['4392714']},
  {'key': ['1998'], 'values': ['4417599']},
  {'key': ['1999'], 'values': ['4445329']},
  {'key': ['2000'], 'values': ['4478497']},
  {'key': ['2001'], 'values': ['4503436']},
  {'key': ['2002'], 'values': ['4524066']},
  {'key': ['2003'], 'values': ['4552252']},
  {'key': ['2004'], 'values': ['4577457']},
  {'key': ['2005'], 'values': ['4606363']},
  {'key': ['2006'], 'values': ['4640219']},
  {'key': ['2007'], 'values': ['4681134']},
  {'key': ['2008'], 'values': ['4737171']},
  {'key': ['2009'

The actual data we look for is here: 

In [27]:
scb_fetch = ssb_data['data']

Once again we check that we have gotten the correct data: 

In [28]:
scb_fetch

[{'key': ['1990'], 'values': ['4233116']},
 {'key': ['1991'], 'values': ['4249830']},
 {'key': ['1992'], 'values': ['4273634']},
 {'key': ['1993'], 'values': ['4299167']},
 {'key': ['1994'], 'values': ['4324815']},
 {'key': ['1995'], 'values': ['4348410']},
 {'key': ['1996'], 'values': ['4369957']},
 {'key': ['1997'], 'values': ['4392714']},
 {'key': ['1998'], 'values': ['4417599']},
 {'key': ['1999'], 'values': ['4445329']},
 {'key': ['2000'], 'values': ['4478497']},
 {'key': ['2001'], 'values': ['4503436']},
 {'key': ['2002'], 'values': ['4524066']},
 {'key': ['2003'], 'values': ['4552252']},
 {'key': ['2004'], 'values': ['4577457']},
 {'key': ['2005'], 'values': ['4606363']},
 {'key': ['2006'], 'values': ['4640219']},
 {'key': ['2007'], 'values': ['4681134']},
 {'key': ['2008'], 'values': ['4737171']},
 {'key': ['2009'], 'values': ['4799252']},
 {'key': ['2011'], 'values': ['4920305']},
 {'key': ['2012'], 'values': ['4985870']},
 {'key': ['2013'], 'values': ['5051275']},
 {'key': ['

Now we need to understand the structure of the data. We have gotten a list of dictionaries where the first variable 'key' contains the domain (in this case county and year), and the variable 'values' contains the value of the observation variable (in this case inhabitants per square kilometer). To change this into time series that can be used for visualisation we need a few syntactic tricks that are described below. 

# Data processing

What we look for is one time series per county. Therefore we need to restructure the data we have gotten before we can do anything. This is outside the functionality of pyscbwrapper, but we can easily solve it. A good structure would be a dictionary with county as key and another dictionary as value, where the inner dictionary has year as key and variable value as value. To achieve this we take the list of counties that we created earlier and connect it to the county codes, which we can take from their place in get_query(), in this case 0. By comparing these codes, which are now connected to the county names, to the codes in the data, we can connect the county names to the data. 

This way we create the structure we want, and we take the opportunity to cast the the values as numeric. 

In [33]:
codes = ssb.get_query()['query'][0]['selection']['values']

countydic = {}
for i in range(len(codes)):
  countydic[codes[i]] = county[i]


countydata = {}

for code in countydic:
  countydata[countydic[code]] = {}
  for i in range(len(scb_fetch)):
    if ssb_fetch[i]['key'][0] == code:
      countydata[countydic[code]][scb_fetch[i]['key'][1]] = \
      float(ssb_fetch[i]['values'][0])


This got a bit hacky, so let's see if we got the structure we wanted: 

In [34]:
countydata

{}

This looks about right. Now we can loop over the keys and plot the values using key on the x axis and value on the y axis. 

# Data visualisation

We need numpy, pandas, and matplotlib for this. We install and import. 

In [31]:
!pip install -q matplotlib
!pip install -q pandas
!pip install -q numpy

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

Now we can make a neat graph. 

In [32]:
df = pd.DataFrame(countydata)
df = df.reset_index()
df = df.rename(columns={"index":"Year"})
ax = df.plot(x=df.index, xticks=np.arange(len(df.index)), colormap='hsv')
ax.set_xticklabels(df["Year"], rotation=45)
plt.title("Population density per county")
plt.xlabel("Year")
plt.ylabel("Inhabitants per square kilometer")
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.show()

ValueError: x must be a label or position

Now we have what we wanted, and we never left the Python environment!