# Plot data on inequality and size for Norwegian municipalities

The theory goes that cities are drivers of economic inequality. Statistics Norway has data both on income inequality and population on a municipality level. There are currently 428 municipalities in Norway, ranging in population size from ~200 to >600 000. Most municipalities aren't cities (let alone big cities). Still, the correlation is interesting. The project is inspired by Richard Florida.

We start by importing some libraries.

In [3]:
from bokeh.plotting import figure, show

In [4]:
from bokeh.io import output_notebook
output_notebook()

In [36]:
import requests
from pyjstat import pyjstat
from collections import OrderedDict
import json
import pandas as pd
from bokeh.models import ColumnDataSource, HoverTool

## Population vs inequality

### Get the data

Statistics Norway has a nice REST-API, which we are going to use. The API uses POST requests, so we specify the URL and the (somewhat complex) JSON-query. The JSON query is basically just a filter statement, plus the specification that we want json-stat formatted response.

In [6]:
GINI_URL = 'http://data.ssb.no/api/v0/no/table/09114'

In [7]:
GINI_PAYLOAD = {
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "all",
        "values": [
          "*"
        ]
      }
    },
    {
      "code": "ContentsCode",
      "selection": {
        "filter": "item",
        "values": [
          "Ginikoeffisient"
        ]
      }
    },
    {
      "code": "Tid",
      "selection": {
        "filter": "top",
        "values": [
            1
        ]
      }
    }
  ],
  "response": {
    "format": "json-stat"
  }
}


Send request

In [8]:
ginidata = requests.post(GINI_URL, json=GINI_PAYLOAD)

Convert json-stat to pandas data frame.

In [9]:
ginidf = pyjstat.from_json_stat(ginidata.json(object_pairs_hook=OrderedDict), naming='id')[0]

In [10]:
POP_URL = 'http://data.ssb.no/api/v0/no/table/01222'
POP_PAYLOAD = {
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "all",
        "values": [
          "*"
        ]
      }
    },
    {
      "code": "ContentsCode",
      "selection": {
        "filter": "item",
        "values": [
          "Folketallet1"
        ]
      }
    },
    {
      "code": "Tid",
      "selection": {
        "filter": "top",
        "values": [
            1
        ]
      }
    }
  ],
  "response": {
    "format": "json-stat"
  }
}


In [11]:
popdata = requests.post(POP_URL, json=POP_PAYLOAD)

In [12]:
popdf = pyjstat.from_json_stat(popdata.json(object_pairs_hook=OrderedDict), naming='id')[0]

In [13]:
popdf.drop(["ContentsCode", 'Tid'], axis=1, inplace=True)
ginidf.drop(["ContentsCode", "Tid"], axis=1, inplace=True)

In [14]:
popdf.rename(columns={'value': 'folketall'}, inplace=True)
ginidf.rename(columns={'value':'gini'}, inplace=True)

In [17]:
ginidf = ginidf[ginidf['Region'].str.len()==4]
popdf = popdf[popdf['Region'].str.len()==4]

Join Gini-data with population data

In [18]:
ad2 = pd.merge(popdf, ginidf, on='Region')

## Visualization

Create the first visualization, using gini-coefficients and municipality population.

In [40]:
pop_hover = HoverTool(tooltips=[
    ("Municipality_id", "@Region"),
    ("Gini", "@gini"),
    ("Total population", "@folketall")
])

In [41]:
source = ColumnDataSource(data=ad2)
p = figure(tools=[pop_hover],
           x_axis_type='log', 
           x_axis_label="Municipality size", 
           y_axis_label="Gini coefficient", 
           title="Municipality size and income inequality in Norway")
p.circle(x='folketall', y='gini', source=source)
show(p)

## Population density vs inequality

We don't really expect any widely different result here, because municipality size and population size is strongly correlated.

Basically repeat the process, but with population density data. Density as frequency must be calculated as number of people living in densely populated areas vs total number of people in municipality. This is achieved through two separate queries, and the results are merged.

In [20]:
D_URL = 'http://data.ssb.no/api/v0/no/table/05212'
D_PAYLOAD1 = {
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "all",
        "values": [
          "*"
        ]
      }
    },
    {
      "code": "TettSpredt",
      "selection": {
        "filter": "item",
        "values": [
          "10"
        ]
      }
    },
    {
      "code": "ContentsCode",
      "selection": {
        "filter": "item",
        "values": [
          "Folkemengde"
        ]
      }
    },
    {
      "code": "Tid",
      "selection": {
        "filter": "top",
        "values": [
            1
        ]
      }
    }
  ],
  "response": {
    "format": "json-stat"
  }
}


D_PAYLOAD2 = {
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "all",
        "values": [
          "*"
        ]
      }
    },
    {
      "code": "ContentsCode",
      "selection": {
        "filter": "item",
        "values": [
          "Folkemengde"
        ]
      }
    },
    {
      "code": "Tid",
      "selection": {
        "filter": "top",
        "values": [
            1
        ]
      }
    }
  ],
  "response": {
    "format": "json-stat"
  }
}


In [21]:
densedata = requests.post(D_URL, json=D_PAYLOAD1)

In [22]:
totaldata = requests.post(D_URL, json=D_PAYLOAD2)

In [23]:
densedf = pyjstat.from_json_stat(densedata.json(object_pairs_hook=OrderedDict), naming='id')[0]
totaldf = pyjstat.from_json_stat(totaldata.json(object_pairs_hook=OrderedDict), naming='id')[0]

In [42]:
# Filter rows that are not munuipalities - area totals and country as a whole
densedf = densedf[densedf['Region'].str.len()==4]
totaldf = totaldf[totaldf['Region'].str.len()==4]

In [25]:
densedf.drop(["ContentsCode", 'Tid', 'TettSpredt'], axis=1, inplace=True)
totaldf.drop(["ContentsCode", "Tid"], axis=1, inplace=True)
densedf.rename(columns={'value': 'tettbygd'}, inplace=True)
totaldf.rename(columns={'value':'totalt'}, inplace=True)

In [26]:
denseshare_df = pd.merge(densedf, totaldf, on='Region')
denseshare_df['denseshare'] = denseshare_df['tettbygd']/denseshare_df['totalt']

In [27]:
denseshare_df.head()

Unnamed: 0,Region,tettbygd,totalt,denseshare
0,101,26491,30544,0.867306
1,102,0,0,
2,103,0,0,
3,104,31634,32182,0.982972
4,105,49584,54678,0.906836


In [28]:
dense_gini_df = pd.merge(denseshare_df, ginidf, on='Region')

In [29]:
dense_gini_df.head()

Unnamed: 0,Region,tettbygd,totalt,denseshare,gini
0,101,26491,30544,0.867306,0.232
1,104,31634,32182,0.982972,0.253
2,105,49584,54678,0.906836,0.231
3,106,72937,78967,0.923639,0.24
4,111,2675,4511,0.592995,0.247


## Visualization

In [37]:
hover = HoverTool(tooltips=[
    ("Municipality_id", "@Region"),
    ("Gini", "@gini"),
    ("Total population", "@totalt")
])

In [39]:
source = ColumnDataSource(data=dense_gini_df)
p = figure(tools=[hover],
           x_axis_label="Share of population living in dense areas", 
           y_axis_label="Gini coefficient", 
           title="Population density and income inequality in Norway")
p.circle(x='denseshare', y='gini', source=source)
show(p)

That's all, folks!