# Norwegian Post codes

For a project, I was interested in the Norwegian Postcode areas. Luckily Norway is pretty open with data, so you can get the data at [Kartkatalog](https://kartkatalog.geonorge.no/metadata/kartverket/postnummeromrader/462a5297-33ef-438a-82a5-07fff5799be3).

This notebook walks through how the dataset was extracted. In the end we want a dataframe pickle that has:
1. Polygon of points that represent the postal area
2. Post code "postnummer"
3. Postal Area "poststed"
4. Municipality "kommune"

In [1]:
from osgeo import ogr
import json
import pandas as pd

In [2]:
reader = ogr.Open('Basisdata_0000_Norge_25833_Postnummeromrader_GML.gml')
layer = reader.GetLayer()

In [3]:
layer.GetFeature(50)

Lets just look at a random entry, in this case 50.

In [4]:
json.loads(layer.GetFeature(50).ExportToJson())

{'type': 'Feature',
 'geometry': {'type': 'Polygon',
  'coordinates': [[[216726.090049086, 6622104.24964588],
    [217142.690048385, 6621595.3796504],
    [216797.980048704, 6619853.21964821],
    [216745.830048758, 6619635.55964785],
    [216730.610048774, 6619572.02964774],
    [216612.700048942, 6619460.71964664],
    [216568.740049007, 6619432.69964622],
    [216353.380049331, 6619363.5196441],
    [216312.480049393, 6619349.5696437],
    [216116.27004969, 6619273.32964178],
    [216071.510049756, 6619244.56964134],
    [216017.830049836, 6619210.48964082],
    [215968.880049909, 6619184.18964035],
    [215911.270049996, 6619150.81963978],
    [215853.910050082, 6619120.23963922],
    [215832.060050115, 6619107.12963901],
    [215786.250050184, 6619087.56963856],
    [215722.310050282, 6619061.08963792],
    [215658.870050379, 6619034.56963729],
    [215625.33005043, 6619019.48963696],
    [215560.45005053, 6618993.97963632],
    [215528.630050579, 6618980.909636],
    [215509.6600

Here we see that the structure is something like this.
1. type (feature)
2. geometry
    1. type (geometry)
    2. coordinates
3. properties
    1. lokalId
    2. navnerom
    3. versjonId
    4. datauttaksdato
    5. opphav
    6. målemetode
    7. nøyaktighet
    8. postnummer
    9. poststed
    10. kommune
    11. oppdateringsdato
4. id
    
We would like to take out the Id, coordinates, postnummer, poststed, kommune. I also see that the coordinate system is not in lat long degrees. We will need to correct that later. Firstly lets extract it into a dictionary, and then put that dictionary into a dataframe.

In [5]:
# create a empty dictionary to populate for the dataframe
data_dict = dict()

# cycle through each feature
for i in range(layer.GetFeatureCount()):
    # extract and export as json 
    json_element = json.loads(layer.GetFeature(i).ExportToJson())
    
    # extract the relivant data from the json
    node_id = json_element['id']
    coordinates = json_element['geometry']['coordinates']
    postnummer = json_element['properties']['postnummer']
    poststed = json_element['properties']['poststed']
    kommune = json_element['properties']['kommune']
    
    # append to the dictionary a dictionary of 
    data_dict[node_id] = {'postnummer': postnummer, 'poststed': poststed,
                          'kommune': kommune, 'coordinates': coordinates}

In [9]:
# set the column names for the dataframe
cols = ['postnummer', 'poststed', 'kommune', 'coordinates']    

# create the dataframe using dictionary keys as the rows
df = pd.DataFrame.from_dict(data_dict, orient='index', columns=cols)
df.head()

Unnamed: 0,postnummer,poststed,kommune,coordinates
0,1339,VØYENENGA,219,"[[[245469.990020313, 6650942.7998409], [245370..."
1,1361,ØSTERÅS,219,"[[[253912.000015147, 6653573.99987799], [25395..."
2,1354,BÆRUMS VERK,219,"[[[248576.47001836, 6654230.96985486], [248488..."
3,1346,GJETTUM,219,"[[[249541.000017632, 6651700.99986], [249317.0..."
4,1362,HOSLE,219,"[[[253912.000015147, 6653573.99987799], [25385..."


Lets also look at the range of postnummers:

In [10]:
print("Max Postcode = {}".format(df.postnummer.max()))
print("Min Postcode = {}".format(df.postnummer.min()))

Max Postcode = 9990
Min Postcode = 10


Great. Thats about the range that is given elsewhere for the range. Finally, lets pickle the dataframe for use here and elsewhere.

In [11]:
df.to_pickle("./postnummerpoly.pkl")